19:01:09 #startmeeting infra 19:01:10 Meeting started Tue May 7 19:01:09 2019 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:11 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:13 The meeting name has been set to 'infra' 19:01:28 o/ 19:01:46 #link http://lists.openstack.org/pipermail/openstack-infra/2019-May/006360.html 19:01:55 #topic Announcements 19:02:47 I've got a really weird week coming up between recovering from being away from home for 9 days, my wife is having minor surgery on Wednesday, and somewhere in there I'm visiting the dentist too. Mostly just a heads up that I may not be around during normal hours 19:03:00 I get a feeling that for many of us that were in denver this is also true 19:03:47 this is true for me, though the particulars are different 19:04:04 yeah 19:04:17 best to assume it's a fairly weird week for everyone 19:04:28 and i'll be entirely afk thursday through monday 19:04:38 for vacation 19:05:08 oh also I've got a shower that leaks 19:05:31 though I think I'm just gonna have ap lumber deal with that so I can do the other stuff 19:05:37 #topic Actions from last meeting 19:05:44 #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-04-23-19.01.txt minutes from last meeting 19:05:58 It has been a while since we had a meeting and we were all recovering from post opendev migration in that meeting 19:06:28 May as well jump ahead straight into the fun stuff I think 19:06:33 #topic Priority Efforts 19:06:40 #topic OpenDev 19:07:07 At the PTG fungi, corvus, mordred, ttx and myself were able to have some discussions around the high level needs for OpenDev 19:07:27 #link https://etherpad.openstack.org/p/2019-denver-ptg-infra-planning PTG Discussion notes 19:07:54 it was good 19:07:55 That has some notes on the thoughts for this. Basically we would like to have namespace owners and delegate responsibility for what goes into a namespace to them 19:08:02 clarkb: lines 59-69? 19:08:07 yup 19:08:23 We were thinking that we would create a meta config repo per namespace to track and manage that 19:08:52 can folks not at the discuss follow lines 59-69? or do we need to write that up in more words (as a ML post or spec?) 19:08:59 not at the *discussion* 19:09:17 I half expect we'll need to write that up as a spec 19:10:16 but I agree, would be good to get input from folks that weren't able to be at the PTG. 19:10:23 yeah. I imagine it will be basically what's in those lines with more conjunctions and stuff 19:10:35 I'm happy to write up a recap email using more words too (which may help me remember things better as well) 19:10:37 maybe more pronouns and adverbs thrown in 19:11:32 Other opendev related items: We need to fix our gitea deployment so that we can upgrade gitea to 1.8.0 + corvus' patch 19:11:47 Deleting the old cgit servers is on my todo list as well 19:11:55 ++ 19:12:08 the update to the test jobs is getting *closer8 19:12:41 ya thats the next topic of discussion maybe we should jump ahead and I'll take on the todo of writing a recap email in the near future (to add more words to our notes on the therpad) 19:12:44 the hangup on gitea is https://review.opendev.org/655522 19:13:04 tests for that change caught a legit behavior change which needs to be investigated 19:13:28 #link https://review.opendev.org/655522 Upgrades gitea to 1.8.0 but our tests discovered a legit behavior change that needs investigating 19:13:54 yeah. we're also making updates to that test job in https://review.opendev.org/#/c/656871/ - which is orthogonal to the behavior change 19:13:57 #action clarkb Write PTG (and summit) recap email to capture our discussions there 19:14:18 #topic Update Configuration Management 19:14:41 Another major topic of discussion was how to get from where we are today with cron + ansible to zuul + ansible for deploying our systems 19:14:53 lines 46-57 on the etherpad 19:15:02 #link https://review.opendev.org/#/c/656871/ Split base playbook into many playbooks 19:15:16 656871 implements lines 47-50 19:15:31 it is currently unhappy making the root account in the gitea install for reasons I don't grok yet 19:15:33 This change is the result of those discussions. The general idea being that if we have finer grained playbooks we can have zuul run those when appropriate without needing to run the whole thing every time 19:15:40 ++ 19:15:52 The step after that chnage will be to add zuul jobs to run those playbooks 19:16:04 that'll be important for infra-core folks to know about, since once we land that, we'll want to keep going in that direction and no longer overload the base playbook. 19:16:12 ++ 19:16:53 it's very exciting progress 19:17:07 I think we expect to have a cron of some sort (perhaps zuul periodic pipeline driven) to ensure base level things like firewalls are up to date across the board 19:17:19 but have most service level stuff driven by specific zuul updates 19:18:15 Also on the config management side of the house we are ready to do our last set of puppet 4 updates. So I may bug people for reviews on those once I am properly back to being able to watch that 19:18:26 Once that is done we can cleanup a bunch of our old puppet 3 jobs 19:18:37 which should hopefully make testing quicker and more reliable for our system changes 19:18:50 also implications that we'll be able to (eventually) drop the puppet apply and beaker jobs 19:19:04 once we're exercising them all via the ansible run jobs 19:19:21 yeah. because then puppet becomes an impl detail of the playbook for the service 19:19:49 And finally, I think our docker image jobs are pretty robust at this point but we still do sometimes have jobs fail because they try to authenticate to dockerhub as if they are our local registry. I've got an agenda item for this towards the end of the meeting 19:20:18 #topic Storyboard 19:20:41 we discovered that we had improperly created some storyboard groups at the summit 19:20:47 fungi got those sorted out 19:21:12 my email inbox went crazy with storyboard story updates during last week as well 19:21:17 I think SotK persia and diablo_rojo_phon were busy 19:21:30 was anyone better than me at following along on what changes were being made? 19:21:32 yes, they had a storysquash at the ptg 19:22:03 basically triaging and combining/splitting/closing stories for storyboard and storyboard-webclient 19:23:11 There was also a constructive feedback session at the forum 19:23:20 * clarkb finds a link to the etherpad for that 19:23:46 #link https://etherpad.openstack.org/p/storyboard-pain-points 19:24:33 #topic General Topics 19:24:47 #link https://etherpad.openstack.org/2019-denver-ptg-infra-planning Denver PTG 2019 notes 19:25:00 in case anyone missed it ^ is where the notes were taken during the PTG 19:25:13 I've also just taken an action to write up a recap email so keep an eye out on the -infra list for that 19:25:30 Is there anything we haven't already touched on that people want more details on? questions? concerns? 19:26:39 does splitting up the base playbook like this mean the bridge mysql/ara work is basically not necessary? 19:27:11 we'll just essentially have the logs from post job runs instead? 19:27:20 ianw: we're still expecting the playbooks to run via bridge's ansible, so that could still be useful 19:27:27 (i think that ways always the plan) 19:27:31 ianw: we will have those logs, but the way the current CD jobs are set up is to execute ansible-playbook on bridge 19:27:42 (so we won't get the data in the zuul ara) 19:27:56 ahh, right, that was your root login stuff 19:28:10 given that I think having ara for the bridge run ansible-playbook is still useful 19:28:10 which is probably good for avoiding information leaks anyway 19:28:41 ++ 19:28:52 cool, ok thanks for clarifying 19:29:21 ianw: oh - speaking of - I pinged you in scrollback about the letsencrypt roles - but it turns out we weren't running one of them, so nothing to worry about :) 19:29:33 That is a good transition to the next topic :) 19:29:49 we have a buildset registry and a buildset proxy registry in our docker image build jobs 19:30:19 sometimes that buildset proxy registry fails (seems to happen more on ipv6 clouds that NAT to dockerhub) so my idea to fix that is to use the in region docker proxy 19:30:32 ++ 19:30:35 those in region mirror nodes have a floating IP so don't go through the many to one nat shared by the test nodes 19:30:52 However corvus pointed out that since we'll be publishing the resulting docker images we should get tls set up on those first 19:31:41 my plan is to roughly add all our in region mirrors to opendev.org DNS, add ssl vhosts to them then run the LE stuff to get certs for the opdnev.org names 19:31:53 Then with SNI I don't even need to change ports 19:32:13 that'll be nice 19:32:15 ++ 19:32:23 As part of this we probably want to transition most/many things to https (and not just the dockerhub proxy) 19:33:03 I expect apt to present the biggest pain there but otherwise should be a win all around 19:33:14 ianw: ^ any concerns with doing that given your LE work? 19:33:36 like should https://review.opendev.org/#/c/652801/ get in first? 19:34:09 #link https://review.opendev.org/#/c/652801/ Add handlers for LE certs updates for integration with existing services 19:34:18 infra-root ^ if you have time to review that it would be much appreciated 19:35:13 clarkb: yeah, i think it could be a bit asynchronous. we can deploy the keys first and make sure they're all there ok, then update the puppet to insert them into apache config 19:35:34 then, within at least 90 days i guess, we need something to figure out restarting apache when then renew :) 19:35:43 but probably having that before we start is a better idea 19:35:47 ya, would be good to have that sorted out before we have too many to retrofit 19:36:15 my idea, with 652801 would be that ansible would drop a flag file, which puppet would notice, remove and restart apache 19:37:07 as we, someday, move the hosts to more ansible control, we should be integrate more with listeners etc 19:37:34 yup 19:37:41 ianw: have you had a chance to see my comments on that change yet? 19:37:50 curious to see what you think about them (but overall I was happy with the chagne) 19:37:52 why have puppet do it and not ansible directly? 19:37:59 corvus: that was one of my questions :) 19:38:13 ah i see that now :) 19:38:21 i haven't cycled, back, but will today. i'm happy to help setup the initial mirror certs 19:38:28 great thanks 19:38:42 (it doesn't bother me that ansible would be restarting a service it didn't set up) 19:38:52 The last item I had on the agenda this week was to bring up the trusty upgrades/cleanups 19:38:58 #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup 19:39:06 will respond, but the idea was that ansible *can* do it, if we want -- but there might be things where it's more complex than service foo restart so i wanted to make sure it could be flexible 19:39:17 I'd like to delete the groups servers this week if I can find time to sit down and make sure I back them up and delete the right instances 19:39:37 If we can finish cleaning up this list we'll be able to remove old nodepool images and test jobs which is nice to have on top of the EOL problem 19:39:52 fungi: ^ re groups ashlee had thoguht you had already done a db backup and/or instance snapshot 19:40:00 fungi: do you know if any of that has been done yet? 19:40:01 yes, i did 19:40:17 cool so mostly just a matter of deleting the instances now? 19:40:23 there is a db snapshot and a server snapshot 19:40:26 I can add them to the list with the old cgit cluster 19:40:35 yes, should be safe to proceed 19:41:47 For those of you that have names on updating some of those remaining services let me know if I can do anything to help 19:42:05 And with that I think we've hit the end of our agenda 19:42:11 #topic Open Discussion 19:42:39 We should set a date for straggler opendev renames to happen 19:42:54 then encourage everyone to update the rename request list for that to happen if they want a rename 19:43:30 maybe the end of the month (gives people time to get back from summit and realize they want a rename) 19:43:51 sounds good to me, i should be around after this weekend 19:44:09 one thing to update on is testinfra which i've spent a bit of time with. it's ansible 2.8 compatible now ... but not using ansible-runner unfortunately 19:44:09 my schedule returns to normal after may 22 19:44:33 ianw: I suppose as long as it works with newer ansible our use case is met 19:44:59 corvus: fungi rgr seems like the 31st might be a good day to pencil in then (thinking others are also in recovery mdoe and may not even notice yet that a name needs changing) 19:45:10 clarkb: sort of, i've sent in a fix for ipv6 addresses that broke us 19:45:18 I actually had one, but didn't add it to meeting agenda. Where do we stand on getting nodepool / k8s integration? ~5months ago we landed: https://review.opendev.org/620755 for kube config for nodepool, but I kinda lost track where things are. Given what we know about namespaces in k8s / nodepool / zuul, is that something someone could pick up again and try to say get tox job on k8s? Or do we need 19:45:21 new features in zuul / nodepool? 19:45:23 ianw: fun 19:45:41 mordred: could you look at https://review.opendev.org/656908 which lets us install testinfra in tox sibilings correctly 19:45:53 pabelanger: I believe everything is in place for that to work now. Just need to add jobs for it (but I'm sure that if you add jobs for it we'll discover other fixes are necessary_ 19:45:54 pabelanger: it's a thing I think we're definitely still poking at - but I don't think we're in a rush to run tox jobs in k8s because it'll ultimately be more expensive to do so 19:46:14 mordred: pabelanger right I think the idea was to start with jobs that exercise nodepool' 19:46:18 s support of k8s 19:46:19 yeah 19:46:22 not to shift large workload to it 19:46:39 yeah, if someone wants to poke at that i say go for it 19:46:42 ++ 19:46:49 okay, that is fine, I mostly wanted to say I am interestd in it, and think I can offer some time to enable it. I just used tox as example. 19:47:13 totes - and I think it would be great to have you poke at it 19:47:14 there is a chance that i screwed up the credentials for that and we might need to re-export them 19:47:22 so, expect some questions in coming days on what we'd like to do :) 19:47:27 \o/ 19:47:48 cool, thanks! 19:48:03 * mordred hands pabelanger a turkey 19:50:59 Alright, anything else before we call it a meeting? 19:52:25 Sounds like no. Everyone enjoy their weird weeks and thank you! See you here next week. 19:52:28 #endmeeting