#openstack-meeting log

19:01:09 <clarkb> #startmeeting infra
19:01:10 <openstack> Meeting started Tue May  7 19:01:09 2019 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:11 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:13 <openstack> The meeting name has been set to 'infra'
19:01:28 <ianw> o/
19:01:46 <clarkb> #link http://lists.openstack.org/pipermail/openstack-infra/2019-May/006360.html
19:01:55 <clarkb> #topic Announcements
19:02:47 <clarkb> I've got a really weird week coming up between recovering from being away from home for 9 days, my wife is having minor surgery on Wednesday, and somewhere in there I'm visiting the dentist too. Mostly just a heads up that I may not be around during normal hours
19:03:00 <clarkb> I get a feeling that for many of us that were in denver this is also true
19:03:47 <corvus> this is true for me, though the particulars are different
19:04:04 <fungi> yeah
19:04:17 <fungi> best to assume it's a fairly weird week for everyone
19:04:28 <fungi> and i'll be entirely afk thursday through monday
19:04:38 <fungi> for vacation
19:05:08 <clarkb> oh also I've got a shower that leaks
19:05:31 <clarkb> though I think I'm just gonna have ap lumber deal with that so I can do the other stuff
19:05:37 <clarkb> #topic Actions from last meeting
19:05:44 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-04-23-19.01.txt minutes from last meeting
19:05:58 <clarkb> It has been a while since we had a meeting and we were all recovering from post opendev migration in that meeting
19:06:28 <clarkb> May as well jump ahead straight into the fun stuff I think
19:06:33 <clarkb> #topic Priority Efforts
19:06:40 <clarkb> #topic OpenDev
19:07:07 <clarkb> At the PTG fungi, corvus, mordred, ttx and myself were able to have some discussions around the high level needs for OpenDev
19:07:27 <clarkb> #link https://etherpad.openstack.org/p/2019-denver-ptg-infra-planning PTG Discussion notes
19:07:54 <mordred> it was good
19:07:55 <clarkb> That has some notes on the thoughts for this. Basically we would like to have namespace owners and delegate responsibility for what goes into a namespace to them
19:08:02 <corvus> clarkb: lines 59-69?
19:08:07 <clarkb> yup
19:08:23 <clarkb> We were thinking that we would create a meta config repo per namespace to track and manage that
19:08:52 <corvus> can folks not at the discuss follow lines 59-69?  or do we need to write that up in more words (as a ML post or spec?)
19:08:59 <corvus> not at the *discussion*
19:09:17 <clarkb> I half expect we'll need to write that up as a spec
19:10:16 <clarkb> but I agree, would be good to get input from folks that weren't able to be at the PTG.
19:10:23 <mordred> yeah. I imagine it will be basically what's in those lines with more conjunctions and stuff
19:10:35 <clarkb> I'm happy to write up a recap email using more words too (which may help me remember things better as well)
19:10:37 <mordred> maybe more pronouns and adverbs thrown in
19:11:32 <clarkb> Other opendev related items: We need to fix our gitea deployment so that we can upgrade gitea to 1.8.0 + corvus' patch
19:11:47 <clarkb> Deleting the old cgit servers is on my todo list as well
19:11:55 <mordred> ++
19:12:08 <mordred> the update to the test jobs is getting *closer8
19:12:41 <clarkb> ya thats the next topic of discussion maybe we should jump ahead and I'll take on the todo of writing a recap email in the near future (to add more words to our notes on the therpad)
19:12:44 <corvus> the hangup on gitea is https://review.opendev.org/655522
19:13:04 <corvus> tests for that change caught a legit behavior change which needs to be investigated
19:13:28 <clarkb> #link https://review.opendev.org/655522 Upgrades gitea to 1.8.0 but our tests discovered a legit behavior change that needs investigating
19:13:54 <mordred> yeah. we're also making updates to that test job in https://review.opendev.org/#/c/656871/ - which is orthogonal to the behavior change
19:13:57 <clarkb> #action clarkb Write PTG (and summit) recap email to capture our discussions there
19:14:18 <clarkb> #topic Update Configuration Management
19:14:41 <clarkb> Another major topic of discussion was how to get from where we are today with cron + ansible to zuul + ansible for deploying our systems
19:14:53 <corvus> lines 46-57 on the etherpad
19:15:02 <clarkb> #link https://review.opendev.org/#/c/656871/ Split base playbook into many playbooks
19:15:16 <corvus> 656871 implements lines 47-50
19:15:31 <mordred> it is currently unhappy making the root account in the gitea install for reasons I don't grok yet
19:15:33 <clarkb> This change is the result of those discussions. The general idea being that if we have finer grained playbooks we can have zuul run those when appropriate without needing to run the whole thing every time
19:15:40 <mordred> ++
19:15:52 <clarkb> The step after that chnage will be to add zuul jobs to run those playbooks
19:16:04 <corvus> that'll be important for infra-core folks to know about, since once we land that, we'll want to keep going in that direction and no longer overload the base playbook.
19:16:12 <clarkb> ++
19:16:53 <mordred> it's very exciting progress
19:17:07 <clarkb> I think we expect to have a cron of some sort (perhaps zuul periodic pipeline driven) to ensure base level things like firewalls are up to date across the board
19:17:19 <clarkb> but have most service level stuff driven by specific zuul updates
19:18:15 <clarkb> Also on the config management side of the house we are ready to do our last set of puppet 4 updates. So I may bug people for reviews on those once I am properly back to being able to watch that
19:18:26 <clarkb> Once that is done we can cleanup a bunch of our old puppet 3 jobs
19:18:37 <clarkb> which should hopefully make testing quicker and more reliable for our system changes
19:18:50 <fungi> also implications that we'll be able to (eventually) drop the puppet apply and beaker jobs
19:19:04 <fungi> once we're exercising them all via the ansible run jobs
19:19:21 <mordred> yeah. because then puppet becomes an impl detail of the playbook for the service
19:19:49 <clarkb> And finally, I think our docker image jobs are pretty robust at this point but we still do sometimes have jobs fail because they try to authenticate to dockerhub as if they are our local registry. I've got an agenda item for this towards the end of the meeting
19:20:18 <clarkb> #topic Storyboard
19:20:41 <clarkb> we discovered that we had improperly created some storyboard groups at the summit
19:20:47 <clarkb> fungi got those sorted out
19:21:12 <clarkb> my email inbox went crazy with storyboard story updates during last week as well
19:21:17 <clarkb> I think SotK persia and diablo_rojo_phon were busy
19:21:30 <clarkb> was anyone better than me at following along on what changes were being made?
19:21:32 <fungi> yes, they had a storysquash at the ptg
19:22:03 <fungi> basically triaging and combining/splitting/closing stories for storyboard and storyboard-webclient
19:23:11 <clarkb> There was also a constructive feedback session at the forum
19:23:20 * clarkb finds a link to the etherpad for that
19:23:46 <clarkb> #link https://etherpad.openstack.org/p/storyboard-pain-points
19:24:33 <clarkb> #topic General Topics
19:24:47 <clarkb> #link https://etherpad.openstack.org/2019-denver-ptg-infra-planning Denver PTG 2019 notes
19:25:00 <clarkb> in case anyone missed it ^ is where the notes were taken during the PTG
19:25:13 <clarkb> I've also just taken an action to write up a recap email so keep an eye out on the -infra list for that
19:25:30 <clarkb> Is there anything we haven't already touched on that people want more details on? questions? concerns?
19:26:39 <ianw> does splitting up the base playbook like this mean the bridge mysql/ara work is basically not necessary?
19:27:11 <ianw> we'll just essentially have the logs from post job runs instead?
19:27:20 <corvus> ianw: we're still expecting the playbooks to run via bridge's ansible, so that could still be useful
19:27:27 <corvus> (i think that ways always the plan)
19:27:31 <clarkb> ianw: we will have those logs, but the way the current CD jobs are set up is to execute ansible-playbook on bridge
19:27:42 <clarkb> (so we won't get the data in the zuul ara)
19:27:56 <ianw> ahh, right, that was your root login stuff
19:28:10 <clarkb> given that I think having ara for the bridge run ansible-playbook is still useful
19:28:10 <fungi> which is probably good for avoiding information leaks anyway
19:28:41 <mordred> ++
19:28:52 <ianw> cool, ok thanks for clarifying
19:29:21 <mordred> ianw: oh - speaking of - I pinged you in scrollback about the letsencrypt roles - but it turns out we weren't running one of them, so nothing to worry about :)
19:29:33 <clarkb> That is a good transition to the next topic :)
19:29:49 <clarkb> we have a buildset registry and a buildset proxy registry in our docker image build jobs
19:30:19 <clarkb> sometimes that buildset proxy registry fails (seems to happen more on ipv6 clouds that NAT to dockerhub) so my idea to fix that is to use the in region docker proxy
19:30:32 <mordred> ++
19:30:35 <clarkb> those in region mirror nodes have a floating IP so don't go through the many to one nat shared by the test nodes
19:30:52 <clarkb> However corvus pointed out that since we'll be publishing the resulting docker images we should get tls set up on those first
19:31:41 <clarkb> my plan is to roughly add all our in region mirrors to opendev.org DNS, add ssl vhosts to them then run the LE stuff to get certs for the opdnev.org names
19:31:53 <clarkb> Then with SNI I don't even need to change ports
19:32:13 <fungi> that'll be nice
19:32:15 <corvus> ++
19:32:23 <clarkb> As part of this we probably want to transition most/many things to https (and not just the dockerhub proxy)
19:33:03 <clarkb> I expect apt to present the biggest pain there but otherwise should be a win all around
19:33:14 <clarkb> ianw: ^ any concerns with doing that given your LE work?
19:33:36 <clarkb> like should https://review.opendev.org/#/c/652801/ get in first?
19:34:09 <clarkb> #link https://review.opendev.org/#/c/652801/ Add handlers for LE certs updates for integration with existing services
19:34:18 <clarkb> infra-root ^ if you have time to review that it would be much appreciated
19:35:13 <ianw> clarkb: yeah, i think it could be a bit asynchronous.  we can deploy the keys first and make sure they're all there ok, then update the puppet to insert them into apache config
19:35:34 <ianw> then, within at least 90 days i guess, we need something to figure out restarting apache when then renew :)
19:35:43 <ianw> but probably having that before we start is a better idea
19:35:47 <clarkb> ya, would be good to have that sorted out before we have too many to retrofit
19:36:15 <ianw> my idea, with 652801 would be that ansible would drop a flag file, which puppet would notice, remove and restart apache
19:37:07 <ianw> as we, someday, move the hosts to more ansible control, we should be integrate more with listeners etc
19:37:34 <clarkb> yup
19:37:41 <clarkb> ianw: have you had a chance to see my comments on that change yet?
19:37:50 <clarkb> curious to see what you think about them (but overall I was happy with the chagne)
19:37:52 <corvus> why have puppet do it and not ansible directly?
19:37:59 <clarkb> corvus: that was one of my questions :)
19:38:13 <corvus> ah i see that now :)
19:38:21 <ianw> i haven't cycled, back, but will today.  i'm happy to help setup the initial mirror certs
19:38:28 <clarkb> great thanks
19:38:42 <corvus> (it doesn't bother me that ansible would be restarting a service it didn't set up)
19:38:52 <clarkb> The last item I had on the agenda this week was to bring up the trusty upgrades/cleanups
19:38:58 <clarkb> #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup
19:39:06 <ianw> will respond, but the idea was that ansible *can* do it, if we want -- but there might be things where it's more complex than service foo restart so i wanted to make sure it could be flexible
19:39:17 <clarkb> I'd like to delete the groups servers this week if I can find time to sit down and make sure I back them up and delete the right instances
19:39:37 <clarkb> If we can finish cleaning up this list we'll be able to remove old nodepool images and test jobs which is nice to have on top of the EOL problem
19:39:52 <clarkb> fungi: ^ re groups ashlee had thoguht you had already done a db backup and/or instance snapshot
19:40:00 <clarkb> fungi: do you know if any of that has been done yet?
19:40:01 <fungi> yes, i did
19:40:17 <clarkb> cool so mostly just a matter of deleting the instances now?
19:40:23 <fungi> there is a db snapshot and a server snapshot
19:40:26 <clarkb> I can add them to the list with the old cgit cluster
19:40:35 <fungi> yes, should be safe to proceed
19:41:47 <clarkb> For those of you that have names on updating some of those remaining services let me know if I can do anything to help
19:42:05 <clarkb> And with that I think we've hit the end of our agenda
19:42:11 <clarkb> #topic Open Discussion
19:42:39 <clarkb> We should set a date for straggler opendev renames to happen
19:42:54 <clarkb> then encourage everyone to update the rename request list for that to happen if they want a rename
19:43:30 <clarkb> maybe the end of the month (gives people time to get back from summit and realize they want a rename)
19:43:51 <fungi> sounds good to me, i should be around after this weekend
19:44:09 <ianw> one thing to update on is testinfra which i've spent a bit of time with.  it's ansible 2.8 compatible now ... but not using ansible-runner unfortunately
19:44:09 <corvus> my schedule returns to normal after may 22
19:44:33 <clarkb> ianw: I suppose as long as it works with newer ansible our use case is met
19:44:59 <clarkb> corvus: fungi rgr seems like the 31st might be a good day to pencil in then (thinking others are also in recovery mdoe and may not even notice yet that a name needs changing)
19:45:10 <ianw> clarkb: sort of, i've sent in a fix for ipv6 addresses that broke us
19:45:18 <pabelanger> I actually had one, but didn't add it to meeting agenda. Where do we stand on getting nodepool / k8s integration? ~5months ago we landed: https://review.opendev.org/620755 for kube config for nodepool, but I kinda lost track where things are. Given what we know about namespaces in k8s / nodepool / zuul, is that something someone could pick up again and try to say get tox job on k8s? Or do we need
19:45:21 <pabelanger> new features in zuul / nodepool?
19:45:23 <clarkb> ianw: fun
19:45:41 <ianw> mordred: could you look at https://review.opendev.org/656908 which lets us install testinfra in tox sibilings correctly
19:45:53 <clarkb> pabelanger: I believe everything is in place for that to work now. Just need to add jobs for it (but I'm sure that if you add jobs for it we'll discover other fixes are necessary_
19:45:54 <mordred> pabelanger: it's a thing I think we're definitely still poking at - but I don't think we're in a rush to run tox jobs in k8s because it'll ultimately be more expensive to do so
19:46:14 <clarkb> mordred: pabelanger right I think the idea was to start with jobs that exercise nodepool'
19:46:18 <clarkb> s support of k8s
19:46:19 <mordred> yeah
19:46:22 <clarkb> not to shift large workload to it
19:46:39 <corvus> yeah, if someone wants to poke at that i say go for it
19:46:42 <mordred> ++
19:46:49 <pabelanger> okay, that is fine, I mostly wanted to say I am interestd in it, and think I can offer some time to enable it.  I just used tox as example.
19:47:13 <mordred> totes - and I think it would be great to have you poke at it
19:47:14 <corvus> there is a chance that i screwed up the credentials for that and we might need to re-export them
19:47:22 <pabelanger> so, expect some questions in coming days on what we'd like to do :)
19:47:27 <mordred> \o/
19:47:48 <pabelanger> cool, thanks!
19:48:03 * mordred hands pabelanger a turkey
19:50:59 <clarkb> Alright, anything else before we call it a meeting?
19:52:25 <clarkb> Sounds like no. Everyone enjoy their weird weeks and thank you! See you here next week.
19:52:28 <clarkb> #endmeeting