19:01:38 <clarkb> #startmeeting infra
19:01:39 <openstack> Meeting started Tue May 28 19:01:38 2019 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:42 <openstack> The meeting name has been set to 'infra'
19:01:49 <clarkb> #link http://lists.openstack.org/pipermail/openstack-infra/2019-May/006391.html The agenda
19:02:09 <clarkb> Sorry for getting that out late. I thought I would have computer time yesterday but the kids just got bikes and insisted on basically riding them all day
19:02:46 <fungi> soon they can just ride them all day while you get more computer time ;)
19:03:01 <fungi> an investment in your future
19:03:07 <clarkb> that is the goal
19:03:18 <clarkb> #topic Announcements
19:03:25 <clarkb> I announce that I have no other announcements
19:03:39 <fungi> concise
19:03:56 <clarkb> Anything worth mentioning that didn't make it onto the agenda?
19:05:12 <clarkb> sounds like no. Let's move on
19:05:14 <clarkb> #topic Actions from last meeting
19:05:21 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-05-21-19.01.txt minutes from last meeting
19:05:31 <clarkb> No recorded actions from last meeting
19:05:49 <clarkb> #topic Specs approval
19:06:11 <clarkb> As mentioned previously there are a few new specs up since the summit/forum/ptg. If you have a free moment they would likely appreciate your feedback
19:06:21 <clarkb> but I don't think anything is at a point where we are ready to approve things
19:06:44 <clarkb> With that out of the way lets jump into the recent happenings.
19:06:52 <clarkb> #topic Priority Efforts
19:07:00 <clarkb> #topic Update Config Management
19:07:19 <clarkb> Puppet 4 upgrades are basically done at this point
19:07:31 <cmurphy> do we have anything left?
19:07:41 <cmurphy> is ask still undone?
19:07:42 <clarkb> there are three hosts that are on puppet 3. The arm64 builder. The arm64 mirror. And the backup target host
19:07:45 <clarkb> cmurphy: ask is done
19:07:59 <clarkb> the Arm64 nodes are puppet 3 because there is no puppet 4 packaging for arm64
19:08:14 <clarkb> ianw has done a bunch of work to ansible our mirrors nodes so the mirror node is a non issue
19:08:25 <clarkb> That mostly leaves us with the arm64 builder and backup target host
19:08:46 <clarkb> Thinking about this more I think the backup target host should be straightforward to ansible so I don't think we need to upgrade puppet on that one
19:08:51 <corvus> agree
19:09:00 <clarkb> (in fact we may already effectively ansible it after our base stuff moved into ansible)
19:09:02 <corvus> (it's mostly base++)
19:09:10 <mordred> yah
19:09:19 <clarkb> I think that leaves us with the arm64 nodepool builder to worry about
19:09:20 <ianw> i can take an action item to get onto that, as i've played quite a bit with bup now for prior volume switches
19:09:25 <clarkb> ianw: thanks
19:09:29 <fungi> any idea if anyone's tried to rebuild the puppet 4 source packages on arm64?
19:09:45 <clarkb> #action ianw ansiblify our backup host so that we don't have to upgrade it to puppet 4
19:10:16 <clarkb> fungi: I haven't heard of any trying that
19:11:08 <ianw> yeah; i guess the distros might have done that, but it seems not
19:11:32 <clarkb> We can probably prioritize ansiblifying nodepool instead
19:11:39 <corvus> we could probably ansible/containerize nodepool before the rest of the zuul system
19:11:42 <clarkb> ya
19:11:44 <ianw> (i haven't done a really deep investigation, though .. there may be conversations out there about it)
19:11:45 <clarkb> that
19:11:49 <corvus> it has fewer interactions with the other systems
19:12:24 <corvus> (ansible/containerizing zuul itself has dependencies on gerrit/opendev/etc, which we are close to resolving, but haven't yet, which makes it more complex)
19:12:25 <fungi> really just need to containerize the builder for that matter
19:12:34 <clarkb> I think we can pretty much say puppet 4 is done at this point and we'll paper over corners with more ansible as necessary.
19:12:43 <clarkb> big thank you to cmurphy for pushing this work and getting it done
19:12:46 <fungi> but would presumably do all of nodepool in one coordinated effort
19:13:05 <clarkb> fungi: I expect the images will be built for all the bits of nodepool so may as well
19:13:06 <fungi> yes, thanks again cmurphy! this has been a huge help
19:13:11 <corvus> fungi: yeah, with the nodepool config files, i think doing all of nodepool together would make the most sense
19:13:12 <cmurphy> \o/
19:13:18 <corvus> cmurphy++
19:14:53 <clarkb> On the ansible and container side of things we should have reliable docker image builds now regardless of ip version
19:15:43 <clarkb> Anything else to bring up on this topic?
19:16:36 <clarkb> Oh right Zuul Driven CD
19:16:55 <clarkb> last week we merged the base playbook split out change
19:17:08 <clarkb> so if you have/had changes that touch base.yaml you'll likely need to rebase and do some minor reorging
19:17:33 <clarkb> This should put us in a position to start having zuul trigger some of the other playbooks when things merge (like dns zone files)
19:17:45 <corvus> ianw had made new changes to other systems since then based on the new framework (yay!)
19:18:06 <fungi> also this has meant instant increase in test coverage for us
19:18:29 <corvus> as for zuul itself, i think the next step in that system is moving the opendev project creation from ansible into python
19:18:30 <clarkb> related to that I need to figure out where we are on having nodepool build control plane images for us
19:18:54 <corvus> clarkb: i think nodepool control plane images change ready to approve
19:18:59 <corvus> + "is"
19:19:13 <corvus> https://review.opendev.org/640044 just needs a +w
19:19:27 <clarkb> kk so after my backlog of haproxy and project rename stuff I'll look at that next
19:19:57 <corvus> oh i think https://review.opendev.org/#/c/640027/ also needs a +w; note wrong depends-on host
19:20:14 <corvus> but it's also ready to go
19:20:54 <clarkb> #link https://review.opendev.org/640044 and https://review.opendev.org/#/c/640027/ nodepool built control plane images
19:21:35 <clarkb> seems like we continue to have slow but steady progress on updating our config management tooling. Thank you everyone for that
19:21:37 <corvus> once we have gitea project creation in python, we can clean up the TODO in that base playbook refactor, and then start to have the gerrit-gitea-zuul playbook run out of the run_all loop.
19:21:56 <corvus> (and then i think we'll be clear to start ansible/containerizing zuul)
19:22:08 <mordred> I agree with corvus
19:23:10 <clarkb> what is the dependency there? the puppet web that would otherwise need to be untangled?
19:23:53 <corvus> can't run that playbook in test mode because it'll take too long
19:24:09 <clarkb> gotcha
19:24:17 <corvus> nor could we run it in zuul in prod mode, for that matter, for the same reason
19:24:19 <clarkb> right takes like 4 hours right now to do it from scratch?
19:24:41 <corvus> $longtime at least
19:24:44 <clarkb> That is probably a good transition point into opendev stuff
19:24:47 <clarkb> #topic OpenDev
19:24:58 <clarkb> Last week we talked about general next step items
19:25:44 <mordred> clarkb: when we update the gitea project creation into python - should we add support to projects.yaml for listing old names for projects? otherwise if we create things from scratch in a gitea we'll miss renames/redirects
19:25:49 <fungi> which were mostly "let's finish cleaning up and polishing what's left from the last transition before we embark on new services (unless they're blocking cleanup from those we already switched)
19:25:59 <fungi> "
19:26:04 <mordred> fungi: ++
19:26:29 <clarkb> ya thats how I ended up digging into the docker job flakyness. It was making it hard for us to deploy updated gitea  :)
19:26:38 <clarkb> mordred: ya I think we need to go back and record that somewhere probably
19:26:47 <fungi> i do expect we need some way of tracking what redirects we're maintaining
19:27:00 <corvus> or do we just want to consider it precious data and back it up?
19:27:07 <clarkb> corvus: that would be database backups right?
19:27:20 <corvus> yeah
19:27:28 <fungi> doable, though that makes it harder to rebuild a gitea db from scratch if we want to be able to do that
19:27:29 <clarkb> we do know how to make those so that is an easy option too probably
19:27:57 <corvus> i guess the main thing is, do we want the complexity of a system which creates projects only to rename them (possibly multiple times), or the complexity of additional precious data + backups
19:28:11 <fungi> depending on the schema, we could even just back up dumps of the redirects table or something like that
19:28:24 <mordred> I mean - we have to encode renames into a yaml file anyway to do the renames - seems like if we just put that into a file that we cared about and tracked we'd be closer to being able to handle renames as part of normal process - assuming gerrit can ever grow rename ability
19:28:44 <corvus> mordred: yeah, that's a good point
19:29:06 <clarkb> maybe we do both (backups as a backup)
19:29:08 <corvus> mordred: i suspect, however, that we could end up in a situation where the "replay" won't work, depending on how we implement it
19:29:15 <mordred> corvus: indeed
19:29:23 <corvus> eg, if multiple projects are renamed over each other, the sequence is important
19:29:36 <corvus> so a naive implementation which just did one at a time could deadlock
19:29:39 <fungi> i'm fond of the belt-n-braces approach too if we want both
19:29:42 <mordred> corvus: that is an excellent point
19:30:08 <clarkb> fungi: ya the more I think about it the more I think proper backups for the db is a good idea regardless of our normal operating procesure
19:30:14 <clarkb> *procedure
19:30:35 <fungi> it is true that any encoding would need to match gitea's logic wrt replacing old redirects
19:30:54 <fungi> or sould need to at least encode the replay sequence
19:31:06 <corvus> mordred: at the very least, i do agree we should go aheand and adjust the yaml schema to accomodate that, if it makes sense... i do think that a "previous name" field is probably the best way for us to eventually implement renames
19:31:16 <fungi> not sure we need backups from all 8 servers though. one copy is probably sufficient
19:31:51 <clarkb> corvus: ++
19:32:10 <corvus> there's probably another way to approach things as well, which would be to maintain a list of "current" previous names for a project
19:32:39 <corvus> which would mean removing previous names from another project if the other project was renamed over the first
19:32:45 <corvus> does that make sense?
19:32:47 <mordred> corvus: ++ I actually really like that
19:32:57 <clarkb> maybe step 0 is checking in the repos.yaml in ~root/ on bridge.o.o so that we hvae a copy of that floating about
19:33:04 <mordred> corvus: because it makes projects.yaml contain the current state of canonical+redirects expeted
19:33:06 <mordred> expected
19:33:18 <corvus> it doesn't match how gitea works right now, so we'd have to think about how to implement it, but it would be the best sort of yaml state representation of the current status.
19:33:19 <corvus> ya that
19:33:33 <clarkb> corvus: I like that
19:33:37 <corvus> let's mull that over and see if we can come up with a way to do that
19:33:51 <mordred> ++
19:34:05 <clarkb> sounds good. I can grab a copy of repos.yaml from the great renaming and throw it up on gerrit os that it is somewhere too.
19:34:17 <corvus> clarkb: ++
19:34:29 <corvus> and anything we do friday we should record with that
19:34:48 <clarkb> ya can push a followup with fridays repos.yaml content
19:34:55 <clarkb> maybe a new dir in system-cofnig called renames for now
19:35:00 <mordred> yah
19:35:02 <clarkb> er project-config
19:35:40 <corvus> system-project
19:35:51 <fungi> matching teh present mapping data could solve overwrites with a ci job to check for duplicates
19:35:53 <clarkb> Ok we've got 25 minutes left so let's keep moving. Anything else related to OpenDev or are we generally happy with the plan of make things more reliable and pick off other bits as it will help us with reliability
19:36:02 <clarkb> corvus: k
19:36:15 <clarkb> corvus: (I'm not actually going to make a new repo)
19:36:21 <corvus> ;)
19:36:28 <fungi> i got nothin urgent
19:36:50 <clarkb> #topic Storyboard
19:37:06 <clarkb> fungi: diablo_rojo_phon: Anything to bring up re storyboard?
19:37:25 <diablo_rojo_phon> fungi did the swift migration
19:37:32 <fungi> swift *test* migration
19:37:36 <diablo_rojo_phon> Test migration
19:37:38 <diablo_rojo_phon> Yes
19:37:59 <timburke> Yay! Thank you!
19:38:17 <fungi> which surfaced an issue in that we've grown py3k-isms in the lp migration/import tooling, probably best solved by just rebuilding the current deployments on python 3
19:38:33 <diablo_rojo_phon> Not a problem with swift, more with our tools.
19:38:39 <corvus> setting up the slow query log (or doing something equivalent) is very near the top of my TODO list now (it's below "file expense report")
19:38:40 <fungi> right, that
19:39:17 * fungi also needs to file expense report
19:39:21 <clarkb> oh hrm I never filed an expense report either
19:39:35 <fungi> yeah... yeah
19:39:40 <corvus> unexpectedly productive topic, apparently!
19:39:44 <fungi> easy to forget
19:39:50 <corvus> thus the list ;)
19:39:56 <fungi> pile of denver receipts on teh workbench behind me
19:39:57 <clarkb> corvus: ya that suddenly jumped up on my list by being added to it :)
19:40:22 <clarkb> Alright lets move on
19:40:26 <clarkb> #topic General Topics
19:40:36 <clarkb> Friday May 31, 2019 project renames (that is this friday)
19:40:55 <clarkb> #link https://etherpad.openstack.org/p/project-renames-2019-05-31 Renaming process notes
19:41:00 <clarkb> I started writing down a plan there
19:41:12 <clarkb> #link https://review.opendev.org/#/q/topic:project-rename+status:open Changes to reflect project renames
19:41:28 <clarkb> if you all can review the plan as well as the changes that that query pulls up it will be helpful
19:41:42 <fungi> and we determined that last rename maintenance (other than the opendev transition) we bypassed zuul to merge the rename patches, so the test errors are expected?
19:41:44 <clarkb> One question I had was do we think we need to stack the project-rename changes so that we resolve any conflicts?
19:41:57 <clarkb> fungi: that is what our etherpad notes say from a previous one yes
19:42:09 <fungi> sequencing them is probably not a terrible idea
19:42:10 <clarkb> fungi: with notes about doing it before stopping gerrit so that they can replicate too
19:42:11 <corvus> yeah, if we're force-merging them that's probably a good idea
19:42:30 <clarkb> k I can take the todo to sequence those changes up
19:42:38 <fungi> "back in the day" when renames were rampant, we used to sequence them all as a matter of course
19:42:59 <fungi> (after running into more than one which needed a rebase after another merged, slowing the maintenance)
19:43:02 <clarkb> The other item was what time do we want to start? I can send an email out today with a notice of gerrit downtime to our gerrit user mailing lists
19:43:27 <clarkb> I'm good with anything after 9am local time and can probably do an early start at 8am if people prefer earlier. That is 1500UTC
19:43:31 <fungi> i'm good with pretty much any time
19:43:33 <clarkb> 8am is 1500UTC
19:43:50 <fungi> i have the benefit of living in your future, after all
19:44:12 <clarkb> mordred: and I think you said you wouldn't be around right? So probably fungi, corvus, myself and maybe frickler/ajaeger?
19:44:26 <mordred> yah - I'm out thurs/fri
19:44:31 <clarkb> corvus: ^ did you have an opinion on start time?
19:45:08 <corvus> i can be around; earlier is better
19:45:20 <clarkb> ok lets say 1500UTC if earlier is better
19:45:31 <fungi> wfm, updated my reminder
19:45:57 <clarkb> I think that covers my todo items for project renaming
19:46:03 <clarkb> Next up is server upgrades
19:46:06 <clarkb> #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup
19:46:12 <clarkb> mordred: did you have an update on status.o.o upgrades?
19:46:38 <clarkb> and fungi did you manage to get that wiki puppet change in?
19:46:41 <mordred> I put up two cleanup patches
19:46:51 <clarkb> mordred: can haz links?
19:46:58 <mordred> https://review.opendev.org/#/c/661385/ and
19:47:05 <mordred> https://review.opendev.org/#/c/661399/1
19:47:19 <clarkb> #link https://review.opendev.org/#/c/661385/ https://review.opendev.org/#/c/661399/ status.o.o server upgrade related cleanup changes
19:47:25 <fungi> yeah, last wiki-dev puppet change merged, i just haven't had time to check back in on it
19:47:42 <clarkb> cool so progress here too.
19:47:49 <corvus> one of those should get a review from ttx
19:47:51 <mordred> I think my next step is just going to be spinning up a new host with the existing stuff on xenial and seeing if it works fine - the only thing htat's really potentially yuck is the libjq-jquery package - but it's probably fine
19:48:14 <clarkb> mordred: and we can probably grab that from upstream if necessary pretty easily
19:48:16 <mordred> (potentially yuck because xenial has a different version of jquery - not because there's anything wrong with the package)
19:48:19 <mordred> clarkb: yah
19:48:24 <clarkb> ianw: fungi had mentioned that he thought you had said you would do static? is that the case or do we need to find someone to work on that?
19:48:38 <fungi> looks like puppet hasn't been running on wiki-dev01.opendev.org for a little while, so need to dig into why
19:48:38 <mordred> there are many things I'd like to change - but trying my best to avoid unnecessary nerd-sniping
19:48:51 <corvus> mordred: zuul status used to use jquery, does anything current?
19:48:53 <clarkb> corvus: https://review.opendev.org/#/c/661385/ that one?
19:48:53 <ianw> umm, well i haven't been looking at it ... does it sound appealing to anyone else?
19:48:56 <fungi> er, i mean on wiki-dev01.openstack.org
19:48:59 <corvus> clarkb: yes
19:49:06 <mordred> clarkb: yes - elastic-recheck does
19:49:08 <mordred> gah
19:49:10 <mordred> corvus: ^^
19:49:12 <corvus> mordred: thx
19:49:19 <clarkb> ianw: looking at notes more closely I think that you put your name on notes about switching cinder volumes around which may be where that idea came from
19:49:21 <fungi> ianw: i probably just misremembered, sorry
19:49:43 <clarkb> ianw: doesn't have to be you, just wanted to check if it already was :)
19:49:48 <fungi> clarkb: that may have been what stuck in my mind, yes
19:49:57 <mordred> corvus: I think in the future we could improve that by updating the elastic-recheck web stuff to build with webpack in the gate so we just deploy a tarball
19:50:39 <clarkb> in any case progress being made on a few fronts here. Thank you to everyone that has helped with this.
19:50:41 <corvus> mordred: yeah, or build a container or something
19:50:44 <mordred> yeah
19:51:13 <mordred> basically - we can make some improvements - but for now just gonna upgrade it as it is and move on
19:51:14 <clarkb> Next up is the mirror replacements which will gives us tls on the mirrors which should hopefully eventually lead to more relaible docker jobs
19:51:24 <clarkb> mordred: +1
19:51:36 <clarkb> link https://review.opendev.org/#/c/661187/ switch rax dfw mirror to new opendev mirror.
19:51:40 <clarkb> #link https://review.opendev.org/#/c/661187/ switch rax dfw mirror to new opendev mirror.
19:52:08 <clarkb> is the site variables file expected to be treated as jinja as in that change?
19:52:10 <clarkb> mordred: corvus ^
19:52:37 <clarkb> that was my only concern with that approach. Otherwise I think we can go ahead and do that
19:53:05 <corvus> i think it's "-e" included and the point of evaluation is when ansible loads that, so i think that should work.
19:53:44 <clarkb> awesome I should stamp a +2 on it and have ianw approve when ready to watch it?
19:53:46 <corvus> also i don't think it will require a restart
19:53:54 <corvus> i went ahead and +w it
19:53:57 <ianw> does it need a restart?
19:54:01 <clarkb> corvus: cool
19:54:06 <ianw> oh, ok, if corvus says no ++ :)
19:54:17 <clarkb> And that takes us to the end of our regularly scheduled agenda
19:54:21 <clarkb> #topic Open Discussion
19:54:38 <clarkb> Anything else we may have missed or should otherwise bring up in our last 5 minutes or so?
19:55:12 <ianw> corvus (mostly, but anyone) if we could loop back on the dns jobs in https://review.opendev.org/#/q/status:open+topic:check-zone that would be good to just get out of my queue
19:55:21 <ianw> and stop me messing up any future mirror bringups :)
19:55:44 <corvus> ianw: will do
19:56:55 <corvus> oh that's easy
19:56:59 <corvus> someone should +3 those now :)
19:57:07 <fungi> i already did
19:57:09 <clarkb> An early heads up: Mid june I'll be in central oregon digging up rocks so probably won't work much between June 17-20. This happens to be the week our certs expire so I'll work on replacing certs the week prior and checked that fungi will be around the 17-20 to finish up any stragglers
19:57:30 <fungi> yep, and then i disappear the next week
19:57:46 <corvus> clarkb: like digging up with a backhoe, or digging up with a paintbrush and dental pick?
19:57:56 <ianw> and mordred, or anyone ansibly -- https://github.com/ansible/ansible/pull/57038 i noticed from our devel job.  not really sure if it needs another +2 or whatever the ansible version of that is
19:58:21 <clarkb> corvus: shovel and pick/hoe. We are going to try our luck at finding geodes/thunder eggs
19:58:32 <corvus> ooh
19:58:43 <mordred> I will also be out june 17-20 - as well as most of the week after - but then I should be here all of july
19:59:16 <corvus> i don't have any plans, but maybe i should make some :)
19:59:20 <clarkb> ha
19:59:45 <clarkb> And that takes us to our hour time limit.
19:59:47 <clarkb> Thanks everyone!
19:59:50 <clarkb> #endmeeting