19:01:38 #startmeeting infra 19:01:39 Meeting started Tue May 28 19:01:38 2019 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:40 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:42 The meeting name has been set to 'infra' 19:01:49 #link http://lists.openstack.org/pipermail/openstack-infra/2019-May/006391.html The agenda 19:02:09 Sorry for getting that out late. I thought I would have computer time yesterday but the kids just got bikes and insisted on basically riding them all day 19:02:46 soon they can just ride them all day while you get more computer time ;) 19:03:01 an investment in your future 19:03:07 that is the goal 19:03:18 #topic Announcements 19:03:25 I announce that I have no other announcements 19:03:39 concise 19:03:56 Anything worth mentioning that didn't make it onto the agenda? 19:05:12 sounds like no. Let's move on 19:05:14 #topic Actions from last meeting 19:05:21 #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-05-21-19.01.txt minutes from last meeting 19:05:31 No recorded actions from last meeting 19:05:49 #topic Specs approval 19:06:11 As mentioned previously there are a few new specs up since the summit/forum/ptg. If you have a free moment they would likely appreciate your feedback 19:06:21 but I don't think anything is at a point where we are ready to approve things 19:06:44 With that out of the way lets jump into the recent happenings. 19:06:52 #topic Priority Efforts 19:07:00 #topic Update Config Management 19:07:19 Puppet 4 upgrades are basically done at this point 19:07:31 do we have anything left? 19:07:41 is ask still undone? 19:07:42 there are three hosts that are on puppet 3. The arm64 builder. The arm64 mirror. And the backup target host 19:07:45 cmurphy: ask is done 19:07:59 the Arm64 nodes are puppet 3 because there is no puppet 4 packaging for arm64 19:08:14 ianw has done a bunch of work to ansible our mirrors nodes so the mirror node is a non issue 19:08:25 That mostly leaves us with the arm64 builder and backup target host 19:08:46 Thinking about this more I think the backup target host should be straightforward to ansible so I don't think we need to upgrade puppet on that one 19:08:51 agree 19:09:00 (in fact we may already effectively ansible it after our base stuff moved into ansible) 19:09:02 (it's mostly base++) 19:09:10 yah 19:09:19 I think that leaves us with the arm64 nodepool builder to worry about 19:09:20 i can take an action item to get onto that, as i've played quite a bit with bup now for prior volume switches 19:09:25 ianw: thanks 19:09:29 any idea if anyone's tried to rebuild the puppet 4 source packages on arm64? 19:09:45 #action ianw ansiblify our backup host so that we don't have to upgrade it to puppet 4 19:10:16 fungi: I haven't heard of any trying that 19:11:08 yeah; i guess the distros might have done that, but it seems not 19:11:32 We can probably prioritize ansiblifying nodepool instead 19:11:39 we could probably ansible/containerize nodepool before the rest of the zuul system 19:11:42 ya 19:11:44 (i haven't done a really deep investigation, though .. there may be conversations out there about it) 19:11:45 that 19:11:49 it has fewer interactions with the other systems 19:12:24 (ansible/containerizing zuul itself has dependencies on gerrit/opendev/etc, which we are close to resolving, but haven't yet, which makes it more complex) 19:12:25 really just need to containerize the builder for that matter 19:12:34 I think we can pretty much say puppet 4 is done at this point and we'll paper over corners with more ansible as necessary. 19:12:43 big thank you to cmurphy for pushing this work and getting it done 19:12:46 but would presumably do all of nodepool in one coordinated effort 19:13:05 fungi: I expect the images will be built for all the bits of nodepool so may as well 19:13:06 yes, thanks again cmurphy! this has been a huge help 19:13:11 fungi: yeah, with the nodepool config files, i think doing all of nodepool together would make the most sense 19:13:12 \o/ 19:13:18 cmurphy++ 19:14:53 On the ansible and container side of things we should have reliable docker image builds now regardless of ip version 19:15:43 Anything else to bring up on this topic? 19:16:36 Oh right Zuul Driven CD 19:16:55 last week we merged the base playbook split out change 19:17:08 so if you have/had changes that touch base.yaml you'll likely need to rebase and do some minor reorging 19:17:33 This should put us in a position to start having zuul trigger some of the other playbooks when things merge (like dns zone files) 19:17:45 ianw had made new changes to other systems since then based on the new framework (yay!) 19:18:06 also this has meant instant increase in test coverage for us 19:18:29 as for zuul itself, i think the next step in that system is moving the opendev project creation from ansible into python 19:18:30 related to that I need to figure out where we are on having nodepool build control plane images for us 19:18:54 clarkb: i think nodepool control plane images change ready to approve 19:18:59 + "is" 19:19:13 https://review.opendev.org/640044 just needs a +w 19:19:27 kk so after my backlog of haproxy and project rename stuff I'll look at that next 19:19:57 oh i think https://review.opendev.org/#/c/640027/ also needs a +w; note wrong depends-on host 19:20:14 but it's also ready to go 19:20:54 #link https://review.opendev.org/640044 and https://review.opendev.org/#/c/640027/ nodepool built control plane images 19:21:35 seems like we continue to have slow but steady progress on updating our config management tooling. Thank you everyone for that 19:21:37 once we have gitea project creation in python, we can clean up the TODO in that base playbook refactor, and then start to have the gerrit-gitea-zuul playbook run out of the run_all loop. 19:21:56 (and then i think we'll be clear to start ansible/containerizing zuul) 19:22:08 I agree with corvus 19:23:10 what is the dependency there? the puppet web that would otherwise need to be untangled? 19:23:53 can't run that playbook in test mode because it'll take too long 19:24:09 gotcha 19:24:17 nor could we run it in zuul in prod mode, for that matter, for the same reason 19:24:19 right takes like 4 hours right now to do it from scratch? 19:24:41 $longtime at least 19:24:44 That is probably a good transition point into opendev stuff 19:24:47 #topic OpenDev 19:24:58 Last week we talked about general next step items 19:25:44 clarkb: when we update the gitea project creation into python - should we add support to projects.yaml for listing old names for projects? otherwise if we create things from scratch in a gitea we'll miss renames/redirects 19:25:49 which were mostly "let's finish cleaning up and polishing what's left from the last transition before we embark on new services (unless they're blocking cleanup from those we already switched) 19:25:59 " 19:26:04 fungi: ++ 19:26:29 ya thats how I ended up digging into the docker job flakyness. It was making it hard for us to deploy updated gitea :) 19:26:38 mordred: ya I think we need to go back and record that somewhere probably 19:26:47 i do expect we need some way of tracking what redirects we're maintaining 19:27:00 or do we just want to consider it precious data and back it up? 19:27:07 corvus: that would be database backups right? 19:27:20 yeah 19:27:28 doable, though that makes it harder to rebuild a gitea db from scratch if we want to be able to do that 19:27:29 we do know how to make those so that is an easy option too probably 19:27:57 i guess the main thing is, do we want the complexity of a system which creates projects only to rename them (possibly multiple times), or the complexity of additional precious data + backups 19:28:11 depending on the schema, we could even just back up dumps of the redirects table or something like that 19:28:24 I mean - we have to encode renames into a yaml file anyway to do the renames - seems like if we just put that into a file that we cared about and tracked we'd be closer to being able to handle renames as part of normal process - assuming gerrit can ever grow rename ability 19:28:44 mordred: yeah, that's a good point 19:29:06 maybe we do both (backups as a backup) 19:29:08 mordred: i suspect, however, that we could end up in a situation where the "replay" won't work, depending on how we implement it 19:29:15 corvus: indeed 19:29:23 eg, if multiple projects are renamed over each other, the sequence is important 19:29:36 so a naive implementation which just did one at a time could deadlock 19:29:39 i'm fond of the belt-n-braces approach too if we want both 19:29:42 corvus: that is an excellent point 19:30:08 fungi: ya the more I think about it the more I think proper backups for the db is a good idea regardless of our normal operating procesure 19:30:14 *procedure 19:30:35 it is true that any encoding would need to match gitea's logic wrt replacing old redirects 19:30:54 or sould need to at least encode the replay sequence 19:31:06 mordred: at the very least, i do agree we should go aheand and adjust the yaml schema to accomodate that, if it makes sense... i do think that a "previous name" field is probably the best way for us to eventually implement renames 19:31:16 not sure we need backups from all 8 servers though. one copy is probably sufficient 19:31:51 corvus: ++ 19:32:10 there's probably another way to approach things as well, which would be to maintain a list of "current" previous names for a project 19:32:39 which would mean removing previous names from another project if the other project was renamed over the first 19:32:45 does that make sense? 19:32:47 corvus: ++ I actually really like that 19:32:57 maybe step 0 is checking in the repos.yaml in ~root/ on bridge.o.o so that we hvae a copy of that floating about 19:33:04 corvus: because it makes projects.yaml contain the current state of canonical+redirects expeted 19:33:06 expected 19:33:18 it doesn't match how gitea works right now, so we'd have to think about how to implement it, but it would be the best sort of yaml state representation of the current status. 19:33:19 ya that 19:33:33 corvus: I like that 19:33:37 let's mull that over and see if we can come up with a way to do that 19:33:51 ++ 19:34:05 sounds good. I can grab a copy of repos.yaml from the great renaming and throw it up on gerrit os that it is somewhere too. 19:34:17 clarkb: ++ 19:34:29 and anything we do friday we should record with that 19:34:48 ya can push a followup with fridays repos.yaml content 19:34:55 maybe a new dir in system-cofnig called renames for now 19:35:00 yah 19:35:02 er project-config 19:35:40 system-project 19:35:51 matching teh present mapping data could solve overwrites with a ci job to check for duplicates 19:35:53 Ok we've got 25 minutes left so let's keep moving. Anything else related to OpenDev or are we generally happy with the plan of make things more reliable and pick off other bits as it will help us with reliability 19:36:02 corvus: k 19:36:15 corvus: (I'm not actually going to make a new repo) 19:36:21 ;) 19:36:28 i got nothin urgent 19:36:50 #topic Storyboard 19:37:06 fungi: diablo_rojo_phon: Anything to bring up re storyboard? 19:37:25 fungi did the swift migration 19:37:32 swift *test* migration 19:37:36 Test migration 19:37:38 Yes 19:37:59 Yay! Thank you! 19:38:17 which surfaced an issue in that we've grown py3k-isms in the lp migration/import tooling, probably best solved by just rebuilding the current deployments on python 3 19:38:33 Not a problem with swift, more with our tools. 19:38:39 setting up the slow query log (or doing something equivalent) is very near the top of my TODO list now (it's below "file expense report") 19:38:40 right, that 19:39:17 * fungi also needs to file expense report 19:39:21 oh hrm I never filed an expense report either 19:39:35 yeah... yeah 19:39:40 unexpectedly productive topic, apparently! 19:39:44 easy to forget 19:39:50 thus the list ;) 19:39:56 pile of denver receipts on teh workbench behind me 19:39:57 corvus: ya that suddenly jumped up on my list by being added to it :) 19:40:22 Alright lets move on 19:40:26 #topic General Topics 19:40:36 Friday May 31, 2019 project renames (that is this friday) 19:40:55 #link https://etherpad.openstack.org/p/project-renames-2019-05-31 Renaming process notes 19:41:00 I started writing down a plan there 19:41:12 #link https://review.opendev.org/#/q/topic:project-rename+status:open Changes to reflect project renames 19:41:28 if you all can review the plan as well as the changes that that query pulls up it will be helpful 19:41:42 and we determined that last rename maintenance (other than the opendev transition) we bypassed zuul to merge the rename patches, so the test errors are expected? 19:41:44 One question I had was do we think we need to stack the project-rename changes so that we resolve any conflicts? 19:41:57 fungi: that is what our etherpad notes say from a previous one yes 19:42:09 sequencing them is probably not a terrible idea 19:42:10 fungi: with notes about doing it before stopping gerrit so that they can replicate too 19:42:11 yeah, if we're force-merging them that's probably a good idea 19:42:30 k I can take the todo to sequence those changes up 19:42:38 "back in the day" when renames were rampant, we used to sequence them all as a matter of course 19:42:59 (after running into more than one which needed a rebase after another merged, slowing the maintenance) 19:43:02 The other item was what time do we want to start? I can send an email out today with a notice of gerrit downtime to our gerrit user mailing lists 19:43:27 I'm good with anything after 9am local time and can probably do an early start at 8am if people prefer earlier. That is 1500UTC 19:43:31 i'm good with pretty much any time 19:43:33 8am is 1500UTC 19:43:50 i have the benefit of living in your future, after all 19:44:12 mordred: and I think you said you wouldn't be around right? So probably fungi, corvus, myself and maybe frickler/ajaeger? 19:44:26 yah - I'm out thurs/fri 19:44:31 corvus: ^ did you have an opinion on start time? 19:45:08 i can be around; earlier is better 19:45:20 ok lets say 1500UTC if earlier is better 19:45:31 wfm, updated my reminder 19:45:57 I think that covers my todo items for project renaming 19:46:03 Next up is server upgrades 19:46:06 #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup 19:46:12 mordred: did you have an update on status.o.o upgrades? 19:46:38 and fungi did you manage to get that wiki puppet change in? 19:46:41 I put up two cleanup patches 19:46:51 mordred: can haz links? 19:46:58 https://review.opendev.org/#/c/661385/ and 19:47:05 https://review.opendev.org/#/c/661399/1 19:47:19 #link https://review.opendev.org/#/c/661385/ https://review.opendev.org/#/c/661399/ status.o.o server upgrade related cleanup changes 19:47:25 yeah, last wiki-dev puppet change merged, i just haven't had time to check back in on it 19:47:42 cool so progress here too. 19:47:49 one of those should get a review from ttx 19:47:51 I think my next step is just going to be spinning up a new host with the existing stuff on xenial and seeing if it works fine - the only thing htat's really potentially yuck is the libjq-jquery package - but it's probably fine 19:48:14 mordred: and we can probably grab that from upstream if necessary pretty easily 19:48:16 (potentially yuck because xenial has a different version of jquery - not because there's anything wrong with the package) 19:48:19 clarkb: yah 19:48:24 ianw: fungi had mentioned that he thought you had said you would do static? is that the case or do we need to find someone to work on that? 19:48:38 looks like puppet hasn't been running on wiki-dev01.opendev.org for a little while, so need to dig into why 19:48:38 there are many things I'd like to change - but trying my best to avoid unnecessary nerd-sniping 19:48:51 mordred: zuul status used to use jquery, does anything current? 19:48:53 corvus: https://review.opendev.org/#/c/661385/ that one? 19:48:53 umm, well i haven't been looking at it ... does it sound appealing to anyone else? 19:48:56 er, i mean on wiki-dev01.openstack.org 19:48:59 clarkb: yes 19:49:06 clarkb: yes - elastic-recheck does 19:49:08 gah 19:49:10 corvus: ^^ 19:49:12 mordred: thx 19:49:19 ianw: looking at notes more closely I think that you put your name on notes about switching cinder volumes around which may be where that idea came from 19:49:21 ianw: i probably just misremembered, sorry 19:49:43 ianw: doesn't have to be you, just wanted to check if it already was :) 19:49:48 clarkb: that may have been what stuck in my mind, yes 19:49:57 corvus: I think in the future we could improve that by updating the elastic-recheck web stuff to build with webpack in the gate so we just deploy a tarball 19:50:39 in any case progress being made on a few fronts here. Thank you to everyone that has helped with this. 19:50:41 mordred: yeah, or build a container or something 19:50:44 yeah 19:51:13 basically - we can make some improvements - but for now just gonna upgrade it as it is and move on 19:51:14 Next up is the mirror replacements which will gives us tls on the mirrors which should hopefully eventually lead to more relaible docker jobs 19:51:24 mordred: +1 19:51:36 link https://review.opendev.org/#/c/661187/ switch rax dfw mirror to new opendev mirror. 19:51:40 #link https://review.opendev.org/#/c/661187/ switch rax dfw mirror to new opendev mirror. 19:52:08 is the site variables file expected to be treated as jinja as in that change? 19:52:10 mordred: corvus ^ 19:52:37 that was my only concern with that approach. Otherwise I think we can go ahead and do that 19:53:05 i think it's "-e" included and the point of evaluation is when ansible loads that, so i think that should work. 19:53:44 awesome I should stamp a +2 on it and have ianw approve when ready to watch it? 19:53:46 also i don't think it will require a restart 19:53:54 i went ahead and +w it 19:53:57 does it need a restart? 19:54:01 corvus: cool 19:54:06 oh, ok, if corvus says no ++ :) 19:54:17 And that takes us to the end of our regularly scheduled agenda 19:54:21 #topic Open Discussion 19:54:38 Anything else we may have missed or should otherwise bring up in our last 5 minutes or so? 19:55:12 corvus (mostly, but anyone) if we could loop back on the dns jobs in https://review.opendev.org/#/q/status:open+topic:check-zone that would be good to just get out of my queue 19:55:21 and stop me messing up any future mirror bringups :) 19:55:44 ianw: will do 19:56:55 oh that's easy 19:56:59 someone should +3 those now :) 19:57:07 i already did 19:57:09 An early heads up: Mid june I'll be in central oregon digging up rocks so probably won't work much between June 17-20. This happens to be the week our certs expire so I'll work on replacing certs the week prior and checked that fungi will be around the 17-20 to finish up any stragglers 19:57:30 yep, and then i disappear the next week 19:57:46 clarkb: like digging up with a backhoe, or digging up with a paintbrush and dental pick? 19:57:56 and mordred, or anyone ansibly -- https://github.com/ansible/ansible/pull/57038 i noticed from our devel job. not really sure if it needs another +2 or whatever the ansible version of that is 19:58:21 corvus: shovel and pick/hoe. We are going to try our luck at finding geodes/thunder eggs 19:58:32 ooh 19:58:43 I will also be out june 17-20 - as well as most of the week after - but then I should be here all of july 19:59:16 i don't have any plans, but maybe i should make some :) 19:59:20 ha 19:59:45 And that takes us to our hour time limit. 19:59:47 Thanks everyone! 19:59:50 #endmeeting