19:01:16 #startmeeting infra 19:01:17 Meeting started Tue Jun 4 19:01:16 2019 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:18 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:20 The meeting name has been set to 'infra' 19:01:26 #link http://lists.openstack.org/pipermail/openstack-infra/2019-June/006396.html 19:01:35 #topic Announcements 19:02:07 These didn't make it to the wiki agenda but I should mention them anyway. I'll be digging up rocks with family June 17-20 in the desert. Don't expect I'll have much of an internet connection 19:02:18 as a result of that I plan to do ssl cert refreshes late next week 19:02:25 (they expire on like the 20th or 21st) 19:02:59 i hear that desert has too many rocks anyway, they won't miss a few 19:03:18 i'm sure it won't be hot at all 19:03:31 he's going before summer! 19:03:39 (a day before summer anyway) 19:03:41 high of 83F today. Only 59F friday 19:03:49 so could be scorching or cool 19:04:04 #topic Actions from last meeting 19:04:14 #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-05-28-19.01.txt minutes from last meeting 19:04:35 ianw volunteered to look into what is necessary to ansible our backup server (instead of updating puppet 3 to 4 on it) 19:04:42 ianw: ^ anything to report on that yet? 19:05:13 i've started on https://review.opendev.org/#/c/662657 although it's in progress, but i think it will work 19:05:51 i think it should allow us to stop with all the manual user adds etc too (something i've had on my todo list for a long time!) ... so next week maybe something to review 19:05:54 #link https://review.opendev.org/#/c/662657 backup server and backup ansible stuff 19:06:20 thank you for kicking that off 19:06:28 also slightly related is https://review.opendev.org/#/c/662662 if people could look to use ansible 2.8 on bridge ... it has some features helpful for deploying ssh keys 19:06:29 that will tie into the gitea mysql backups work I plan to start today 19:07:27 #topic Specs approval 19:07:55 No specs up for approval but a call out for reviews. https://review.opendev.org/#/q/project:opendev/infra-specs+status:open is worth a look if you haev a moment (its on my todo list somewhere after followups on gitea things) 19:08:16 corvus: ^ you may be interested in the irc spec (involves email and irc) 19:08:43 ack, thx 19:08:56 #topic Priority Efforts 19:09:02 #topic Update Config Management 19:09:28 Any news on this topic other than ianw's ansible backup work and my plan to do mysql backups on gitea servers? 19:09:51 Seems we're starting to transition from concrete targets to organic changes on this item 19:09:51 i think we're also waiting on the gitea-project-creation-in-python work 19:09:52 none i'm aware of 19:10:01 corvus: oh right 19:10:06 that's a blocker for finishing up the independent zuul playbook 19:10:31 right so the concrete target we have is the CD'ing end goal 19:10:33 mordred: now that you're back, do you think you might be able to get to that soon? if not, i should be able to do it 19:10:36 * mordred waves 19:10:48 which deps on a few things like independent zuul playbook and gitea project management in python 19:10:49 I can work on that soon 19:11:03 cool, thx 19:11:25 I believe I have no outstanding higher-priority tasks to clear first 19:12:17 great, then once that's in place, we can run the zuul service playbook in test, and then in cd 19:12:26 luckily the logic is in ansible already, so that's awesome 19:12:28 ++ 19:13:56 we may want to consider this item as "done" (enough) and drop it from priority topics once we 've got CDing happening. Then we can transition to more organic updates like the mirror update and backup server update. Then decide if we want to have a concerted effort for replacing puppet 19:14:08 but should probably consider that once we get to the CDing state 19:14:47 ++ 19:15:10 alright lets move on to the next topic 19:15:14 #topic OpenDev 19:15:52 that's a mighty fine topic you have there 19:16:29 Last friday we did the project renaming thing. We learned we had a few more gaps in the renaming playbooks but I think we fixed them all (gerrit start timeout, lack of var loading in play, missing grab of existing orgs to check if we need to create new orgs) 19:16:51 we've also got groups other than ara using zuul based git replication to github 19:17:07 and openstack is talking about how to manage their github orgs now too 19:17:26 all that to say I think we are in really good shape for managing new projects and renames (with the caveat that we still need gerrit downtime) 19:17:33 thank you to everyone who helped with that 19:18:09 we did miss one rename we intended to do 19:18:25 i forget which 19:18:31 yup, it wasn't in the gerrit projects.yaml so didn't make it into my repos.yaml input for the playbook 19:18:40 it was ansible-role-cloud-launcher 19:18:49 ajaeger has pushed changes up to get it on the list for next time 19:20:34 I believe the spec for the migration is still open, but at this point I'm comfortable marking it complete. Any objections? 19:21:02 opendev-gerrit specifically 19:21:29 i think we've done all the worke items 19:21:40 so yeah, i think it's done :) 19:21:50 #link opendev-gerrit spec https://docs.opendev.org/opendev/infra-specs/latest/specs/opendev-gerrit.html 19:21:57 we also had a last-minute rename request... did we end up folding it into the batch? i don't recall now 19:22:03 fungi: we did 19:22:19 we did not fold mnasers last minute request in due to people awnting more discussion on whether or not those belong in openstack/ 19:22:22 okay, so it's just x/ansible-role-cloud-launcher which we're currently missing 19:22:37 yup 19:22:38 oh, right, i was confusing two last-minute requests 19:23:03 seems likely the ones mnaser brought up may move into the openstack org, so would still be renamed 19:23:14 ya but to a different target 19:23:23 right 19:23:29 yes, sorry, I will -w that for now 19:23:45 but i'm expecting them to be in the list for the next maintenance, whenever that winds up getting scheduled 19:23:58 as the logistics will likely be worked out by then 19:24:10 ++ 19:24:23 Anything else opendev related worth bringin up? 19:24:33 corvus: ^ did you want to talk about tarballs.opendev.org/ 19:25:28 oh, yeah... 19:25:58 zuul needs a place to put branch-tip tarballs before it can move out of the openstack zuul tenant 19:26:07 (presumably, other opendev projects will want the same) 19:26:44 some opendev admins and zuul maint folks had a quick chat in #zuul about it, and we figured making tarballs.opendev.org, similar to tarballs.openstack.org, is the best way forward for now 19:27:02 i don't think that's controversial, so i plan on just doing it 19:27:11 changes are in progress and nearly complete 19:27:40 there are two big differences to the old oepnstack tarballs ost which is it will be using afs instead of "local" disk and projects are expected to be fully namespaced (not just using shortnames) 19:27:40 main differences between this and existing tarballs site are: 1) new site will be in afs; 2) it will be scoped by full project name rather than the short name 19:28:20 yay I didn't say inaccurate things :) 19:28:22 #link opendev tarball publish jobs https://review.opendev.org/662877 19:28:32 that's the current tip of development (follow depends-on for the rabbit hole 19:28:57 i'm currently testing its dependencies 19:28:58 thanks for the update on that. 19:29:05 np. eol. 19:29:14 any other opendev items before we move on to storyboard? 19:30:00 also we have the option to migrate tarballs.openstack.org into that afs volume if we update the publishing jobs and adjust teh vhost config for it, i think? 19:30:27 (well, amd move the vhost to somewhere like files.o.o or add an afs client on static.o.o) 19:30:30 fungi: should be possible. basically have new runs of jobs publish to afs then rsync the old stuff over 19:30:46 yeah. we should schedule that since we're changing publication locations 19:30:56 yeah, would like an eventual result where we're not indefinitely maintaining two different solutions for it 19:31:06 ++ 19:31:09 but it's branch-tip tarballs -- i don't think it's going to be a big deal. 19:31:22 well, it's also our release tarballs 19:31:25 one thought popped up ... we've had issues before with parallel vos release's ... i haven't looked but is good locking around this explicitly handled? 19:31:42 ianw: i plan on adding it to the periodic vos release cron 19:31:49 so publication to the new site will be delayed up to 5m. 19:32:07 i think that's fine if we set that expectation 19:32:13 (and that's only one actor) 19:32:27 that cron runs on afsdb01 iirc 19:32:30 so ya only one actor for that 19:32:33 if jobs need a tighter timeline to reuse those artifacts, then they can use paused jobs or something 19:33:00 like the image jobs do it with their own ephemeral repository 19:33:14 fungi: can also read from the RW afs mount 19:33:54 could the pypi publication jobs be affected? 19:33:58 i suppose, but probably better we just warn against reconsuming those artifacts in dependent jobs where you may race a vos release 19:34:07 off the top of my head, i don't remember the sequence for that 19:34:15 the pypi publication jobs don't reconsume what's pushed onto the tarballs site any longer 19:34:24 that change happened with the zuulv3 transition 19:34:32 they used to, but the zuul v3 refactor elimniated teh push/pull/push model 19:34:40 yeah, what clarkb said 19:34:54 ok we're probably good then 19:34:59 we only did it back then to be able to have signing keys on a different (static) host for added protection 19:35:09 zuul v3 solves that problem more cleanly for us 19:35:32 well, and also pypi creds 19:35:45 alright we have ~25 minutes left in our meeting so lets continue on 19:35:50 #topic Storyboard 19:36:05 fungi: diablo_rojo how are storyboard things? 19:36:24 oh, i set up the slow query log but have not pulled in the net to see what it's caught yet 19:36:27 i need to do that 19:36:29 No negative feedback fromt he swift team yet 19:36:34 post test migration 19:36:37 so thats cool 19:36:45 corvus: i skimmed it the day after you turned it on and there are some egregious examples 19:36:54 fungi: great news! 19:37:00 We should definitely pull that. 19:37:11 Now that our outreachy intern is getting started 19:37:13 queries running over 3 minutes with inner-joins several layers deep and majorly compound selects 19:37:28 #action corvus do something with the slow query log 19:37:28 She could use that to help target what she can improve 19:37:37 corvus: other great news is that SotK thinks some of those may be solved with a change he has up for review 19:37:43 fungi: corvus diablo_rojo probably make sense to throw that up on paste.o.o or an etherpad and get eyeballs on it? 19:37:45 (we have an outreachy intern and she has started now) 19:37:54 Yes please :) 19:37:56 ok i'll do that asap 19:38:02 corvus, merci :) 19:38:19 #link http://paste.openstack.org/show/752274 an egregious example from the slow-query log 19:38:43 note that's a single entry for a single query 19:38:54 and time is in seconds? 19:38:57 yup 19:39:12 wow. 19:39:19 Ouch. 19:39:32 sounds like progress at least :) 19:39:34 i wonder if we can pair that up with a request log to figure out what code path triggered it 19:39:38 granted, that was the longest query execution i could find in the first 24 hours of the log, so it's a doozy 19:40:01 one could probably identify the code location from the query alone, but it might be difficult. 19:40:19 i think SotK knew where it likely was off the top of his head 19:40:26 oh good 19:40:55 then maybe let's just grab the queries, and if we can't figure it out from that, go to the request logs as needed. 19:41:03 sounds like a plan 19:41:10 yeah, i think that's a pragmatic approach 19:41:19 Yeah I can probably guess at most things from the queries :) 19:41:30 Anything else storyboard related before we move on? 19:41:37 just feed them to the sotk api ;) 19:42:00 clarkb: nothing from me 19:42:34 #topic General Topics 19:42:44 #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup 19:42:48 mordred: any luck with status? 19:42:55 and fungi same question for you re wiki 19:43:14 Nothing from me 19:43:22 i have another puppet module change for more packages which needed replacing 19:43:27 * fungi finds 19:43:47 #link https://review.opendev.org/661848 More transitional Xenial package name replacements 19:43:53 it's rather like whack-a-mole 19:44:07 looks like it only needs one more +2 (I already got that one) 19:44:10 the entire kitchen sink was explicitly specified in the manifest 19:44:22 and now we have to do dishes 19:44:27 in a variety of places too 19:44:53 yeah, next time we need to do dishes i think it may be easier to burn down the kitchen and build a new one 19:44:59 clarkb: oh - crappit - THAT was the thing I had on my plate before gitea project creation 19:45:13 mordred: yay for reminders then :) 19:45:16 yay! 19:45:22 mordred: too late, we just put the plate in the dishwasher 19:45:22 i think 19:45:24 again let me know if I can assist on these 19:45:28 it's hard to follow the metaphors here 19:45:36 I thought the plate got put in the mole 19:45:45 Next up was following up on the mirror rebuilds 19:46:19 ianw do you think we are ready to do that in all regions now? We did have a slight hiccup with mirror.dfw.rax.opendev.org after the apache disk ran out of space btu it had to do with afs ubuntu volume cache 19:46:28 I did a flushvolume and all is well again 19:46:44 i think so 19:47:07 yeah, still a mystery what happened there 19:47:12 #link https://review.opendev.org/#/c/662357/ 19:47:22 one possible very minor thing is to get the names a bit clearer on the logs ^ 19:47:34 looks like we were returning truncated results from the afs cache starting around the time the apache proxy ran out of disk space in the other volume 19:47:45 getting that rolled out then adding https to the docker proxy will simplify image building things too so worthwhile to keep pushing on that 19:48:07 fungi: i'm not sure i want to think too hard on why they're related 19:48:20 also it looks like it may be required in order to use buildkit to build images 19:48:39 but yeah, the misconfiguration of disk sizes was my mistake 19:49:21 Next item on the list is cleaning up our arm64 cloud configuration from nodepool. This is a bit intertwined with mordred's work to have nodepool builders build control plane images 19:49:36 mordred: ^ if you want to start approving those I'm able to help keep an eye on stuff this afternoon 19:50:06 we had to cleanup nb03's disk and reboot it (restarting services) to get it building images again. But it is now doing that successfully so I think the functional problems are alrgely addressed 19:50:13 now just a matter of cleaning out the clouds that no longer exist 19:50:43 clarkb: sweet 19:51:05 And then if we can sneak it in before our hour is up I wanted to bring up setting a flag day(s) for bindep fallback removal and z-c shim removal from our base jobs 19:51:05 related to mirroring and arm64, hrw has a change up since yesterdayish to add arm64 mirrors of centos to our rsync config 19:51:34 clarkb: ++ to both of those 19:51:44 This week is an openstack milestone week but we are clear on openstack's shcedule for like the next 6 weeks 19:51:49 #link https://review.opendev.org/662727 centos-mirror-update: handle non-x86 architectures 19:51:52 (for reference) 19:52:01 ppc64 as well i guess 19:52:08 maybe do the switch on the 24th (that should be plenty of time for warning people and testing) 19:52:16 fungi: yeah, i can babysit that today 19:52:40 fungi: ^ do you think the 24th would be too soon to do the bindep and z-c removals? 19:53:07 clarkb: of june? yeah that's probably sufficient notice 19:53:14 ya 19:53:18 if we get to communicating broadly right away 19:53:33 fungi: I think step 0 is get the chagne up for z-c removal so we can link to it in communications 19:53:39 then ya step 1 is send email todayish 19:53:52 sure, i'll push it right after the meeting, was mostly looking for consensus on the direction first 19:53:55 and I figure one day with one big bandaid pull is better than multiple days 19:54:23 i think my jaunt through time reminded me that it was something we'd been meaning to do early in the v3 transition and then we lost track of it 19:54:41 I think we realized that people were making new legacy jobs 19:54:53 and so the jobs that depended on it were getting larger in number and not smaller 19:55:41 next up after this, removing legacy-base :) 19:56:14 #topic Open Discussion 19:56:20 We have a few minutes for any last items 19:57:04 or more sandwiches 19:57:13 i'm all out of both 19:57:21 me too. My burger is gone 19:57:40 oh wait I have a small thing. Larissa has jury duty tomorrow so I'm doing dadops 19:57:45 I'll be less useful tomorrow than today 19:58:32 trying to explain jury duty to a 4 year old is difficult if anyone is wondering 19:59:00 clarkb: does that make it doubly difficult to explain to 2 4 year olds? 19:59:07 it's been FOUR years? 19:59:12 one reason the week of the 24th is good for ripping off the legacy job bandages is that i'll be out of town the following week so immune to the flamewar which ensues 19:59:32 corvus: as of two days ago it has been four years 19:59:41 fungi: ha 19:59:56 happy birthdays! 20:00:01 look on the bright side... you only need to throw one birthday party ;) 20:00:08 fungi: its great actually :) 20:00:11 and we are at time 20:00:13 thank you everyone 20:00:15 #endmeeting