19:01:16 <clarkb> #startmeeting infra
19:01:17 <openstack> Meeting started Tue Jun  4 19:01:16 2019 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:20 <openstack> The meeting name has been set to 'infra'
19:01:26 <clarkb> #link http://lists.openstack.org/pipermail/openstack-infra/2019-June/006396.html
19:01:35 <clarkb> #topic Announcements
19:02:07 <clarkb> These didn't make it to the wiki agenda but I should mention them anyway. I'll be digging up rocks with family June 17-20 in the desert. Don't expect I'll have much of an internet connection
19:02:18 <clarkb> as a result of that I plan to do ssl cert refreshes late next week
19:02:25 <clarkb> (they expire on like the 20th or 21st)
19:02:59 <fungi> i hear that desert has too many rocks anyway, they won't miss a few
19:03:18 <Shrews> i'm sure it won't be hot at all
19:03:31 <fungi> he's going before summer!
19:03:39 <fungi> (a day before summer anyway)
19:03:41 <clarkb> high of 83F today. Only 59F friday
19:03:49 <clarkb> so could be scorching or cool
19:04:04 <clarkb> #topic Actions from last meeting
19:04:14 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2019/infra.2019-05-28-19.01.txt minutes from last meeting
19:04:35 <clarkb> ianw volunteered to look into what is necessary to ansible our backup server (instead of updating puppet 3 to 4 on it)
19:04:42 <clarkb> ianw: ^ anything to report on that yet?
19:05:13 <ianw> i've started on https://review.opendev.org/#/c/662657 although it's in progress, but i think it will work
19:05:51 <ianw> i think it should allow us to stop with all the manual user adds etc too (something i've had on my todo list for a long time!) ... so next week maybe something to review
19:05:54 <clarkb> #link https://review.opendev.org/#/c/662657 backup server and backup ansible stuff
19:06:20 <clarkb> thank you for kicking that off
19:06:28 <ianw> also slightly related is https://review.opendev.org/#/c/662662 if people could look to use ansible 2.8 on bridge ... it has some features helpful for deploying ssh keys
19:06:29 <clarkb> that will tie into the gitea mysql backups work I plan to start today
19:07:27 <clarkb> #topic Specs approval
19:07:55 <clarkb> No specs up for approval but a call out for reviews. https://review.opendev.org/#/q/project:opendev/infra-specs+status:open is worth a look if you haev a moment (its on my todo list somewhere after followups on gitea things)
19:08:16 <clarkb> corvus: ^ you may be interested in the irc spec (involves email and irc)
19:08:43 <corvus> ack, thx
19:08:56 <clarkb> #topic Priority Efforts
19:09:02 <clarkb> #topic Update Config Management
19:09:28 <clarkb> Any news on this topic other than ianw's ansible backup work and my plan to do mysql backups on gitea servers?
19:09:51 <clarkb> Seems we're starting to transition from concrete targets to organic changes on this item
19:09:51 <corvus> i think we're also waiting on the gitea-project-creation-in-python work
19:09:52 <fungi> none i'm aware of
19:10:01 <clarkb> corvus: oh right
19:10:06 <corvus> that's a blocker for finishing up the independent zuul playbook
19:10:31 <clarkb> right so the concrete target we have is the CD'ing end goal
19:10:33 <corvus> mordred: now that you're back, do you think you might be able to get to that soon?  if not, i should be able to do it
19:10:36 * mordred waves
19:10:48 <clarkb> which deps on a few things like independent zuul playbook and gitea project management in python
19:10:49 <mordred> I can work on that soon
19:11:03 <corvus> cool, thx
19:11:25 <mordred> I believe I have no outstanding higher-priority tasks to clear first
19:12:17 <corvus> great, then once that's in place, we can run the zuul service playbook in test, and then in cd
19:12:26 <mordred> luckily the logic is in ansible already, so that's awesome
19:12:28 <mordred> ++
19:13:56 <clarkb> we may want to consider this item as "done" (enough) and drop it from priority topics once we 've got CDing happening. Then we can transition to more organic updates like the mirror update and backup server update. Then decide if we want to have a concerted effort for replacing puppet
19:14:08 <clarkb> but should probably consider that once we get to the CDing state
19:14:47 <mordred> ++
19:15:10 <clarkb> alright lets move on to the next topic
19:15:14 <clarkb> #topic OpenDev
19:15:52 <fungi> that's a mighty fine topic you have there
19:16:29 <clarkb> Last friday we did the project renaming thing. We learned we had a few more gaps in the renaming playbooks but I think we fixed them all (gerrit start timeout, lack of var loading in play, missing grab of existing orgs to check if we need to create new orgs)
19:16:51 <clarkb> we've also got groups other than ara using zuul based git replication to github
19:17:07 <clarkb> and openstack is talking about how to manage their github orgs now too
19:17:26 <clarkb> all that to say I think we are in really good shape for managing new projects and renames (with the caveat that we still need gerrit downtime)
19:17:33 <clarkb> thank you to everyone who helped with that
19:18:09 <corvus> we did miss one rename we intended to do
19:18:25 <corvus> i forget which
19:18:31 <clarkb> yup, it wasn't in the gerrit projects.yaml so didn't make it into my repos.yaml input for the playbook
19:18:40 <clarkb> it was ansible-role-cloud-launcher
19:18:49 <clarkb> ajaeger has pushed changes up to get it on the list for next time
19:20:34 <clarkb> I believe the spec for the migration is still open, but at this point I'm comfortable marking it complete. Any objections?
19:21:02 <clarkb> opendev-gerrit specifically
19:21:29 <corvus> i think we've done all the worke items
19:21:40 <corvus> so yeah, i think it's done :)
19:21:50 <corvus> #link opendev-gerrit spec https://docs.opendev.org/opendev/infra-specs/latest/specs/opendev-gerrit.html
19:21:57 <fungi> we also had a last-minute rename request... did we end up folding it into the batch? i don't recall now
19:22:03 <clarkb> fungi: we did
19:22:19 <clarkb> we did not fold mnasers last minute request in due to people awnting more discussion on whether or not those belong in openstack/
19:22:22 <fungi> okay, so it's just x/ansible-role-cloud-launcher which we're currently missing
19:22:37 <clarkb> yup
19:22:38 <fungi> oh, right, i was confusing two last-minute requests
19:23:03 <fungi> seems likely the ones mnaser brought up may move into the openstack org, so would still be renamed
19:23:14 <clarkb> ya but to a different target
19:23:23 <fungi> right
19:23:29 <mnaser> yes, sorry, I will -w that for now
19:23:45 <fungi> but i'm expecting them to be in the list for the next maintenance, whenever that winds up getting scheduled
19:23:58 <fungi> as the logistics will likely be worked out by then
19:24:10 <clarkb> ++
19:24:23 <clarkb> Anything else opendev related worth bringin up?
19:24:33 <clarkb> corvus: ^ did you want to talk about tarballs.opendev.org/
19:25:28 <corvus> oh, yeah...
19:25:58 <corvus> zuul needs a place to put branch-tip tarballs before it can move out of the openstack zuul tenant
19:26:07 <corvus> (presumably, other opendev projects will want the same)
19:26:44 <corvus> some opendev admins and zuul maint folks had a quick chat in #zuul about it, and we figured making tarballs.opendev.org, similar to tarballs.openstack.org, is the best way forward for now
19:27:02 <corvus> i don't think that's controversial, so i plan on just doing it
19:27:11 <corvus> changes are in progress and nearly complete
19:27:40 <clarkb> there are two big differences to the old oepnstack tarballs ost which is it will be using afs instead of "local" disk and projects are expected to be fully namespaced (not just using shortnames)
19:27:40 <corvus> main differences between this and existing tarballs site are: 1) new site will be in afs; 2) it will be scoped by full project name rather than the short name
19:28:20 <clarkb> yay I didn't say inaccurate things :)
19:28:22 <corvus> #link opendev tarball publish jobs https://review.opendev.org/662877
19:28:32 <corvus> that's the current tip of development (follow depends-on for the rabbit hole
19:28:57 <corvus> i'm currently testing its dependencies
19:28:58 <clarkb> thanks for the update on that.
19:29:05 <corvus> np.  eol.
19:29:14 <clarkb> any other opendev items before we move on to storyboard?
19:30:00 <fungi> also we have the option to migrate tarballs.openstack.org into that afs volume if we update the publishing jobs and adjust teh vhost config for it, i think?
19:30:27 <fungi> (well, amd move the vhost to somewhere like files.o.o or add an afs client on static.o.o)
19:30:30 <clarkb> fungi: should be possible. basically have new runs of jobs publish to afs then rsync the old stuff over
19:30:46 <corvus> yeah.  we should schedule that since we're changing publication locations
19:30:56 <fungi> yeah, would like an eventual result where we're not indefinitely maintaining two different solutions for it
19:31:06 <clarkb> ++
19:31:09 <corvus> but it's branch-tip tarballs -- i don't think it's going to be a big deal.
19:31:22 <corvus> well, it's also our release tarballs
19:31:25 <ianw> one thought popped up ... we've had issues before with parallel vos release's ... i haven't looked but is good locking around this explicitly handled?
19:31:42 <corvus> ianw: i plan on adding it to the periodic vos release cron
19:31:49 <corvus> so publication to the new site will be delayed up to 5m.
19:32:07 <fungi> i think that's fine if we set that expectation
19:32:13 <corvus> (and that's only one actor)
19:32:27 <clarkb> that cron runs on afsdb01 iirc
19:32:30 <clarkb> so ya only one actor for that
19:32:33 <fungi> if jobs need a tighter timeline to reuse those artifacts, then they can use paused jobs or something
19:33:00 <fungi> like the image jobs do it with their own ephemeral repository
19:33:14 <clarkb> fungi: can also read from the RW afs mount
19:33:54 <corvus> could the pypi publication jobs be affected?
19:33:58 <fungi> i suppose, but probably better we just warn against reconsuming those artifacts in dependent jobs where you may race a vos release
19:34:07 <corvus> off the top of my head, i don't remember the sequence for that
19:34:15 <fungi> the pypi publication jobs don't reconsume what's pushed onto the tarballs site any longer
19:34:24 <clarkb> that change happened with the zuulv3 transition
19:34:32 <fungi> they used to, but the zuul v3 refactor elimniated teh push/pull/push model
19:34:40 <fungi> yeah, what clarkb said
19:34:54 <corvus> ok we're probably good then
19:34:59 <fungi> we only did it back then to be able to have signing keys on a different (static) host for added protection
19:35:09 <fungi> zuul v3 solves that problem more cleanly for us
19:35:32 <fungi> well, and also pypi creds
19:35:45 <clarkb> alright we have ~25 minutes left in our meeting so lets continue on
19:35:50 <clarkb> #topic Storyboard
19:36:05 <clarkb> fungi: diablo_rojo how are storyboard things?
19:36:24 <corvus> oh, i set up the slow query log but have not pulled in the net to see what it's caught yet
19:36:27 <corvus> i need to do that
19:36:29 <diablo_rojo> No negative feedback fromt he swift team yet
19:36:34 <diablo_rojo> post test migration
19:36:37 <diablo_rojo> so thats cool
19:36:45 <fungi> corvus: i skimmed it the day after you turned it on and there are some egregious examples
19:36:54 <corvus> fungi: great news!
19:37:00 <diablo_rojo> We should definitely pull that.
19:37:11 <diablo_rojo> Now that our outreachy intern is getting started
19:37:13 <fungi> queries running over 3 minutes with inner-joins several layers deep and majorly compound selects
19:37:28 <corvus> #action corvus do something with the slow query log
19:37:28 <diablo_rojo> She could use that to help target what she can improve
19:37:37 <fungi> corvus: other great news is that SotK thinks some of those may be solved with a change he has up for review
19:37:43 <clarkb> fungi: corvus diablo_rojo probably make sense to throw that up on paste.o.o or an etherpad and get eyeballs on it?
19:37:45 <diablo_rojo> (we have an outreachy intern and she has started now)
19:37:54 <diablo_rojo> Yes please :)
19:37:56 <corvus> ok i'll do that asap
19:38:02 <diablo_rojo> corvus, merci :)
19:38:19 <fungi> #link http://paste.openstack.org/show/752274 an egregious example from the slow-query log
19:38:43 <fungi> note that's a single entry for a single query
19:38:54 <clarkb> and time is in seconds?
19:38:57 <fungi> yup
19:39:12 <mordred> wow.
19:39:19 <diablo_rojo> Ouch.
19:39:32 <clarkb> sounds like progress at least :)
19:39:34 <corvus> i wonder if we can pair that up with a request log to figure out what code path triggered it
19:39:38 <fungi> granted, that was the longest query execution i could find in the first 24 hours of the log, so it's a doozy
19:40:01 <corvus> one could probably identify the code location from the query alone, but it might be difficult.
19:40:19 <fungi> i think SotK knew where it likely was off the top of his head
19:40:26 <corvus> oh good
19:40:55 <corvus> then maybe let's just grab the queries, and if we can't figure it out from that, go to the request logs as needed.
19:41:03 <clarkb> sounds like a plan
19:41:10 <fungi> yeah, i think that's a pragmatic approach
19:41:19 <SotK> Yeah I can probably guess at most things from the queries :)
19:41:30 <clarkb> Anything else storyboard related before we move on?
19:41:37 <fungi> just feed them to the sotk api ;)
19:42:00 <fungi> clarkb: nothing from me
19:42:34 <clarkb> #topic General Topics
19:42:44 <clarkb> #link https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup
19:42:48 <clarkb> mordred: any luck with status?
19:42:55 <clarkb> and fungi same question for you re wiki
19:43:14 <diablo_rojo> Nothing from me
19:43:22 <fungi> i have another puppet module change for more packages which needed replacing
19:43:27 * fungi finds
19:43:47 <fungi> #link https://review.opendev.org/661848 More transitional Xenial package name replacements
19:43:53 <fungi> it's rather like whack-a-mole
19:44:07 <clarkb> looks like it only needs one more +2 (I already got that one)
19:44:10 <fungi> the entire kitchen sink was explicitly specified in the manifest
19:44:22 <clarkb> and now we have to do dishes
19:44:27 <fungi> in a variety of places too
19:44:53 <fungi> yeah, next time we need to do dishes i think it may be easier to burn down the kitchen and build a new one
19:44:59 <mordred> clarkb: oh - crappit - THAT was the thing I had on my plate before gitea project creation
19:45:13 <clarkb> mordred: yay for reminders then :)
19:45:16 <mordred> yay!
19:45:22 <corvus> mordred: too late, we just put the plate in the dishwasher
19:45:22 <corvus> i think
19:45:24 <clarkb> again let me know if I can assist on these
19:45:28 <corvus> it's hard to follow the metaphors here
19:45:36 <mordred> I thought the plate got put in the mole
19:45:45 <clarkb> Next up was following up on the mirror rebuilds
19:46:19 <clarkb> ianw do you think we are ready to do that in all regions now? We did have a slight hiccup with mirror.dfw.rax.opendev.org after the apache disk ran out of space btu it had to do with afs ubuntu volume cache
19:46:28 <clarkb> I did a flushvolume and all is well again
19:46:44 <ianw> i think so
19:47:07 <fungi> yeah, still a mystery what happened there
19:47:12 <ianw> #link https://review.opendev.org/#/c/662357/
19:47:22 <ianw> one possible very minor thing is to get the names a bit clearer on the logs ^
19:47:34 <fungi> looks like we were returning truncated results from the afs cache starting around the time the apache proxy ran out of disk space in the other volume
19:47:45 <clarkb> getting that rolled out then adding https to the docker proxy will simplify image building things too so worthwhile to keep pushing on that
19:48:07 <ianw> fungi: i'm not sure i want to think too hard on why they're related
19:48:20 <corvus> also it looks like it may be required in order to use buildkit to build images
19:48:39 <ianw> but yeah, the misconfiguration of disk sizes was my mistake
19:49:21 <clarkb> Next item on the list is cleaning up our arm64 cloud configuration from nodepool. This is a bit intertwined with mordred's work to have nodepool builders build control plane images
19:49:36 <clarkb> mordred: ^ if you want to start approving those I'm able to help keep an eye on stuff this afternoon
19:50:06 <clarkb> we had to cleanup nb03's disk and reboot it (restarting services) to get it building images again. But it is now doing that successfully so I think the functional problems are alrgely addressed
19:50:13 <clarkb> now just a matter of cleaning out the clouds that no longer exist
19:50:43 <mordred> clarkb: sweet
19:51:05 <clarkb> And then if we can sneak it in before our hour is up I wanted to bring up setting a flag day(s) for bindep fallback removal and z-c shim removal from our base jobs
19:51:05 <fungi> related to mirroring and arm64, hrw has a change up since yesterdayish to add arm64 mirrors of centos to our rsync config
19:51:34 <mordred> clarkb: ++ to both of those
19:51:44 <clarkb> This week is an openstack milestone week but we are clear on openstack's shcedule for like the next 6 weeks
19:51:49 <fungi> #link https://review.opendev.org/662727 centos-mirror-update: handle non-x86 architectures
19:51:52 <fungi> (for reference)
19:52:01 <fungi> ppc64 as well i guess
19:52:08 <clarkb> maybe do the switch on the 24th (that should be plenty of time for warning people and testing)
19:52:16 <ianw> fungi: yeah, i can babysit that today
19:52:40 <clarkb> fungi: ^ do you think the 24th would be too soon to do the bindep and z-c removals?
19:53:07 <fungi> clarkb: of june? yeah that's probably sufficient notice
19:53:14 <clarkb> ya
19:53:18 <fungi> if we get to communicating broadly right away
19:53:33 <clarkb> fungi: I think step 0 is get the chagne up for z-c removal so we can link to it in communications
19:53:39 <clarkb> then ya step 1 is send email todayish
19:53:52 <fungi> sure, i'll push it right after the meeting, was mostly looking for consensus on the direction first
19:53:55 <clarkb> and I figure one day with one big bandaid pull is better than multiple days
19:54:23 <fungi> i think my jaunt through time reminded me that it was something we'd been meaning to do early in the v3 transition and then we lost track of it
19:54:41 <clarkb> I think we realized that people were making new legacy jobs
19:54:53 <clarkb> and so the jobs that depended on it were getting larger in number and not smaller
19:55:41 <corvus> next up after this, removing legacy-base :)
19:56:14 <clarkb> #topic Open Discussion
19:56:20 <clarkb> We have a few minutes for any last items
19:57:04 <fungi> or more sandwiches
19:57:13 <corvus> i'm all out of both
19:57:21 <clarkb> me too. My burger is gone
19:57:40 <clarkb> oh wait I have a small thing. Larissa has jury duty tomorrow so I'm doing dadops
19:57:45 <clarkb> I'll be less useful tomorrow than today
19:58:32 <clarkb> trying to explain jury duty to a 4 year old is difficult if anyone is wondering
19:59:00 <mordred> clarkb: does that make it doubly difficult to explain to 2 4 year olds?
19:59:07 <corvus> it's been FOUR years?
19:59:12 <fungi> one reason the week of the 24th is good for ripping off the legacy job bandages is that i'll be out of town the following week so immune to the flamewar which ensues
19:59:32 <clarkb> corvus: as of two days ago it has been four years
19:59:41 <clarkb> fungi: ha
19:59:56 <corvus> happy birthdays!
20:00:01 <fungi> look on the bright side... you only need to throw one birthday party ;)
20:00:08 <clarkb> fungi: its great actually :)
20:00:11 <clarkb> and we are at time
20:00:13 <clarkb> thank you everyone
20:00:15 <clarkb> #endmeeting