19:01:16 <ianw> #startmeeting infra
19:01:17 <openstack> Meeting started Tue Mar 23 19:01:16 2021 UTC and is due to finish in 60 minutes.  The chair is ianw. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:20 <openstack> The meeting name has been set to 'infra'
19:01:28 <ianw> #topic Annoucements
19:01:54 <fungi> this just in: clark takes a week off
19:02:15 <ianw> i also spelt that wrong
19:02:33 <fungi> nothing wrong with an annoucement or two
19:02:37 <ianw> anyway, yes, no other global announcements
19:02:49 <ianw> #topic Actions from last meeting
19:03:01 <ianw> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-03-16-19.01.html minutes from last meeting
19:03:16 <ianw> we didn't seem to have any particular action items
19:03:42 <ianw> #topic Specs approval
19:04:09 <fungi> is the gerrit server replacement spec ready for consideration?
19:04:45 <ianw> not quite, i was going to start up a new server and then fill in some things from info from that
19:05:16 <fungi> cool. i'm good with what's there so far anyway
19:05:37 <ianw> #topic Priority Efforts
19:05:55 <ianw> #topic Update Config Management
19:06:08 <ianw> I think we will cover the active parts of this in other topics
19:06:33 <ianw> #topic Opendev
19:06:47 <ianw> the main work here is the Gerrit account inconsistencies
19:07:08 <ianw> this is really being driven by clarkb, but maybe fungi you have a update?
19:07:18 <fungi> nothing new this week, no
19:07:35 <fungi> in some belated afs news, the debian 10.9 stable point release will include the awaited openafs fix
19:07:56 <fungi> so we should be able to simplify our buster image builds next week
19:08:02 <fungi> release is scheduled for saturday
19:08:48 <ianw> cool.  executors rely on AFS from outside the container though, right?
19:09:21 <fungi> yes, in our case i believe so
19:09:33 <fungi> but this was for testing it
19:09:42 <fungi> where we added the temporary workaround
19:10:04 <fungi> or at least that's the only lingering workaround i remember
19:10:38 <ianw> ++
19:10:51 <fungi> also i've just about hammered out getting zuul-jobs working with our gentoo images again, thanks to prometheanfire's help
19:12:09 <ianw> yeah i saw something fly by -- i feel like gentoo images are currently not building
19:12:25 <fungi> oh, again? i'll check that too
19:12:26 <ianw> that was something to do with iscsi and newer gcc
19:12:30 <fungi> they were working a few days ago
19:12:41 <fungi> ahh, right, that. i think he had a fix happening upstream there
19:13:21 <fungi> and for simplifying our gerrit all-projects acl, i looked into repurposing the openstack/openstack acl to contain the openstack release management bits, but ultimately determined that was a non-starter due to an exclusive setting in one section. so i've tentatively settled on making "openstack/meta-config" as the empty project for other openstack projects to inherit, but am not thrilled with the name
19:13:23 <fungi> (especially considering we may want to recommend this model to other namespaces)
19:13:30 <ianw> ok, i've noticed failures on some glean changes i've pushed, will have to look closer
19:14:29 <fungi> i'll push the change up for openstack/meta-config later today, folks can follow up there if they have good name suggestions
19:14:58 <ianw> ok, this is for release managers to remove old branches?
19:15:59 <fungi> well, more generally, to get openstack release manager permissions out of our global config and into an openstack-only acl
19:16:22 <fungi> so that, e.g., openstack release managers can't accidentally push tags for airship
19:16:42 <ianw> ahh, right, got it
19:17:14 <fungi> but yes also so that they can't accidentally delete another project's branches
19:18:40 <ianw> there's also a note in the agenda about configuration tuning
19:18:53 <fungi> ahh, yup
19:18:58 <ianw> i'm not sure we've discussed that previously
19:19:22 <fungi> oh, also on the gerrit theme, i've pushed up some changes to partially restore launchpad in-progress integration
19:19:36 <fungi> as a stopgap until someone has time to write the replacement
19:19:58 <fungi> #link https://review.opendev.org/782538 Stop trying to assign Launchpad bugs
19:20:12 <fungi> #link https://review.opendev.org/782540 Run update-bug on patchset-created again
19:20:29 <ianw> ++ that seems like a good compromise
19:20:37 <fungi> the first one seems to have a job failure, likely bitrot for jeepyb
19:20:59 <fungi> i'll look at it shortly
19:21:22 <fungi> oh, and we're on a new version of zuul (4.1.0) but had to roll back off master temporarily
19:21:45 <fungi> corvus has fixed the bug we rolled back for, and we'll be restarting again on latest master shortly after this meeting
19:22:02 <corvus> and swest fixed the next bug we would have seen which avass found :)
19:22:41 <corvus> (2nd bug only affected github; we probably would have seen it eventually)
19:23:33 <fungi> i also revisited the gerrit upgrade fallout pad and tried to catch it up to current reality
19:23:40 <ianw> ok, will watch out for all that and any new behaviour
19:23:45 <fungi> #link https://etherpad.opendev.org/p/gerrit-3.2-post-upgrade-notes
19:23:49 <corvus> however, i think we had a "pretty good" run on 4.1.0 in that the openstack tenant was fully running with the event queues in zk, with, afaict, no appreciable change in performance or load on zk.  so i'm not too worried about the switch back.
19:24:00 <fungi> if there's anything still in there which we've fixed or can stop worrying about, please mark it off the list
19:25:21 <ianw> #link https://grafana.opendev.org/d/5Imot6EMk/zuul-status?orgId=1
19:25:31 <ianw> for anyone who hasn't seen recent updates to add zookeeper stats in there
19:26:26 <fungi> oh, also there's a push to get debian-bullseye images added, starting with package mirroring. i think we'll need to evaluate quota usage on that volume as well as afs01.dfw overall
19:26:57 <ianw> i think i may still owe some cleanups on fedora
19:27:04 <fungi> checking out the volume utilization on our afs stats grafana dashboard, quite a few volumes are almost full, yeah
19:27:48 <ianw> #link https://grafana.opendev.org/d/T5zTt6PGk/afs?orgId=1
19:27:56 <fungi> i suggested seeing if we can drop debian-stretch mirroring, but a number of openstack projects are still etsting with it on older stable branches
19:27:58 <ianw> the wheel release stats there are depressing.  i'll have to look at that
19:28:30 <fungi> though related, we still have a node label named "debian-stable" aliasing stretch, when buster is the current stable as of a couple of years ago
19:29:43 <fungi> we should probably encourage people to reevaluate their use of that, and either update or remove it
19:30:20 <ianw> we do have plenty of disk quota in rax dfw so adding a drive to vicepa might be the simplest thing
19:31:13 <fungi> yeah, though the more cinder volumes we attach the more precarious it becomes, as we saw with the old static.o.o
19:31:44 <fungi> we're basically multiplying the odds of catastrophic failure by the number of volumes
19:31:52 <ianw> or even afs01.dfw, when i rebooted it recently :)
19:32:21 <ianw> one thing i've been meaning to look at too, after that OVH region burnt down, is the redundancy of tarballs in particular
19:32:58 <fungi> in theory we replicate that, and can turn a read-only replica into the new read-write replica
19:33:11 <ianw> it's sort of related to the failure mode; when we have vos release failures and require full releases, we get tied up in days and days of copying
19:33:29 <fungi> yup
19:34:03 <fungi> as for the recent afs01.dfw boot failure, i'm almost certain it's because we created the pv on the raw vilume block device and not a partition
19:34:16 <fungi> i have a feeling we could reproduce that if we wanted
19:34:27 <ianw> still, since we moved to running releases via ssh i think things have generally been more reliable
19:34:39 <fungi> yes, that's helped immensely
19:35:23 <ianw> we also spent quite a long time diagnosing and tuning rsync to stop touching every file for some updates too, which helped
19:36:05 <ianw> alright, i think let's move on
19:36:08 <ianw> #topic General Topics
19:36:15 <ianw> #topic Puppet/Ansible rewrites
19:36:35 <ianw> i think the news of the week here was the launchers all switched over to fresh opendev.org versions
19:36:56 <ianw> i think that leaves zuul scheduler host as the only Xenial system in that ecosystem?
19:37:12 <ianw> executors, mergers, builders and launchers are all done now
19:38:05 <fungi> zk servers?
19:38:05 <ianw> i'm guessing with the pace of zuul development at the moment, we're better waiting a little to tackle that host
19:38:33 <fungi> yeah, just double-checked, our zk servers are also still xenial
19:38:45 <fungi> we should be able to rolling-replace those live
19:39:12 <fungi> though as corvus observed, doing that will end in zuul only connecting to two out of the three until the next time the zuul services are restarted
19:40:01 <fungi> because it won't automatically redistribute connections, only reconnect as needed
19:40:27 <ianw> i'm willing to help out on that, a good way to become more familiar with zk operations
19:41:11 <ianw> corvus: ^ maybe reach out when it's a better time to consider this, i.e. not pending bug restarts for bug fix updates :)
19:41:33 <ianw> #link https://etherpad.opendev.org/p/infra-puppet-conversions-and-xenial-upgrades
19:41:41 <ianw> i had  a quick pass through that
19:42:36 <fungi> i'm tempted to snapshot the wiki server and try an in-place ubuntu upgrade for now, as repugnant as that idea may be
19:43:22 <fungi> part of why the wiki server isn't listed there is that it's not running xenial. still on trusty :/
19:43:28 <corvus> i think we can upgrade zk any time; it's containerized, so we should already be running a recent release of the software; hopefully an os upgrade won't have too big of an impact
19:44:23 <corvus> ianw: i think if you want to go ahead and stage the patches to do that, it's probably okay to do so more or less any time
19:44:55 <ianw> corvus: ok, i'll take a look and see what i come up with
19:45:15 <ianw> i feel like clarkb might have already written a change to switch to focal in testing at least
19:45:27 <fungi> that does sound familiar
19:45:56 <ianw> one from that list was the asterisk server; i feel like retirement is probably the best idea there
19:46:25 <ianw> do we want a spec, or an announcement, or just changes we can vote on?
19:47:11 <fungi> announcement is probably in order, just in case anyone was using it
19:47:40 <fungi> ideally we'd work out how to move the current dial-in trunk's sip config to meetpad, but that's not absolutely necessary
19:47:52 <ianw> openstack-discuss or just the service list?
19:48:28 <fungi> i'd say service-announce
19:48:45 <ianw> ok, i'll give myself an action item to get that going
19:48:51 <fungi> thanks!
19:49:06 <ianw> #action ianw start retirement for asterisk
19:49:39 <ianw> there's nothing else on that list that is a surprise ... just a bunch of things we know we need to do :)  but it is getting smaller
19:50:11 <ianw> #topic Refstack
19:50:22 <ianw> speaking of, i think this is almost ready to be dropped as a topic
19:50:30 <ianw> i have one outstanding bugfix review
19:50:30 <fungi> excellent
19:50:41 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/781593
19:51:39 <ianw> i will put in a todo to clean up the old server in a few months just to be super safe
19:51:54 <ianw> otherwise, i'd say this one is done
19:52:04 <fungi> yay!
19:52:12 <ianw> #topic PTG planning
19:52:33 <ianw> last week clarkb put out a call for suggestions on this, did we want dedicated times to talk, or a hackathon, etc
19:52:50 <ianw> tbh i feel like pretty much every day is a hackathon :)
19:53:31 <fungi> yeah, it's more like do we want a hackathon where we're all awake at the same relatively inconvenient time ;)
19:54:59 <ianw> i feel like the requests for times deadline was this thursday?
19:54:59 <fungi> anyway, i gave my loose suggestions last week, don't really have any new ideas personally
19:55:33 <fungi> yeah, maybe i'll double-check the ethercalc and see if he reserved anything
19:56:27 <fungi> amusing side-note, the ptg organizers forgot we run an ethercalc instance and created a spreadsheet on the ethercalc.org site instead, which has been going up and down and returning random errors to people
19:56:49 <ianw> ok, maybe i'll send a mail too.  just in case anyone who doesn't hang out in meetings has an interest
19:56:59 <fungi> thanks
19:57:08 <ianw> it would certainly be worth it if we have a dedicated time to help onbaord people who are interested, etc.
19:57:30 <ianw> #topic Open Discussion
19:57:35 <fungi> yes, especially new config reviewers
19:57:49 <fungi> but anybody really
19:58:02 <ianw> this is true, it is probably worth reserving a time dedicated for that, see who turns up
19:58:45 <ianw> there's been a bit of work on glean lately if anyone wants to look
19:58:48 <fungi> #link https://ethercalc.net/oz7q0gds9zfi PTG schedule spreadsheet
19:58:56 <fungi> i don't see opendev reserving any slots in there yet
19:59:19 <ianw> basically all open changes.  ironic have some requirements there
19:59:49 <fungi> oh, following up on the gentoo image builds, prometheanfire has a dib change proposed to solve it
19:59:57 <fungi> see #opendev for details
20:00:07 <ianw> ok will look
20:00:12 <fungi> thanks for chairing, ianw!
20:00:22 <ianw> that's about time, see you all next time!
20:00:27 <ianw> #endmeeting