19:04:42 <clarkb> #startmeeting infra
19:04:43 <openstack> Meeting started Tue Apr 27 19:04:42 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:04:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:04:46 <openstack> The meeting name has been set to 'infra'
19:05:06 <ianw> o/
19:05:35 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2021-April/000227.html Our Agenda
19:05:42 <clarkb> #topic Announcements
19:06:07 <clarkb> Airship and openstack have largely finished up their releases so I think we can stopping holding off on things for that (like the zk cluster upgrade)
19:06:30 <clarkb> #topic Actions from last meeting
19:06:35 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-04-13-19.01.txt minutes from last meeting
19:06:54 <clarkb> ianw had an action to push changes up to retire pbx.o.o and fungi had an action to push changes to retire survey.o.o
19:07:01 <clarkb> ianw: fungi: any updates on those items?
19:07:08 <fungi> yeah, i haven't gotten to it yet
19:07:18 <clarkb> #action fungi push changes to retire survey.o.o
19:07:24 <fungi> maybe sometime this week, though i'm trying to not be around the computer mych
19:07:26 <fungi> much
19:07:30 <ianw> i totally forgot about pbx, sorry
19:07:33 <ianw> on todo list now
19:07:49 <clarkb> #action ianw push changes to retire pbx.o.o
19:07:53 <clarkb> thanks!
19:07:59 <clarkb> #topic Priority Efforts
19:08:03 <clarkb> #topic OpenDev
19:08:18 <clarkb> I've done more account cleanups. We are down to ~250 iirc with conflicts now
19:08:44 <fungi> excellent!
19:08:45 <clarkb> I also put together a list that is probably a bit more "dangerous" btu if we start by disabling the accoutns and waiting a week or two we can probably rule out issues that way
19:09:26 <clarkb> ~clarkb/gerrit_user_cleanups/notes/proposed-cleanups.20210416 has that list if you want to look it over
19:09:27 <fungi> sounds reasonable
19:09:42 <clarkb> ~clarkb/gerrit_user_cleanups/notes/audit-results.yaml.20210415 is the info that was used to produce that list
19:09:43 <fungi> how many for that batch?
19:09:54 <fungi> is that the remaining 250 or a subset?
19:09:59 <clarkb> ~180, it is a subset
19:10:29 <clarkb> there are some folks like donnyd and stackalytics who show up and I do want to reach out to them by email once we get the list down to a manageable set
19:11:18 <clarkb> a good chunk of this group are accounst that haven't been used in many years
19:11:25 <clarkb> I think I used ~2018 as a rough cut off
19:12:22 <fungi> so ~70 which will need an even higher level of care, got it
19:13:07 <clarkb> but ya if ya'll can skim it and see if anything stands out as a bad idea I can go through it again if that happens
19:13:12 <clarkb> otehrwise will try to proceed with it
19:13:38 <clarkb> In other gerrit news we upgraded to 3.2.8 and incorporated fungi's jeepyb improvements on our iamge
19:13:56 <clarkb> ianw noticed that gerrit lost the account lock again :/ it seems to do that after we update then we restart and it doesn't seem to happen much after
19:14:14 <clarkb> anyway we'll continue to keep an eye on it. Earlier today when I checked lslocks showed gerrit still had the lock
19:14:29 <clarkb> Any other OpenDev related discussion?
19:15:02 <clarkb> #topic Update Config Management
19:15:25 <clarkb> Between ptg and adding inmotion cloud and now zookeeper I've been pretty busy doing other things. Anyone have config management updates to call out?
19:16:20 <fungi> none i can recall
19:17:06 <ianw> not really, other things popping up
19:17:18 <clarkb> I have been meaning to followup with the gerrit sql stuff but we can do that out of the meeting
19:17:21 <clarkb> #topic General Topics
19:17:31 <clarkb> #topic Server Upgrades
19:18:05 <clarkb> The zk cluster upgrade is happening as we speak. One of three hosts has been swapped out so far. Waiting on some changes to apply on bridge before starting on the second
19:18:10 <clarkb> #link https://etherpad.opendev.org/p/opendev-zookeeper-upgrade-2021 tracking progress there
19:18:35 <clarkb> had one small issue that needed fixing, but otherwise seems to be working about how I expected it to
19:19:46 <clarkb> Once zk is done I'll be looking at the zuul scheduler. I think the rough idea there is have ansible configure a new zuul02.opendev.org host, get everything LE'd and in place, copy the keys from old to new, prime repos, then schedule a cutover
19:19:57 <clarkb> My raed of the ansible is that this should work ebcause we don't start zuul automatically
19:20:12 <clarkb> but I don't want to get too ahead of myself while in the middle of the zk upgrade
19:20:22 <clarkb> #topic survey.openstack.org
19:20:40 <fungi> probably don't need this on the agenda with the action item
19:20:42 <clarkb> I added this to the agenda beacuse we're getting the your cert will expire warnings from it. fungi is on the case but no progress
19:20:59 <clarkb> ya it was on the agenda so just quickly covering it, but agreed we can move on
19:21:10 <fungi> yeah, well i mean, if it stops serving a valid cert, it's as good as down ;)
19:21:12 <clarkb> #topic Debian Bullseye Images
19:21:51 <clarkb> I just wanted to call out the odd situation these are in. Bullseye has not released yet, but we're under some pressure to provide images for bullseye because openstack (specifically nova) dropped support for buster
19:22:16 <clarkb> We've run into at least one issue related to "bullseye hasn't been released yet" causing problems with ansible facts that ianw has a workaround in dib for
19:22:34 <fungi> that hasn't been approved yet though
19:22:35 <clarkb> the problem with that is we've had some persistent nova 500 errors when doing end to end functional testing of the dib changes
19:23:06 <ianw> yeah it looks like that got a +1 from zuul, but it took a few rounds
19:23:07 <clarkb> I've got a change up to nodepool to collect openstack logs to help us debug these problems but you have to depends-on that change from dib to get the logs. I don't think nodepool is interested in collecting all of those devstack logs
19:23:07 <fungi> yep, that's also been hampering another buster-related fix in dib
19:23:29 <clarkb> if we want we can split the depends-on for the nodepool change in dib out to another change and then try and land the dib stuff as is
19:23:35 <fungi> (the one to correct the security mirror path)
19:24:14 <clarkb> once zk things settle down I can look into that again, but I'm happy if others want to try and work past it for now
19:24:44 <fungi> er, bullseye-related i meant
19:25:23 <clarkb> that was all I had on this, wanted to record why bullseye is important before it even releases
19:25:28 <clarkb> anything else to add ?
19:25:56 <ianw> not really, we can keep rechecking, but hopefully we can just get the logs thing in to help debug
19:26:05 <fungi> which change was that again?
19:26:14 <ianw> https://review.opendev.org/c/zuul/nodepool/+/788028
19:26:17 <ianw> #link https://review.opendev.org/c/zuul/nodepool/+/788028
19:26:46 <fungi> aha, thanks
19:26:49 <clarkb> but I think corvus and others have expressed an opinion that they don't want those logs in there because we aren't testing devstack/openstack
19:27:05 <clarkb> (if it were me having the debug info there makes sensesince it does seem to cause a nonzero number of failures)
19:27:38 <clarkb> maybe bring it up for discussion in #zuul
19:28:54 <clarkb> #topic Minor git-review release to support --no-thin
19:29:14 <clarkb> I put this on the agenda because this is a new feature we added to git-review that users aren't likely to know they want until they ask us for help
19:29:41 <clarkb> if individuals come to us with problems pushing to gerrit related to missing trees/objects in packfiles you can ask them to update git-review (if necessary) then do git review --no-thin
19:29:45 <clarkb> that should work around the problem
19:30:33 <fungi> 2.1.0 is the minimum version needed to get that option
19:31:02 <fungi> (noted for posterity)
19:31:47 <clarkb> #topic openEuler patches
19:32:36 <clarkb> linaro is asking us to get openEuler test instances running. They discussed this with us at the PTG. I don't have any major concerns other than ensuring we don't become a defacto official mirror because we're pulling from the root with some magic rsync key
19:33:02 <clarkb> ianw: I don't think we need to get the TC involved. We can provide testing platforms in opendev that go beyond what openstack wants to test on
19:33:19 <ianw> ok, the first such change was
19:33:22 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/784874
19:33:45 <clarkb> ya thats the one at the PTG that I mentioned we should pull anonymously and from another mirror (not the root) if necessary
19:33:58 <clarkb> so that we aren't any more special than anyone else pulling the packages
19:34:24 <clarkb> (they don't seem to have any mirrors outside of china currently so I want to avoid giving the impression that our mirrors are official)
19:34:48 <clarkb> ianw: at the PTG they agreed they would look for a differetn mirror to pull from that allows anonymous access (they thought at least one of the other mirrors may do that)
19:35:11 <ianw> ok; i just received a couple of personal pings, so while i'm happy for this to go ahead i just didn't want to be the defacto owner of this
19:35:49 <clarkb> ianw: definitely push them to the channel (#opendev) and the mailing list so that it doesnt' become something you alone are negotiating
19:36:45 <ianw> ++, thanks
19:36:55 <clarkb> ianw: I also suggested that we ensure the functional end to end testing for the dib changes is working before we try to land those
19:37:02 <fungi> at a minimum, we need people with a vested interest in the platform hanging out in irc or reading our mailing lists in case we need something fixed
19:37:06 <clarkb> I think they are currently non voting so the chagnes get +1 but the tests for openeuler don't actually work
19:37:16 <clarkb> they thought this was due to the lack of a mirror (I haven't looked at the failures yet myself)
19:37:32 <clarkb> fungi: ya I also meant to check if it is even localized into english?
19:37:40 <clarkb> I assume it is as a fork of centos/rhel but who knows :)
19:37:52 <ianw> yeah, i think the other issue is that we haven't figured out a way to really do boot tests of arm64
19:37:52 <clarkb> but that didn't occur to me until later
19:38:13 <ianw> qemu binary translation, at least in linaro, is just too slow
19:38:23 <fungi> it was supposedly a fork of centos, i couldn't find any information on what they're doing wrt stream though
19:39:16 <ianw> it can get a cirros instance up in devstack but trying to boot a dib image like we do in the devstack+nodepool tests i could barely do after leaving it for literally hours
19:39:26 <fungi> why would we need qemu binary translation in linaro? figured we'd have to use it only when the architecture differs
19:40:08 <clarkb> fungi: no nested virt on arm64
19:40:14 <fungi> or is the problem that we want to test booting amd64 cirros images on arm64 devstack?
19:40:17 <ianw> this is for the nested case.  so we build the arm64 image on the native host, then try to boot it
19:40:48 <ianw> i mean, in theory, we could make a multi-node devstack, have a compute node separate and boot our image on that
19:40:57 <fungi> and qemu needs binary translation to boot arm64 on arm64?
19:41:08 <ianw> which is probably the solution, but also a lot of work
19:42:36 <ianw> fungi: there's no nested virt, at least on linaro, so yeah it's going old-school.  i haven't checked on osu
19:42:59 <clarkb> ianw: I think linaro said there isn't any nested virt support in kvm yet for arm64
19:43:23 <clarkb> fungi: basically if you don't have nested virt then its always binary translation regardless of the targets
19:44:44 <fungi> interesting, i thought that was the point if paravirtualization vs not
19:44:50 <fungi> s/if/of/
19:45:15 <clarkb> fungi: there are optimizations depending on the whether or not you have a common arch but in general you're still hitting expensive paths
19:45:16 <ianw> i think there is, there is some flag it says during boot, but as for end-to-end plumbing of everything from kvm/qemu -> nova -> userspace, i don't know
19:45:26 <clarkb> ianw: ah interesting
19:47:53 <clarkb> anything else on the openeuler topic?
19:48:06 <ianw> nope, thanks
19:49:33 <clarkb> #topic Open Discussion
19:49:59 <clarkb> That was everything on the agenda. I'm currently working through the zk upgrade as zk05 was added to the cluster a few minutes earlier than I expected (its fine)
19:50:25 <clarkb> was there anything else to cover? Otherwise I'm going to dig into zk again
19:50:48 <fungi> as mentioned i'm not really around this week if i can help it, but will still make time if there's something urgent
19:52:12 <clarkb> enjoy the break
19:52:21 <fungi> oh, one thing not yet resolved... ansible fact gathering crashes the python interpreter for centos 8 on our arm64 nodes, i have one held, going to try to wrestle a core dump out of it but not sure how much that's going to tell us
19:52:21 <clarkb> sounds like that may be it so I'll stop the meeting here.
19:52:23 <clarkb> Thank you everyone
19:52:26 <clarkb> #endmeeting