19:01:18 <clarkb> #startmeeting infra
19:01:18 <opendevmeet> Meeting started Tue Jun  6 19:01:18 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:18 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:18 <opendevmeet> The meeting name has been set to 'infra'
19:01:34 <clarkb> there we go was wondering what happened to the bot
19:01:37 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/Y3ZTMR6ZJJDWPZPNWYB32UC2HHGFZH73/ Our Agenda
19:01:48 <clarkb> This just went out. Sorry about that, I got super nerd sniped yesterday afternoon looking into the python and openafs thing
19:02:13 <clarkb> debuginfod is really cool and useful btw
19:02:23 <clarkb> #topic Announcements
19:02:40 <clarkb> A reminder that next week we'll skip having a meeting since several of us will be attending the open infra summit
19:03:10 <clarkb> Then for June 20th (the meeting after next) I won't be able to make it as I'll be in the middle of travel. I'm happy for others to run a meeting if they like. I just can't run it myself
19:03:59 <clarkb> #topic Topics
19:04:12 <clarkb> I removed the quay topic. I think its basically in a steady state situation now
19:04:22 <clarkb> dib functional testing is working again too
19:04:28 <clarkb> #topic Bastion Host Updates
19:04:48 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups
19:05:13 <clarkb> tonyb appears to have reviewed this stack, thanks! Fungi mentioned he would take a look too but I don't think that has happened yet
19:05:26 <clarkb> The other thing I was thinking about recently is we should probably start looking at updating our ansible version bridge to ansible 8 (we are currently 7)
19:05:30 <fungi> no, i've been too distracted by other things, sorry
19:05:47 <clarkb> In theory that will be self testing and if we get the change that bumps things up to run all the system-config-run-* jobs we should get really good coverage of it
19:06:11 <clarkb> Not sure if anyone is interested in doing that. I don't think I have time for the next couple of weeks but I may start looking after if no one else beats me to it
19:06:27 <clarkb> Throwing it out there if there is interest since I think it could be a good one as our testing for it should be robust
19:06:40 <ianw> i think we're testing the git master so it should be fairly easy
19:07:06 <clarkb> ianw: I think that has been failing though. But proposing a change to move the cap from <8 to <9 should do what we need
19:07:15 <clarkb> and then ensuring all the jobs we want to trigger also trigger
19:07:45 <fungi> zuul is well behind that in what versions it supports, but shouldn't be an issue for our nested ansible calls, right?
19:07:52 <clarkb> fungi: correct
19:07:59 <clarkb> they are pretty well separated in our environment
19:08:46 <clarkb> #topic Mailman 3
19:08:55 <clarkb> fungi: any updates with your testing of the vhost fixes?
19:09:06 <fungi> nope, distractions
19:09:09 <fungi> sorry
19:09:20 <clarkb> hopefully post summit we'll all have fewer distractions
19:09:26 <clarkb> #topic Gerrit Updates
19:09:34 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/884779 Revert bind mounts for Gerrit plugin data
19:10:01 <clarkb> I pushed this change because I think I've decided that in he short term the best thing for us may be to just clear out that data when we launch new gerrit containers
19:10:13 <clarkb> I'd love feedback on that and/or the changes I pushed to manually clear things on gerrit startup
19:10:20 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk\
19:10:46 <clarkb> I don't think this second change will clear all the files we need to clear but it will be easier to see what else is leaking after we land it if we decide to go that route instead. Feedback very much welcome either way or if you have alternative suggestions
19:11:08 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/885317 Update Gerrit 3.8 image to 3.8.0 final release
19:11:31 <clarkb> I also pushed this update yesterday to update our 3.8 image. THis won't affect production, but will make our upgrade testing a bit more realistic
19:12:30 <clarkb> #topic Upgrading Old Servers
19:12:36 <clarkb> #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes
19:12:51 <clarkb> I've replaced the old zp01 server with a zp02 server. Though nothing may be using it...
19:13:04 <clarkb> Still on the list are a handful of mirror nodes, meetpad, and I think the insecure ci registry
19:13:27 <clarkb> I'm going to continue to try and pick them off one by one as I can. Probably mirror nodes will be my next target
19:13:39 <clarkb> Help welcome
19:13:58 <tonyb> can you link to the zp01 change?
19:14:02 <clarkb> tonyb: yes one sec
19:14:08 <tonyb> that'd help me grok what's needed
19:14:20 <tonyb> rather than thinking I know what to do ;P
19:14:37 <clarkb> tonyb: https://review.opendev.org/q/topic:replace-zp01+OR+topic:zp02
19:15:17 <clarkb> tonyb: you do need a root to laucnh the new node, but if you propose a change like https://review.opendev.org/c/opendev/system-config/+/885076 which updtes the test node label type so that testing shows jammy works then I'm happy to do that with/for you
19:15:37 <tonyb> perfect
19:15:38 <clarkb> basically mock it up and see that testing sows it is happy then I can launch the node and stick it in the change in the inventory hosts file
19:16:17 <clarkb> tonyb: insecure-ci-registry would probably be a good one and or the mirrors. Since they have minimal state on host. meetpad is a bit weird because we need to sort out how to replace the control plane for that service
19:16:38 <clarkb> Relatd to all this but slightly different is the update of zuul servers to jammy for podman potentual use
19:16:44 <clarkb> (crrently going to jammy but still using docker)
19:17:14 <tonyb> got it
19:17:33 <clarkb> All of the mergers have been replaced and 6 executors have been booted. THis exposed a behavior in new openafs where lseek()ing the openafs ioctl device/file crashes the process with a kernel oops
19:18:02 <clarkb> this was a problem with zuul because it uses python open() which does an lseek under the hood. corvus replaced that open() with an os.open() which does not do any magic under the hood
19:18:19 <corvus> (actually all 12 booted)
19:18:26 <clarkb> corvus: oh cool I thought it was only 6
19:18:50 <clarkb> static02 is a jammy node running jammy openafs and that has been functional so we expect that once we launch updated executor code on the new nodes they should be happy
19:18:55 <clarkb> this was a specific corner case that was sad
19:18:56 <corvus> yeah, plan was to get it all done before monday :/
19:19:37 <ianw> so was this https://gerrit.openafs.org/#/c/14918/ ?  i'm unclear if that fix wasn't there or if it's a different issue
19:20:13 <clarkb> ianw: the problem persisted in the 1.8.9 package the ppa built
19:20:25 <clarkb> I think you expected that version to include that patch? if so then I don't think that patch was the fix
19:20:40 <clarkb> however the description in that change definitely seems to match the behavior we saw
19:21:07 <corvus> (though i agree, the words in the commit message sure make that sound like it should be a fix)
19:21:10 <ianw> yeah, i perhaps miscalculated if that patch is there ...
19:21:56 <clarkb> in any case I think it is unlikely we'll be opening that structure outside of the single place zuul does it or in openafs itself
19:22:24 <clarkb> and openafs itself seems to be working on static02 so we're prbably good? Lets proceed and keep an eye on it and we can always look at rebuilding new ppa packages later if necessary
19:23:11 <ianw> $ cat src/afs/LINUX/osi_ioctl.c | grep default_llseek
19:23:12 <ianw> $
19:23:25 <ianw> ... sigh ... you would have thought I'd think to check that :/
19:23:29 <corvus> (pam modules would be the other place to watch out for that)
19:23:47 <ianw> that at least explains why 1.8.9 didn't fix it for us
19:24:06 <clarkb> cool I think that gives us a path forward should this problem continue to persist. We can build a 1.8.9 with that fix backported
19:24:09 <ianw> so the counterpoint to that is that we could pull that patch in
19:24:25 <clarkb> ianw: yup or convince ubuntu to pull it in maybe
19:24:34 <ianw> sorry about that.  i don't know why i didn't think of it until just then
19:24:44 <clarkb> but it doesn't seem necessary yet so I think we can proceed with distro 1.8.8 and take it from there
19:25:35 <clarkb> Anything else related to server updates?
19:26:17 <ianw> it is in the openafs-stable-1_8_x branch in the openafs git
19:26:35 <fungi> fixed in 1.9.x at all?
19:26:57 <fungi> mainly curious, i have no idea how much unrelated breakage 1.9 will mean
19:27:01 <clarkb> cool so we can fetch the backport out of that branch then and it should apply cleanly to the existing packaging
19:27:07 <ianw> that was what i looked at.  but that doesn't seem to have tags
19:27:18 <clarkb> fungi: I think it merged to 1.9
19:27:26 <fungi> ahh
19:27:33 <clarkb> ya I think 1.9 is master right now?
19:28:55 <clarkb> #topic Fedora cleanup
19:29:18 <clarkb> tonyb: you've been poking at the zuul-jobs role stuff for this and corvus pointed out the possibiltiy of using the new thing that we never actually switched to...
19:29:34 <tonyb> Yeah
19:29:36 <clarkb> tonyb: did you have thoughts on what makes sense for pushing this ahead?
19:29:57 <tonyb> I'm looking for feedback on timing and priorities
19:30:18 <fungi> (there's an openafs 1.9.1 or 1.9.2 maybe, so they're supposedly releasing from it, but maybe they aren't making tags there)
19:30:27 <tonyb> The new thinng looks good but I admit I'm not in a position to judge the effort needed to make it a reality
19:30:36 <clarkb> I'll be honest my personal priority for this is low, I was just trying to find easy wins for maybe adding rocky/bookworm mirroring
19:31:01 <tonyb> I get that the fedora_mirror_enabled is a hack
19:31:13 <clarkb> for this reason I'm personally happy to take our time and add the new thing to configure mirrors. But that likely would take longer than bookworms release date
19:31:40 <frickler> how much effort is increasing afs capacity?
19:31:44 <clarkb> tonyb: the main thing is going to be adding configuration for the new thing and adding the new thing to the base-test base jobs and then reparenting a representative sampling of jobs to base-tests to ensure it is doing what we expect
19:31:56 <tonyb> corvus: It'd be good to get your thoughts on how desireable the new thing is
19:32:05 <fungi> bookworm release day is saturday, btw
19:32:08 <clarkb> frickler: you need to add volumes to existing servers (easy but increases potential for failures) or add new servers (more work)
19:33:07 <tonyb> I hesitate, could we "fork" the configure-mirrors role into openstack-jobs, remove the fedora stuff while I do the right thin in zuul?
19:33:29 <tonyb> clarkb: Yup I can totally do that.
19:33:36 <clarkb> tonyb: you'd probably need to fork it into opendev/base-jobs
19:33:39 <frickler> maybe it still would be worth to decouple cleaning up old mirrors from setting up new stuff?
19:33:45 <clarkb> tonyb: since this will affect all opendev users
19:33:46 <corvus> well, the new thing is designed to have the kind of flexibility we apparently are now starting to require, so i think it's better for opendev and the wider community (in that it lets others actually use mirror roles in zuul-jobs which, basically, we're the only ones who can actually use right now)
19:33:57 <tonyb> Okay, same idea but wrong repo ;P
19:34:27 <clarkb> frickler: yes that is doable too. It would still be lowish priority for me though which is why I was looking for easy wins
19:34:41 <clarkb> I just don't have time to add new distro content when I can barely keep up with what we already have so my personal preference is cleanup first
19:35:02 <corvus> i think it's the classic long-term/short-term balancing act, and i don't have a good read on making that decision.  so i'm just able to provide background.  :)
19:35:18 <frickler> iirc the patch to add bookworm mirroring is already present, just needs afs capacity
19:35:29 <tonyb> corvus: okay.
19:35:41 <clarkb> frickler: that and cleanup of buster
19:36:14 <clarkb> I think if I were trying to lead this I would look at doing the new mirror setup thing, test it with base-test, update base, clean up fedora mirroring, then decide if we need to adjust capacity from there or not
19:36:25 <clarkb> because from where I'm sitting we can't keep up so reducing effort first is a win
19:36:52 <clarkb> but if others want to push bookworm ahead and do something different i'm ok with that too
19:37:07 <tonyb> Okay.  We'll make that the plan.
19:37:20 <clarkb> also we can do the two things concurrently which is nice
19:37:24 <clarkb> they don't conflict with each other
19:37:44 <tonyb> I'll put myself on the hook for making that happen .... as long as I can count on help/support for doing the new mirror_info thing
19:38:01 <clarkb> yup I can contiue to help
19:38:14 <tonyb> perfect
19:39:38 <clarkb> #topic Storyboard
19:39:58 <clarkb> I think fungi has been keeping up with the updates there. But anything I've missed worth calling out?
19:40:09 <fungi> nothing new since last week, afaik
19:41:13 <clarkb> #topic Open Infra Summit
19:42:13 <clarkb> I sent out an email trying to organize a low key gathering for those of us that will be there (for Zuul and OpenDev but really no one is counting). The beer gardne worked really well in berlin and about 2.5km away from the summit venue is a brewery with a ton of outdoor picnic table type setups
19:42:35 <clarkb> I'm hoping it doesn't rain (current forecast says it will rain in the morning but be dry in the afternoon) and we can go hang out Thursday at 6ish there
19:43:10 <clarkb> It looks like it may be a little cool. 67F/19C as a high and it will probably be a bit cooler in the evening so fingers crossed that still works out
19:43:20 <clarkb> if it gets too cold or rainy we'll figure it out then
19:44:02 <corvus> that's warmer and drier than here now... maybe i should open a beer
19:44:03 <clarkb> Also a reminder that next week is the summit. I expect it will get quiet around here.
19:44:54 <clarkb> #topic Open Discussion
19:44:56 <clarkb> Anything else?
19:45:52 <frickler> I'm slowly working my way through zuul config error cleanups
19:46:10 <corvus> related to the ongoing work to clean up errors, there are some zuul changes arriving soon that will hopefully help with that.  new layout for the config errors page, ability to filter, and display of warnings.
19:46:35 <corvus> (i also have adding sorting to that list, but that will make more sense after the warnings arrive, so that may be a few changes away still)
19:46:40 <frickler> got the consent from the TC now to force-merge things that get stalled due so failing CI
19:46:47 <clarkb> frickler: I pushed a DNM change to confirm that github reports there are no valid merge methods for ansible and testinfra :/
19:47:02 <clarkb> frickler: but I think your removal of the project listings is working fine so we don't need to dig into that with any urgency
19:47:39 <frickler> clarkb: yes, I saw that. some day zuul will likely need to switch to graphql for that
19:48:22 <clarkb> or figure out if different perms are now required
19:49:05 <frickler> corvus: I've switched to using the json from the API, will that also change?
19:49:48 <corvus> frickler: so far only new fields
19:50:53 <clarkb> I think I can give everyone 10 minutes back
19:51:01 <clarkb> feel free to continue discussion in #opendev or on the mailing list
19:51:09 <clarkb> thank you for your time and help!
19:51:11 <clarkb> #endmeeting