19:02:07 <clarkb> #startmeeting infra
19:02:07 <opendevmeet> Meeting started Tue Sep 26 19:02:07 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:02:07 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:02:08 <opendevmeet> The meeting name has been set to 'infra'
19:02:35 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JUUNKEA2W2T4IA64DHBIYKUHBXOH7W3D/ Our Agenda
19:03:46 <clarkb> #topic Announcements
19:04:06 <clarkb> OpenStack is going to be making final release candidates this week and the actual Bobcat release should occur next week
19:04:11 <clarkb> please be aware of that as we make changes
19:06:08 <clarkb> #topic Mailman 3
19:06:19 <clarkb> fungi: should we jump right into planning for the final mailman3 migration?
19:06:37 <fungi> sure, i had a proposed date on the etherpad... just a sec
19:07:44 <fungi> #link https://etherpad.opendev.org/p/mm3migration
19:08:02 <fungi> line 265
19:08:12 <fungi> tentative maintenance 15:30-19:30 utc on thursday 2023-10-12 (week after release)
19:08:29 <fungi> that's for lists.openstack.org which is the only remaining site on the old server
19:09:11 <clarkb> that date works for me
19:09:15 <fungi> if folks generally find that acceptable i'll announce it on openstack-discuss this week and start putting together a detailed maintenance plan based on the previous 6 sites we've migrated
19:09:26 <frickler> +1
19:10:04 <fungi> i've started a handful of notes below there on things we'll want to be mindful of for this specific maintenance, either lessons learned from earlier migrations or due to the size of the data
19:10:25 <fungi> i'll take those into account when drafting the full plan
19:11:06 <fungi> aside from that, last week's maintenance went smoothly
19:11:30 <fungi> we've had a couple of requests for clarification on new urls for starlingx mailing lists, but no problems reported
19:12:48 <fungi> sounds like we can proceed with the date and time indicated, not hearing any objections
19:13:02 <fungi> i didn't have anything else on this topic, but am happy to answer questions
19:13:46 <clarkb> #topic Server upgrades
19:14:15 <clarkb> Nothing new to add here
19:14:44 <clarkb> #topic Nodepool Image Build and Upload Status
19:15:17 <clarkb> has anyone checked if the uploads in various clouds (particularly rax iad?) are looking good since our config chagnes (ending with the increase of the upload timeout about a week ago)
19:15:33 <fungi> i have not
19:15:53 <frickler> I just check rax-iad
19:16:07 <frickler> looking as expected, all uploads succeeded at the first attempt
19:16:13 <fungi> yay!
19:16:21 <fungi> i guess we can drop this from the agenda?
19:16:48 <frickler> yes, we will handle inmotion as a different topic I think
19:16:54 <fungi> agreed
19:17:18 <frickler> except maybe check for leftover images one last time?
19:17:27 <clarkb> feel free :)
19:17:31 <frickler> or did you check last week?
19:17:43 <clarkb> I did not do a pass since I cleaned up the images I last cleaned up whcih was more than a week ago
19:17:47 <fungi> i don't remember any more, but i can take a look
19:20:28 <clarkb> thanks
19:20:37 <clarkb> #topic OpenMetal
19:20:53 <clarkb> Hvaen't heard back from Yuriy since I responded last week. Did anyone else see email that I overlooked?
19:20:59 <clarkb> I probably need to prod him on that
19:21:04 <frickler> nope
19:22:48 <fungi> i did no
19:22:49 <fungi> t
19:23:21 <clarkb> ok ya I'll try to remember to send a followup again then
19:23:33 <clarkb> #topic Zuul PCRE regexes are deprecated
19:23:39 <clarkb> I don't think there is anything new to cover on this
19:23:44 <clarkb> But happy to be wrong :)
19:24:33 <frickler> I saw some teams doing patches in reaction to the announcement
19:24:54 <frickler> so IMO we can give it some weeks maybe and then recheck the remainders
19:25:12 <clarkb> sounds good
19:25:34 <clarkb> #topic Python container image updates
19:25:38 <clarkb> #link https://review.opendev.org/q/(topic:bookworm-python3.11+OR+hashtag:bookworm)+status:open
19:25:52 <clarkb> As noted previously we'll defer Gerrit until after the release. As a result I pushed up more changes
19:26:13 <clarkb> reviews welcome. Be aware that zuul/zuul-registry's image doesn't have a straightfowrard move to bookworm because it currently relies on openssl 1.1
19:26:30 <clarkb> we may end up leaving that on bullseye for now and then keeping on bullseye image longer term for python3.11?
19:29:50 <clarkb> in any case we can sort that out later. Reviews are welcome on the other changes
19:29:57 <clarkb> #topic Etherpad 1.9.3 Upgrade
19:29:57 <fungi> yeah, that's an unfortunate situation
19:30:12 <clarkb> fungi: want to fill is in on where this neded up yseterday?
19:30:33 <fungi> #link https://review.opendev.org/896454 Upgrade Etherpad to 1.9.3
19:30:47 <fungi> clarkb and i tested a held upgraded node yesterday
19:31:17 <fungi> i didn't observe any problems, though i think you noticed some weirdness with cached something in chrom*?
19:32:08 <clarkb> ya my chrome browser kept reconencting to the etherpad
19:32:12 <fungi> the changelog is short and doesn't seem to touch anything our deployment should care about
19:32:19 <clarkb> well and before I did a hard refresh it got an error for some unfound symbol
19:32:34 <clarkb> but switching into incognito mode made the problems go away so almost certainly something was cached badly
19:34:04 <frickler> sounds like a 3rd test might be helpful to decide how reproducible this is
19:34:15 <frickler> I'll try to do that this week, then
19:34:23 <clarkb> frickler: ++ if anyone other than fungi or myself have time for that. the clarkb-test pad is where we were testing and it has content already
19:35:34 <clarkb> #topic Gitea 1.21.0 Upgrade
19:35:41 <clarkb> Gitea 1.21.0-rc0 is out now
19:36:01 <clarkb> There is no changelog yet so this isn't urgent, but in the past I've tried to push up an upgrade testing chagne once RCs are available in order to start working through problems
19:36:19 <clarkb> It would probably be good for someone other than myself to have a go at it. Any interest in doing a 1.21.0 upgrade?
19:37:58 <clarkb> The typical process I use is 1) check if go and/or nodejs need to be updated 2) upate our templates to keep in sync with updates upsteram 3) sort out any other items called out in the changelog
19:38:20 <clarkb> 3) is what usually takes the most time as you need to understand chagnes if they intersect with our use cases and occasionally write tests to cover the updates
19:38:41 <clarkb> but even just having 1 and 2 done can help rule out a bunch of stuff pretty quickly in our test system
19:39:26 <clarkb> if there is interest say something in #opendev so that we don't end up doing duplicate work
19:40:42 <clarkb> #topic Gerrit Replication Task Leaks Fix
19:40:52 <clarkb> #link https://gerrit-review.googlesource.com/c/plugins/replication/+/387314
19:41:32 <clarkb> I've written a chagne that when tested by hand on a held node seems to do what we want. Somewhat predictably gerrit maintainers have asked me to write test cases though. I've spent the morning doing my best to pretend I understand what is going on there and the end result is my latest comment
19:42:16 <clarkb> tl;dr I think i have two test cases that are very close but haven't figured out how to run them locally yet and a third is basically missing a major piece of testing in the replication plugin (replicating when permissiosn say no) and I am currently completely lost in Gerrit's internal models for permissions
19:42:37 <clarkb> I'm hopeful we can get that merged by the time we want to restart Gerrit though so that we can fix the problem for good
19:45:19 <clarkb> #topic PTGBot Webserver crashing
19:45:35 <clarkb> seems to be running right now. Did anyone have to restart it since we restarted it yseterday?
19:46:01 <frickler> not me
19:46:30 <fungi> i don't see any indication it's been restarted
19:46:46 <fungi> Sep 25 19:20:43 eavesdrop01 docker-ptgbot[646]: DEBUG:root:Debugging on
19:46:51 <fungi> that was the last thing it logged
19:47:02 <fungi> which is when i started it yesterday
19:47:09 <frickler> ps also says running since yesterday
19:47:23 <fungi> yeah, so i think it has not crashed (yet anyway)
19:48:10 <clarkb> ok so no new information to look at there. Did we figure out if the webserver logging was broken?
19:48:33 <fungi> i think it's just not instrumented for request logging
19:48:57 <fungi> it's logging an explicit line proving that debug logging was enabled at start
19:49:08 <fungi> i have no idea if it's sufficient to log exceptions/tracebacks though
19:50:19 <frickler> so that'll need some further testing once someone has time, but it doesn't seem urgent for now
19:50:40 <clarkb> ok sounds good
19:50:48 <clarkb> #topic Open Discussion
19:50:50 <clarkb> Anything else?
19:51:00 <fungi> i did check for leaked nodes in rax-iad and found 19
19:51:10 <frickler> nodes or images?
19:51:11 <fungi> not sure how old they are
19:51:14 <fungi> sorry, images
19:52:16 <fungi> one i just looked at was from 2023-09-01
19:52:27 <fungi> so they might not be very recent
19:52:34 <frickler> so that's before the timeout bump
19:52:37 <clarkb> ~2023-09-18 is when we got the config where we wanted it
19:52:53 <clarkb> I would probably clean anything before the 18th up and anything after can be subject to debugging
19:54:01 <fungi> looping through to get dates now
19:54:31 <fungi> most are from 2023-08-30 and 2023-09-01
19:54:41 <clarkb> ya so possibly timed out uploads that didn't clean up proeprly
19:55:00 <fungi> i think 2023-09-13 is the most recent
19:55:09 <fungi> so all from before the change
19:55:33 <fungi> i think that indicates it's been successful, and we can go ahead and mop up these remnants
19:55:41 <fungi> i'll delete them now
19:55:51 <clarkb> ++
19:56:09 <frickler> ack
19:57:10 <fungi> and now that's done too
19:58:22 <frickler> next tuesday is a bank holiday here, so I might not be around
19:58:27 <clarkb> enjoy the day off
19:58:41 <clarkb> I don't think we have any holidays here until november
19:58:48 <fungi> also corvus and i looked closely at one and they're completely missing metadata because the sdk adds the metadata after the image import tasks complete, so we can't use metadata to indicate they're safe to clean up
19:59:12 <fungi> though maybe we could look for a complete lack of metadata, i dunno
19:59:13 <clarkb> and we are just about at time. Thank you everyone. We'll be back next week
19:59:26 <clarkb> fungi: complete lack of metadata is normal for user uploads which is the risk there
19:59:34 <fungi> right, exactly
19:59:34 <clarkb> you might delete somethign someone has uploaded iirc
19:59:42 <clarkb> #endmeeting