19:02:07 #startmeeting infra 19:02:07 Meeting started Tue Sep 26 19:02:07 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:02:07 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:02:08 The meeting name has been set to 'infra' 19:02:35 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JUUNKEA2W2T4IA64DHBIYKUHBXOH7W3D/ Our Agenda 19:03:46 #topic Announcements 19:04:06 OpenStack is going to be making final release candidates this week and the actual Bobcat release should occur next week 19:04:11 please be aware of that as we make changes 19:06:08 #topic Mailman 3 19:06:19 fungi: should we jump right into planning for the final mailman3 migration? 19:06:37 sure, i had a proposed date on the etherpad... just a sec 19:07:44 #link https://etherpad.opendev.org/p/mm3migration 19:08:02 line 265 19:08:12 tentative maintenance 15:30-19:30 utc on thursday 2023-10-12 (week after release) 19:08:29 that's for lists.openstack.org which is the only remaining site on the old server 19:09:11 that date works for me 19:09:15 if folks generally find that acceptable i'll announce it on openstack-discuss this week and start putting together a detailed maintenance plan based on the previous 6 sites we've migrated 19:09:26 +1 19:10:04 i've started a handful of notes below there on things we'll want to be mindful of for this specific maintenance, either lessons learned from earlier migrations or due to the size of the data 19:10:25 i'll take those into account when drafting the full plan 19:11:06 aside from that, last week's maintenance went smoothly 19:11:30 we've had a couple of requests for clarification on new urls for starlingx mailing lists, but no problems reported 19:12:48 sounds like we can proceed with the date and time indicated, not hearing any objections 19:13:02 i didn't have anything else on this topic, but am happy to answer questions 19:13:46 #topic Server upgrades 19:14:15 Nothing new to add here 19:14:44 #topic Nodepool Image Build and Upload Status 19:15:17 has anyone checked if the uploads in various clouds (particularly rax iad?) are looking good since our config chagnes (ending with the increase of the upload timeout about a week ago) 19:15:33 i have not 19:15:53 I just check rax-iad 19:16:07 looking as expected, all uploads succeeded at the first attempt 19:16:13 yay! 19:16:21 i guess we can drop this from the agenda? 19:16:48 yes, we will handle inmotion as a different topic I think 19:16:54 agreed 19:17:18 except maybe check for leftover images one last time? 19:17:27 feel free :) 19:17:31 or did you check last week? 19:17:43 I did not do a pass since I cleaned up the images I last cleaned up whcih was more than a week ago 19:17:47 i don't remember any more, but i can take a look 19:20:28 thanks 19:20:37 #topic OpenMetal 19:20:53 Hvaen't heard back from Yuriy since I responded last week. Did anyone else see email that I overlooked? 19:20:59 I probably need to prod him on that 19:21:04 nope 19:22:48 i did no 19:22:49 t 19:23:21 ok ya I'll try to remember to send a followup again then 19:23:33 #topic Zuul PCRE regexes are deprecated 19:23:39 I don't think there is anything new to cover on this 19:23:44 But happy to be wrong :) 19:24:33 I saw some teams doing patches in reaction to the announcement 19:24:54 so IMO we can give it some weeks maybe and then recheck the remainders 19:25:12 sounds good 19:25:34 #topic Python container image updates 19:25:38 #link https://review.opendev.org/q/(topic:bookworm-python3.11+OR+hashtag:bookworm)+status:open 19:25:52 As noted previously we'll defer Gerrit until after the release. As a result I pushed up more changes 19:26:13 reviews welcome. Be aware that zuul/zuul-registry's image doesn't have a straightfowrard move to bookworm because it currently relies on openssl 1.1 19:26:30 we may end up leaving that on bullseye for now and then keeping on bullseye image longer term for python3.11? 19:29:50 in any case we can sort that out later. Reviews are welcome on the other changes 19:29:57 #topic Etherpad 1.9.3 Upgrade 19:29:57 yeah, that's an unfortunate situation 19:30:12 fungi: want to fill is in on where this neded up yseterday? 19:30:33 #link https://review.opendev.org/896454 Upgrade Etherpad to 1.9.3 19:30:47 clarkb and i tested a held upgraded node yesterday 19:31:17 i didn't observe any problems, though i think you noticed some weirdness with cached something in chrom*? 19:32:08 ya my chrome browser kept reconencting to the etherpad 19:32:12 the changelog is short and doesn't seem to touch anything our deployment should care about 19:32:19 well and before I did a hard refresh it got an error for some unfound symbol 19:32:34 but switching into incognito mode made the problems go away so almost certainly something was cached badly 19:34:04 sounds like a 3rd test might be helpful to decide how reproducible this is 19:34:15 I'll try to do that this week, then 19:34:23 frickler: ++ if anyone other than fungi or myself have time for that. the clarkb-test pad is where we were testing and it has content already 19:35:34 #topic Gitea 1.21.0 Upgrade 19:35:41 Gitea 1.21.0-rc0 is out now 19:36:01 There is no changelog yet so this isn't urgent, but in the past I've tried to push up an upgrade testing chagne once RCs are available in order to start working through problems 19:36:19 It would probably be good for someone other than myself to have a go at it. Any interest in doing a 1.21.0 upgrade? 19:37:58 The typical process I use is 1) check if go and/or nodejs need to be updated 2) upate our templates to keep in sync with updates upsteram 3) sort out any other items called out in the changelog 19:38:20 3) is what usually takes the most time as you need to understand chagnes if they intersect with our use cases and occasionally write tests to cover the updates 19:38:41 but even just having 1 and 2 done can help rule out a bunch of stuff pretty quickly in our test system 19:39:26 if there is interest say something in #opendev so that we don't end up doing duplicate work 19:40:42 #topic Gerrit Replication Task Leaks Fix 19:40:52 #link https://gerrit-review.googlesource.com/c/plugins/replication/+/387314 19:41:32 I've written a chagne that when tested by hand on a held node seems to do what we want. Somewhat predictably gerrit maintainers have asked me to write test cases though. I've spent the morning doing my best to pretend I understand what is going on there and the end result is my latest comment 19:42:16 tl;dr I think i have two test cases that are very close but haven't figured out how to run them locally yet and a third is basically missing a major piece of testing in the replication plugin (replicating when permissiosn say no) and I am currently completely lost in Gerrit's internal models for permissions 19:42:37 I'm hopeful we can get that merged by the time we want to restart Gerrit though so that we can fix the problem for good 19:45:19 #topic PTGBot Webserver crashing 19:45:35 seems to be running right now. Did anyone have to restart it since we restarted it yseterday? 19:46:01 not me 19:46:30 i don't see any indication it's been restarted 19:46:46 Sep 25 19:20:43 eavesdrop01 docker-ptgbot[646]: DEBUG:root:Debugging on 19:46:51 that was the last thing it logged 19:47:02 which is when i started it yesterday 19:47:09 ps also says running since yesterday 19:47:23 yeah, so i think it has not crashed (yet anyway) 19:48:10 ok so no new information to look at there. Did we figure out if the webserver logging was broken? 19:48:33 i think it's just not instrumented for request logging 19:48:57 it's logging an explicit line proving that debug logging was enabled at start 19:49:08 i have no idea if it's sufficient to log exceptions/tracebacks though 19:50:19 so that'll need some further testing once someone has time, but it doesn't seem urgent for now 19:50:40 ok sounds good 19:50:48 #topic Open Discussion 19:50:50 Anything else? 19:51:00 i did check for leaked nodes in rax-iad and found 19 19:51:10 nodes or images? 19:51:11 not sure how old they are 19:51:14 sorry, images 19:52:16 one i just looked at was from 2023-09-01 19:52:27 so they might not be very recent 19:52:34 so that's before the timeout bump 19:52:37 ~2023-09-18 is when we got the config where we wanted it 19:52:53 I would probably clean anything before the 18th up and anything after can be subject to debugging 19:54:01 looping through to get dates now 19:54:31 most are from 2023-08-30 and 2023-09-01 19:54:41 ya so possibly timed out uploads that didn't clean up proeprly 19:55:00 i think 2023-09-13 is the most recent 19:55:09 so all from before the change 19:55:33 i think that indicates it's been successful, and we can go ahead and mop up these remnants 19:55:41 i'll delete them now 19:55:51 ++ 19:56:09 ack 19:57:10 and now that's done too 19:58:22 next tuesday is a bank holiday here, so I might not be around 19:58:27 enjoy the day off 19:58:41 I don't think we have any holidays here until november 19:58:48 also corvus and i looked closely at one and they're completely missing metadata because the sdk adds the metadata after the image import tasks complete, so we can't use metadata to indicate they're safe to clean up 19:59:12 though maybe we could look for a complete lack of metadata, i dunno 19:59:13 and we are just about at time. Thank you everyone. We'll be back next week 19:59:26 fungi: complete lack of metadata is normal for user uploads which is the risk there 19:59:34 right, exactly 19:59:34 you might delete somethign someone has uploaded iirc 19:59:42 #endmeeting