19:01:35 <clarkb> #startmeeting infra
19:01:35 <opendevmeet> Meeting started Tue Jul 25 19:01:35 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:35 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:35 <opendevmeet> The meeting name has been set to 'infra'
19:01:50 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/LODRYKSZXR4VEE2OJF2HMJN5E5GXH3LF/ Our Agenda
19:02:00 <frickler> o/
19:02:02 <clarkb> #topic Announcements
19:02:41 <clarkb> I'll not be able to make the August 8 meeting due to travel
19:02:54 <clarkb> but I intend on running next week's meeting.
19:03:14 <clarkb> Seems like we also have service coordinator election type stuff coming up. I need to go look at timelines set in the last election
19:03:17 <fungi> i could chair on the 8th if others want a meeting
19:04:42 <clarkb> thanks. We can sort that out when we get there I guess
19:05:01 <clarkb> also the bugs are really happy with my laptop screen  again
19:05:07 <clarkb> #topic Bastion Host Updates
19:05:27 <clarkb> Nothing new here other than requesting reviews on the stack that implements shared encryption key backups
19:05:34 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups
19:06:05 <clarkb> #topic Mailman 3
19:06:24 <clarkb> Looks like there are some recent developtments here. Fungi can you get us up to date?
19:07:25 <fungi> as discussed in last week's meeting, we're rolling forward with manual creation of a django site in its admin webui and association of the corresponding mailman mail domain. i stuck some notes in comments on #link https://review.opendev.org/867981
19:07:42 <fungi> i meant to stick them on a different child change. anyway i'll get that into a docs change
19:08:12 <fungi> i manually created a django site (per the notes there) for lists.zuul-ci.org and then set that as the site for the lists.zuul-ci.org mail domain
19:08:49 <fungi> i also manually applied #link https://review.opendev.org/867987 to production and restarted the containers just to make sure what we're seeing matches the earlier tests
19:09:24 <fungi> really just the edit to /var/lib/mailman/web/settings.py
19:09:33 <clarkb> fungi: will the other domains in mm3 get similar treatment once yo uare happy with the zuul results? Also, where does the dummy domain fit into the future planning here?
19:09:57 <fungi> i think we can dispense with the dummy domain i was originally planning to add
19:10:17 <fungi> things seem so far to be working as intended with lists.opendev.org as our first site/domain
19:10:51 <fungi> but keep an eye out for any subtle oddities in list post headers or post moderation
19:11:03 <fungi> or inconsistencies i haven't spotted in the web interfaces
19:11:07 <clarkb> cool that helps simplify things
19:11:58 <fungi> if all goes well, and once we get 867987 deployed officially, we can decide to either upgrade to latest mailman 3 with #link https://review.opendev.org/869210 or migrate more sites first
19:12:21 <clarkb> fwiw the domain listing at https://lists.opendev.org seems to show what we want after your manual changes
19:12:32 <fungi> i'm in favor of upgrading first, before we migrate more sites
19:12:46 <clarkb> makes sense. I'll need to review that change I guess
19:12:51 <fungi> just to reduce the blast radius if the upgrade goes sideways
19:13:44 <fungi> anyway, that's it for my update. nice to have a little progress on this again, hopefully it will pick up steam in the coming weeks
19:13:53 <clarkb> to summarize then we need to land 867987 to sync up with your manual changes. Then we can upgrade mm3. Then we can schedule more list moves
19:14:11 <fungi> that would be my recommendation, yes
19:14:39 <clarkb> sounds good, thanks
19:15:11 <fungi> and i saw your comments on 867987, will address those shortly
19:15:34 <clarkb> #topic Gerrit Updates
19:16:00 <clarkb> fungi updated All-Projects to reject implicit merges
19:16:27 <clarkb> be on the lookout for people finding the new behavior to be undesired. They should be able to override the setting in their own project acls but I think a global reject is safest
19:16:27 <fungi> yeah, just a heads up to keep an eye out for any problems people may encounter, though i don't expect any
19:16:57 <clarkb> Then seprately we have ~3 interrelated items that we should try and coordinate around
19:17:30 <clarkb> 1) Gerrit 3.7.4 update https://review.opendev.org/c/opendev/system-config/+/885317 2) jeepyb updates needing a new gerrit image deployment and 3) replication task leaks
19:18:16 <clarkb> I think my preferred course of action would be to land 885317 and build new images if ew are happy with that change. Then schedule a restart to deploy 885317 which will also address 2). During that restart we should manually clera our the replication task files (or move them aside etc)
19:19:02 <clarkb> if that sounds reasonable I should be able to help with that around 2100 UTC tomorrow or the next day etc so that we can do it during a quieter gerrit period
19:19:42 <fungi> i can do 21z tomorrow, sure
19:20:43 <clarkb> that assumes 885317 land sbefore then, but that seems reaosnably safe to assume
19:21:06 <clarkb> I'll check in tomorrow morning then ad we can take it from there. fungi maybe you can take a look at the gerrit replication stuff before then too just to hvae a bit of familiarity with it?
19:21:26 <fungi> sure. where are the notes on that again?
19:21:48 <fungi> never mind, i see the links in the agenda
19:21:56 <clarkb> probably mostly in the changes i wrote around it and the upstream issues
19:22:10 <clarkb> /home/gerrit2/review_site/data/replication/ref-updates/ is the host side path for the contents on review02
19:22:11 <fungi> yep, will refresh my noodle on those
19:22:17 <clarkb> they are in a different path on the container side
19:22:24 <clarkb> thanks
19:22:51 <clarkb> That was all I had for Gerrit.
19:22:58 <clarkb> #topic Server Upgrades
19:23:19 <clarkb> I continue to have no progress here myself. I saw that corvus deleted the old zuul executors though so that is good cleanup
19:23:42 <corvus> yup, all gone, and zuul still thinks it has 12 executors so i think i got the right ones :)
19:23:50 <clarkb> excellent
19:24:09 <corvus> that's zm, and ze upgraded to jammy now
19:24:29 <corvus> also the registry is jammy
19:24:49 <clarkb> only the schedulers remain in zuul land
19:24:55 <fungi> and the lb
19:24:56 <corvus> yep
19:25:39 <clarkb> Anything else on the subject of server upgrades?
19:26:26 <clarkb> #topic Fedora Cleanup
19:27:01 <clarkb> Sometime in the last couple days (the dateline makes it confusing) tonyb and I discussed making progresson this
19:27:20 <clarkb> Basically plan is to copy roles as necessary, make modifications, point base-test at modified roles and test from there
19:27:30 <clarkb> specifically for mirror selection updates to fedora
19:27:39 <clarkb> so that we can stop mirroring fedora before cleaning up the images entirely
19:28:03 <clarkb> I don't know how far tonyb got on that. I know tony wanted to test things a bit locally before pushing stuff up (since our mirrors are public anyway this is something that can be done)
19:29:08 <clarkb> Hopefully we'll have changes we can review soon
19:29:14 <clarkb> #topic Storyboard
19:29:39 <clarkb> Anything new to report here? Should we drop it out of the meeting agenda until we've had time to process any next steps?
19:30:00 <frickler> yes, I was thinking the same
19:30:10 <fungi> wfm unless something changes
19:30:12 <clarkb> The needed gerrit restart is related to updating issues correctly but only the lp side I think
19:30:31 <fungi> right, the sb equivalent has remained working
19:30:50 <clarkb> ack
19:30:55 <clarkb> Let's move on then
19:30:59 <fungi> because it uses the its plugin instead of hook scripts in jeepyb
19:31:27 <clarkb> #topic Gitea 1.20 upgrade
19:32:00 <clarkb> Gitea has released a 1.20.1 version now (they appear to have backpedalled on allowing all url types in markdown by default for securiry reasons)
19:32:30 <clarkb> I'm still trying to work through the breaking change slist and a list of unallowed url type swas on the todo. Hopefully 1.20.1 makes that simpler
19:32:39 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/886993 Gitea 1.20 change.
19:33:01 <clarkb> Unfortunately I'm struggling to collect hte access log files which is why that change currently fails in ci. I think I need to hold the node and check directly
19:33:18 <clarkb> (Access log format changes in 1.20 and I want to confirm we still get something useful from it)
19:33:59 <clarkb> The other itme on the todo list that will need help is the theme color selection change. I'm not quite parsing the intended update given by the release notes
19:34:11 <clarkb> w eneed to set some sort of attributes in the base template or something
19:34:39 <clarkb> No progress on the oauth2 jwt stuff unfortunately
19:34:50 <clarkb> at least not according to release notes. I guess I can test that when I hold a node
19:35:08 <clarkb> #topic Etherpad 1.9.1
19:35:18 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/887006 Etherpad 1.9.1
19:35:35 <clarkb> There is a held node if you look at the child chnage of ^ and pull up the logs for the failed test run (sorry I odn't have that in front of my right now)
19:35:54 <clarkb> After making the config change from false to null values for color and use rname the UI appears to act like it did in 1.8.x
19:36:24 <clarkb> I did some simple testing with the held node and I think that 1.9.1 chnage is ready for review/testing by others and hopefully merging
19:37:14 <clarkb> #topic Python Container Image Updates
19:37:23 <clarkb> #link https://review.opendev.org/q/topic:bookworm-python
19:37:30 <clarkb> #link https://review.opendev.org/q/topic:force-base-image-build
19:37:39 <clarkb> I think this is moving along well. Mostly needs reviews at this point
19:38:02 <clarkb> Note the limnoria image update should be approved during a period where we don't have any meetings running as the image update will restart the bot and interrupt any meetings
19:38:58 <fungi> yeah, i keep forgetting to do that at an appropriate time, i only ever remember about it at inconvenient times
19:39:30 <clarkb> heh ya I theoretically have time here but so many distractions
19:40:20 <clarkb> I've also made a note in the agenda that we should consider deleting our old bare python version tags so that image build sfor anything still using them fail forcing a move to the distro + python version tags
19:40:36 <clarkb> and a zuul-jobs tool that can scan docker hub and gerrit to determine if change tags are leaked and can be cleaned up
19:41:06 <clarkb> I can probably look at that second thing as it is probably similarish to the script we used to sync from docker hub to quay and back again
19:41:33 <clarkb> #topic Meetpad LE Cert Refresh
19:42:04 <clarkb> As fungi pointed out we have a cert but the hanlder that syncs from the acme path on disk to the jitsi meet service path on disk didn't run so we don't have a new cert where we need it
19:42:13 <clarkb> this means simply restarting services won't fix this
19:42:47 <clarkb> instead we should determine what is necessary to rerun the acme system in its entirty so that it either succeeds and we win or it fails and we can debug further
19:42:53 <fungi> frickler was the one who spotted that actually
19:43:27 <clarkb> alternatively w ecan manually copy the file and manually restart services
19:43:46 <frickler> I'm not sure if the rerun will work if LE thinks the cert has already been refreshed
19:43:51 <clarkb> ianw: if you get a chance to see this, what do you think is the preferred way to trigger a cert refresh from scratch
19:44:04 <frickler> so the second option would be my preferred choice
19:44:04 <clarkb> frickler: ya there are some things we need to do to force the system to rerun again
19:44:25 <clarkb> if ianw remembers those without us needing to dig into the acme stuff we should probably document it and attempt that with meetpad
19:44:35 <fungi> in theory, the staged cert being absent would look like a newly-deployed server to acme.sh, right? which should force it to request a new cert?
19:44:43 <clarkb> I think there is some record file it keeps that we can either edit a timestamp in or delete the file then it retriggers
19:45:00 <fungi> ahh, so there's more retained state than just the cert
19:45:04 <clarkb> fungi: maybe? I think it keeps some state info in a config/metadata file though
19:45:11 <clarkb> it may check both things
19:45:58 <clarkb> side note we should followup with acme.sh on fixing that bug thta preventing us from upgrading to latest resulting in backporting patches instead
19:47:06 <clarkb> I'll try to take a look at that stuff and if I hit a wall we can do the manual sync and restart for now
19:47:15 <clarkb> #topic Open Discussion
19:47:18 <clarkb> Anything else?
19:49:03 <fungi> oh, one change
19:49:19 <fungi> #link https://review.opendev.org/888901 Add 32GB nodes Ubuntu Focal and Jammy nodes
19:49:38 <fungi> that basically just duplicates the existing bionic nodes in vexxhost ca-ymq-1
19:49:56 <fungi> so probably uncontroversial, but i figure it might need another set of eyes/opinions
19:50:17 <clarkb> ya I think we can go ahead and approve that. I'll do that as soon as I'm done with the meeting
19:51:07 <fungi> cool
19:51:48 <clarkb> sounds like that may be everything then. Thank you everyone!
19:51:56 <clarkb> #endmeeting