Tuesday, 2023-08-01

clarkbhello everyone19:00
clarkbthe meeting will get started shortly19:00
fungiohai19:00
tonybo/19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Aug  1 19:01:35 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/TGHMCISZUPXZ6QOAOOXRSAIHN6WYKPP4/ Our Agenda19:02
clarkb#topic Announcements19:03
clarkbA reminder that I won't be able to attend next week due to travel19:03
clarkbI also looked at service coordinator things and it appears I did not propose an election time frame for the next election during the last one19:04
* tonyb has no idea how to parse that statement19:04
clarkbwe are theoretically supposed to run elections every 6 months and I have tried to list dates for the next period during the current election19:05
clarkblast january/february I did not do this19:05
clarkbDoing a direct january 31 - Feburary 14, 2023 + 6 months set of math I think the nomination period would open today and end in two weeks.19:06
tonybOh okay, I think I get it19:06
clarkbI'm happy to do that though for selfish reasons would prefer to delay by a week19:06
clarkbDelaying would allow me to get emails out on time and travel without being in the middle of nomination period19:07
fungii'm in favor of time travel, yep19:07
fungior time and travel separately, whatever works19:08
tonybFWIW I'm fine with a delay.19:08
clarkb:) cool in that case I'll send emails to make August 8 to 22 as a nomination period with a week long voting period afterwards should that be necessary19:08
fungithanks!19:08
clarkb#topic Bastion Host19:09
clarkb#link https://review.opendev.org/q/topic:bridge-backups19:09
clarkbThis stack needs reviews. Other than that I think things are going well with the bastion19:09
clarkb#topic Mailman 319:09
fungitwo steps forward, one step back19:09
fungiwhile waiting for reviews on the open topic:mailman3 changes, i was going to do a new held node and run through more imports to make sure the import process isn't broken by the new mailman releases19:10
fungihowever that timed out, then a recheck failed19:10
fungimailman rest api wasn't starting up, on the second failed node i found tracebacks in the container logs19:10
fungistory short, a recent importlib_resources release removed a bunch of deprecated stuff, breaking mailman-core19:11
fungii proposed a new change to pin back importlib_resources<6 and restacked the other changes on top of that19:11
clarkbfungi: was that sufficient to get a held node and get back onto the original plan?19:12
fungii think so, will know shortly19:12
fungiin better news, i added a section to the migration pad detailing the manual django site/mail host creation steps19:12
clarkb#link https://review.opendev.org/c/opendev/system-config/+/890220 pin importlib_resources for mm319:12
fungi#link https://etherpad.opendev.org/p/mm3migration "Add Django Site and Postorius Mail Host"19:12
fungionce i get a good held node, i'll do those manual steps on it and then try to run through some imports from production data again19:13
clarkband that will be on top of the upgraded version right?19:13
fungibut in the meantime, assuming those changes pass latest testing, they should still be ready to merge19:13
fungicorrect19:13
fungisince we decided we wanted to upgrade before scheduling the remaining imports19:14
clarkbsounds good, thank you19:14
clarkb++19:14
fungiso i just want to be sure it will go as smoothly as the previous imports19:14
fungibut the topic:mailman3 changes still need reviews if anyone gets a spare moment or two19:14
clarkb#topic Gerrit Updates19:16
clarkbAs planned last week we (mostly fungi) managed to push some of this forward. We are running on Gerrit 3.7.4 images with updated jeepyb now. During the restart process we cleared out the replication plugins' on disk waiting queue and the plugin seemed fine starting back up that way19:17
clarkbfungi: do we have confirmation yet of happy lp bug updates?19:17
fungii haven't checked, and nobody's said19:18
clarkback19:19
clarkbThe only other remaining gerrit todos are deciding if we want to do anything different with the replication plugin (I think what we've got now is a decent on the fly workaround actually) and a start towards a 3.8 upgrade19:20
clarkbneither of which are super urgent19:20
clarkb#topic Server Upgrades19:21
clarkbI have nothing new on this topic19:21
clarkbThe last week has generally been full of real world distractions. Both good and bad :)19:21
tonybI got distracted with python and zuul mirror updates19:21
clarkbsound slike w ecan go to the next topic then19:22
clarkb#topic Fedora Cleanup19:22
clarkbI noticed some discussion around general base job cleanup which I suspect is related to the fedora clenaup work19:23
tonybI've been working on the zuul side of this19:23
fricklerdevstack just dropped fedora support in master fwiw19:23
tonybI'll publish something this week19:23
tonybnothing super interesting19:24
clarkbok will look forward to reviewing those changes19:24
clarkb#topic Gitea 1.20 Upgrade19:24
clarkbThey are moving faster than I can keep up right now19:25
clarkbthere is a 1.20.2 release already19:25
tonybyikes19:25
clarkbI need ot update my change to 1.20.2 and hold a node to figure out access log file locations so that we can cross check the changes they made to that log file befor ethey end up in production19:26
clarkbunfortunately I don't think any of these releases hvae done anything to make oauth2 disablement easier to configure19:26
clarkbBut I guess it is a good thing they ar efixing bugs generally19:26
clarkb#topic Etherpad 1.9.1 upgrade19:27
clarkb#link https://review.opendev.org/c/opendev/system-config/+/887006 Etherpad 1.9.119:27
clarkbI feel like this is ready but it would be good to have others check the held system works for them too19:27
clarkbthe comments on that change have the held node ip and ianw appears to have checked it. Thank you ianw19:28
clarkbif another core can check it out we can schedule a day to land and monitor that change?19:28
clarkblet me know if you think it looks good and I'll try to find a day that seems safe for that (but I'v egot ~6 days left on my trip that aren't consumed by flying so I'm in crunch time for doing things here too)19:30
clarkb#topic Python container image updates19:30
clarkbthe changes to update irc bots did land iirc and as far as I can tell the bots are all still functional?19:30
tonybI think so :⁠-⁠)19:31
tonybI'll get back to that after the zuul stuff19:31
tonybas discussed last time I'll target as much as possible to be on bookworm 3.1019:31
clarkbat this point moving consumer images to the new stuff then cleaning up the old stuff should be mostly smooth sailing at least as far as the base image is concerned. We may still need sort out $service on newer python problems19:32
tonybyeah.  it looks like there are many that will be very simple19:33
tonybptgbot will need small amounts of work19:33
tonybgotta find a core for that project :⁠-⁠P19:33
clarkbzuul and nodepool should be easy transitions and will allow them to clean up backported package installs19:34
clarkbbut ya it will be an iterative process to get through the list19:34
tonybyup.19:34
clarkb#topic Meetpad LE Cert Update19:34
fungiwe're 12 days out from expiration, fwiw19:35
clarkbI saw frickler and ianw discussing and debugging this. It appears that some sort of cache is preventing a new DNS verification key from being issued which prevents our ansible machinery from triggering appropriately?19:35
fricklerianw and me noticed that this is an interesting edge case not currently handled properly in driver.sh, yes19:35
clarkbfrickler: can you give us a quick overview of that edge case?19:36
frickleressentially what you wrote, the dns-01 is still valid from a previous attempt, so acme.sh actually issues a new cert at the first stage ("issue" iirc)19:37
fricklerwhile driver.sh only expects some dns auth verification record to be produced at that point19:37
clarkbaha19:37
fricklerwe can retry in a couple of days to see if the old auth expired by then19:37
clarkbmaybe wait until 7 days before expiry and if the cache is still stale we can manually copy the cert and restart the docker services?19:38
fricklerother option are either generation a new key or changing the cert content like adding a different hostname, meetpad02 maybe19:38
frickleror doing the manual path, right19:38
clarkbI wonder if changing the order of the names in the cert would do it19:39
clarkbthat might be an easy option, if that fails then do the manual copy19:39
clarkbmostly thinking that we can debug acme and improve it on a longer time frame than 12 days19:39
fungias far as avoiding it in the future, do we have the ability to pick the validity period for those tokens?19:40
clarkbso decoupe that from making the service happy again19:40
fricklerI don't think we can choose an interval there19:40
frickleralso it likely isn't easy to intentionally trigger the current situation19:40
clarkbya so fixing the service is a higher priority than fixing our acme.sh integration19:41
clarkbour own evidence would indicate this is rare19:41
fricklerroot cause seems to have been an internal failure at LE at the initial renew attempt19:41
clarkbneat19:41
clarkbmy suggestion is wait until ~7 days before expiry. Attempt reissue automatically and if that fails manually copy the file and restart containers19:42
fungiwfm19:42
fricklerack19:42
clarkb#topic Open Discussion19:42
clarkbI think fungi wanted to bring up matrix room and space creation on our homeserver19:43
fungithe starlingx community is interested in making use of our matrix homeserver19:43
fungisounds like they may want as many as 20 channels (some general discussion channels, per-project channels, separate channels for gerritbot)19:43
ildikovo/19:44
ildikovmore along the lines of 12, but yes, the idea is to have rooms/channels per project team19:44
fungier, "rooms" i guess they're termed in matrix parlance. and with the number of rooms they're thinking about, they also want to know if grouping those with a "space" makes sense19:44
clarkbI'm not sure we're in a spot to say using a space makes sense since we've not done it before19:45
clarkbbut we can certainly put the rooms in a space and find out19:45
fungialso there was some question as to the creation workflow. for rooms (and spaces) on our opendev.org homeserver, that's only doable by our matrix admin account, so requests for new rooms would need to come to one of us19:45
tonybis there an API we can use later?19:46
clarkbtonyb: yes I think so19:46
tonybokay.  that's nice19:46
ildikovyeah, I tried to click around, but I can only create a room on matrix.org19:47
clarkbit is intentional to restrict what rooms can be created19:47
clarkber who can create rooms19:48
fungianyway, it sounds like there's no objection to hosting multiple rooms and a space for the starlingx community if they decide that's what they want19:48
clarkbbut as mentioned we could potentially automate the actual creation after writing a tool to do it from reviewed inputs19:48
tonybhttps://element.io/blog/spaces-the-next-frontier/ also introduces subspaces19:49
clarkbI don't have any objections from a service hosting standpoint. My only concern is as a user I get frustrated when projects have a bunch of channels nad they are all dead or say not my problem19:49
clarkbOpenstack is particularly bad about this for example19:49
fungiyeah, we can definitely provide them with feedback/recommendations that room proliferation can lead to user confusion19:49
ildikovpeople in the StarlingX community are currently saying that they are uncomfortable throwing every topic into one channel on IRC19:50
ildikovso for them it seems more appealing to have channels/rooms for more focused discussions, and we landed on grouping by project teams19:50
tonybI guess all we can do is explain the options that we can host/manage and let the community decide 19:50
corvusyes i think a space would be appropriate19:50
fungiildikov: sounds like you can let me know what the names are they want (now that i've figured out where to create those). with that many rooms they probably need to pick a common name prefix for clarity, like #starlingx-whatever or #stx-whatever19:51
clarkbildikov: yes I think those optimizations prioritize long term core developers over new contributors or people trying to debug a problem. Its a choice and one I personally dislike but the homeserver can handle X rooms just fine19:51
fungiwe can also start out with just one so they can try it out, i suppose19:51
fungiwhatever works better19:51
ildikovclarkb: the plan is to have a general channel, where new people can bring up any topic, etc19:52
tonyb++19:52
clarkbfungi: the initial test room we made cleaned up just fine so  Ithink we can go ahead and creat ethem all19:52
clarkbif we need to they can be cleaned up later19:52
ildikovwe've tried the one channel option through IRC, but people didn't really get into it19:52
tonybso the homeserver will be OpenDev.org19:53
ildikov@fungi I'll get that info to you19:53
ildikovtonyb: +119:53
tonyband There will be a stx space19:53
tonybwith several rooms?19:53
fungiyes19:53
fungiwhere several is maybe a dozen19:54
tonybyup19:54
clarkbtonyb: that was my take away. And fungi's idea to use a consistent room prefix in #opendev like starlingx or stx is also a good idea imo19:54
ildikovI would call the space StarlingX, and yes, several rooms, I think there are currently 11 project teams, I'll double check with the team leads who objects to having a room19:54
fungimainly, i think "spaces" don't operate like room namespaces, so you still end up needing to namespace the rooms themselves for clarity19:54
tonybyeah, depending how the space and room names interact19:54
ildikovclarkb: +1, that's how I wrote up the proposal to the community19:54
tonybcool beans19:55
corvusspaces are  just groups of rooms, no namespacing19:55
tonybokay19:55
corvus(also, users can make their own spaces, server-hosted spaces are basically just suggestions of how users can discover rooms -- which is appropriate here)19:55
corvusroom logging and gerrit bot announcements are available; the only service we're missing in matrix is meetbot19:57
ildikovcorvus: that's great info, thank you for sharing19:57
fungiyeah, i dug up the links to the right files for the bots and pasted them in #opendev earlier but i'll try to collect all this into a rudimentary document19:57
ildikovfungi: +1, I'm happy to help19:59
clarkbsound slike a plan19:59
clarkband we are at time. Thank you everyone!19:59
fungithanks clarkb!19:59
tonybthanks clarkb 20:00
clarkbreminder I can't host a meeting next week20:00
tonybthanks all20:00
clarkbbut happy for osmeone else to chair without me20:00
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue Aug  1 20:00:12 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-08-01-19.01.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-08-01-19.01.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-08-01-19.01.log.html20:00
ildikovthank you all!!20:00
corvusildikov: https://zuul-ci.org/docs/zuul/latest/howtos/matrix.html  may be useful20:00
ildikovcorvus: it's a good one, I've been using it already :)20:03
ildikovthank you for sharing20:03
ildikovI was more lost with regards to setting up rooms, etc20:03
corvusildikov: understood :)20:03
ildikovand I'm very grateful for everyone's help here! :)20:06

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!