#opendev-meeting log

19:01:35 <clarkb> #startmeeting infra
19:01:35 <opendevmeet> Meeting started Tue Aug  1 19:01:35 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:35 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:35 <opendevmeet> The meeting name has been set to 'infra'
19:02:37 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/TGHMCISZUPXZ6QOAOOXRSAIHN6WYKPP4/ Our Agenda
19:03:21 <clarkb> #topic Announcements
19:03:44 <clarkb> A reminder that I won't be able to attend next week due to travel
19:04:13 <clarkb> I also looked at service coordinator things and it appears I did not propose an election time frame for the next election during the last one
19:04:55 * tonyb has no idea how to parse that statement
19:05:36 <clarkb> we are theoretically supposed to run elections every 6 months and I have tried to list dates for the next period during the current election
19:05:46 <clarkb> last january/february I did not do this
19:06:24 <clarkb> Doing a direct january 31 - Feburary 14, 2023 + 6 months set of math I think the nomination period would open today and end in two weeks.
19:06:37 <tonyb> Oh okay, I think I get it
19:06:49 <clarkb> I'm happy to do that though for selfish reasons would prefer to delay by a week
19:07:16 <clarkb> Delaying would allow me to get emails out on time and travel without being in the middle of nomination period
19:07:37 <fungi> i'm in favor of time travel, yep
19:08:00 <fungi> or time and travel separately, whatever works
19:08:01 <tonyb> FWIW I'm fine with a delay.
19:08:24 <clarkb> :) cool in that case I'll send emails to make August 8 to 22 as a nomination period with a week long voting period afterwards should that be necessary
19:08:35 <fungi> thanks!
19:09:13 <clarkb> #topic Bastion Host
19:09:15 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups
19:09:27 <clarkb> This stack needs reviews. Other than that I think things are going well with the bastion
19:09:31 <clarkb> #topic Mailman 3
19:09:40 <fungi> two steps forward, one step back
19:10:17 <fungi> while waiting for reviews on the open topic:mailman3 changes, i was going to do a new held node and run through more imports to make sure the import process isn't broken by the new mailman releases
19:10:30 <fungi> however that timed out, then a recheck failed
19:10:50 <fungi> mailman rest api wasn't starting up, on the second failed node i found tracebacks in the container logs
19:11:13 <fungi> story short, a recent importlib_resources release removed a bunch of deprecated stuff, breaking mailman-core
19:11:33 <fungi> i proposed a new change to pin back importlib_resources<6 and restacked the other changes on top of that
19:12:02 <clarkb> fungi: was that sufficient to get a held node and get back onto the original plan?
19:12:20 <fungi> i think so, will know shortly
19:12:23 <fungi> in better news, i added a section to the migration pad detailing the manual django site/mail host creation steps
19:12:31 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/890220 pin importlib_resources for mm3
19:12:36 <fungi> #link https://etherpad.opendev.org/p/mm3migration "Add Django Site and Postorius Mail Host"
19:13:07 <fungi> once i get a good held node, i'll do those manual steps on it and then try to run through some imports from production data again
19:13:23 <clarkb> and that will be on top of the upgraded version right?
19:13:27 <fungi> but in the meantime, assuming those changes pass latest testing, they should still be ready to merge
19:13:33 <fungi> correct
19:14:02 <fungi> since we decided we wanted to upgrade before scheduling the remaining imports
19:14:04 <clarkb> sounds good, thank you
19:14:07 <clarkb> ++
19:14:17 <fungi> so i just want to be sure it will go as smoothly as the previous imports
19:14:55 <fungi> but the topic:mailman3 changes still need reviews if anyone gets a spare moment or two
19:16:14 <clarkb> #topic Gerrit Updates
19:17:15 <clarkb> As planned last week we (mostly fungi) managed to push some of this forward. We are running on Gerrit 3.7.4 images with updated jeepyb now. During the restart process we cleared out the replication plugins' on disk waiting queue and the plugin seemed fine starting back up that way
19:17:31 <clarkb> fungi: do we have confirmation yet of happy lp bug updates?
19:18:38 <fungi> i haven't checked, and nobody's said
19:19:20 <clarkb> ack
19:20:03 <clarkb> The only other remaining gerrit todos are deciding if we want to do anything different with the replication plugin (I think what we've got now is a decent on the fly workaround actually) and a start towards a 3.8 upgrade
19:20:09 <clarkb> neither of which are super urgent
19:21:19 <clarkb> #topic Server Upgrades
19:21:25 <clarkb> I have nothing new on this topic
19:21:44 <clarkb> The last week has generally been full of real world distractions. Both good and bad :)
19:21:59 <tonyb> I got distracted with python and zuul mirror updates
19:22:24 <clarkb> sound slike w ecan go to the next topic then
19:22:41 <clarkb> #topic Fedora Cleanup
19:23:02 <clarkb> I noticed some discussion around general base job cleanup which I suspect is related to the fedora clenaup work
19:23:10 <tonyb> I've been working on the zuul side of this
19:23:21 <frickler> devstack just dropped fedora support in master fwiw
19:23:29 <tonyb> I'll publish something this week
19:24:12 <tonyb> nothing super interesting
19:24:24 <clarkb> ok will look forward to reviewing those changes
19:24:31 <clarkb> #topic Gitea 1.20 Upgrade
19:25:16 <clarkb> They are moving faster than I can keep up right now
19:25:22 <clarkb> there is a 1.20.2 release already
19:25:51 <tonyb> yikes
19:26:07 <clarkb> I need ot update my change to 1.20.2 and hold a node to figure out access log file locations so that we can cross check the changes they made to that log file befor ethey end up in production
19:26:25 <clarkb> unfortunately I don't think any of these releases hvae done anything to make oauth2 disablement easier to configure
19:26:59 <clarkb> But I guess it is a good thing they ar efixing bugs generally
19:27:26 <clarkb> #topic Etherpad 1.9.1 upgrade
19:27:29 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/887006 Etherpad 1.9.1
19:27:47 <clarkb> I feel like this is ready but it would be good to have others check the held system works for them too
19:28:21 <clarkb> the comments on that change have the held node ip and ianw appears to have checked it. Thank you ianw
19:28:37 <clarkb> if another core can check it out we can schedule a day to land and monitor that change?
19:30:17 <clarkb> let me know if you think it looks good and I'll try to find a day that seems safe for that (but I'v egot ~6 days left on my trip that aren't consumed by flying so I'm in crunch time for doing things here too)
19:30:30 <clarkb> #topic Python container image updates
19:30:47 <clarkb> the changes to update irc bots did land iirc and as far as I can tell the bots are all still functional?
19:31:05 <tonyb> I think so :⁠-⁠)
19:31:20 <tonyb> I'll get back to that after the zuul stuff
19:31:52 <tonyb> as discussed last time I'll target as much as possible to be on bookworm 3.10
19:32:10 <clarkb> at this point moving consumer images to the new stuff then cleaning up the old stuff should be mostly smooth sailing at least as far as the base image is concerned. We may still need sort out $service on newer python problems
19:33:00 <tonyb> yeah.  it looks like there are many that will be very simple
19:33:28 <tonyb> ptgbot will need small amounts of work
19:33:46 <tonyb> gotta find a core for that project :⁠-⁠P
19:34:08 <clarkb> zuul and nodepool should be easy transitions and will allow them to clean up backported package installs
19:34:20 <clarkb> but ya it will be an iterative process to get through the list
19:34:34 <tonyb> yup.
19:34:40 <clarkb> #topic Meetpad LE Cert Update
19:35:09 <fungi> we're 12 days out from expiration, fwiw
19:35:27 <clarkb> I saw frickler and ianw discussing and debugging this. It appears that some sort of cache is preventing a new DNS verification key from being issued which prevents our ansible machinery from triggering appropriately?
19:35:37 <frickler> ianw and me noticed that this is an interesting edge case not currently handled properly in driver.sh, yes
19:36:09 <clarkb> frickler: can you give us a quick overview of that edge case?
19:37:01 <frickler> essentially what you wrote, the dns-01 is still valid from a previous attempt, so acme.sh actually issues a new cert at the first stage ("issue" iirc)
19:37:32 <frickler> while driver.sh only expects some dns auth verification record to be produced at that point
19:37:38 <clarkb> aha
19:37:57 <frickler> we can retry in a couple of days to see if the old auth expired by then
19:38:25 <clarkb> maybe wait until 7 days before expiry and if the cache is still stale we can manually copy the cert and restart the docker services?
19:38:37 <frickler> other option are either generation a new key or changing the cert content like adding a different hostname, meetpad02 maybe
19:38:59 <frickler> or doing the manual path, right
19:39:24 <clarkb> I wonder if changing the order of the names in the cert would do it
19:39:35 <clarkb> that might be an easy option, if that fails then do the manual copy
19:39:58 <clarkb> mostly thinking that we can debug acme and improve it on a longer time frame than 12 days
19:40:05 <fungi> as far as avoiding it in the future, do we have the ability to pick the validity period for those tokens?
19:40:05 <clarkb> so decoupe that from making the service happy again
19:40:25 <frickler> I don't think we can choose an interval there
19:40:48 <frickler> also it likely isn't easy to intentionally trigger the current situation
19:41:15 <clarkb> ya so fixing the service is a higher priority than fixing our acme.sh integration
19:41:25 <clarkb> our own evidence would indicate this is rare
19:41:28 <frickler> root cause seems to have been an internal failure at LE at the initial renew attempt
19:41:33 <clarkb> neat
19:42:04 <clarkb> my suggestion is wait until ~7 days before expiry. Attempt reissue automatically and if that fails manually copy the file and restart containers
19:42:33 <fungi> wfm
19:42:34 <frickler> ack
19:42:51 <clarkb> #topic Open Discussion
19:43:05 <clarkb> I think fungi wanted to bring up matrix room and space creation on our homeserver
19:43:11 <fungi> the starlingx community is interested in making use of our matrix homeserver
19:43:49 <fungi> sounds like they may want as many as 20 channels (some general discussion channels, per-project channels, separate channels for gerritbot)
19:44:00 <ildikov> o/
19:44:24 <ildikov> more along the lines of 12, but yes, the idea is to have rooms/channels per project team
19:44:29 <fungi> er, "rooms" i guess they're termed in matrix parlance. and with the number of rooms they're thinking about, they also want to know if grouping those with a "space" makes sense
19:45:31 <clarkb> I'm not sure we're in a spot to say using a space makes sense since we've not done it before
19:45:41 <clarkb> but we can certainly put the rooms in a space and find out
19:45:47 <fungi> also there was some question as to the creation workflow. for rooms (and spaces) on our opendev.org homeserver, that's only doable by our matrix admin account, so requests for new rooms would need to come to one of us
19:46:31 <tonyb> is there an API we can use later?
19:46:44 <clarkb> tonyb: yes I think so
19:46:55 <tonyb> okay.  that's nice
19:47:31 <ildikov> yeah, I tried to click around, but I can only create a room on matrix.org
19:47:51 <clarkb> it is intentional to restrict what rooms can be created
19:48:05 <clarkb> er who can create rooms
19:48:16 <fungi> anyway, it sounds like there's no objection to hosting multiple rooms and a space for the starlingx community if they decide that's what they want
19:48:27 <clarkb> but as mentioned we could potentially automate the actual creation after writing a tool to do it from reviewed inputs
19:49:13 <tonyb> https://element.io/blog/spaces-the-next-frontier/ also introduces subspaces
19:49:17 <clarkb> I don't have any objections from a service hosting standpoint. My only concern is as a user I get frustrated when projects have a bunch of channels nad they are all dead or say not my problem
19:49:32 <clarkb> Openstack is particularly bad about this for example
19:49:52 <fungi> yeah, we can definitely provide them with feedback/recommendations that room proliferation can lead to user confusion
19:50:07 <ildikov> people in the StarlingX community are currently saying that they are uncomfortable throwing every topic into one channel on IRC
19:50:47 <ildikov> so for them it seems more appealing to have channels/rooms for more focused discussions, and we landed on grouping by project teams
19:50:56 <tonyb> I guess all we can do is explain the options that we can host/manage and let the community decide
19:50:59 <corvus> yes i think a space would be appropriate
19:51:27 <fungi> ildikov: sounds like you can let me know what the names are they want (now that i've figured out where to create those). with that many rooms they probably need to pick a common name prefix for clarity, like #starlingx-whatever or #stx-whatever
19:51:34 <clarkb> ildikov: yes I think those optimizations prioritize long term core developers over new contributors or people trying to debug a problem. Its a choice and one I personally dislike but the homeserver can handle X rooms just fine
19:51:48 <fungi> we can also start out with just one so they can try it out, i suppose
19:51:56 <fungi> whatever works better
19:52:20 <ildikov> clarkb: the plan is to have a general channel, where new people can bring up any topic, etc
19:52:23 <tonyb> ++
19:52:36 <clarkb> fungi: the initial test room we made cleaned up just fine so  Ithink we can go ahead and creat ethem all
19:52:45 <clarkb> if we need to they can be cleaned up later
19:52:59 <ildikov> we've tried the one channel option through IRC, but people didn't really get into it
19:53:12 <tonyb> so the homeserver will be OpenDev.org
19:53:15 <ildikov> @fungi I'll get that info to you
19:53:22 <ildikov> tonyb: +1
19:53:27 <tonyb> and There will be a stx space
19:53:33 <tonyb> with several rooms?
19:53:57 <fungi> yes
19:54:02 <fungi> where several is maybe a dozen
19:54:08 <tonyb> yup
19:54:09 <clarkb> tonyb: that was my take away. And fungi's idea to use a consistent room prefix in #opendev like starlingx or stx is also a good idea imo
19:54:26 <ildikov> I would call the space StarlingX, and yes, several rooms, I think there are currently 11 project teams, I'll double check with the team leads who objects to having a room
19:54:42 <fungi> mainly, i think "spaces" don't operate like room namespaces, so you still end up needing to namespace the rooms themselves for clarity
19:54:46 <tonyb> yeah, depending how the space and room names interact
19:54:54 <ildikov> clarkb: +1, that's how I wrote up the proposal to the community
19:55:04 <tonyb> cool beans
19:55:16 <corvus> spaces are  just groups of rooms, no namespacing
19:55:22 <tonyb> okay
19:55:45 <corvus> (also, users can make their own spaces, server-hosted spaces are basically just suggestions of how users can discover rooms -- which is appropriate here)
19:57:10 <corvus> room logging and gerrit bot announcements are available; the only service we're missing in matrix is meetbot
19:57:33 <ildikov> corvus: that's great info, thank you for sharing
19:57:53 <fungi> yeah, i dug up the links to the right files for the bots and pasted them in #opendev earlier but i'll try to collect all this into a rudimentary document
19:59:40 <ildikov> fungi: +1, I'm happy to help
19:59:42 <clarkb> sound slike a plan
19:59:49 <clarkb> and we are at time. Thank you everyone!
19:59:54 <fungi> thanks clarkb!
20:00:00 <tonyb> thanks clarkb
20:00:02 <clarkb> reminder I can't host a meeting next week
20:00:03 <tonyb> thanks all
20:00:08 <clarkb> but happy for osmeone else to chair without me
20:00:12 <clarkb> #endmeeting