19:01:35 #startmeeting infra 19:01:35 Meeting started Tue Aug 1 19:01:35 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:35 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:35 The meeting name has been set to 'infra' 19:02:37 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/TGHMCISZUPXZ6QOAOOXRSAIHN6WYKPP4/ Our Agenda 19:03:21 #topic Announcements 19:03:44 A reminder that I won't be able to attend next week due to travel 19:04:13 I also looked at service coordinator things and it appears I did not propose an election time frame for the next election during the last one 19:04:55 * tonyb has no idea how to parse that statement 19:05:36 we are theoretically supposed to run elections every 6 months and I have tried to list dates for the next period during the current election 19:05:46 last january/february I did not do this 19:06:24 Doing a direct january 31 - Feburary 14, 2023 + 6 months set of math I think the nomination period would open today and end in two weeks. 19:06:37 Oh okay, I think I get it 19:06:49 I'm happy to do that though for selfish reasons would prefer to delay by a week 19:07:16 Delaying would allow me to get emails out on time and travel without being in the middle of nomination period 19:07:37 i'm in favor of time travel, yep 19:08:00 or time and travel separately, whatever works 19:08:01 FWIW I'm fine with a delay. 19:08:24 :) cool in that case I'll send emails to make August 8 to 22 as a nomination period with a week long voting period afterwards should that be necessary 19:08:35 thanks! 19:09:13 #topic Bastion Host 19:09:15 #link https://review.opendev.org/q/topic:bridge-backups 19:09:27 This stack needs reviews. Other than that I think things are going well with the bastion 19:09:31 #topic Mailman 3 19:09:40 two steps forward, one step back 19:10:17 while waiting for reviews on the open topic:mailman3 changes, i was going to do a new held node and run through more imports to make sure the import process isn't broken by the new mailman releases 19:10:30 however that timed out, then a recheck failed 19:10:50 mailman rest api wasn't starting up, on the second failed node i found tracebacks in the container logs 19:11:13 story short, a recent importlib_resources release removed a bunch of deprecated stuff, breaking mailman-core 19:11:33 i proposed a new change to pin back importlib_resources<6 and restacked the other changes on top of that 19:12:02 fungi: was that sufficient to get a held node and get back onto the original plan? 19:12:20 i think so, will know shortly 19:12:23 in better news, i added a section to the migration pad detailing the manual django site/mail host creation steps 19:12:31 #link https://review.opendev.org/c/opendev/system-config/+/890220 pin importlib_resources for mm3 19:12:36 #link https://etherpad.opendev.org/p/mm3migration "Add Django Site and Postorius Mail Host" 19:13:07 once i get a good held node, i'll do those manual steps on it and then try to run through some imports from production data again 19:13:23 and that will be on top of the upgraded version right? 19:13:27 but in the meantime, assuming those changes pass latest testing, they should still be ready to merge 19:13:33 correct 19:14:02 since we decided we wanted to upgrade before scheduling the remaining imports 19:14:04 sounds good, thank you 19:14:07 ++ 19:14:17 so i just want to be sure it will go as smoothly as the previous imports 19:14:55 but the topic:mailman3 changes still need reviews if anyone gets a spare moment or two 19:16:14 #topic Gerrit Updates 19:17:15 As planned last week we (mostly fungi) managed to push some of this forward. We are running on Gerrit 3.7.4 images with updated jeepyb now. During the restart process we cleared out the replication plugins' on disk waiting queue and the plugin seemed fine starting back up that way 19:17:31 fungi: do we have confirmation yet of happy lp bug updates? 19:18:38 i haven't checked, and nobody's said 19:19:20 ack 19:20:03 The only other remaining gerrit todos are deciding if we want to do anything different with the replication plugin (I think what we've got now is a decent on the fly workaround actually) and a start towards a 3.8 upgrade 19:20:09 neither of which are super urgent 19:21:19 #topic Server Upgrades 19:21:25 I have nothing new on this topic 19:21:44 The last week has generally been full of real world distractions. Both good and bad :) 19:21:59 I got distracted with python and zuul mirror updates 19:22:24 sound slike w ecan go to the next topic then 19:22:41 #topic Fedora Cleanup 19:23:02 I noticed some discussion around general base job cleanup which I suspect is related to the fedora clenaup work 19:23:10 I've been working on the zuul side of this 19:23:21 devstack just dropped fedora support in master fwiw 19:23:29 I'll publish something this week 19:24:12 nothing super interesting 19:24:24 ok will look forward to reviewing those changes 19:24:31 #topic Gitea 1.20 Upgrade 19:25:16 They are moving faster than I can keep up right now 19:25:22 there is a 1.20.2 release already 19:25:51 yikes 19:26:07 I need ot update my change to 1.20.2 and hold a node to figure out access log file locations so that we can cross check the changes they made to that log file befor ethey end up in production 19:26:25 unfortunately I don't think any of these releases hvae done anything to make oauth2 disablement easier to configure 19:26:59 But I guess it is a good thing they ar efixing bugs generally 19:27:26 #topic Etherpad 1.9.1 upgrade 19:27:29 #link https://review.opendev.org/c/opendev/system-config/+/887006 Etherpad 1.9.1 19:27:47 I feel like this is ready but it would be good to have others check the held system works for them too 19:28:21 the comments on that change have the held node ip and ianw appears to have checked it. Thank you ianw 19:28:37 if another core can check it out we can schedule a day to land and monitor that change? 19:30:17 let me know if you think it looks good and I'll try to find a day that seems safe for that (but I'v egot ~6 days left on my trip that aren't consumed by flying so I'm in crunch time for doing things here too) 19:30:30 #topic Python container image updates 19:30:47 the changes to update irc bots did land iirc and as far as I can tell the bots are all still functional? 19:31:05 I think so :⁠-⁠) 19:31:20 I'll get back to that after the zuul stuff 19:31:52 as discussed last time I'll target as much as possible to be on bookworm 3.10 19:32:10 at this point moving consumer images to the new stuff then cleaning up the old stuff should be mostly smooth sailing at least as far as the base image is concerned. We may still need sort out $service on newer python problems 19:33:00 yeah. it looks like there are many that will be very simple 19:33:28 ptgbot will need small amounts of work 19:33:46 gotta find a core for that project :⁠-⁠P 19:34:08 zuul and nodepool should be easy transitions and will allow them to clean up backported package installs 19:34:20 but ya it will be an iterative process to get through the list 19:34:34 yup. 19:34:40 #topic Meetpad LE Cert Update 19:35:09 we're 12 days out from expiration, fwiw 19:35:27 I saw frickler and ianw discussing and debugging this. It appears that some sort of cache is preventing a new DNS verification key from being issued which prevents our ansible machinery from triggering appropriately? 19:35:37 ianw and me noticed that this is an interesting edge case not currently handled properly in driver.sh, yes 19:36:09 frickler: can you give us a quick overview of that edge case? 19:37:01 essentially what you wrote, the dns-01 is still valid from a previous attempt, so acme.sh actually issues a new cert at the first stage ("issue" iirc) 19:37:32 while driver.sh only expects some dns auth verification record to be produced at that point 19:37:38 aha 19:37:57 we can retry in a couple of days to see if the old auth expired by then 19:38:25 maybe wait until 7 days before expiry and if the cache is still stale we can manually copy the cert and restart the docker services? 19:38:37 other option are either generation a new key or changing the cert content like adding a different hostname, meetpad02 maybe 19:38:59 or doing the manual path, right 19:39:24 I wonder if changing the order of the names in the cert would do it 19:39:35 that might be an easy option, if that fails then do the manual copy 19:39:58 mostly thinking that we can debug acme and improve it on a longer time frame than 12 days 19:40:05 as far as avoiding it in the future, do we have the ability to pick the validity period for those tokens? 19:40:05 so decoupe that from making the service happy again 19:40:25 I don't think we can choose an interval there 19:40:48 also it likely isn't easy to intentionally trigger the current situation 19:41:15 ya so fixing the service is a higher priority than fixing our acme.sh integration 19:41:25 our own evidence would indicate this is rare 19:41:28 root cause seems to have been an internal failure at LE at the initial renew attempt 19:41:33 neat 19:42:04 my suggestion is wait until ~7 days before expiry. Attempt reissue automatically and if that fails manually copy the file and restart containers 19:42:33 wfm 19:42:34 ack 19:42:51 #topic Open Discussion 19:43:05 I think fungi wanted to bring up matrix room and space creation on our homeserver 19:43:11 the starlingx community is interested in making use of our matrix homeserver 19:43:49 sounds like they may want as many as 20 channels (some general discussion channels, per-project channels, separate channels for gerritbot) 19:44:00 o/ 19:44:24 more along the lines of 12, but yes, the idea is to have rooms/channels per project team 19:44:29 er, "rooms" i guess they're termed in matrix parlance. and with the number of rooms they're thinking about, they also want to know if grouping those with a "space" makes sense 19:45:31 I'm not sure we're in a spot to say using a space makes sense since we've not done it before 19:45:41 but we can certainly put the rooms in a space and find out 19:45:47 also there was some question as to the creation workflow. for rooms (and spaces) on our opendev.org homeserver, that's only doable by our matrix admin account, so requests for new rooms would need to come to one of us 19:46:31 is there an API we can use later? 19:46:44 tonyb: yes I think so 19:46:55 okay. that's nice 19:47:31 yeah, I tried to click around, but I can only create a room on matrix.org 19:47:51 it is intentional to restrict what rooms can be created 19:48:05 er who can create rooms 19:48:16 anyway, it sounds like there's no objection to hosting multiple rooms and a space for the starlingx community if they decide that's what they want 19:48:27 but as mentioned we could potentially automate the actual creation after writing a tool to do it from reviewed inputs 19:49:13 https://element.io/blog/spaces-the-next-frontier/ also introduces subspaces 19:49:17 I don't have any objections from a service hosting standpoint. My only concern is as a user I get frustrated when projects have a bunch of channels nad they are all dead or say not my problem 19:49:32 Openstack is particularly bad about this for example 19:49:52 yeah, we can definitely provide them with feedback/recommendations that room proliferation can lead to user confusion 19:50:07 people in the StarlingX community are currently saying that they are uncomfortable throwing every topic into one channel on IRC 19:50:47 so for them it seems more appealing to have channels/rooms for more focused discussions, and we landed on grouping by project teams 19:50:56 I guess all we can do is explain the options that we can host/manage and let the community decide 19:50:59 yes i think a space would be appropriate 19:51:27 ildikov: sounds like you can let me know what the names are they want (now that i've figured out where to create those). with that many rooms they probably need to pick a common name prefix for clarity, like #starlingx-whatever or #stx-whatever 19:51:34 ildikov: yes I think those optimizations prioritize long term core developers over new contributors or people trying to debug a problem. Its a choice and one I personally dislike but the homeserver can handle X rooms just fine 19:51:48 we can also start out with just one so they can try it out, i suppose 19:51:56 whatever works better 19:52:20 clarkb: the plan is to have a general channel, where new people can bring up any topic, etc 19:52:23 ++ 19:52:36 fungi: the initial test room we made cleaned up just fine so Ithink we can go ahead and creat ethem all 19:52:45 if we need to they can be cleaned up later 19:52:59 we've tried the one channel option through IRC, but people didn't really get into it 19:53:12 so the homeserver will be OpenDev.org 19:53:15 @fungi I'll get that info to you 19:53:22 tonyb: +1 19:53:27 and There will be a stx space 19:53:33 with several rooms? 19:53:57 yes 19:54:02 where several is maybe a dozen 19:54:08 yup 19:54:09 tonyb: that was my take away. And fungi's idea to use a consistent room prefix in #opendev like starlingx or stx is also a good idea imo 19:54:26 I would call the space StarlingX, and yes, several rooms, I think there are currently 11 project teams, I'll double check with the team leads who objects to having a room 19:54:42 mainly, i think "spaces" don't operate like room namespaces, so you still end up needing to namespace the rooms themselves for clarity 19:54:46 yeah, depending how the space and room names interact 19:54:54 clarkb: +1, that's how I wrote up the proposal to the community 19:55:04 cool beans 19:55:16 spaces are just groups of rooms, no namespacing 19:55:22 okay 19:55:45 (also, users can make their own spaces, server-hosted spaces are basically just suggestions of how users can discover rooms -- which is appropriate here) 19:57:10 room logging and gerrit bot announcements are available; the only service we're missing in matrix is meetbot 19:57:33 corvus: that's great info, thank you for sharing 19:57:53 yeah, i dug up the links to the right files for the bots and pasted them in #opendev earlier but i'll try to collect all this into a rudimentary document 19:59:40 fungi: +1, I'm happy to help 19:59:42 sound slike a plan 19:59:49 and we are at time. Thank you everyone! 19:59:54 thanks clarkb! 20:00:00 thanks clarkb 20:00:02 reminder I can't host a meeting next week 20:00:03 thanks all 20:00:08 but happy for osmeone else to chair without me 20:00:12 #endmeeting