19:00:08 #startmeeting infra 19:00:08 Meeting started Tue Dec 5 19:00:08 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:08 The meeting name has been set to 'infra' 19:00:10 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JBOF56GLO37SHVBZEXWSP4N3CIRIDQGG/ Our Agenda 19:01:01 #topic Announcements 19:01:16 I mentioned this previously but I won't be able to make our meeting time next week 19:01:42 I've put on the agenda a proposal to go ahead and skip the meeting on these days due to holidays: December 12, 26 and January 2 19:01:55 sgtm 19:01:59 thanks! 19:02:38 * tonyb will be traveling back to AU this weekend so basically absent 19:02:41 That means our last meeting of the year would be December 19 and we'll be back January 9 19:02:55 tonyb: safe travels! 19:03:15 "cheers mate!" 19:03:51 #topic Server Updates 19:04:00 Just this morning we put three new mirror nodes into use 19:04:23 thanks tonyb! 19:04:42 https://review.opendev.org/q/topic:mirror-distro-updates+is:open finishes off the process IIUC 19:05:01 well the CI side, I need to remove the old servers 19:05:08 with a manual step of removing the server and volumes 19:05:10 ya that 19:05:17 ++ 19:05:38 I should be able to review those soon. I'll shout if I see anything amiss 19:05:53 Starting on JVB+meetpad later today 19:06:16 thanks clarkb 19:07:27 sounds good. For meetpad I think we can do two appraoches. One is to replace the jvb then the main meetpad server in sequence and have them sync up with each other at each step of the way. The other would be to deploy a new meetpad and a new jvb independent of the old stuff and then swap over dns. I'm not sure if the existing ansible is built for the second option 19:07:36 it may just try to automatically make everything work together based on ansible groups 19:08:13 anything else server related? 19:08:16 Good to know. I'll poke around 19:08:20 not from me 19:08:40 #topic DIB Bionic Support 19:08:53 DIB's python3.6 unittesting is broken 19:08:55 #link https://review.opendev.org/c/openstack/diskimage-builder/+/901093 proposal to drop py36 testing 19:09:17 In this change ianw suggested we (opendev) pause/stop building bionic nodes and DIB would then actively remove py36 support entirely 19:09:23 o/ 19:10:03 I have no problem with removing py36 unitesting in dib but am a bit more cautious around the assumption we don't need to build bionic nodes. That may be necessary due to changes in clouds for example 19:10:25 My proposal would be that dib remove py36 testing as necessary toget its CI stable again, but not actively take steps to break py36 yet 19:10:45 the code base is fairly stable so I don't expect big changes that would cause that to happen unless dib explicitly set a python requires or similar 19:11:12 ianw: feel free to jump in with other info, background or thoughts 19:11:21 yeah, so i don't think anyone has anything 3.6 breaking as you say 19:11:50 but opendev might be in a position of wanting a bionic build from a nodepool host with a dib that can't do that (theoretically) 19:12:04 so we'd just need to stop building new bionic images, we can still use the existing ones, rught? 19:12:15 my thought was to just avoid that possibility up front -- but it's only a possibility 19:12:23 i agree with not dropping bionic image builds while we're still relying on them. we've gotten ourselves into situations in the past where we lost our local copy of an image we were no longer able to build and so couldn't reupload it to providers 19:12:24 You could still use older versions of DIB to build bionic images, correct? 19:12:48 frickler: we could manually manage bionic images copying from one cloud to another yes as a possibility 19:13:12 someone could, i'm sure, manually build with old versions of dib or manually move images. this would suck for that person :) 19:13:13 JayF: not easily that would potentially require a dedicated builder just for bionic 19:13:25 ack 19:13:27 basically my point is bionic buidls work today 19:13:43 we can drop testing without changing that state and then we don't need to pause builds 19:14:03 if bionic builds do break in say a yaer or two hopefully we don't need bionic anymore and if we do we can decide if we stop building them at that point or fix dib 19:14:42 rather than an upfront shift to explicitly say this isn't supported and intentional steps to make it not work 19:15:13 I do think the suggestion for a tag nowish is a good idea as a fallback point if that becomes necessary 19:15:20 so no intentional steps to make it not work -- maybe i mentioned that but i don't think we've ever done that 19:15:23 at the moment, openstack stable/ussuri and older branches rely on bionic jobs 19:15:30 yeah, i think as long as we have bionic images, we need to keep the builder going. if it breaks, then at that point we should consider that we no longer have bionic images. 19:15:34 ianw: I may have also misparsed 19:15:58 it sounds like you're ok with the uncertainty window that i was thinking should be closed 19:16:08 which is fine too 19:16:12 keeping an image around without the ability to build a new version is not feasible in the long run -- as soon as we can't build an image, that's when the clock starts on either fixing it or removing it. 19:16:25 also I've just checked a recent dib change and py36 passed 19:16:42 so maybe this is a non issue? 19:16:47 under openstack's old extended maintenance model it would normally have forced stable/ussuri to eol around mid-2025 19:16:52 https://review.opendev.org/c/openstack/diskimage-builder/+/902497 this change 19:17:02 with the new unmaintained model, maybe we don't have to worry abotu it for quite that long 19:17:05 to build on ianw's nomenclature, i think the uncertainty window is "when does a bionic build finally accidentally break" 19:18:11 huh, maybe stestr had a similar discussion and restored 3.6 support? 19:18:20 ianw: that would be my hunch 19:18:31 anyway, i think the whole thing is on borrowed time until we have this discussion again :) 19:19:35 so basically, it seems like we're ok with dib not actively breaking 3.6, but also not taking efforts to do 3.6 testing when it fails 19:19:42 yes I think we should have a general plan. I'll prpose this: Keep py36 testing as long as it is passing without a ton of effort and opendev still needs bionic images. If testing starts to fail and there isn't an easy fix drop the testing but don't explicitly break old pythons. Tag at this point as fallback. if at some point int he future bionic buidls break opendev can either 19:19:43 fix it in dib with less automated testing or accelerate bionic removal 19:20:04 clarkb: ++ 19:20:13 I think 3.6 is still unsupported: https://github.com/mtreinish/stestr/commit/cc66736df90968126585540b220fea663b124bbf 19:20:18 additionally I think it would be trivial to cap stestr for py36 testing 19:20:33 environment marker, i guess 19:20:42 now that I've had more time to digest what initiated this change I think I'm actually -2 even if stestr is broken 19:20:52 we just install older stestr for py36 and move on 19:21:02 fungi: yes exactly 19:21:46 anyway that is beside teh point considering py36 testing does work 19:21:46 well i would say removing 3.6 in dib has been a topic of interest for a long time, so it's good to have a discussion about it 19:22:43 anyway, i think we've sorted out the major points so can follow-up in the changes 19:22:58 cool. As a side note I think rhel 8's main python is 3.6 as well so not bionic specific? 19:23:06 but rhel 8 has packaging for newer pythons so less impactful 19:23:16 #topic Python Container Updates 19:23:23 #link https://review.opendev.org/c/opendev/system-config/+/898756 And parent add python3.12 images 19:23:39 Not aware of any major movement here, but a reminder that I thinki we can go ahead and approve ^ 19:24:16 #topic Gitea 1.21 19:24:20 Yeah no progress, the time I wanted to spend on this got redirected into a downstream project 19:24:47 Lots of movement on the gitea side of things though its a bit one step forard two back 19:25:06 the tl;dr is that we added a new larger key to gitea as the first step to rotating the old key out 19:25:27 the next step is adding the key to gerrit and having it use the new larger key. Unfortunately when we attempted that on firday none of it worked 19:25:36 And the problem was entirely pebkac 19:25:42 le sigh 19:26:00 I used review02 host side file paths to ssh keys and not gerrit container filesystem file paths to keys 19:26:08 so our .ssh/config wasn't working beacuse the file "didn't exist" 19:26:12 #link https://review.opendev.org/c/opendev/system-config/+/902490 Configure Gerrit to use new SSH key 19:26:22 I've pushed a new change up which corrects that and has been tested on some held nodes 19:26:36 i'm going to blame it being a long week and we weren't as observant as we might have been were it not the end of the day on a friday 19:27:28 I don't think I've got time to do a gerrit restart today given how much we've already gotten wrong around that. I want to ensure I can focus on it 19:27:31 but maybe tomorrow? 19:27:32 So that ssh config works as expected on the test nodes? 19:27:36 tonyb: yes 19:27:43 Oh cool 19:27:55 tonyb: the key was using the IdentifyFile path using the container's filesystem paths and not the hosts 19:27:56 tomorrow will work for me 19:28:12 tomorrow is good for me. 19:28:36 excellent we can sync back up again tomorrow and figure out timing 19:29:04 ++ 19:29:48 #topic Upgrading Zuul's DB Server 19:30:03 #link https://etherpad.opendev.org/p/opendev-zuul-mysql-upgrade info gathering document 19:30:31 I didn't end up having much time to look into this, but I did briefly pull up postgres' HA documentation (note postgres clustering is completely different so you want to use HA for search terms) 19:31:02 The impression it gave me was that there are a lot of options and your use case is important to picking the right one and none are super drop in out of the box easy 19:31:14 So basically if we go down the postgres path we'll need to do a bit of investigating 19:32:16 Has anyone else done any research to add to the pile? 19:34:22 I'm guessing that is a no. I'm hopeful that I'll have time to look into this more around the holidays as research tends to be a bit more time flexible for me 19:34:34 #topic Annual Report Season 19:34:38 #link https://etherpad.opendev.org/p/2023-opendev-annual-report OpenDev draft report 19:34:43 There is a draft in this etherpad now 19:35:01 I did a small amount on Mysql 5.X to 8.x DB upgrades. 19:35:28 hrm can I undo twice? 19:35:30 #undo 19:35:30 Removing item from minutes: #link https://etherpad.opendev.org/p/2023-opendev-annual-report 19:35:32 #undo 19:35:32 Removing item from minutes: #topic Annual Report Season 19:35:34 I can ! 19:35:53 As expected it's basically a dump+restore but going through 6 and 7 before 8 so really 3 dump+restores 19:35:54 tonyb: anything jump out to you as important? 19:36:11 oh if we just want to upgrade i think we would just dump and restore 19:36:12 once 19:36:19 I picked 8 as it was the latest in trove 19:36:45 corvus: That's not waht the docs say, but they could be overly cautious 19:36:55 i mean, it's sql 19:37:05 i'm talking mysqldump + mysql 19:37:25 yeah, it would mean an outage, but hopefully not too long of one 19:37:31 Yeah me too 19:37:42 i've done that exact upgrade something like 20 times over the past month. it's fine. :) 19:37:43 also we could time a dry run to find out approximately how long of an outage we're talking about 19:37:51 corvus: Okay 19:37:58 could be a generic suggestion for specific corner cases 19:38:06 but in this case it sounds like we know this db can make the transition just fine 19:38:49 Okay. I was worried it would triple the outage window but it sounds like not 19:39:45 fungi: it's basically about the same amount of time our recent schema change took 19:39:51 noted 19:39:55 because our schema change was "rewrite the two biggest tables" 19:40:20 (hopefully the actual cpu time, not the "figure out what went wrong" time) 19:40:39 i guess the workload for updates and inserts are comparable in that case 19:40:56 well, i'm trying to say they're both inserts :) 19:41:02 ah 19:41:18 which is why they're equivalent in this specific case 19:41:24 makes sense, yep 19:41:32 #topic Annual Report Season 19:41:42 #link https://etherpad.opendev.org/p/2023-opendev-annual-report OpenDev draft report 19:42:20 as I mentioned prior to the undos there is a draft here now. I'm considering this draft fairly complete but please do make suggestions or point out missing content if you notice anything like that. I've also got placeholders for stats data that fungi will eb collecting 19:42:29 i read that yestrday and it looks good to me! thanks! 19:43:12 This document goes into the foundation's annual report and is a good way to us to show off to people that may not be in the know typically about what we do 19:43:23 so definitely call out anything you feel is important and missing 19:44:15 #topic Mailing lists and Gmail requirements 19:44:46 this is something frickler spotted and brought up in irc earlier today 19:44:46 This is a new one. Apparently gmail is throttling list.openstack.org beacuse it lacks spf and dkim. Adding one or the other seems to be sufficient according to the message they respond with to stop getting rate limited 19:44:55 #link https://review.opendev.org/q/topic:%22argh-gmail%22 Add an SPF record for the listserv 19:45:13 also according to the article i linked in the commit message, it will go from throttling to rejecting in february 19:45:21 I have no concerns or objections with the proposed dns record update 19:45:35 As written it should largely be a noop and it would be good to confirm it makes google happy 19:46:16 as little love as i harbor for gmail, i think it will mean less work for our mailman server if it doesn't have to hold all those deferrals, and less work for list owners/moderators dealing with gmail users getting their subscriptions put on hold 19:46:29 * frickler likes the gerrit topic 19:46:49 lgtm 19:46:49 the two changes in that topic cover the list sites we do dns in opendev for 19:46:57 the other 5 domains i'll make similar edits to in their respective hosts (rackspace and cloudflare) 19:47:14 since i'm not hearing objections, i'll un-wip them now 19:47:17 Seems good to me 19:48:31 and again, thanks to frickler for spotting this and bringing it up 19:48:50 I'm trying to think how we can test it directly after the dns updates are made but my gmail account isn't subscribed to anything aynmore because years ago I got auto unsubbed for some reason that had to do with gmail 19:49:00 but it is probably sufficient to make the changes and monitor the exim logs 19:49:18 yes, that's my plan 19:49:31 see if, after a while, it stops logging deferrals 19:50:03 sounds good 19:50:07 #topic Open Discussion 19:50:42 I've got one more "its should be straightfoward review stack because it doesn't actually touch production": https://review.opendev.org/c/opendev/system-config/+/901469/ and parent add gerrit 3.9 images and testing 19:51:08 oh, i'm going to be travelling a big chunk of thursday and friday, so am not expecting to be around much 19:51:18 should be back on for a bit in my friday afternoon. may also be able to check in from the hotel thursday night my time 19:51:30 tonyb: I also left the python testing with newer python in containers or else item under open discussion as a reminder to ask if you were able to look at the existing job roles for that yet 19:51:41 fungi: thanks for the heads up 19:52:27 clarkb: I have not, Later today for sure! 19:53:58 I'll give it a few more minutes for any other topics but its looking like we might be able to end early and go find a meal 19:56:08 thank you for your time and effort everyone! 19:56:12 thanks clarkb! 19:56:14 #endmeeting