Tuesday, 2023-12-05

clarkbMeeting time18:59
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Dec  5 19:00:08 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JBOF56GLO37SHVBZEXWSP4N3CIRIDQGG/ Our Agenda19:00
clarkb#topic Announcements19:01
clarkbI mentioned this previously but I won't be able to make our meeting time next week19:01
clarkbI've put on the agenda a proposal to go ahead and skip the meeting on these days due to holidays: December 12, 26 and January 219:01
fungisgtm19:01
fungithanks!19:01
* tonyb will be traveling back to AU this weekend so basically absent19:02
clarkbThat means our last meeting of the year would be December 19 and we'll be back January 919:02
clarkbtonyb: safe travels!19:02
tonyb"cheers mate!"19:03
clarkb#topic Server Updates19:03
clarkbJust this morning we put three new mirror nodes into use19:04
fungithanks tonyb!19:04
tonybhttps://review.opendev.org/q/topic:mirror-distro-updates+is:open finishes off the process IIUC19:04
tonybwell the CI side, I need to remove the old servers19:05
clarkbwith a manual step of removing the server and volumes19:05
clarkbya that19:05
tonyb++19:05
clarkbI should be able to review those soon. I'll shout if I see anything amiss19:05
tonybStarting on JVB+meetpad later today19:05
tonybthanks clarkb 19:06
clarkbsounds good. For meetpad I think we can do two appraoches. One is to replace the jvb then the main meetpad server in sequence and have them sync up with each other at each step of the way. The other would be to deploy a new meetpad and a new jvb independent of the old stuff and then swap over dns. I'm not sure if the existing ansible is built for the second option19:07
clarkbit may just try to automatically make everything work together based on ansible groups19:07
clarkbanything else server related?19:08
tonybGood to know.  I'll poke around19:08
tonybnot from me19:08
clarkb#topic DIB Bionic Support19:08
clarkbDIB's python3.6 unittesting is broken19:08
clarkb#link https://review.opendev.org/c/openstack/diskimage-builder/+/901093 proposal to drop py36 testing19:08
clarkbIn this change ianw suggested we (opendev) pause/stop building bionic nodes and DIB would then actively remove py36 support entirely19:09
ianwo/19:09
clarkbI have no problem with removing py36 unitesting in dib but am a bit more cautious around the assumption we don't need to build bionic nodes. That may be necessary due to changes in clouds for example19:10
clarkbMy proposal would be that dib remove py36 testing as necessary toget its CI stable again, but not actively take steps to break py36 yet19:10
clarkbthe code base is fairly stable so I don't expect big changes that would cause that to happen unless dib explicitly set a python requires or similar19:10
clarkbianw: feel free to jump in with other info, background or thoughts19:11
ianwyeah, so i don't think anyone has anything 3.6 breaking as you say19:11
ianwbut opendev might be in a position of wanting a bionic build from a nodepool host with a dib that can't do that (theoretically)19:11
fricklerso we'd just need to stop building new bionic images, we can still use the existing ones, rught?19:12
ianwmy thought was to just avoid that possibility up front -- but it's only a possibility19:12
fungii agree with not dropping bionic image builds while we're still relying on them. we've gotten ourselves into situations in the past where we lost our local copy of an image we were no longer able to build and so couldn't reupload it to providers19:12
JayFYou could still use older versions of DIB to build bionic images, correct?19:12
clarkbfrickler: we could manually manage bionic images copying from one cloud to another yes as a possibility19:12
ianwsomeone could, i'm sure, manually build with old versions of dib or manually move images.  this would suck for that person :)19:13
clarkbJayF: not easily that would potentially require a dedicated builder just for bionic19:13
JayFack19:13
clarkbbasically my point is bionic buidls work today19:13
clarkbwe can drop testing without changing that state and then we don't need to pause builds19:13
clarkbif bionic builds do break in say a yaer or two hopefully we don't need bionic anymore and if we do we can decide if we stop building them at that point or fix dib19:14
clarkbrather than an upfront shift to explicitly say this isn't supported and intentional steps to make it not work19:14
clarkbI do think the suggestion for a tag nowish is a good idea as a fallback point if that becomes necessary19:15
ianwso no intentional steps to make it not work -- maybe i mentioned that but i don't think we've ever done that19:15
fungiat the moment, openstack stable/ussuri and older branches rely on bionic jobs19:15
corvusyeah, i think as long as we have bionic images, we need to keep the builder going.  if it breaks, then at that point we should consider that we no longer have bionic images.19:15
clarkbianw: I may have also misparsed19:15
ianwit sounds like you're ok with the uncertainty window that i was thinking should be closed19:15
ianwwhich is fine too19:16
corvuskeeping an image around without the ability to build a new version is not feasible in the long run -- as soon as we can't build an image, that's when the clock starts on either fixing it or removing it.19:16
clarkbalso I've just checked a recent dib change and py36 passed19:16
clarkbso maybe this is a non issue?19:16
fungiunder openstack's old extended maintenance model it would normally have forced stable/ussuri to eol around mid-202519:16
clarkbhttps://review.opendev.org/c/openstack/diskimage-builder/+/902497 this change19:16
fungiwith the new unmaintained model, maybe we don't have to worry abotu it for quite that long19:17
corvusto build on ianw's nomenclature, i think the uncertainty window is "when does a bionic build finally accidentally break"19:17
ianwhuh, maybe stestr had a similar discussion and restored 3.6 support?19:18
clarkbianw: that would be my hunch19:18
ianwanyway, i think the whole thing is on borrowed time until we have this discussion again :)19:18
ianwso basically, it seems like we're ok with dib not actively breaking 3.6, but also not taking efforts to do 3.6 testing when it fails19:19
clarkbyes I think we should have a general plan. I'll prpose this: Keep py36 testing as long as it is passing without a ton of effort and opendev still needs bionic images. If testing starts to fail and there isn't an easy fix drop the testing but don't explicitly break old pythons. Tag at this point as fallback. if at some point int he future bionic buidls break opendev can either19:19
clarkbfix it in dib with less automated testing or accelerate bionic removal19:19
corvusclarkb: ++19:20
tonybI think 3.6 is still unsupported: https://github.com/mtreinish/stestr/commit/cc66736df90968126585540b220fea663b124bbf19:20
clarkbadditionally I think it would be trivial to cap stestr for py36 testing19:20
fungienvironment marker, i guess19:20
clarkbnow that I've had more time to digest what initiated this change I think I'm actually -2 even if stestr is broken19:20
clarkbwe just install older stestr for py36 and move on19:20
clarkbfungi: yes exactly19:21
clarkbanyway that is beside teh point considering py36 testing does work19:21
ianwwell i would say removing 3.6 in dib has been a topic of interest for a long time, so it's good to have a discussion about it19:21
ianwanyway, i think we've sorted out the major points so can follow-up in the changes19:22
clarkbcool. As a side note I think rhel 8's main python is 3.6 as well so not bionic specific?19:22
clarkbbut rhel 8 has packaging for newer pythons so less impactful19:23
clarkb#topic Python Container Updates19:23
clarkb#link https://review.opendev.org/c/opendev/system-config/+/898756 And parent add python3.12 images19:23
clarkbNot aware of any major movement here, but a reminder that I thinki we can go ahead and approve ^19:23
clarkb#topic Gitea 1.2119:24
tonybYeah no progress, the time I wanted to spend on this got redirected into a downstream project19:24
clarkbLots of movement on the gitea side of things though its a bit one step forard two back19:24
clarkbthe tl;dr is that we added a new larger key to gitea as the first step to rotating the old key out19:25
clarkbthe next step is adding the key to gerrit and having it use the new larger key. Unfortunately when we attempted that on firday none of it worked19:25
clarkbAnd the problem was entirely pebkac19:25
fungile sigh19:25
clarkbI used review02 host side file paths to ssh keys and not gerrit container filesystem file paths to keys19:26
clarkbso our .ssh/config wasn't working beacuse the file "didn't exist"19:26
clarkb#link https://review.opendev.org/c/opendev/system-config/+/902490 Configure Gerrit to use new SSH key19:26
clarkbI've pushed a new change up which corrects that and has been tested on some held nodes19:26
fungii'm going to blame it being a long week and we weren't as observant as we might have been were it not the end of the day on a friday19:26
clarkbI don't think I've got time to do a gerrit restart today given how much we've already gotten wrong around that. I want to ensure I can focus on it19:27
clarkbbut maybe tomorrow?19:27
tonybSo that ssh config works as expected on the test nodes?19:27
clarkbtonyb: yes19:27
tonybOh cool19:27
clarkbtonyb: the key was using the IdentifyFile path using the container's filesystem paths and not the hosts19:27
fungitomorrow will work for me19:27
tonybtomorrow is good for me.19:28
clarkbexcellent we can sync back up again tomorrow and figure out timing19:28
tonyb++19:29
clarkb#topic Upgrading Zuul's DB Server19:29
clarkb#link https://etherpad.opendev.org/p/opendev-zuul-mysql-upgrade info gathering document19:30
clarkbI didn't end up having much time to look into this, but I did briefly pull up postgres' HA documentation (note postgres clustering is completely different so you want to use HA for search terms)19:30
clarkbThe impression it gave me was that there are a lot of options and your use case is important to picking the right one and none are super drop in out of the box easy19:31
clarkbSo basically if we go down the postgres path we'll need to do a bit of investigating19:31
clarkbHas anyone else done any research to add to the pile?19:32
clarkbI'm guessing that is a no. I'm hopeful that I'll have time to look into this more around the holidays as research tends to be a bit more time flexible for me19:34
clarkb#topic Annual Report Season19:34
clarkb#link https://etherpad.opendev.org/p/2023-opendev-annual-report OpenDev draft report19:34
clarkbThere is a draft in this etherpad now19:34
tonybI did a small amount on Mysql 5.X to 8.x DB upgrades.19:35
clarkbhrm can I undo twice?19:35
clarkb#undo19:35
opendevmeetRemoving item from minutes: #link https://etherpad.opendev.org/p/2023-opendev-annual-report19:35
clarkb#undo19:35
opendevmeetRemoving item from minutes: #topic Annual Report Season19:35
clarkbI can !19:35
tonybAs expected it's basically a dump+restore but going through 6 and 7 before 8 so really 3 dump+restores19:35
clarkbtonyb: anything jump out to you as important?19:35
corvusoh if we just want to upgrade i think we would just dump and restore19:36
corvusonce19:36
tonybI picked 8 as it was the latest in trove19:36
tonybcorvus: That's not waht the docs say, but they could be overly cautious19:36
corvusi mean, it's sql19:36
corvusi'm talking mysqldump + mysql19:37
fungiyeah, it would mean an outage, but hopefully not too long of one19:37
tonybYeah me too19:37
corvusi've done that exact upgrade something like 20 times over the past month.  it's fine.  :)19:37
fungialso we could time a dry run to find out approximately how long of an outage we're talking about19:37
tonybcorvus: Okay19:37
clarkbcould be a generic suggestion for specific corner cases19:37
clarkbbut in this case it sounds like we know this db can make the transition just fine19:38
tonybOkay.  I was worried it would triple the outage window but it sounds like not19:38
corvusfungi: it's basically about the same amount of time our recent schema change took19:39
funginoted19:39
corvusbecause our schema change was "rewrite the two biggest tables"19:39
corvus(hopefully the actual cpu time, not the "figure out what went wrong" time)19:40
fungii guess the workload for updates and inserts are comparable in that case19:40
corvuswell, i'm trying to say they're both inserts :)19:40
fungiah19:41
corvuswhich is why they're equivalent in this specific case19:41
fungimakes sense, yep19:41
clarkb#topic Annual Report Season19:41
clarkb#link https://etherpad.opendev.org/p/2023-opendev-annual-report OpenDev draft report19:41
clarkbas I mentioned prior to the undos there is a draft here now. I'm considering this draft fairly complete but please do make suggestions or point out missing content if you notice anything like that. I've also got placeholders for stats data that fungi will eb collecting19:42
corvusi read that yestrday and it looks good to me!  thanks!19:42
clarkbThis document goes into the foundation's annual report and is a good way to us to show off to people that may not be in the know typically about what we do19:43
clarkbso definitely call out anything you feel is important and missing19:43
clarkb#topic Mailing lists and Gmail requirements19:44
fungithis is something frickler spotted and brought up in irc earlier today19:44
clarkbThis is a new one. Apparently gmail is throttling list.openstack.org beacuse it lacks spf and dkim. Adding one or the other seems to be sufficient according to the message they respond with to stop getting rate limited19:44
clarkb#link https://review.opendev.org/q/topic:%22argh-gmail%22 Add an SPF record for the listserv19:44
fungialso according to the article i linked in the commit message, it will go from throttling to rejecting in february19:45
clarkbI have no concerns or objections with the proposed dns record update19:45
clarkbAs written it should largely be a noop and it would be good to confirm it makes google happy19:45
fungias little love as i harbor for gmail, i think it will mean less work for our mailman server if it doesn't have to hold all those deferrals, and less work for list owners/moderators dealing with gmail users getting their subscriptions put on hold19:46
* frickler likes the gerrit topic19:46
corvuslgtm19:46
fungithe two changes in that topic cover the list sites we do dns in opendev for19:46
fungithe other 5 domains i'll make similar edits to in their respective hosts (rackspace and cloudflare)19:46
fungisince i'm not hearing objections, i'll un-wip them now19:47
tonybSeems good to me19:47
fungiand again, thanks to frickler for spotting this and bringing it up19:48
clarkbI'm trying to think how we can test it directly after the dns updates are made but my gmail account isn't subscribed to anything aynmore because years ago I got auto unsubbed for some reason that had to do with gmail19:48
clarkbbut it is probably sufficient to make the changes and monitor the exim logs19:49
fungiyes, that's my plan19:49
fungisee if, after a while, it stops logging deferrals19:49
clarkbsounds good19:50
clarkb#topic Open Discussion19:50
clarkbI've got one more "its should be straightfoward review stack because it doesn't actually touch production": https://review.opendev.org/c/opendev/system-config/+/901469/ and parent add gerrit 3.9 images and testing19:50
fungioh, i'm going to be travelling a big chunk of thursday and friday, so am not expecting to be around much19:51
fungishould be back on for a bit in my friday afternoon. may also be able to check in from the hotel thursday night my time19:51
clarkbtonyb: I also left the python testing with newer python in containers or else item under open discussion as a reminder to ask if you were able to look at the existing job roles for that yet19:51
clarkbfungi: thanks for the heads up19:51
tonybclarkb: I have not, Later today for sure!19:52
clarkbI'll give it a few more minutes for any other topics but its looking like we might be able to end early and go find a meal19:53
clarkbthank you for your time and effort everyone!19:56
fungithanks clarkb!19:56
clarkb#endmeeting19:56
opendevmeetMeeting ended Tue Dec  5 19:56:14 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:56
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-12-05-19.00.html19:56
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-12-05-19.00.txt19:56
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-12-05-19.00.log.html19:56
tonybThanks all!19:56

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!