19:00:32 <clarkb> #startmeeting infra
19:00:32 <opendevmeet> Meeting started Tue Oct 31 19:00:32 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:32 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:32 <opendevmeet> The meeting name has been set to 'infra'
19:00:39 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/IILSVTDAEDTRTCRSZZ3P2UKY4CIOKUEY/ Our Agenda
19:00:53 <clarkb> #topic Announcements
19:01:13 <clarkb> I announced the gerrit 3.8 upgrade for November 17 and fungi announced a mm3 upgrade for Thursday
19:01:49 <fungi> short notice, i mainly just didn't want ui changes and logouts to surprise anyone
19:02:24 <clarkb> Also worth noting that November 23 is a big US holiday which may mean people have varying numbers of days off around then
19:02:41 <clarkb> other than that I didn't have any announcements
19:02:50 <clarkb> #topic Mailman 3
19:02:58 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/899300 Upgrade to latest mm3 suite of tools
19:03:08 <clarkb> this is the change that corresponds to the mailman3 upgrade fungi announced
19:03:11 <fungi> as of moments ago the remaining cleanup changes merged
19:03:19 <clarkb> there were two other cleanup changes but ya they have both merged now
19:03:41 <fungi> so, yes, and yes, 899300 is the upgrade for thursday
19:04:03 <clarkb> outside of those changes we still need to snapshot and cleanup the old server at some point
19:04:15 <clarkb> and I've left the question of whether or not we add MX records on the agenda
19:04:31 <clarkb> I'm somewhat inclined to leave things alone in DNS since this seems to be working and is simpler to manage
19:04:51 <tonyb> yeah they shouldn't be needed.
19:05:39 <fungi> down the road we can revisit things like spf and dkim signing if they become necessary, but i'd rather avoid them as long as we can get away with
19:05:57 <tonyb> ++
19:06:02 <clarkb> fungi: both google and yahoo have been making statements about requiring that stuff early next year...
19:06:11 <clarkb> but ya lets worry about it when we get more concrete details
19:06:24 <clarkb> meanwhile the vast majority of spam I receive comes from gmail addresses
19:06:36 <clarkb> Anything else mailing list related?
19:07:00 <fungi> nope
19:07:07 <clarkb> #topic Server Upgrades
19:07:23 <clarkb> tonyb started looking at jammy mirror testing
19:07:24 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/899710
19:07:44 <tonyb> it's currently failing due to a change in behaviour in curl
19:07:45 <clarkb> this appears to be failing but I think for a test framwork reason not necessarily because the jammy deployment is failing. I don't fully understand the failure though
19:07:54 <clarkb> oh is it curl that changed? fun
19:07:58 <tonyb> yeah
19:08:24 <tonyb> a command line that works on focal fails on newer curls
19:08:43 <tonyb> I'm looking at what the right fix is.
19:08:54 <clarkb> tonyb: I think what the goal is there is to set SNI stuff properly so that we get the correct responses?
19:09:06 <tonyb> I'll propose a testing fix first and then add the jammy server
19:09:07 <clarkb> otherwise using localhost we don't align with the apache vhost matching
19:09:45 <fungi> i suppose an alternative could be to fiddle with /etc/hosts on the node, but that feels... dirty
19:10:12 <tonyb> yeah. that makes sense.  we could probably do something in python itself
19:10:31 <tonyb> it should be a quick fix once I get back to my laptop
19:10:40 <clarkb> looking forward to it
19:11:10 <clarkb> I am not aware of any other updates but wanted to mention we are also in a good period of time to look at meetpad server replacements since the PTG just ended
19:11:22 <tonyb> at some point we should decide if we need testing for 3 Ubuntu releases
19:11:44 <tonyb> meetpad is next
19:11:54 <clarkb> tonyb: in general I think we're trying to align with all the things we're deploying. As we replace servers and reduce the total list of mirrors we can reduce the ubuntu flavors
19:12:11 <clarkb> tonyb: the main concern there is apache version differences (which haven't been a problem in more recent years) and openafs functionality
19:12:17 <tonyb> okay that's pretty much what I thought
19:12:40 <fungi> yeah, basically we want to test that the changes we make to stuff continues to work on the platforms we're currently running on, and then once we're not running on those platforms we can stop testing them
19:12:56 <tonyb> ++
19:13:03 <fungi> on a service-by-service basis
19:13:18 <clarkb> once upon a time we tried to be more generally compatible with people doing similar to us outside of our env but realized it was too much effort and should focus on our set of things
19:13:30 <fungi> so basically, if we upgrade the meetpad servers to jammy, we can then switch to only testing that meetpad deployments work on jammy
19:14:19 <clarkb> #topic Python Container Updates
19:14:27 <clarkb> #link https://review.opendev.org/q/(+topic:bookworm-python3.11+OR+hashtag:bookworm+)status:open
19:14:38 <clarkb> this is very close to the finish line (as much as there is one)
19:14:52 <clarkb> python 3.9 is gone and the current TODOs are to update zuul-operator and OSC to python3.11
19:15:10 <clarkb> OSC should merge its change soon I expect as openstack is voting on python3.11 jobs now which makes switching the image to python3.11 safe
19:15:27 <clarkb> on the zuul-operator side of things the CI jobs there are all unhappy and I'm not quite sure the scope of the necessary fixes yet
19:15:44 <clarkb> I was hoping zuul-operator users would get it sorted soon enough but I may need to help out
19:15:51 <clarkb> once that is done we can drop python3.10 image builds
19:15:58 <tonyb> yay
19:16:29 <clarkb> I've also got a change up to add python 3.12 images but that is failing because uwsgi doesn't support python3.12 yet.
19:16:43 <clarkb> I think we can wait for them to make a release that works (there is upstream effort to support newer python but not yet in a release)
19:16:44 <tonyb> a quick tangent, I think it'd be good to remove old images/tags from the public registry
19:17:04 <tonyb> leaving buster based 3.7 images feels dangerous?
19:17:28 <clarkb> maybe? openshift recently broke zuul's openshift functional testing because they deleted old images
19:17:39 <fungi> sounds like a refcounting challenge
19:17:47 <tonyb> I could generate a list of things we could tag as deprecated and pull later
19:17:54 <clarkb> there is definitely a tradeoff. I think if someone is using an image for testing its fine, but you're right you wouldn't want it in production
19:18:07 <clarkb> maybe retag as foo-deprecated
19:18:17 <tonyb> fair enough.
19:18:19 <fungi> foo-dangerous
19:18:21 <clarkb> then people have an out but it also makes it more apparent if something should not be used
19:18:31 <tonyb> yeah that's sort of what I was thinking
19:18:32 <clarkb> I think that would be my preference over proper deletion
19:18:42 <fungi> foo-if-it-breaks-you-get-to-keep-the-pieces
19:19:09 <tonyb> that's all I had
19:19:18 <clarkb> #topic Gitea 1.21
19:19:25 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/897679
19:19:58 <clarkb> still no proper release and no changelog
19:20:39 <clarkb> I have tried to keep up with their updates though and it generally works for us other than the ssh key size check thing that I disabled in that change
19:20:56 <clarkb> I've left this on the agenda under the assumption we'll have to make decissions soon but upstream hasn't made that the case yet
19:21:08 <clarkb> #topic Gerrit 3.8 Upgrade
19:21:17 <clarkb> This is one with a bit more detail
19:21:19 <clarkb> #link https://etherpad.opendev.org/p/gerrit-upgrade-3.8
19:21:24 <clarkb> #link https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/XT26HFG2FOZL3UHZVLXCCANDZ3TJZM7Q/
19:21:37 <clarkb> I have announced the plan to upgrade Gerrit to 3.8 on November 17 at 15:30-16:30 UTC
19:21:54 <clarkb> I've tested the downgrade path on a held CI node and then re upgraded it for the experience of it
19:22:32 <clarkb> Yesterday we merged a config update necessary for 3.8 that we'll want to have in place under 3.7 to ensure it is working there as well. My plan is to restart Gerrit later today
19:22:55 <clarkb> this config update shouldn't result in any behavioral differences. It is entirely about maintaining compatibility of acceptable config in gerrit 3.8
19:23:30 <clarkb> fungi: if I want to do that restart around say 22:00 UTC is that a bad time for you?
19:23:46 <clarkb> fungi: maybe better to ask if ther eis a good time for you later today?
19:24:28 <fungi> sure, i can help at 22:00 utc
19:24:52 <clarkb> cool I think that time should work for me
19:24:56 <fungi> great
19:25:36 <clarkb> the last thing I've noticed is a traceback starting up the plugin manager plugin. Upstream thought they had already fixed it which made me concerned this waws a problem with our builds but on closer inspection it seems to be a different problem (tracebacks differ)
19:25:46 <clarkb> also we hit it on 3.7 too so shouldn't impact 3.8
19:25:57 <fungi> so basically not a regression
19:26:07 <clarkb> more of a thing to be aware of as an expected startup tracebacks that looks scary but is believed to be fine
19:26:07 <fungi> just some continued broken for a feature we're not using
19:26:11 <clarkb> yup
19:26:51 <clarkb> and that was all I had. The etherpad has pointers to the held node if anyone wants to take a look at it
19:27:03 <clarkb> #topic Etherpad 1.9.4
19:27:26 <fungi> in progress
19:27:34 <clarkb> in the time it took us to be ready to upgrade to 1.9.3 they released 1.9.4. Fun fact: 1.9.4 fixes the mysql isn't utf8mb4 encoded bug I filed with them years ago
19:27:35 <fungi> i need to finish diffing the upstream container configs
19:27:53 <clarkb> we worked around that by manually setting the encoding on the db but before that etherpad hard crashed because it couldn't log in this instance
19:28:37 <tonyb> was that the poo emoji crash from like Vancouver?
19:28:44 <fungi> snowman
19:28:53 <fungi> but yes
19:28:56 <tonyb> okay
19:28:58 <clarkb> tonyb: no, this was on the db level not the table level. They had fixed the table level thing prior
19:29:02 <clarkb> its all related though.
19:29:19 <clarkb> In this case they wanted to log "warning this is probably a problem" but their loggign was broken so the whole thing crashed
19:29:28 <clarkb> rather than bad bytes causing the crash later
19:29:36 <tonyb> lol
19:29:37 <fungi> ah, right, that problem
19:30:00 <clarkb> fixing the db level encoding meant it never tried to log and things proceeded :)
19:30:11 <fungi> also related, update to log4js which invalidates some of the config we're carrying, preventing the service from starting, which is why i need to more deeply diff the configs
19:30:11 <clarkb> fungi: I guess once we have an updated change we'll hold a node and do another round of testing
19:30:40 <fungi> correct
19:30:41 <tonyb> sounds good.
19:31:05 <clarkb> #topic Open Discussion
19:31:10 <clarkb> that was it for the emailed agenda
19:31:29 <clarkb> worth noting we just updated nodepool to exclude openstacksdk 2.0.0 as it isn't compatible with rax cinder v2 apis
19:31:44 <clarkb> a fix is in progress in openstacksdk which frickler and I mentioned we could help test
19:32:08 <clarkb> this effectively took rax offline in nodepool for a few days. It also causes nodepool to not mark nodes as node failures when a cloud is failing like that
19:32:31 <clarkb> I kinda want to make nodepool fail the request in that cloud when the cloud is throwing errors rather than try forever
19:33:10 <tonyb> that seems like it'd be move visible
19:33:27 <fungi> the openstack vmt would like a private room on the opendev matrix homeserver to use instead of its current restricted irc channel, since some members are joining from the oftc matrix bridge which doesn't handle nickserv identification very well. i doubt there will be any objections, but... objections? otherwise i'll work on adding it
19:33:29 <frickler> it failed very early, kind of a similar scenario to the expired cert issue, which also could use better handling
19:33:56 <clarkb> frickler: ya I think they have the same underlying failure method for request handling internally with nodepool which is to basically move on and then the request is never completed
19:34:43 <clarkb> fungi: no objections from me. Worth noting private and encrypted are distinct in matrix so you'll have to decide on those two things separately iirc
19:34:55 <fungi> yeah, it'll be both in this case
19:35:00 <clarkb> private is basically invite only and then encrypted is whether or not everyone is doing e2e amongst themselves
19:35:31 <frickler> afaict even then some things are not encrypted like emojis
19:35:54 <frickler> but not worse than IRC likely, so no objection either
19:36:20 <tonyb> thanks fungi
19:36:33 <fungi> the vmt uses its private communication channel only for coordinating things which can't be mentioned in public (and even then it's just things like "i triaged this private bug, please take a look: <url>" so emojis rarely come into it ;)
19:37:39 <clarkb> re Holidays the 10th is also a holiday here and I'm taking advantage for a long weekend. I won't be around on the 10th and 13th
19:38:01 <fungi> i'll try to be around
19:38:07 <frickler> one question regarding the branch deletion I did for kayobe earlier: the github mirror should sync this on the next merged change, right?
19:38:21 <fungi> correct
19:38:42 <fungi> that job only gets triggered by changes merging, so addition/deletion of branches or pushing tags doesn't replicate immediately
19:39:16 <frickler> ok, so we'll wait for that to happen and then can check again
19:39:39 <fungi> it could probably be added to additional pipelines, if that becomes a bigger problem
19:40:22 <frickler> I don't think it is urgent in this case, just wanted to cross that check off my list
19:41:31 <clarkb> last call for anything else. Otherwise we can all have a few minutes back for $meal or sleep
19:43:12 <clarkb> thank you for your time and help everyone! We'll be back here same time and place next week.
19:43:17 <clarkb> #endmeeting