#opendev-meeting log

19:01:06 <clarkb> #startmeeting infra
19:01:06 <opendevmeet> Meeting started Tue Jul 18 19:01:06 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:06 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:06 <opendevmeet> The meeting name has been set to 'infra'
19:01:23 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JVMGLDPDLQW5L3FFIKWILIJU5DJS77ES/ Our Agenda
19:01:30 <clarkb> #topic Announcements
19:02:25 <clarkb> A minor announcement. I'm not actually here today. The only reason that this works out is the super early relative to local time hour of the meeting. But the lowest tide of our trip occurs in ~5 hours so we're taking advantage of that for "tide pooling"
19:02:45 <clarkb> and it sa whole production to get the boat out before it gets stuck in the mud
19:02:56 <fungi> i can only imagine
19:03:03 <tonyb> Sounds like fun
19:03:42 <clarkb> Ya I think everyone is looking forward to it. But I dfeinitely won't be around after the meeting today
19:03:47 <clarkb> #topic Bastion Host Updates
19:03:55 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups
19:04:21 <clarkb> I think this one still deserves multiple core/root reviewers if we can manage it
19:04:28 <clarkb> fungi: frickler fyi if you have time
19:05:17 <fungi> oh yep
19:05:37 <clarkb> #topic Mailman 3
19:06:11 <clarkb> The 429 spam seems to have gone away as qucikly as it started. I don't think we made changes for that yet so the other end must've gotten bored
19:06:15 <fungi> no appreciable progress. life is starting to get out of the way and i'm working on catching back up to where i left off (new held node, et cetera)
19:06:48 <fungi> i'm wondering if documenting manual steps for adding a new domain is simpler than trying to orchestrate django for now
19:07:07 <clarkb> fungi: considering the number of domains I think that is workable
19:07:11 <clarkb> we are at ~6 today?
19:07:21 <fungi> given for the current ones we have manual import steps to perform anyway
19:07:25 <fungi> yeah
19:08:01 <clarkb> that works for me. We hae manual steps elsewhere too
19:08:02 <fungi> i'll shift my focus to working out those steps through the webui in that case
19:08:34 <clarkb> sounds good. Anything else mailman related?
19:08:35 <fungi> the existing wip changes are still good for either approach
19:08:41 <fungi> nothing from me
19:09:03 <clarkb> #topic Gerrit Updates
19:09:16 <clarkb> There are a few Gerrit items I've merged into one block here
19:09:25 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/885317 Build gerrit 3.7.4 and 3.8.1 images
19:09:35 <clarkb> The first is Gerrit did a whole bunch of releases over the weekend
19:09:48 <clarkb> 3.7.4 and 3.8.1 are both new and that change updates our image build sto match
19:10:14 <clarkb> We run 3.7.3 in prod so 3.7.4 will be our prod update and 3.8.1 will be used for 3.8 testing and 3.7 -> 3.8 upgrade testing
19:10:35 <clarkb> I made a note about a recorded breaking change that I'm pretty sure doesn't affect us
19:11:04 <clarkb> Note we need to manually replace the container for gerrit after that lands. It won't be automatic
19:11:22 <clarkb> Next is the leaking replication task files
19:11:35 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/884779 Revert Gerrit replication task bind mount
19:11:45 <clarkb> is one option and one that we might want to combine with the 3.7.4 container replacement
19:12:06 <clarkb> since that will give Gerrit a fresh ephemeral directory for those files the nwe can manually clean up the old bind mount location
19:12:18 <clarkb> the alternative is my somewhat hacky changes to add a startup script that scans all the json files and prunes them
19:12:49 <clarkb> Unfortunately no updates to my gerrit issues filed for this and they changed bug trackers so I'm not even sure my old links will work
19:13:12 <clarkb> Finally the rejection of implicit merges
19:13:14 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/885318 Merge this to reflect change to All-Projects once made
19:13:27 <clarkb> fungi: Not sure if yo uwere still planning to push that update to All-Projects
19:13:58 <fungi> oh, yes i can do that
19:14:13 <fungi> related to gerrit, zuul (as of... yesterday?) has support for the kafka event plugin too, wonder if we should consider working toward using that or stick with ssh event stream (we'd presumably still need to support the latter for existing third-party ci systems anyway, but looks like there are some resiliency benefits if we switch our zuul's connection to kafka)
19:14:37 <clarkb> the main issue with kafka is going to be running it
19:14:49 <fungi> yep
19:14:58 <corvus> the gerrit folks have been frowning at ssh for a while.... but i don't think they have plans to remove it
19:15:01 <clarkb> it is a fairly large and complicated system aiui (they even deleted zookeeper and now do that all internally)
19:15:07 <fungi> that's why it's a bit of an open question
19:15:13 <corvus> when developing the zuul stuff, i used the bitnami all-in-one container
19:15:40 <corvus> i didn't look into it much, but it might be easy enough if we want a simple system...
19:15:43 <fungi> not something we need to decide any time soon, mainly just curious
19:15:50 <corvus> but if we want multi-host, yeah, probably more work
19:15:51 <clarkb> an all in one container won't give us much extra resiliency when compared to ssh though. Except that we could potentially restart kafka less often then gerrit
19:16:00 <corvus> clarkb: exactly
19:16:10 <corvus> also, did they delete all the zk stuff?  or just augment it with more complexity?
19:16:20 <corvus> i still saw a lot of "set up zk" instructions...
19:16:36 <corvus> (which i didn't follow on account of using the bitnami aio, so i don't really know)
19:16:47 <fungi> as we all know, the solution to complexity is to layer on more complexity ;)
19:16:54 <clarkb> corvus: my understanding is that kafka deleted the zk dependency or is working to that in order to do simpler/cheaper/quicker elections internally
19:17:19 <frickler> as long as we only have a single gerrit, aio kafka sounds fine
19:18:08 <clarkb> Anything else Gerrit related?
19:18:28 <tonyb> Just quickly
19:18:39 <corvus> and that reminds me, fyi, the reason gerrit supports kafka is mostly to support multi-master stuff... so that's a potentially stepping stone ...
19:18:56 <tonyb> Should I base the python updates on your 3.7.4 review for ordering?
19:19:49 <fungi> good point about the path to multi-master gerrit. i mainly saw kafka as a way to avoid losing gerrit events if zuul gets disconnected
19:19:53 <clarkb> tonyb: Yes, I think we should try to update Gerrit first since they tend to have good bugfixes and bullseye is still supported for a bit making bookworm less urgent but still an important update
19:20:06 <tonyb> ++
19:21:17 <clarkb> #topi Server Upgrades
19:21:27 <clarkb> The 12 zuul executors are all running Jammy now
19:21:47 <corvus> i reckon i'll delete the old ones today
19:21:56 <clarkb> cool was just going to ask about that
19:22:10 <clarkb> I need to look at cleaning up the old ci registry too. Probably a tomorrow task ats this point
19:22:29 <clarkb> Other than that I didn't have any news here. Anyone else have updates?
19:22:34 * tonyb will watch how corvus does it and then copy it for the ci-registry
19:22:49 <corvus> oh i think all the changes are done now
19:23:02 <corvus> only thing left is manually deleting them using openstack cli
19:23:06 <tonyb> although I expect the actual server destruction will be done by y'all
19:23:17 <ianw> if anyone is picking up mirrors, i do think it's worth going back to re-evaluate kafs with them
19:23:17 <clarkb> ya server destruction is a bit manual
19:23:20 <tonyb> Oh okay
19:23:32 <clarkb> ianw: ya mirrors and meetpad are next up on the todo list
19:23:36 <fungi> ianw: great reminder about kafs, thanks
19:23:55 <corvus> ianw: any reason in particular, or just lets check in since it's been a while?
19:23:56 <tonyb> ianw I can keep you in the loop on that
19:24:01 <clarkb> ianw: for kafs you were thinking we could just deploy a node and then use dns to flip back and forth as necessary?
19:24:35 <ianw> corvus: it's come a long way; i've started using it locally and it's working fine
19:24:43 <fungi> jammy upgrades means newer kernel means newer kafs
19:25:11 <ianw> yeah, i have some changes up to implement it with a flag; we could put up a trial host and do some load testing with a dns switch
19:25:31 <ianw> what i'm not 100% on is the caching layers
19:25:39 <corvus> kk
19:25:45 <ianw> and that's the type of thing we'd probably need some real loads to tune
19:26:31 <ianw> but ultimately the reason is if we can get away from external builds of openafs that would be nice
19:26:51 <clarkb> thoguh with jammy the openafs version there seems to be working too (at least for now)
19:27:01 <clarkb> but agreed adds more flexibility across platforms and updates etc
19:27:16 <ianw> yeah, it's never a problem till it is :)
19:27:56 <ianw> i'm willing to bring up a node, etc., but will require more than just my eyes
19:28:36 <clarkb> maybe put it in one o fthe rax regions since that is sizeable enough for data collection and feeling confident elsewhere will be happy too
19:29:03 <fungi> openafs dkms builds are back to broken in debian/unstable (seems to be related to linux 6.1 or maybe newer compiler/klibc), so i'm tempted to give kafs a whirl there
19:30:10 <clarkb> #topic Fedora cleanup
19:30:30 <clarkb> I haven't had time to look at hte mirroring stuff since we last met. tonyb do you have anything to add?
19:30:53 <tonyb> No progress from me.  I need to write up how I think the mirroring setup should work for review
19:31:16 <clarkb> and feel free to ping me with questions and point me to the write up when ready
19:31:26 <clarkb> #topic Storyboard
19:31:42 <clarkb> I haven't seen anything new here either but figured I would check
19:31:50 <fungi> nope
19:32:00 <clarkb> #topic Gitea 1.20 Upgrade
19:32:07 <clarkb> We did the 1.19.4 upgrade of Gitea last week
19:32:13 <clarkb> Was straightforward as expected
19:32:22 <clarkb> 1.20 is a bit more involved unfortunately
19:32:33 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/886993 Gitea 1.20 change
19:32:53 <clarkb> I finally got our test suite to pass, but there ar ea number of todos I've noted in the commit message about stuff we should check
19:33:29 <clarkb> The main frustrations I've hit so far 1) oauth2 is a disabled feature but we still need to configure all of its jwt stuff to avoid startup errors which means more config and state on disk that we don't use but is required
19:34:13 <clarkb> and second 2) they have changed their WORK_DIR/WORK_PATH expectations for the second time and we need to go through that and ensure we aren't orphaning data in our containers' ephemeral disk areas and instead have all that covered by bind mounts
19:34:43 <clarkb> for 2) the idea I had was we could hold a node and compare the resulting bindmounts and gitea dir locations with our prod stuff to make sure they roughly align and if they do we should be food
19:34:47 <clarkb> *good
19:35:22 <corvus> (this meeting moved from lunch to breakfast for clarkb)
19:35:25 <clarkb> From a feature perspective this release doesn't seem to add anything flashy which is probably good as we don' thave to wrangle features on top of this
19:36:05 <clarkb> anyway I think reviews may be helpful at this point looking over the change log from gitea and ensuring we havne't missed anything important. And I'll try to work through those TODOs as I'm able and update the change
19:36:47 <clarkb> and feel free to add more todos if you find items that need to be addressed
19:37:34 <clarkb> #topic Etherpad 1.9.1
19:37:45 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/887006 Etherpad 1.9.1
19:38:12 <clarkb> Better news here. I think I sorted ou that the username and user color problems are due to a change of handling falsey boolean config entries to null entries in config
19:38:27 <clarkb> I made that update to our settings.json on the old held node and seemed to fix it
19:38:37 <fungi> awesome
19:38:47 <clarkb> I should have a new held node somewhere built from the code update based on that manual update
19:38:54 <clarkb> so we need to retest and check that it actually helps
19:38:59 <clarkb> also numbered lists seem to work for us
19:39:37 <clarkb> and they appear to have updated the git tag so we don't need to use a random git sha
19:39:52 <clarkb> I'm hopefuly that after round two of checking we'll be in a good spot to land the update
19:40:31 <clarkb> #topic Python Container Updates
19:40:59 <clarkb> Typically we talk about this in the context of updating python versions but due to the recent Debian bookwork release we're doing base OS container updates instead
19:41:14 <clarkb> #link https://review.opendev.org/q/topic:bookworm-python
19:41:21 <clarkb> #link https://review.opendev.org/q/topic:force-base-image-build
19:41:39 <clarkb> tonyb had two specific questions listed on the agenda.
19:41:52 <clarkb> The first is a qusetion of updating openstacksdk's old dockerfile and I think we should
19:42:04 <clarkb> we can't merge that change but can propose it to them and hopefully they approve it
19:42:29 <tonyb> They're okay to do whatever we suggest and the chnage is up for review
19:42:37 <frickler> #link https://review.opendev.org/c/openstack/python-openstackclient/+/888744
19:42:46 <clarkb> and secondly, should we manually clean up the leaked zuul change_* tags in docker hub
19:43:35 <clarkb> ah cool. then ya I think they should update their base image. It should be pretty safe since this is just a client tool we run on all the operating systems with minimal OS integration
19:43:57 <clarkb> For leaked zuul change_* tags I wonde rif we should write a script to clean those up and have it run against all our images
19:44:20 <clarkb> the script could check gerrit's api to see which changes are no longer open and then delete those tags
19:44:29 <tonyb> With the SDK there is a "meta" question about tags, we've stopped pushing 3.x as tags shoudl we restart so that consumers can just use whatever we "suggest"
19:45:00 <tonyb> those tags are pretty old (buster) which isn't great
19:45:05 <clarkb> tonyb: I think with the buster -> bullseye transition we decided that was not explicit enough
19:45:22 <clarkb> end users were expected to switch to the specific OS version tags, but I'm not surprised some were missed
19:47:20 <tonyb> Okay as long as it's been considered
19:47:27 <clarkb> we didn't remove the old tags to give people the ability to transition but maybe we should consider cleaning them up eventually
19:48:16 <tonyb> Do we have any way to see how many pulls a tag is getting?
19:48:24 <clarkb> I don't know if docker exposes that to us
19:48:46 <tonyb> I wondered if it was something the org owner could see
19:48:47 <clarkb> (quay does)
19:49:22 <tonyb> I did a grep/codesearch but that only helps for opendev
19:49:33 <clarkb> https://github.com/docker/hub-feedback/issues/1047
19:49:47 <clarkb> seems like we could fetch the total pulls at intervals and calculate the delta ourselves
19:50:08 <clarkb> corvus: I also wanted to mention that zuul/nodepool etc can probably look at bookworm now as the base images are present
19:50:23 <clarkb> I think that will allow zuul to clean up at least one backported package install
19:50:31 <clarkb> (bwrap?)
19:50:48 <corvus> ah cool thx
19:51:39 <tonyb> corvus: FWIW I have zuul containers on my list to tackle
19:51:44 <clarkb> tonyb: what we could do if we want to be super careful is tag :3.9 as :3.9-deprecated and deleted :3.9
19:51:55 <clarkb> then if anyone screams they can switch to the new tag and know that they should update to something else soon
19:52:09 <tonyb> clarkb: that'd be cool
19:52:42 <tonyb> My first round of chnages will just be to s/bullseye/bookworm/
19:52:50 <clarkb> tonyb: let's revisit that once we're happily on bookworm and we can go back and clean things up. Also need to look at removing 3.9 builds too
19:52:55 <clarkb> ++
19:53:01 <clarkb> definitely an iterative process here
19:53:08 <tonyb> and then do any python version bumps after that
19:53:28 <tonyb> and I was kinda thinking of doing 3.9 to 3.10 and then 3.10 to 3.11
19:53:46 <tonyb> depending on my perception of risk / downtime
19:54:08 <clarkb> The main drawback to 3.11 is ease of testing, but now that bookworm itself is 3.11 that is less of a concern
19:54:28 <clarkb> (previously you had to install extra packages on ubuntu and I think fedora/centos/rhel were all 3.10 as the newest?)
19:54:34 <clarkb> definitely less of an issue today
19:54:56 <clarkb> #topic Open Discussion
19:55:09 <clarkb> We have about 5 minutes left and I wanted to make sure we didn't miss anything else that may be important
19:55:29 <clarkb> Anything else?
19:55:42 <tonyb> nope.
19:55:55 <fungi> i got nothin'
19:56:31 <tonyb> I can use the time to make coffee
19:56:39 <clarkb> do that!
19:56:43 <ianw> sounds like clarkb gets an early mark to get the boat ready :)
19:56:45 <clarkb> thank you for your time everyone!
19:56:47 <tonyb> and not be late for my next meeting
19:56:52 <clarkb> We'll be back next week
19:56:58 <tonyb> have fun clarkb
19:57:00 <clarkb> #endmeeting