19:01:06 #startmeeting infra 19:01:06 Meeting started Tue Jul 18 19:01:06 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:06 The meeting name has been set to 'infra' 19:01:23 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/JVMGLDPDLQW5L3FFIKWILIJU5DJS77ES/ Our Agenda 19:01:30 #topic Announcements 19:02:25 A minor announcement. I'm not actually here today. The only reason that this works out is the super early relative to local time hour of the meeting. But the lowest tide of our trip occurs in ~5 hours so we're taking advantage of that for "tide pooling" 19:02:45 and it sa whole production to get the boat out before it gets stuck in the mud 19:02:56 i can only imagine 19:03:03 Sounds like fun 19:03:42 Ya I think everyone is looking forward to it. But I dfeinitely won't be around after the meeting today 19:03:47 #topic Bastion Host Updates 19:03:55 #link https://review.opendev.org/q/topic:bridge-backups 19:04:21 I think this one still deserves multiple core/root reviewers if we can manage it 19:04:28 fungi: frickler fyi if you have time 19:05:17 oh yep 19:05:37 #topic Mailman 3 19:06:11 The 429 spam seems to have gone away as qucikly as it started. I don't think we made changes for that yet so the other end must've gotten bored 19:06:15 no appreciable progress. life is starting to get out of the way and i'm working on catching back up to where i left off (new held node, et cetera) 19:06:48 i'm wondering if documenting manual steps for adding a new domain is simpler than trying to orchestrate django for now 19:07:07 fungi: considering the number of domains I think that is workable 19:07:11 we are at ~6 today? 19:07:21 given for the current ones we have manual import steps to perform anyway 19:07:25 yeah 19:08:01 that works for me. We hae manual steps elsewhere too 19:08:02 i'll shift my focus to working out those steps through the webui in that case 19:08:34 sounds good. Anything else mailman related? 19:08:35 the existing wip changes are still good for either approach 19:08:41 nothing from me 19:09:03 #topic Gerrit Updates 19:09:16 There are a few Gerrit items I've merged into one block here 19:09:25 #link https://review.opendev.org/c/opendev/system-config/+/885317 Build gerrit 3.7.4 and 3.8.1 images 19:09:35 The first is Gerrit did a whole bunch of releases over the weekend 19:09:48 3.7.4 and 3.8.1 are both new and that change updates our image build sto match 19:10:14 We run 3.7.3 in prod so 3.7.4 will be our prod update and 3.8.1 will be used for 3.8 testing and 3.7 -> 3.8 upgrade testing 19:10:35 I made a note about a recorded breaking change that I'm pretty sure doesn't affect us 19:11:04 Note we need to manually replace the container for gerrit after that lands. It won't be automatic 19:11:22 Next is the leaking replication task files 19:11:35 #link https://review.opendev.org/c/opendev/system-config/+/884779 Revert Gerrit replication task bind mount 19:11:45 is one option and one that we might want to combine with the 3.7.4 container replacement 19:12:06 since that will give Gerrit a fresh ephemeral directory for those files the nwe can manually clean up the old bind mount location 19:12:18 the alternative is my somewhat hacky changes to add a startup script that scans all the json files and prunes them 19:12:49 Unfortunately no updates to my gerrit issues filed for this and they changed bug trackers so I'm not even sure my old links will work 19:13:12 Finally the rejection of implicit merges 19:13:14 #link https://review.opendev.org/c/opendev/system-config/+/885318 Merge this to reflect change to All-Projects once made 19:13:27 fungi: Not sure if yo uwere still planning to push that update to All-Projects 19:13:58 oh, yes i can do that 19:14:13 related to gerrit, zuul (as of... yesterday?) has support for the kafka event plugin too, wonder if we should consider working toward using that or stick with ssh event stream (we'd presumably still need to support the latter for existing third-party ci systems anyway, but looks like there are some resiliency benefits if we switch our zuul's connection to kafka) 19:14:37 the main issue with kafka is going to be running it 19:14:49 yep 19:14:58 the gerrit folks have been frowning at ssh for a while.... but i don't think they have plans to remove it 19:15:01 it is a fairly large and complicated system aiui (they even deleted zookeeper and now do that all internally) 19:15:07 that's why it's a bit of an open question 19:15:13 when developing the zuul stuff, i used the bitnami all-in-one container 19:15:40 i didn't look into it much, but it might be easy enough if we want a simple system... 19:15:43 not something we need to decide any time soon, mainly just curious 19:15:50 but if we want multi-host, yeah, probably more work 19:15:51 an all in one container won't give us much extra resiliency when compared to ssh though. Except that we could potentially restart kafka less often then gerrit 19:16:00 clarkb: exactly 19:16:10 also, did they delete all the zk stuff? or just augment it with more complexity? 19:16:20 i still saw a lot of "set up zk" instructions... 19:16:36 (which i didn't follow on account of using the bitnami aio, so i don't really know) 19:16:47 as we all know, the solution to complexity is to layer on more complexity ;) 19:16:54 corvus: my understanding is that kafka deleted the zk dependency or is working to that in order to do simpler/cheaper/quicker elections internally 19:17:19 as long as we only have a single gerrit, aio kafka sounds fine 19:18:08 Anything else Gerrit related? 19:18:28 Just quickly 19:18:39 and that reminds me, fyi, the reason gerrit supports kafka is mostly to support multi-master stuff... so that's a potentially stepping stone ... 19:18:56 Should I base the python updates on your 3.7.4 review for ordering? 19:19:49 good point about the path to multi-master gerrit. i mainly saw kafka as a way to avoid losing gerrit events if zuul gets disconnected 19:19:53 tonyb: Yes, I think we should try to update Gerrit first since they tend to have good bugfixes and bullseye is still supported for a bit making bookworm less urgent but still an important update 19:20:06 ++ 19:21:17 #topi Server Upgrades 19:21:27 The 12 zuul executors are all running Jammy now 19:21:47 i reckon i'll delete the old ones today 19:21:56 cool was just going to ask about that 19:22:10 I need to look at cleaning up the old ci registry too. Probably a tomorrow task ats this point 19:22:29 Other than that I didn't have any news here. Anyone else have updates? 19:22:34 * tonyb will watch how corvus does it and then copy it for the ci-registry 19:22:49 oh i think all the changes are done now 19:23:02 only thing left is manually deleting them using openstack cli 19:23:06 although I expect the actual server destruction will be done by y'all 19:23:17 if anyone is picking up mirrors, i do think it's worth going back to re-evaluate kafs with them 19:23:17 ya server destruction is a bit manual 19:23:20 Oh okay 19:23:32 ianw: ya mirrors and meetpad are next up on the todo list 19:23:36 ianw: great reminder about kafs, thanks 19:23:55 ianw: any reason in particular, or just lets check in since it's been a while? 19:23:56 ianw I can keep you in the loop on that 19:24:01 ianw: for kafs you were thinking we could just deploy a node and then use dns to flip back and forth as necessary? 19:24:35 corvus: it's come a long way; i've started using it locally and it's working fine 19:24:43 jammy upgrades means newer kernel means newer kafs 19:25:11 yeah, i have some changes up to implement it with a flag; we could put up a trial host and do some load testing with a dns switch 19:25:31 what i'm not 100% on is the caching layers 19:25:39 kk 19:25:45 and that's the type of thing we'd probably need some real loads to tune 19:26:31 but ultimately the reason is if we can get away from external builds of openafs that would be nice 19:26:51 thoguh with jammy the openafs version there seems to be working too (at least for now) 19:27:01 but agreed adds more flexibility across platforms and updates etc 19:27:16 yeah, it's never a problem till it is :) 19:27:56 i'm willing to bring up a node, etc., but will require more than just my eyes 19:28:36 maybe put it in one o fthe rax regions since that is sizeable enough for data collection and feeling confident elsewhere will be happy too 19:29:03 openafs dkms builds are back to broken in debian/unstable (seems to be related to linux 6.1 or maybe newer compiler/klibc), so i'm tempted to give kafs a whirl there 19:30:10 #topic Fedora cleanup 19:30:30 I haven't had time to look at hte mirroring stuff since we last met. tonyb do you have anything to add? 19:30:53 No progress from me. I need to write up how I think the mirroring setup should work for review 19:31:16 and feel free to ping me with questions and point me to the write up when ready 19:31:26 #topic Storyboard 19:31:42 I haven't seen anything new here either but figured I would check 19:31:50 nope 19:32:00 #topic Gitea 1.20 Upgrade 19:32:07 We did the 1.19.4 upgrade of Gitea last week 19:32:13 Was straightforward as expected 19:32:22 1.20 is a bit more involved unfortunately 19:32:33 #link https://review.opendev.org/c/opendev/system-config/+/886993 Gitea 1.20 change 19:32:53 I finally got our test suite to pass, but there ar ea number of todos I've noted in the commit message about stuff we should check 19:33:29 The main frustrations I've hit so far 1) oauth2 is a disabled feature but we still need to configure all of its jwt stuff to avoid startup errors which means more config and state on disk that we don't use but is required 19:34:13 and second 2) they have changed their WORK_DIR/WORK_PATH expectations for the second time and we need to go through that and ensure we aren't orphaning data in our containers' ephemeral disk areas and instead have all that covered by bind mounts 19:34:43 for 2) the idea I had was we could hold a node and compare the resulting bindmounts and gitea dir locations with our prod stuff to make sure they roughly align and if they do we should be food 19:34:47 *good 19:35:22 (this meeting moved from lunch to breakfast for clarkb) 19:35:25 From a feature perspective this release doesn't seem to add anything flashy which is probably good as we don' thave to wrangle features on top of this 19:36:05 anyway I think reviews may be helpful at this point looking over the change log from gitea and ensuring we havne't missed anything important. And I'll try to work through those TODOs as I'm able and update the change 19:36:47 and feel free to add more todos if you find items that need to be addressed 19:37:34 #topic Etherpad 1.9.1 19:37:45 #link https://review.opendev.org/c/opendev/system-config/+/887006 Etherpad 1.9.1 19:38:12 Better news here. I think I sorted ou that the username and user color problems are due to a change of handling falsey boolean config entries to null entries in config 19:38:27 I made that update to our settings.json on the old held node and seemed to fix it 19:38:37 awesome 19:38:47 I should have a new held node somewhere built from the code update based on that manual update 19:38:54 so we need to retest and check that it actually helps 19:38:59 also numbered lists seem to work for us 19:39:37 and they appear to have updated the git tag so we don't need to use a random git sha 19:39:52 I'm hopefuly that after round two of checking we'll be in a good spot to land the update 19:40:31 #topic Python Container Updates 19:40:59 Typically we talk about this in the context of updating python versions but due to the recent Debian bookwork release we're doing base OS container updates instead 19:41:14 #link https://review.opendev.org/q/topic:bookworm-python 19:41:21 #link https://review.opendev.org/q/topic:force-base-image-build 19:41:39 tonyb had two specific questions listed on the agenda. 19:41:52 The first is a qusetion of updating openstacksdk's old dockerfile and I think we should 19:42:04 we can't merge that change but can propose it to them and hopefully they approve it 19:42:29 They're okay to do whatever we suggest and the chnage is up for review 19:42:37 #link https://review.opendev.org/c/openstack/python-openstackclient/+/888744 19:42:46 and secondly, should we manually clean up the leaked zuul change_* tags in docker hub 19:43:35 ah cool. then ya I think they should update their base image. It should be pretty safe since this is just a client tool we run on all the operating systems with minimal OS integration 19:43:57 For leaked zuul change_* tags I wonde rif we should write a script to clean those up and have it run against all our images 19:44:20 the script could check gerrit's api to see which changes are no longer open and then delete those tags 19:44:29 With the SDK there is a "meta" question about tags, we've stopped pushing 3.x as tags shoudl we restart so that consumers can just use whatever we "suggest" 19:45:00 those tags are pretty old (buster) which isn't great 19:45:05 tonyb: I think with the buster -> bullseye transition we decided that was not explicit enough 19:45:22 end users were expected to switch to the specific OS version tags, but I'm not surprised some were missed 19:47:20 Okay as long as it's been considered 19:47:27 we didn't remove the old tags to give people the ability to transition but maybe we should consider cleaning them up eventually 19:48:16 Do we have any way to see how many pulls a tag is getting? 19:48:24 I don't know if docker exposes that to us 19:48:46 I wondered if it was something the org owner could see 19:48:47 (quay does) 19:49:22 I did a grep/codesearch but that only helps for opendev 19:49:33 https://github.com/docker/hub-feedback/issues/1047 19:49:47 seems like we could fetch the total pulls at intervals and calculate the delta ourselves 19:50:08 corvus: I also wanted to mention that zuul/nodepool etc can probably look at bookworm now as the base images are present 19:50:23 I think that will allow zuul to clean up at least one backported package install 19:50:31 (bwrap?) 19:50:48 ah cool thx 19:51:39 corvus: FWIW I have zuul containers on my list to tackle 19:51:44 tonyb: what we could do if we want to be super careful is tag :3.9 as :3.9-deprecated and deleted :3.9 19:51:55 then if anyone screams they can switch to the new tag and know that they should update to something else soon 19:52:09 clarkb: that'd be cool 19:52:42 My first round of chnages will just be to s/bullseye/bookworm/ 19:52:50 tonyb: let's revisit that once we're happily on bookworm and we can go back and clean things up. Also need to look at removing 3.9 builds too 19:52:55 ++ 19:53:01 definitely an iterative process here 19:53:08 and then do any python version bumps after that 19:53:28 and I was kinda thinking of doing 3.9 to 3.10 and then 3.10 to 3.11 19:53:46 depending on my perception of risk / downtime 19:54:08 The main drawback to 3.11 is ease of testing, but now that bookworm itself is 3.11 that is less of a concern 19:54:28 (previously you had to install extra packages on ubuntu and I think fedora/centos/rhel were all 3.10 as the newest?) 19:54:34 definitely less of an issue today 19:54:56 #topic Open Discussion 19:55:09 We have about 5 minutes left and I wanted to make sure we didn't miss anything else that may be important 19:55:29 Anything else? 19:55:42 nope. 19:55:55 i got nothin' 19:56:31 I can use the time to make coffee 19:56:39 do that! 19:56:43 sounds like clarkb gets an early mark to get the boat ready :) 19:56:45 thank you for your time everyone! 19:56:47 and not be late for my next meeting 19:56:52 We'll be back next week 19:56:58 have fun clarkb 19:57:00 #endmeeting