19:00:52 #startmeeting infra 19:00:52 Meeting started Tue Nov 21 19:00:52 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:00:52 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:00:52 The meeting name has been set to 'infra' 19:00:53 indeed 19:00:59 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/WBYBD2663WL2IJD7NLDHBQ5ANRNRSMX3/ Our Agenda 19:01:02 #topic Announcements 19:01:34 It is Thanksgiving week in the US. I saw the TC meeting was cancelled today as a result. I will be less and less around as the week progresses. Have to start on food prep tomorrow 19:01:47 basically heads up that it may get quiet but I'll probably check my matrix connection at times 19:02:16 #topic Server Upgrades 19:02:41 tonyb has made progress on this and replaced the ord mirror. The new jammy mirror is in use 19:02:55 \o/ 19:03:02 #link https://review.opendev.org/c/opendev/system-config/+/901504 Helper tool for mirror node volume management 19:03:16 awesome job 19:03:18 I created mirror02.bhs1 today, and tested ^^^ 19:03:23 one thing that came out of that is the mirror nodes have volumes that are set up differently than all our other hosts so the existing tools can't be used 19:03:46 to avoid manual effort which results in errors and deltas tonyb volunteered to write a tool to simplify things. 19:03:49 I need to rereview it 19:04:23 tonyb: other than reviewing changes and answering questions you have is there anything the rest of us can be doing to help? 19:04:42 Nope I'm working through things. 19:04:58 if anything comes up I'll yell 19:05:08 sounds good and thank you for the help! 19:05:23 #topic Python Container Updates 19:05:51 No update on getting zuul-operator off of old debian. But uwsgi builds against python3.12 now so we can add python3.12 images if we want 19:06:01 #link https://review.opendev.org/c/opendev/system-config/+/898756 And parent add python3.12 images 19:06:19 I don't expect we'll be making use of those quickly, but I do like getting the base images ready so that we aren't preventing anyone from testing with them 19:06:41 ++ 19:06:57 They should be straightforward reviews. THe parent is a bookkeeping noop and the child only adds new images that you have to explicitly opt into using 19:08:39 #topic Gitea 1.12.0 19:09:02 I worked through the changelog and have the gitea test job running with screenshots that look correct now 19:09:26 However, it seems there is rough consensus that we'd like to rotate our ssh keys out in gitea before we upgrade to avoid needing to disable ssh key length checking 19:09:32 #link https://review.opendev.org/c/opendev/system-config/+/901082 Support gitea key rotation 19:09:57 This change should allow us to do that entirely through configuration management. (the existing config management doesn't quite do what we need for rotating keys) 19:10:20 As written it should noop. Then we can create a new key, add it to gitea, then also update gerrit config management to deploy the key there and select it 19:10:36 the gerrit side is not yet implemented as I was hoping for feedback on 901082 first 19:11:02 Oh and I think we should use an ed25519 key because they have a single length which hopefully avoids gitea changing minimum lengths in the future on us 19:11:40 Sounds good to me. 19:12:06 i'm fine with it 19:12:09 If you are interested in seeing what changes with gitea other than the ssh key stuff the change is ready for review 19:12:14 #link https://review.opendev.org/c/opendev/system-config/+/897679 Upgrade to 1.21.0 19:12:49 There are other things that change but none of them in a very impactful way 19:13:55 #topic Gerrit 3.8 Upgrade 19:13:58 This is done! 19:14:03 \o/ 19:14:03 It went really well as far as I can tell 19:14:21 The one issue we've seen is that html/js resources seem to be cached on the old version affecting the web ui file editor 19:14:32 If you hard refresh or delete your caches this seems to fix it 19:14:55 I've gone ahead and started on the gerrit container image cleanup for 3.7 and updates for 3.9 19:14:57 #link https://review.opendev.org/c/opendev/system-config/+/901469 Updates our gerrit image builds post upgrade 19:15:21 I figure we can merge those first thing next week if we still don't have a reason to rollback to 3.7 19:15:33 Is it worth sending an email to service-announce (and pointing other projects at it) explaining the html/js issue 19:15:55 tonyb: ya I can do that as a response to the upgrade announcement 19:16:18 Okay, I wasn't volunteering you ;P 19:16:28 tonyb: no its fine, then I don't have to moderate it througgh the list :) 19:16:34 :) 19:16:46 i get the impression not many people use the built-in change editor, and some of them will end up never seeing the problem because of their browser pulling the new version before they do 19:16:47 I also sent email to upstrem about it and at least one person indicated they had seen the issue before as well but weren't sure of any changes that avoid it 19:18:10 In related good news the gerrit 3.9 upgrade looks similar to the 3.8 upgrade. Minimal downtime to run init and then online reindexing 19:18:25 I haven't gone through the change list though so there may be annoying things we have to deal with pre upgrade 19:19:03 Anyway if you agree about removing 3.7 early next week maybe review teh chagne and indicate that in review or something 19:19:21 #topic Upgrading Zuul's MySQL DB Server 19:19:45 In addition to upgrading gerrit last friday we also did a big zuul db migration to accomodate buildsets with multiple refs 19:20:04 in that migration we discovered that the older mysql tehre didn't support modern sql syntax for renaming foreign key constraints 19:20:28 This has since been fixed in the zuul migration, but to avoid similar problems in the future it is probably a good idea for us to look into running a more modern mysql/maria db for zuul 19:21:27 I don't think we're going to create a plan for that here in this meeting but wanted to bring it up so that we can call out concerns or items to think about. I have 2. The first is where do we run it? Should it be on a dedicated host or just on say zuul01? I think disk size and memory needs will determine that. And are we currently backing up the db? If not should we before we 19:21:29 move it? 19:22:47 I suspect that the size of the database may make it somewhat impactful to run it alongside of the existing schedulers and we'll need a new host dedicated to the databse instead. Thats fine but a small departure from how we run mariabd next to our other services 19:22:51 i don't see a mysqldump in root's crontab on either of the schedulers, for reference 19:24:26 It'd be a departure from how we typically run the DB, but consistent with how we're runnign it for zuul today right? 19:24:34 tonyb: correct. 19:24:52 tonyb: basically all of the self hosted non trove dbs currently are run out of the same docker compose for $service on the same host 19:25:05 but that is because all of those dbs are small enough or servers are large enough that the impact is minimal 19:25:09 I suspect that won't be the case here 19:25:17 Yup that makes sense 19:25:32 well, first off, we're running zuul with no spof other than haproxy and that trove instance at the moment. would we want a db cluster? 19:25:36 maybe the thign to do is collect info in an etherpad (current db version, current db size needs for disk and memory, backups and backup sizes if any) and then use that to build a plan off of 19:26:13 so I'm not sure how zuul would handle that 19:26:17 for example is it galera safe? 19:26:46 all questions we ought to ask 19:27:08 unlike zookeeper which automatically fails over and handles clustering out of the box with db servers its a lot more hands on and has impacts on the sorts of queries and inserts you can do for example 19:27:16 in the short term though, should we schedule some downtime to reboot the current trove instance onto a newer version (if available)? 19:28:10 I think it depends on how much newer we can get? If it is still fairly ancient then probably not worthwhile but if it is modern then it may be worth doing 19:28:39 but ya this is the sort of info gathering we need before we can make any reasonable decisions 19:29:00 Yup. 19:29:04 https://etherpad.opendev.org/p/opendev-zuul-mysql-upgrade <- 19:29:08 lets collect questions and answers there 19:29:42 the "upgrade instance" option is greyed out in the rackspace webui for that db, just checked. not sure if that means 5.7 is the latest they have, or what 19:30:52 Well that's a start. 19:30:53 if i create a new instance they have mysql 8.0 or percona 8.0 or mariadb 10.4 as options 19:31:46 so anyway, in-place upgrading seems to be unavailable for it 19:31:58 no idea if those versions are also ~ancient 19:32:03 So we could stick with trove and dump|restore 19:32:28 10.4 is like old old stable but still supported for a bit iirc 19:32:42 its what a lot of our stuff runs on and I haven't prioritized upgrades yet because it isn't EOL for another year or two iirc 19:32:56 I've got a list of questions in that etherpad now 19:32:56 10.4.32 was releases last week 19:34:48 I think collect what we can on that etherpad then loop corvus in and make an informed decision 19:35:13 oh hi, today has been busy for me, sorry just catching up 19:35:35 corvus: I don't think it is urgent. Just trying to get a handle on what an updated non trove zuul db looks like 19:35:45 i think i'd add that we have generally been okay with losing the entire build db, thus the current decisions around deployment 19:35:56 and lack of backups etc 19:36:12 we could decide to change that, but that's definitely a first-order input into requirements :) 19:36:36 if we wanted to remove the spof, we could do what the zuul operator does and run percona xtradb 19:36:53 but none of us knows how to run it other than just using the pxc operator, so that's a k8s. 19:37:37 if we run our own mysql spof, then i think it should be on a separate host since we now treat the schedulers as disposable 19:39:12 those all sound like reasonable constraints 19:41:30 maybe worth doing a survey of db clustering solutions that are reasonably low effort 19:41:54 ++ 19:42:18 i feel like this is not important enough for us to sink huge amounts of ops time into running a zero-downtime cluster and risk more downtime by not doing it well enough. 19:42:30 and that aren't kv stores, presumably. we need an actual rdbms right? 19:42:35 so if it's hard/risky, i would lean toward just run a tidy mariadb on a dedicated node. 19:42:44 corvus: ++ 19:42:53 but if it's reasonably easy (like it is in k8s with pxc-operator), then maybe worth it 19:43:09 I think it may be a really interesting learning experience if peopel are into that but also based on people's struggles with openstack stuff it seems running db clusters isn't always straightforward 19:43:12 fungi: yes, mysql/mariadb or postgres specifically. no others. 19:43:38 #nopostgres 19:43:46 we should probably not exclude pgsql from our consideration, even though we're generally mysql biased so far. 19:44:00 a, okay 19:44:25 fungi: was that a veto of postgres, or tongue in cheek? 19:44:45 it was an interpretation of your followup to my question about rdbms 19:44:55 but then you clarified 19:45:05 ok I've tried to collect what we've said so far in that etherpad 19:45:16 oh that was a deep cut. i get it. :) 19:45:18 * fungi has no preference, just remember the postgres wars in openstack 19:45:38 I think the next step(s) is/are to fill in the rest of the anwers to those questions and get some basic info on clustering options 19:45:54 anyway, the ship has sailed on zuul supporting both. both are first-class citizens and will continue to be afaict, even though supporting two is O(n^2) effort. 19:45:56 I'm definitely not going to get to that this week :) I can already start to feel the pull of cooking happening 19:46:07 so either is fine, if, say, postgres clustering is easy. 19:46:12 wfm 19:46:48 yep. these are good notes to take and will help me remember this after recovering from pumpkin pie. 19:46:51 maybe resync at next week's meeting and find specific volunteers for remaining info gathering post holiday 19:47:18 ++ 19:47:31 sounds good 19:47:39 #topic Open Discussion 19:48:25 I think if I could get one more thing done this week it would be to land the python3.12 image updates since that is low impact. But otherwise I'm happy to wait on the gitea ssh stuff and gerrit image cleanup/additions 19:48:54 I'm definitely going to start being around less regularly. Apparently we haev to roast turkey stuff tomorrow because we're not cooking it whole and need gravy makings 19:49:08 but also before that happens the turkey needs to be "deconstructed" 19:49:13 such a kind way of saying "butchered" 19:49:41 openinfra foundation board individual member representative nominations are open until december 15 19:50:36 #link https://lists.openinfra.dev/archives/list/foundation@lists.openinfra.dev/thread/YJIQL444JMKFRSHUBYDWUQHBF7P7UDJF/ 2024 Open Infrastructure Foundation Individual Director nominations are open 19:51:12 One thing on my todo list for after thanksgiving is to start on Foundation Annual Report content for OpenDev (and Zuul) 19:51:26 I plan to stick that into etherpads like I've done before so that others can provide feedback easily 19:51:45 If there is somethign specific you're proud of or really want to see covered feel free to let me know 19:52:31 Last call for anything else? otehrwise we can go eat $meal a bit early 19:53:09 nothing from me 19:53:28 Sounds like that is everything. Thank you everyone for your time and I hope you get to enjoy Thanksgiving if you are celebrating 19:53:33 #endmeeting