19:00:52 <clarkb> #startmeeting infra
19:00:52 <opendevmeet> Meeting started Tue Nov 21 19:00:52 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:52 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:52 <opendevmeet> The meeting name has been set to 'infra'
19:00:53 <fungi> indeed
19:00:59 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/WBYBD2663WL2IJD7NLDHBQ5ANRNRSMX3/ Our Agenda
19:01:02 <clarkb> #topic Announcements
19:01:34 <clarkb> It is Thanksgiving week in the US. I saw the TC meeting was cancelled today as a result. I will be less and less around as the week progresses. Have to start on food prep tomorrow
19:01:47 <clarkb> basically heads up that it may get quiet but I'll probably check my matrix connection at times
19:02:16 <clarkb> #topic Server Upgrades
19:02:41 <clarkb> tonyb has made progress on this and replaced the ord mirror. The new jammy mirror is in use
19:02:55 <tonyb> \o/
19:03:02 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/901504 Helper tool for mirror node volume management
19:03:16 <fungi> awesome job
19:03:18 <tonyb> I created mirror02.bhs1 today, and tested ^^^
19:03:23 <clarkb> one thing that came out of that is the mirror nodes have volumes that are set up differently than all our other hosts so the existing tools can't be used
19:03:46 <clarkb> to avoid manual effort which results in errors and deltas tonyb volunteered to write a tool to simplify things.
19:03:49 <clarkb> I need to rereview it
19:04:23 <clarkb> tonyb: other than reviewing changes and answering questions you have is there anything the rest of us can be doing to help?
19:04:42 <tonyb> Nope I'm working through things.
19:04:58 <tonyb> if anything comes up I'll yell
19:05:08 <clarkb> sounds good and thank you for the help!
19:05:23 <clarkb> #topic Python Container Updates
19:05:51 <clarkb> No update on getting zuul-operator off of old debian. But uwsgi builds against python3.12 now so we can add python3.12 images if we want
19:06:01 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/898756 And parent add python3.12 images
19:06:19 <clarkb> I don't expect we'll be making use of those quickly, but I do like getting the base images ready so that we aren't preventing anyone from testing with them
19:06:41 <tonyb> ++
19:06:57 <clarkb> They should be straightforward reviews. THe parent is a bookkeeping noop and the child only adds new images that you have to explicitly opt into using
19:08:39 <clarkb> #topic Gitea 1.12.0
19:09:02 <clarkb> I worked through the changelog and have the gitea test job running with screenshots that look correct now
19:09:26 <clarkb> However, it seems there is rough consensus that we'd like to rotate our ssh keys out in gitea before we upgrade to avoid needing to disable ssh key length checking
19:09:32 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/901082 Support gitea key rotation
19:09:57 <clarkb> This change should allow us to do that entirely through configuration management. (the existing config management doesn't quite do what we need for rotating keys)
19:10:20 <clarkb> As written it should noop. Then we can create a new key, add it to gitea, then also update gerrit config management to deploy the key there and select it
19:10:36 <clarkb> the gerrit side is not yet implemented as I was hoping for feedback on 901082 first
19:11:02 <clarkb> Oh and I think we should use an ed25519 key because they have a single length which hopefully avoids gitea changing minimum lengths in the future on us
19:11:40 <tonyb> Sounds good to me.
19:12:06 <fungi> i'm fine with it
19:12:09 <clarkb> If you are interested in seeing what changes with gitea other than the ssh key stuff the change is ready for review
19:12:14 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/897679 Upgrade to 1.21.0
19:12:49 <clarkb> There are other things that change but none of them in a very impactful way
19:13:55 <clarkb> #topic Gerrit 3.8 Upgrade
19:13:58 <clarkb> This is done!
19:14:03 <tonyb> \o/
19:14:03 <clarkb> It went really well as far as I can tell
19:14:21 <clarkb> The one issue we've seen is that html/js resources seem to be cached on the old version affecting the web ui file editor
19:14:32 <clarkb> If you hard refresh or delete your caches this seems to fix it
19:14:55 <clarkb> I've gone ahead and started on the gerrit container image cleanup for 3.7 and updates for 3.9
19:14:57 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/901469 Updates our gerrit image builds post upgrade
19:15:21 <clarkb> I figure we can merge those first thing next week if we still don't have a reason to rollback to 3.7
19:15:33 <tonyb> Is it worth sending an email to service-announce (and pointing other projects at it) explaining the html/js issue
19:15:55 <clarkb> tonyb: ya I can do that as a response to the upgrade announcement
19:16:18 <tonyb> Okay, I wasn't volunteering you ;P
19:16:28 <clarkb> tonyb: no its fine, then I don't have to moderate it througgh the list :)
19:16:34 <tonyb> :)
19:16:46 <fungi> i get the impression not many people use the built-in change editor, and some of them will end up never seeing the problem because of their browser pulling the new version before they do
19:16:47 <clarkb> I also sent email to upstrem about it and at least one person indicated they had seen the issue before as well but weren't sure of any changes that avoid it
19:18:10 <clarkb> In related good news the gerrit 3.9 upgrade looks similar to the 3.8 upgrade. Minimal downtime to run init and then online reindexing
19:18:25 <clarkb> I haven't gone through the change list though so there may be annoying things we have to deal with pre upgrade
19:19:03 <clarkb> Anyway if you agree about removing 3.7 early next week maybe review teh chagne and indicate that in review or something
19:19:21 <clarkb> #topic Upgrading Zuul's MySQL DB Server
19:19:45 <clarkb> In addition to upgrading gerrit last friday we also did a big zuul db migration to accomodate buildsets with multiple refs
19:20:04 <clarkb> in that migration we discovered that the older mysql tehre didn't support modern sql syntax for renaming foreign key constraints
19:20:28 <clarkb> This has since been fixed in the zuul migration, but to avoid similar problems in the future it is probably a good idea for us to look into running a more modern mysql/maria db for zuul
19:21:27 <clarkb> I don't think we're going to create a plan for that here in this meeting but wanted to bring it up so that we can call out concerns or items to think about. I have 2. The first is where do we run it? Should it be on a dedicated host or just on say zuul01? I think disk size and memory needs will determine that. And are we currently backing up the db? If not should we before we
19:21:29 <clarkb> move it?
19:22:47 <clarkb> I suspect that the size of the database may make it somewhat impactful to run it alongside of the existing schedulers and we'll need a new host dedicated to the databse instead. Thats fine but a small departure from how we run mariabd next to our other services
19:22:51 <fungi> i don't see a mysqldump in root's crontab on either of the schedulers, for reference
19:24:26 <tonyb> It'd be a departure from how we typically run the DB, but consistent with how we're runnign it for zuul today right?
19:24:34 <clarkb> tonyb: correct.
19:24:52 <clarkb> tonyb: basically all of the self hosted non trove dbs currently are run out of the same docker compose for $service on the same host
19:25:05 <clarkb> but that is because all of those dbs are small enough or servers are large enough that the impact is minimal
19:25:09 <clarkb> I suspect that won't be the case here
19:25:17 <tonyb> Yup that makes sense
19:25:32 <fungi> well, first off, we're running zuul with no spof other than haproxy and that trove instance at the moment. would we want a db cluster?
19:25:36 <clarkb> maybe the thign to do is collect info in an etherpad (current db version, current db size needs for disk and memory, backups and backup sizes if any) and then use that to build a plan off of
19:26:13 <clarkb> so I'm not sure how zuul would handle that
19:26:17 <clarkb> for example is it galera safe?
19:26:46 <fungi> all questions we ought to ask
19:27:08 <clarkb> unlike zookeeper which automatically fails over and handles clustering out of the box with db servers its a lot more hands on and has impacts on the sorts of queries and inserts you can do for example
19:27:16 <fungi> in the short term though, should we schedule some downtime to reboot the current trove instance onto a newer version (if available)?
19:28:10 <clarkb> I think it depends on how much newer we can get? If it is still fairly ancient then probably not worthwhile but if it is modern then it may be worth doing
19:28:39 <clarkb> but ya this is the sort of info gathering we need before we can make any reasonable decisions
19:29:00 <tonyb> Yup.
19:29:04 <clarkb> https://etherpad.opendev.org/p/opendev-zuul-mysql-upgrade <-
19:29:08 <clarkb> lets collect questions and answers there
19:29:42 <fungi> the "upgrade instance" option is greyed out in the rackspace webui for that db, just checked. not sure if that means 5.7 is the latest they have, or what
19:30:52 <tonyb> Well that's a start.
19:30:53 <fungi> if i create a new instance they have mysql 8.0 or percona 8.0 or mariadb 10.4 as options
19:31:46 <fungi> so anyway, in-place upgrading seems to be unavailable for it
19:31:58 <fungi> no idea if those versions are also ~ancient
19:32:03 <tonyb> So we could stick with trove and dump|restore
19:32:28 <clarkb> 10.4 is like old old stable but still supported for a bit iirc
19:32:42 <clarkb> its what a lot of our stuff runs on and I haven't prioritized upgrades yet because it isn't EOL for another year or two iirc
19:32:56 <clarkb> I've got a list of questions in that etherpad now
19:32:56 <tonyb> 10.4.32 was releases last week
19:34:48 <clarkb> I think collect what we can on that etherpad then loop corvus in and make an informed decision
19:35:13 <corvus> oh hi, today has been busy for me, sorry just catching up
19:35:35 <clarkb> corvus: I don't think it is urgent. Just trying to get a handle on what an updated non trove zuul db looks like
19:35:45 <corvus> i think i'd add that we have generally been okay with losing the entire build db, thus the current decisions around deployment
19:35:56 <corvus> and lack of backups etc
19:36:12 <corvus> we could decide to change that, but that's definitely a first-order input into requirements :)
19:36:36 <corvus> if we wanted to remove the spof, we could do what the zuul operator does and run percona xtradb
19:36:53 <corvus> but none of us knows how to run it other than just using the pxc operator, so that's a k8s.
19:37:37 <corvus> if we run our own mysql spof, then i think it should be on a separate host since we now treat the schedulers as disposable
19:39:12 <fungi> those all sound like reasonable constraints
19:41:30 <corvus> maybe worth doing a survey of db clustering solutions that are reasonably low effort
19:41:54 <clarkb> ++
19:42:18 <corvus> i feel like this is not important enough for us to sink huge amounts of ops time into running a zero-downtime cluster and risk more downtime by not doing it well enough.
19:42:30 <fungi> and that aren't kv stores, presumably. we need an actual rdbms right?
19:42:35 <corvus> so if it's hard/risky, i would lean toward just run a tidy mariadb on a dedicated node.
19:42:44 <clarkb> corvus: ++
19:42:53 <corvus> but if it's reasonably easy (like it is in k8s with pxc-operator), then maybe worth it
19:43:09 <clarkb> I think it may be a really interesting learning experience if peopel are into that but also based on people's struggles with openstack stuff it seems running db clusters isn't always straightforward
19:43:12 <corvus> fungi: yes, mysql/mariadb or postgres specifically.  no others.
19:43:38 <fungi> #nopostgres
19:43:46 <corvus> we should probably not exclude pgsql from our consideration, even though we're generally mysql biased so far.
19:44:00 <fungi> a, okay
19:44:25 <corvus> fungi: was that a veto of postgres, or tongue in cheek?
19:44:45 <fungi> it was an interpretation of your followup to my question about rdbms
19:44:55 <fungi> but then you clarified
19:45:05 <clarkb> ok I've tried to collect what we've said so far in that etherpad
19:45:16 <corvus> oh that was a deep cut.  i get it.  :)
19:45:18 * fungi has no preference, just remember the postgres wars in openstack
19:45:38 <clarkb> I think the next step(s) is/are to fill in the rest of the anwers to those questions and get some basic info on clustering options
19:45:54 <corvus> anyway, the ship has sailed on zuul supporting both.  both are first-class citizens and will continue to be afaict, even though supporting two is O(n^2) effort.
19:45:56 <clarkb> I'm definitely not going to get to that this week :) I can already start to feel the pull of cooking happening
19:46:07 <corvus> so either is fine, if, say, postgres clustering is easy.
19:46:12 <fungi> wfm
19:46:48 <corvus> yep. these are good notes to take and will help me remember this after recovering from pumpkin pie.
19:46:51 <clarkb> maybe resync at next week's meeting and find specific volunteers for remaining info gathering post holiday
19:47:18 <corvus> ++
19:47:31 <fungi> sounds good
19:47:39 <clarkb> #topic Open Discussion
19:48:25 <clarkb> I think if I could get one more thing done this week it would be to land the python3.12 image updates since that is low impact. But otherwise I'm happy to wait on the gitea ssh stuff and gerrit image cleanup/additions
19:48:54 <clarkb> I'm definitely going to start being around less regularly. Apparently we haev to roast turkey stuff tomorrow because we're not cooking it whole and need gravy makings
19:49:08 <clarkb> but also before that happens the turkey needs to be "deconstructed"
19:49:13 <clarkb> such a kind way of saying "butchered"
19:49:41 <fungi> openinfra foundation board individual member representative nominations are open until december 15
19:50:36 <fungi> #link https://lists.openinfra.dev/archives/list/foundation@lists.openinfra.dev/thread/YJIQL444JMKFRSHUBYDWUQHBF7P7UDJF/ 2024 Open Infrastructure Foundation Individual Director nominations are open
19:51:12 <clarkb> One thing on my todo list for after thanksgiving is to start on Foundation Annual Report content for OpenDev (and Zuul)
19:51:26 <clarkb> I plan to stick that into etherpads like I've done before so that others can provide feedback easily
19:51:45 <clarkb> If there is somethign specific you're proud of or really want to see covered feel free to let me know
19:52:31 <clarkb> Last call for anything else? otehrwise we can go eat $meal a bit early
19:53:09 <tonyb> nothing from me
19:53:28 <clarkb> Sounds like that is everything. Thank you everyone for your time and I hope you get to enjoy Thanksgiving if you are celebrating
19:53:33 <clarkb> #endmeeting