Tuesday, 2023-11-28

clarkbalmost meeting time18:58
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Nov 28 19:00:15 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/3B75BMYDEBIQ56DW355IGF72ZH6JVVQI/ Our Agenda19:00
clarkb#topic Announcements19:00
clarkbI didn't have anything to announce.19:00
clarkbOpenInfra foundation individial board member seat nominations are open now19:01
clarkbif that interests you I'm sure we can point you in the right direction19:01
clarkbI'll give it a couple more minutes before diving into the agenda19:02
clarkb#topic Server Upgrades19:05
clarkbtonyb continues to push this along19:05
clarkb#link https://review.opendev.org/q/topic:%22mirror-distro-updates%22+status:open19:05
clarkbthere are three mirrors all booted and ready to be swapped in now. Just waiting on reviews19:05
clarkbone thing tonyb and I discovered yesterday is that the launcher venv cannot create new volumes in rax. We had to use fungi's xyzzy env for that. The xyzzy env cannot attach the volume :/19:06
clarkbfungi: so maybe don't go cleaning up that env anytime soon :)19:06
clarkbtonyb: once we get those servers swapped in we'll need to go through and clean out the old servers too. I'm happy to sit down for that and we can go over some other root topics as well19:07
fungiyeah, a good quiet-time project for someone might be to do another round of bisecting sdk/cli versions to figure out what will actually work19:07
fungii think the launch venv might be usable for all those things? and we just didn't try it for volume creation19:08
fungibut then ended up using it for volume attachment19:08
tonybclarkb: Yup.  That'd probably be good to have extra eyes19:08
frickleriiuc the intention for latest sdk/cli is still to support rax, so reporting bugs if things don't work would be an option, too19:09
clarkbfungi: yes, the launch env worked for everything but volume creation. volume creation failed19:09
clarkbfrickler: ya we can also run with the --debug flag to see what calls are actually failing19:10
tonybfrickler: I think so.  The challenge is the CLI/SDK team don't have easy access to testing (yet)19:10
clarkbanyway reviews for the current set of nodes would be good so we can get them in place and then figure out cleanup of the old nodes19:11
clarkbanything else related to this?19:11
fricklertonyb: that's why feedback from us would be even more valuable19:11
tonybfrickler: fair point19:12
tonybclarkb: not from me.19:12
clarkb#topic Python Container Updates19:12
clarkb#link https://review.opendev.org/c/opendev/system-config/+/898756 And parent add python3.12 images19:12
clarkbAt this point I think adding python3.12 images is the only action we can take as we are still waiting on the zuul-operator fixups. I have not personally had time to look into that more closely19:13
clarkbThat said I don't think anything should stop us from adding those images19:13
tonybNeither have I.  It's in the "top 5" items on my todo list19:13
clarkb#topic Gitea 1.2119:14
clarkbGitea just released a 1.20.6 bugfix release that we should upgrade to prior to upgrading to 1.2119:14
clarkb#link https://review.opendev.org/c/opendev/system-config/+/902094 Upgrade gitea to 1.20.6 first19:15
clarkbThey also made a 1.21.1 release which I bumped our existing 1.21 change to19:15
clarkbin #opendev earlier today we said we'd appove the 1.20.6 update after this meeting. I think that still works for me though I will be popping out from about 2100-2230 UTC19:15
clarkbMy hope is that later this week (maybe thursday at this rate?) I'll be able to write a change for the gerrit half of the key rotation and then generate a new key and stash it in the appropriate locations19:16
tonybSounds good.19:16
clarkbThat said the gitea side of key rotation is ready for review and landable as is: #link https://review.opendev.org/c/opendev/system-config/+/901082 Support gitea key rotation19:16
clarkb#link https://review.opendev.org/c/opendev/system-config/+/901082 Support gitea key rotation19:16
clarkbThe change there is set up to manage the single existing key and we can do a followup to add the new key19:17
clarkbfor clarity I think the rough plan here is 0) upgrade to 1.20.6 1) add gitea key rotation support 2) add gerrit key rotation support 3) add new key to gitea 4) add new key to gerrit 5) use new key in gerrit 6) remove old key from gitea (and gerrit?) 7) upgrade gitea19:18
clarkbsteps 0) and 1) should be good to go.19:18
tonybSeems like a plan, FWIW, I'll look again at 0 and 119:19
clarkb#topic Upgrading Zuul's DB Server19:19
clarkb#link https://etherpad.opendev.org/p/opendev-zuul-mysql-upgrade info gathering document19:19
clarkbI haven't had time to dig into db cluster options yet19:20
fricklerI'm wondering whether we could reuse what kolla does for that19:20
clarkbLooking at the document it seems like some conclusions can be made though. Backups are not currently critical, database size is about 18GB uncompressed so the server(s) don't need to be large, the database should not be hosted on zuul nodes beacuse we auto upgrade zuul nodes19:21
clarkbfrickler: that is an interesting idea.19:22
tonybYup.  Given it won't be on any zuul servers I guess the RAM requirements are less interesting19:22
tonybfrickler: Can you drop some pointers?19:22
fungialso we can resize the instances if we run into memory pressure19:22
fricklerI need to look up the pointers in the docs, but in general there is quite a bit of logic in there to make things like upgrades work without interruption19:23
fungiat one point we had played around with percona replicating to a hot standby19:23
clarkbyes you need a lot of explicit coordination unlike say zookeeper19:23
clarkband you have to run a proxy19:24
fungimay have relied on ndb?19:24
fricklerkolla uses either haproxy or proxysql19:25
clarkbI don't remember that. The zuul-operator uses percona xtradb cluster and I think kolla uses galera19:25
clarkbwhich are very similar backends and then ya a proxy in front19:25
corvuslooks like kolla may use galera.  that's one of the options (in addition to percona xtradb, and whatever postgres does for clustering these days)19:25
corvusi don't think ndb is an option due to memory requirements19:26
fungii trust the former mysql contributors in these matters, i'm mostly database illiterate19:26
clarkbone thing we should look at too is whether or not we can promote and existing mysql/mariadb to galera/xtradb cluster and similar with postgres19:27
corvus(and the sort of archival nature of the data seems like not a great fit for ndb; though it is my favorite cluster tech just because of how wonderfully crazy it is)19:27
clarkbthen one option we may have is to start with a spof which isn't a regression then later add in the more complicated load balanced cluster19:27
fungiin theory the trove instance is already a spof19:27
clarkbyes that is why this isn't a regression19:28
corvusclarkb: i think that's useful to know, but in all cases, a db migration for us won't be too burdensome19:28
corvusworst case we're talking like an hour on a weekend for an outage if we want to completely change architecture19:28
clarkbgood point19:29
corvus(so i agree, a good plan might look like "move to a spof mariadb and then make it better later" but also it's not the end of the world if we decide "move to a spof mariadb then move to a non-spof mariadb during a maint window")19:30
corvusanyway, seems like a survey of HA options is still on the task list19:31
clarkbfwiw it looks like postgres ha options are also fairly involved and require you to manage fault identification and failover19:31
clarkb++ lets defer any decision making unti lwe have a bit more data. But I think we're leaning towards running our own system on dedicated machine(s) at the very least19:31
clarkb#topic Annual Report Season19:32
clarkb#link https://etherpad.opendev.org/p/2023-opendev-annual-report OpenDev draft report19:32
clarkbI've done this for a number of years now. I'll be drafting a section of the openinfra foundation's annual report that covers opendev19:32
clarkbI'm still in the brainstorming just get something started phase but feel free to add items to the etherpad19:33
clarkbOnce I've actually written something I'll ask for feedback as well. I think they want them written by the 22nd of december. Something like that19:33
fungii'll try to get preliminary 2023 engagement report data to you soon, though the mailing list measurements need to be reworked for mm319:33
tonybOkay so there is some time, but not lots of time19:34
clarkbtonyb: ya its a bit earlier than previous years too. Usually we have until the first week of january19:34
fungifinal counts of things get an exception beyond december for obvious reasons, but the prose needs to be ready with placeholders or preliminary numbers19:35
tonyb clarkb: Hmm okay19:35
clarkbits a good opportunity to call out work you've been involved in :)19:36
clarkbdefinitely add those items to the brainstorm list so I don't forget about them19:37
clarkb#topic Open Discussion19:37
fungiyeah, whatever you think we should be proud of19:37
clarkbtonyb stuck the idea of making it possible for people to run unittest jobs using python containers under open discussion19:37
corvuswhat's the use case that's motivating this?  is it that someone wants to run jobs on containers instead of vms?  or that they want an easier way to customize our vm images than using dib?19:37
tonybYeah, I just wanted to get a feel for what's been tried.  I guess there are potentially 2 "motivators"19:38
tonyb1) a possibly flawed assumption that we could do more unit tests in some form of container system as the startup/rest costs are lower?19:39
tonyb2) making it possible, if not easy, for the community to test newer pythons without the problems of chasing unstable distros19:39
fungiwhere would those containers run?19:40
corvusok!  for 1 there are a few things:19:40
tonybWell that'd be part of the discussion.19:40
corvus- in opendev, we're usually not really limited by startup/recycle time.  most of our clouds are fast.19:41
corvus(and we have enough capacity we can gloss over the recycle time)19:41
clarkbalso worth noting that the last time I checked we utilize less than 30% of our total available resources on a long term basis19:41
tonybWe could make zuul job templates like openstack-tox-* to setup a VM with an appropriate comtainer-runtime and run tox in there19:42
clarkbfrom an efficiency standpoint we'd need to cut our resource usage down to about 1/3 if we use always one container runners19:42
clarkb*always on/running19:42
tonybbut that would negate the 1st motivator19:42
corvus- nevertheless, nodepool and zuul do support running jobs in containers via k8s or openshift.  we ran a k8s cluster for a short time, but running it required a lot of work that no one had time for.  only one of our clouds provides a k8s aas, so that doesn't meet our diversity requirements19:42
tonybBoth fair points19:43
corvusthat ^ goes to fungi's point about where to run them.  i don't think the answer has changed since then, sadly :(19:43
tonybOkay.19:44
corvusyeah, we could write jobs/roles to pull in the image and run it, but if we do that a lot, that'll be slow and drive a lot of network traffic19:44
corvusif the motivation is to expand python versions, we might want to consider new dib images with them?19:44
clarkbpart of the problem with that is dib imges are extremely heavy weight19:44
corvusi think there was talk of using stow to have a bunch of pythons on one image?19:44
clarkbthey are massive (each image is like 50GB * 2 of storage) and uploads are slow to certain clouds19:45
clarkbya so we could colocate instead.19:45
clarkbMy hesitancy here is that in the times where we've tried to make it easier for the projects to test with new stuff its not gone anywhere bceause they have a hard time keeping up in general19:45
clarkbtumbleweed and fedora are examples of this19:46
clarkbbut even today openstack isn't testing with python3.11 across the board yet (though it is close)19:46
clarkbI think there is probably a balance in effort vs return and maybe containers are a good tool in balancing that out?19:46
tonybYeah that's why I thought avoiding the DIB image side might be helpful19:47
clarkbbasically I don't expect all of openstack to run python3.12 jobs until well after ubuntu has packages for it anyway. But maybe a project like zuul would run python3.12 jobs and those are relatively infrequent compared to openstack19:47
clarkbbut also have a dib step install python3.12 on jammy is not a ton of work if we think thi is largely a python problem19:48
clarkb(I think generally it could be a nodejs, golang, rust, etc problem but many of those ecosystems make it a bit easier to get a random version)19:48
clarkbcorvus: does ensure-python with appropriate flags already know how to go to the internet to fetch a python version and build it?19:49
clarkbI think it does? maybe we start there and see if there is usage and we can optmize from there?19:49
corvusyeah, there's pyenv and stow19:50
corvusin ensure-python19:50
tonybOkay.19:51
tonybI think that was helpful, I'd be willing to look at the ensure-python part and see what works and doesn't19:52
tonybit seems like the idea of using a container runtime isn't justified right now.19:52
clarkbif our clouds had first class container runtimes as a service it would be much easier to sell/experiement with. But without that there is a lot of bootstrapping overhead for the humans and networking19:53
clarkbside note: dox is a thing that mordred experimented with for a while: https://pypi.org/project/dox/19:54
clarkbbut ya lets start with the easy thing which is try ensure-python's existing support for getting a random python and take what we learn from there19:55
clarkbAnything else? We are just about at time for our hour?19:55
clarkbIs everyone still comfortable merging that gitea 1.20.6 udpate even if I'm gone from 2100 to 2230?19:55
clarkbif so I say someone should approve it :)19:55
fungii will, i can keep an eye on it19:56
clarkbthanks!19:56
fungiand done19:56
clarkbI guess its worth mentioning I think I'll miss our meeting on December 12. I'll be around for the very first part of the day but then I'm popping out19:56
* tonyb will be back in AU by then19:57
clarkbthank you for your time everyone19:57
clarkb#endmeeting19:57
opendevmeetMeeting ended Tue Nov 28 19:57:13 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:57
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-28-19.00.html19:57
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-28-19.00.txt19:57
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-11-28-19.00.log.html19:57
fungithings will probably be very quiet from that point on until the end of the year anyway19:57
clarkbfungi: ya thats my hope19:57
corvuswe'll miss you too clarkb 19:57
tonybThanks all19:57

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!