#opendev-meeting log

19:01:37 <clarkb> #startmeeting infra
19:01:37 <opendevmeet> Meeting started Tue Oct 10 19:01:37 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:37 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:37 <opendevmeet> The meeting name has been set to 'infra'
19:01:47 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/7WZKZSWIX2W3OHQFM6OZNJU54BUD4IIT/ Our Agenda
19:01:53 <clarkb> #topic Announcements
19:02:19 <clarkb> The OpenInfra PTG will take place October 23-27. Please keep this in mind when making changes to tools like etherpad and meetpad
19:02:49 <clarkb> #topic Mailman 3
19:02:55 <clarkb> and now we can dive right in to our agenda
19:03:09 <clarkb> The plan is still to migrate lists.openstack.org on Thursday starting around 15:30 UTC
19:03:24 <clarkb> yesterday I asked fungi if there aws anythign we should do to help prepare and sounded like it was well in hand?
19:03:30 <fungi> still on track for maintenance thursday, yes. i sent a reminder to openstack-discuss on friday
19:04:02 <fungi> i'm working on the changes we'll merge as part of the maintenance and will ping infra-root once they're ready for review today or worst case tomorrow
19:04:19 <fungi> i'll also finish fleshing out the maintenance plan on the etherpad
19:04:27 <clarkb> sounds good
19:04:35 <fungi> but it's mostly a copy/paste of the earlier maintenances with timelines adjusted
19:04:53 <fungi> there are a few steps to do tomorrow, including initial rsync and checking dns record ttls
19:05:17 <clarkb> ping me if/when I can help and I can dig into things
19:05:25 <fungi> will do
19:05:48 <clarkb> The other mailman3 item of note is we created a new mailing list through our automation otuside of a migration. That seems to ahve worked as welll
19:05:57 <clarkb> we expected it to as our testing covers that but always good to confirm
19:06:11 <fungi> yes, the list owner confirmed they were able to create an account and got granted control of the settings
19:06:40 <fungi> they do not yet seem to have posted anything though, empty archive
19:07:07 <fungi> (i subscribed successfully when it was first created, and haven't seen any messages for it either, so no reason to think the archiving is broken)
19:07:49 <clarkb> ack
19:07:54 <clarkb> anything else mailman 3 related?
19:08:05 <fungi> i have nothing
19:08:07 <clarkb> #topic Updating Our OpenMetal Cloud
19:08:16 <clarkb> #link https://etherpad.opendev.org/p/redeploying-opendevs-openmetal-cloud Notes from discussion with Yuriy
19:09:39 <clarkb> This was an informative discussion. There are 3 timeframes we can upgrade within which will produce different results. If we upgrade today we'll get a newer base OS. If we upgrade in about a month we'll get the same newer base os and the same openstack versiosn as before but with more robust configs. And finally if we upgrade early next year (sounded lik February) we'll get
19:09:41 <clarkb> an even newer base OS and openstack 2023.1 or 2023.2
19:10:19 <frickler> did you discuss whether they might offer debian or ubuntu as base os?
19:10:23 <fungi> upgrading today would also get us more ip addresses, right?
19:10:29 <clarkb> given that I think we shouldn't upgrade until the more robust openstack configs are available. That means not upgrading right away. But then we have to decide if we want to upgrade twice or just once
19:10:56 <clarkb> fungi: oh yes a few more addresses since we would reduce the total number of subnets in use which helps with vrrp
19:11:21 <clarkb> frickler: yes that came up. It didn't sound like they have fully decided which base os would be used in the February update
19:11:34 <clarkb> frickler: but currently its all a variation of centos
19:12:25 <clarkb> we don't need to decide anything right now, but I think the decision we need to make is if we want to upgrade twice (once in november and once in february) or just once and if just once decide if we wait for february or do it in november ish
19:12:26 <fungi> sounds like they're more comfortable with red hat based platforms from a sheer familiarity standpoint but are considering alternatives due to recent changing winds in that regard
19:12:48 <clarkb> I don't think there is much value to doing three upgrades and doing an upgrade right now
19:13:05 <frickler> I would be fine with waiting until feb
19:13:55 <clarkb> considering we seem to lack time as a major resource I think I'm leaning that way too
19:14:52 <clarkb> other useful bits of info: we do have to shutdown the existing cloud to build a new one because we would recycle the same hardware. The cloud should be renamed to openmetal from inmotion particularly important on the grafana side so we may update that dashboard with a new name earlier
19:15:15 <clarkb> maybe something like "OpenMetal (formerly InMotion)" to reduce confusion
19:15:49 <clarkb> maybe think this over and we can put together a plan next week
19:16:10 <clarkb> and then relay that back to Yuriy as necessary to ensure we're all on the same page
19:16:13 <fungi> worth noting, the openstack feature freeze for their next release will be at the end of february, so that will probably be the busiest time for quota usage
19:16:34 <fungi> though the openmetal cloud being offline at that time wouldn't severely diminish available quota
19:16:50 <clarkb> its about 10% of our quota? maybe a little less
19:16:56 <fungi> right
19:16:58 <clarkb> probably noticeable but we should be able to limp along
19:17:04 <fungi> maybe a little more after the reprovision
19:18:08 <clarkb> alright please think it over during the next week and bring up any other concerns like the rlease schedule timing for openstack and we can put a rough plan in place soon
19:18:25 <clarkb> #topic Python Container Updates
19:19:15 <clarkb> The Gerrit update is complete. We are now running Gerrit on bookworm with java 17
19:19:15 <clarkb> I haven't noticed any problems, but please say something if you notice anything off or weird
19:19:15 <clarkb> In theory GC performance is much improved under java 17 so we should see Gerrit being more responsive
19:19:19 <clarkb> #link https://review.opendev.org/q/(+topic:bookworm-python3.11+OR+hashtag:bookworm+)status:open
19:19:50 <clarkb> There are some other containers that need updates though. Within OpenDev we have lodgeit and gear. I think both of those should be good to go. In lodgeit's case I had to do some python3.11 porting
19:20:09 <clarkb> On the zuul side I updated zuul-registry to simply move to python3.11 and stay on bullseye for now
19:20:45 <clarkb> My hope is that we can get everything to python3.11 and then drop python3.9 and python3.10 builds on both bullseye and bookworm. And that will be a big impact on the total number of images we need to juggle
19:20:49 <clarkb> not as great as having just bookworm but still a big improvement
19:21:10 <clarkb> if we want to do lodgeit today I should be around to keep an eye on it
19:21:29 <corvus> seems reasonable; i looked a bit at the zuul-registry issue, and i didn't see a quick fix.  it could be a problem in the future.
19:21:35 <clarkb> and gear is something we don't use anymore so should be fine wheneever
19:21:42 <corvus> (i mean, it's a problem now; could be a bigger one later)
19:22:14 <clarkb> corvus: ya I too looked at rehash and decided this would require actually undersanding openssl internals which felt like a pain. One crazy idea I had was to use golang's sha256sum implementation instead since it too is resumable
19:22:14 <corvus> (so if anyone wants to explore python/c libssl bindings, there's a good puzzle for you!)
19:22:38 <corvus> good idea
19:22:41 <clarkb> but then you are shuffling bits around between python and go instead of python and C and that may be more painful
19:22:52 <fungi> i take it pyca/cryptography doesn't have resumable hash primitives
19:24:01 <corvus> not that i have found
19:24:16 <clarkb> cryptography does allow you to make copies of hash objects in order to get intermediate results. But what we really need is the ability to serialize the objects in that state and I don't know that they support this
19:24:41 <fungi> right, openssl lets you get intermediate results too
19:24:49 <fungi> (even in v3 i mean)
19:25:04 <fungi> but i also didn't see anywhere that it allowed exporting that state
19:26:05 <clarkb> https://github.com/stevvooe/resumable is the golang thing I found which is deprecated and says stdlib can do it directly
19:26:56 <clarkb> appears to do something similar to pickling the hash object then unpickles it later
19:27:28 <clarkb> in any case reviews on those outstanding changes are helpful and I'm happy to help shepherd things through with approvals once reviewed
19:27:36 <clarkb> #topic Etherpad 1.9.3
19:28:13 <clarkb> Pretty sure I mentioned this last week? But tonyb ran into similar cache related problems with 1.9.3 so I'm wary of upgrading prior to the PTG. My big concern is that people won't use etherpad until the PTG then it won't work due to cache issues
19:28:29 <clarkb> For this reason I think we defer this upgrade until after the PTG and then peopel can clear caches when not in a rush to attend sessions
19:28:41 <fungi> fwiw, i don't believe this is the first time i've seen persisted cookies/storage/state get confused after an etherpad upgrade
19:28:57 <fungi> but i agree avoiding that right before a ton of users swamp the system is probably wise
19:29:03 <frickler> +1
19:29:14 <tonyb> +1
19:29:28 <clarkb> fungi: ya I think we've hit it at least once before
19:29:36 <clarkb> #topic Gitea 1.21
19:29:52 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/897679 A test change for gitea 1.21.0-rc1
19:30:12 <clarkb> I went ahead and got this started since the template updates tend to be annoying and not need many updates after the first RC
19:30:43 <clarkb> there is still no changelog but this change found at least one problem we will need to address: Gitea by default requires rsa keys to be 3072 bits or longer in 1.21
19:31:00 <clarkb> The key that trips over this is the gerrit replication key
19:31:20 <clarkb> In the change I've disabled key length checking, but I think we should also take a todo to rotate the key out with something larger or with another key type
19:31:23 <fungi> which has been the default size ssh-keygen emits for a while, just not as long as when we generated that key
19:32:03 <clarkb> to do that we should be able to prestage the new key alongside the old key in gitea then add it to the gerrit user's home dir and remove the old key from gerrit
19:32:19 <clarkb> but we need to check that the key isn't used anywhere else before removing it
19:32:29 <fungi> and yeah, we could either rotate the key before upgrading, or upgrade with the restriction disabled, rotate after, and then remove the override
19:32:58 <clarkb> We can probably wait for the changelog to exist before decidign just in case there is any extra info they have to share
19:33:10 <fungi> replacing the key first might be less work, but could delay the upgrade
19:33:24 <fungi> and i have no real preference either way
19:33:53 <clarkb> #topic Gerrit Replication File Leaks
19:34:16 * fungi pops some popcorn for this one
19:34:30 <fungi> any new plot twists?
19:34:52 <clarkb> I was really hopeful that this bugfix would be in place by the time we updated Gerrit to bookworm. Unfortunately I got feedback late last week for a redesign on the approach. I'm a bit frustrated by that because I am not responsible for the approach in the current code base I'm merely patching it to stop elaking files (which my change does successfully do with three test
19:34:54 <clarkb> cases...)
19:35:35 <clarkb> I spent Friday trying to make the suggested approach work with minimal refactoring. I'm missing something clearly important in the handling of the events because I can get it to work when replication actually does need to happen but if you filter out the project using the replication config I'm now leaking more files
19:35:52 <fungi> who new trying to fix one bug was going to result in you being the new maintainer?
19:36:18 <clarkb> one step forward two back sort of deal. The latest thing I'm attempting is getting eclipse going so that I can mroe easily use a debugger to understand that new behavior. Unfortunately I broke eclipse in about 30 seconds
19:36:43 <clarkb> I now have like 5 different eclipse workspace and two separate gerrit clones... I think I'm understanding eclipse better but its been interesting
19:36:55 <clarkb> If anyone understands eclipse and wants to walk through it with me I would be grateful :)
19:37:24 <clarkb> If that doesn't go anywhere I'll fallback to jdb which is not my favorite debugger but should be able to do what I need it to do
19:38:02 <corvus> System.out.println is my fav
19:38:09 <clarkb> I'm speedrunning java developer setup
19:38:33 <fungi> you should twitch stream that
19:38:34 <tonyb> and we're enjoying the show
19:38:44 <tonyb> I'd watch!
19:38:46 <clarkb> corvus: oh that is the other issue. The test suite appaers to eat the replication logs. Normal error logs are emitted though so I've thought about a mass replacement of the replication log calls with normal log calls so that I can see what is happening
19:39:00 <corvus> oO
19:39:23 <clarkb> I think it is writing them to a virtual in memory disk location that goes away when the test finishes
19:39:38 <clarkb> similar to how w do it with python except they aren't capturing the contents so they can be given to you on completion
19:39:51 <fungi> yeah, gerrit plugins have independent logging, right? i vaguely recall configuring that somewhere
19:39:52 <clarkb> whereas the normal log goes to stdout
19:39:58 <clarkb> fungi: the replication plugin does
19:40:51 <clarkb> anyway I haven't given up. I have some avenues to explore and will continue to poke at them as time permits
19:41:13 <clarkb> #topic Re-evaluating the meeting time
19:42:05 <clarkb> frickler mentioned that now that ianw is no longer attending the meetings we might be able to reschedule to be friendlier to other timezones. One problem with that is I'd like to continue to encourage tonyb to participate and moving the meeting earlier may make that difficult for tonyb
19:42:22 <clarkb> that said, it is still pretty early for tonyb  and maybe tonyb would rather catch up asynchronously anyway?
19:42:30 <clarkb> thoughts?
19:42:43 <fungi> i'm open to different times, though i do have (rather a lot of) other meetings on tuesdays. up to 4 other meetings besides this one depending on how the weeks/months fall
19:43:12 <tonyb> selfishly I'd like to keep the meeting at a time I can attend
19:43:19 <clarkb> when frickler brought the idea up to me I think we both decided that 1700-1800 UTC would work for us
19:43:22 <fungi> and some of my tuesday meetings follow daylight/summer time shifts, which add to the complexity
19:43:43 <clarkb> tonyb: ok that is good to know. I assume 1900 UTC is about as early as is practical for you too?
19:43:48 <tonyb> and with AU having done DST it's 6am which is about as early as I can do
19:44:31 <tonyb> correct
19:44:53 <tonyb> the winter 5am block was hard for me to get to
19:45:16 <frickler> hmm, o.k., so let's stick to this time, then
19:45:19 <clarkb> another approach would be to shift the meeting so that it is more convenient for australia and europe and those of us in north maerica take the hit
19:45:24 <tonyb> we could alternate, but that sucks for different reasons
19:45:33 <fungi> i guess there are better au/de overlap times which are like 3am for some folks in the us
19:45:43 <clarkb> I would need to look at a map and do some maths to figure out when ^ those times are
19:46:25 <fungi> can we shrink the planet, or turn it at a different speed to fix this?
19:46:43 <fungi> wait, no, quantum tunneling
19:46:49 <tonyb> we could all move to NZ?
19:46:56 <clarkb> I hear it is nice there
19:47:04 <fungi> i would find that an acceptable solution as well
19:47:06 <corvus> best suggestion so far
19:47:35 <clarkb> but ya no easy solutions. Lets keep it at 19:00 for now and I can look at a map and see if anything jumps out as a good alternative while also enabling those who which to attend time to attend
19:47:46 <fungi> i'm okay with alternating times too
19:48:15 <fungi> spread the pain around
19:48:20 <corvus> fungi: you'd have to adjust your meeting orrery
19:48:30 <tonyb> I can do some research and dump it to an etherpad
19:48:34 <clarkb> my concern with alternating times is that they are already so few of us that we might end up with two meetings with significantly fewer people
19:48:39 <frickler> well the EU/AU overlap would be around 6 UTC likely
19:49:06 <clarkb> frickler: that is approximately midnight where I am and 3am where fungi is so ya fungi's numebr above seems spot on
19:49:28 <fungi> 0600z is 2am local for me at the moment, right. in about a month that will change to 1am
19:49:46 <clarkb> clearly we should do more brainstorming. At least now we know where the interest is and can do more accurate brainstorming
19:50:03 <clarkb> feedback welcome if/when we have a great idea on how to make this better :)
19:50:08 <clarkb> #topic Open Discussion
19:50:23 <clarkb> I know fungi had one item to discuss before we end the meeting so jumping to open discussion now
19:50:48 <fungi> folks in the #openstack-kolla channel were talking about it being maybe nice if we could start mirroring the "osbpo" (debian openstack backports) package repository
19:51:06 <clarkb> functionally that repo is very similar to Ubuntu Cloud Archive but for Debian?
19:51:12 <frickler> yes
19:51:18 <fungi> i realized we have a mirror.deb-openstack volume in afs which is 5 years stale, the predecessor of osbpo essentially
19:51:31 <fungi> none of the files in there are of any use now
19:51:50 <clarkb> I have no objections to mirroring it. Maybe we can get the kolla team to modify configs for mirror.deb-openstack to write out the current content?
19:52:15 <fungi> size estimates for the whole of osbpo are around 22gb at the moment, but with reprepro we'd probably only mirror a fraction of that (like we don't care about openstack victoria packages for debian stretch)
19:52:47 <fungi> i expect the amount of data would come out about the same as uca, that order of magnitude anyway
19:53:10 <fungi> and yeah, i'll work with kolla folks on it, just making sure there were no immediate objections
19:53:17 <clarkb> none from me.
19:53:24 <frickler> if it is only for kolla, just bookworm and maybe bullseye would be enough
19:53:44 <tonyb> seems like a reasonable ask to me
19:53:58 <frickler> I fear this may end up as task for me, but no objection
19:54:13 <fungi> frickler: yeah, i was going to start suggesting openstack bobcat packages for bookworm and then adding on if they realize they need more
19:54:40 <frickler> ack
19:54:45 <fungi> okay, that's all i needed to confirm
19:54:49 <fungi> thanks!
19:55:10 <tonyb> frickler: I'm willing to help learn if you're willing to instruct.
19:55:22 <clarkb> Depending on how the next coupel of days go I may try to disappear on Friday. Kids have it randomly off (teacher inservice day or soething) and will attempt to take advantage. But I expect wednesday and thursday to be busy
19:55:27 <tonyb> assuming infra-root isn't a prerequisite
19:55:40 <clarkb> tonyb: root shouldn't be necessary for most of it
19:55:42 <fungi> i can also provide pointers. it's pretty straightforward since we don't need to make any changes to afs other than delete some old files out of the volume for our own sake
19:55:52 <clarkb> root may be required to delete the old files if reprepro can't be convinced to do it
19:56:19 <fungi> tonyb: it'll be 100% config files, maybe a root sysadmin running the initial mirror pulse without a timeout just to make sure it populates safely
19:56:46 <tonyb> that was my expectation just wanted to be clear
19:56:47 <fungi> but logs are also public
19:56:59 <fungi> so you can quickly see what's going wrong if something is
19:57:16 <frickler> on the topic of being away, I'll be mostly offline tomorrow
19:57:32 <fungi> thanks for the heads up
19:57:43 <fungi> also i'll be absent the first half of next week
19:58:03 <fungi> all things open runs sunday through tuesday so i'll be driving saturday and wednesday
19:58:15 <clarkb> fungi: have fun!
19:58:22 <tonyb> enjoy
19:58:23 <fungi> but i should be around again thursday/friday next week
19:58:54 <fungi> also i've got some family obligations that following weekend up into wednesday of ptg week, so will have limited availability then as well
19:59:40 <clarkb> and that is all we have time for. Feel free to continue discussion on the mailing list or in #opendev
19:59:45 <clarkb> Thank you everyoen for your time
19:59:47 <fungi> thanks clarkb!
19:59:49 <clarkb> #endmeeting