Tuesday, 2023-10-10

clarkbhello19:00
clarkbtime for our weekly team meeting19:00
corvusaloha19:01
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Oct 10 19:01:37 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/7WZKZSWIX2W3OHQFM6OZNJU54BUD4IIT/ Our Agenda19:01
clarkb#topic Announcements19:01
clarkbThe OpenInfra PTG will take place October 23-27. Please keep this in mind when making changes to tools like etherpad and meetpad19:02
clarkb#topic Mailman 319:02
clarkband now we can dive right in to our agenda19:02
clarkbThe plan is still to migrate lists.openstack.org on Thursday starting around 15:30 UTC19:03
clarkbyesterday I asked fungi if there aws anythign we should do to help prepare and sounded like it was well in hand?19:03
fungistill on track for maintenance thursday, yes. i sent a reminder to openstack-discuss on friday19:03
fungii'm working on the changes we'll merge as part of the maintenance and will ping infra-root once they're ready for review today or worst case tomorrow19:04
fungii'll also finish fleshing out the maintenance plan on the etherpad19:04
clarkbsounds good19:04
fungibut it's mostly a copy/paste of the earlier maintenances with timelines adjusted19:04
fungithere are a few steps to do tomorrow, including initial rsync and checking dns record ttls19:04
clarkbping me if/when I can help and I can dig into things19:05
fungiwill do19:05
clarkbThe other mailman3 item of note is we created a new mailing list through our automation otuside of a migration. That seems to ahve worked as welll19:05
clarkbwe expected it to as our testing covers that but always good to confirm19:05
fungiyes, the list owner confirmed they were able to create an account and got granted control of the settings19:06
fungithey do not yet seem to have posted anything though, empty archive19:06
fungi(i subscribed successfully when it was first created, and haven't seen any messages for it either, so no reason to think the archiving is broken)19:07
clarkback19:07
clarkbanything else mailman 3 related?19:07
fungii have nothing19:08
clarkb#topic Updating Our OpenMetal Cloud19:08
clarkb#link https://etherpad.opendev.org/p/redeploying-opendevs-openmetal-cloud Notes from discussion with Yuriy19:08
clarkbThis was an informative discussion. There are 3 timeframes we can upgrade within which will produce different results. If we upgrade today we'll get a newer base OS. If we upgrade in about a month we'll get the same newer base os and the same openstack versiosn as before but with more robust configs. And finally if we upgrade early next year (sounded lik February) we'll get19:09
clarkban even newer base OS and openstack 2023.1 or 2023.219:09
fricklerdid you discuss whether they might offer debian or ubuntu as base os?19:10
fungiupgrading today would also get us more ip addresses, right?19:10
clarkbgiven that I think we shouldn't upgrade until the more robust openstack configs are available. That means not upgrading right away. But then we have to decide if we want to upgrade twice or just once19:10
clarkbfungi: oh yes a few more addresses since we would reduce the total number of subnets in use which helps with vrrp19:10
clarkbfrickler: yes that came up. It didn't sound like they have fully decided which base os would be used in the February update19:11
clarkbfrickler: but currently its all a variation of centos19:11
clarkbwe don't need to decide anything right now, but I think the decision we need to make is if we want to upgrade twice (once in november and once in february) or just once and if just once decide if we wait for february or do it in november ish19:12
fungisounds like they're more comfortable with red hat based platforms from a sheer familiarity standpoint but are considering alternatives due to recent changing winds in that regard19:12
clarkbI don't think there is much value to doing three upgrades and doing an upgrade right now19:12
fricklerI would be fine with waiting until feb19:13
clarkbconsidering we seem to lack time as a major resource I think I'm leaning that way too19:13
clarkbother useful bits of info: we do have to shutdown the existing cloud to build a new one because we would recycle the same hardware. The cloud should be renamed to openmetal from inmotion particularly important on the grafana side so we may update that dashboard with a new name earlier19:14
clarkbmaybe something like "OpenMetal (formerly InMotion)" to reduce confusion19:15
clarkbmaybe think this over and we can put together a plan next week19:15
clarkband then relay that back to Yuriy as necessary to ensure we're all on the same page19:16
fungiworth noting, the openstack feature freeze for their next release will be at the end of february, so that will probably be the busiest time for quota usage19:16
fungithough the openmetal cloud being offline at that time wouldn't severely diminish available quota19:16
clarkbits about 10% of our quota? maybe a little less19:16
fungiright19:16
clarkbprobably noticeable but we should be able to limp along19:16
fungimaybe a little more after the reprovision19:17
clarkbalright please think it over during the next week and bring up any other concerns like the rlease schedule timing for openstack and we can put a rough plan in place soon19:18
clarkb#topic Python Container Updates19:18
clarkbThe Gerrit update is complete. We are now running Gerrit on bookworm with java 1719:19
clarkbI haven't noticed any problems, but please say something if you notice anything off or weird19:19
clarkbIn theory GC performance is much improved under java 17 so we should see Gerrit being more responsive19:19
clarkb#link https://review.opendev.org/q/(+topic:bookworm-python3.11+OR+hashtag:bookworm+)status:open19:19
clarkbThere are some other containers that need updates though. Within OpenDev we have lodgeit and gear. I think both of those should be good to go. In lodgeit's case I had to do some python3.11 porting19:19
clarkbOn the zuul side I updated zuul-registry to simply move to python3.11 and stay on bullseye for now19:20
clarkbMy hope is that we can get everything to python3.11 and then drop python3.9 and python3.10 builds on both bullseye and bookworm. And that will be a big impact on the total number of images we need to juggle19:20
clarkbnot as great as having just bookworm but still a big improvement19:20
clarkbif we want to do lodgeit today I should be around to keep an eye on it19:21
corvusseems reasonable; i looked a bit at the zuul-registry issue, and i didn't see a quick fix.  it could be a problem in the future.19:21
clarkband gear is something we don't use anymore so should be fine wheneever19:21
corvus(i mean, it's a problem now; could be a bigger one later)19:21
clarkbcorvus: ya I too looked at rehash and decided this would require actually undersanding openssl internals which felt like a pain. One crazy idea I had was to use golang's sha256sum implementation instead since it too is resumable19:22
corvus(so if anyone wants to explore python/c libssl bindings, there's a good puzzle for you!)19:22
corvusgood idea19:22
clarkbbut then you are shuffling bits around between python and go instead of python and C and that may be more painful19:22
fungii take it pyca/cryptography doesn't have resumable hash primitives19:22
corvusnot that i have found19:24
clarkbcryptography does allow you to make copies of hash objects in order to get intermediate results. But what we really need is the ability to serialize the objects in that state and I don't know that they support this19:24
fungiright, openssl lets you get intermediate results too19:24
fungi(even in v3 i mean)19:24
fungibut i also didn't see anywhere that it allowed exporting that state19:25
clarkbhttps://github.com/stevvooe/resumable is the golang thing I found which is deprecated and says stdlib can do it directly19:26
clarkbappears to do something similar to pickling the hash object then unpickles it later19:26
clarkbin any case reviews on those outstanding changes are helpful and I'm happy to help shepherd things through with approvals once reviewed19:27
clarkb#topic Etherpad 1.9.319:27
clarkbPretty sure I mentioned this last week? But tonyb ran into similar cache related problems with 1.9.3 so I'm wary of upgrading prior to the PTG. My big concern is that people won't use etherpad until the PTG then it won't work due to cache issues19:28
clarkbFor this reason I think we defer this upgrade until after the PTG and then peopel can clear caches when not in a rush to attend sessions19:28
fungifwiw, i don't believe this is the first time i've seen persisted cookies/storage/state get confused after an etherpad upgrade19:28
fungibut i agree avoiding that right before a ton of users swamp the system is probably wise19:28
frickler+119:29
tonyb+119:29
clarkbfungi: ya I think we've hit it at least once before19:29
clarkb#topic Gitea 1.2119:29
clarkb#link https://review.opendev.org/c/opendev/system-config/+/897679 A test change for gitea 1.21.0-rc119:29
clarkbI went ahead and got this started since the template updates tend to be annoying and not need many updates after the first RC19:30
clarkbthere is still no changelog but this change found at least one problem we will need to address: Gitea by default requires rsa keys to be 3072 bits or longer in 1.2119:30
clarkbThe key that trips over this is the gerrit replication key19:31
clarkbIn the change I've disabled key length checking, but I think we should also take a todo to rotate the key out with something larger or with another key type19:31
fungiwhich has been the default size ssh-keygen emits for a while, just not as long as when we generated that key19:31
clarkbto do that we should be able to prestage the new key alongside the old key in gitea then add it to the gerrit user's home dir and remove the old key from gerrit19:32
clarkbbut we need to check that the key isn't used anywhere else before removing it19:32
fungiand yeah, we could either rotate the key before upgrading, or upgrade with the restriction disabled, rotate after, and then remove the override19:32
clarkbWe can probably wait for the changelog to exist before decidign just in case there is any extra info they have to share19:32
fungireplacing the key first might be less work, but could delay the upgrade19:33
fungiand i have no real preference either way19:33
clarkb#topic Gerrit Replication File Leaks19:33
* fungi pops some popcorn for this one19:34
fungiany new plot twists?19:34
clarkbI was really hopeful that this bugfix would be in place by the time we updated Gerrit to bookworm. Unfortunately I got feedback late last week for a redesign on the approach. I'm a bit frustrated by that because I am not responsible for the approach in the current code base I'm merely patching it to stop elaking files (which my change does successfully do with three test19:34
clarkbcases...)19:34
clarkbI spent Friday trying to make the suggested approach work with minimal refactoring. I'm missing something clearly important in the handling of the events because I can get it to work when replication actually does need to happen but if you filter out the project using the replication config I'm now leaking more files19:35
fungiwho new trying to fix one bug was going to result in you being the new maintainer?19:35
clarkbone step forward two back sort of deal. The latest thing I'm attempting is getting eclipse going so that I can mroe easily use a debugger to understand that new behavior. Unfortunately I broke eclipse in about 30 seconds19:36
clarkbI now have like 5 different eclipse workspace and two separate gerrit clones... I think I'm understanding eclipse better but its been interesting19:36
clarkbIf anyone understands eclipse and wants to walk through it with me I would be grateful :)19:36
clarkbIf that doesn't go anywhere I'll fallback to jdb which is not my favorite debugger but should be able to do what I need it to do19:37
corvusSystem.out.println is my fav19:38
clarkbI'm speedrunning java developer setup19:38
fungiyou should twitch stream that19:38
tonyband we're enjoying the show19:38
tonybI'd watch!19:38
clarkbcorvus: oh that is the other issue. The test suite appaers to eat the replication logs. Normal error logs are emitted though so I've thought about a mass replacement of the replication log calls with normal log calls so that I can see what is happening19:38
corvusoO19:39
clarkbI think it is writing them to a virtual in memory disk location that goes away when the test finishes19:39
clarkbsimilar to how w do it with python except they aren't capturing the contents so they can be given to you on completion19:39
fungiyeah, gerrit plugins have independent logging, right? i vaguely recall configuring that somewhere19:39
clarkbwhereas the normal log goes to stdout19:39
clarkbfungi: the replication plugin does19:39
clarkbanyway I haven't given up. I have some avenues to explore and will continue to poke at them as time permits19:40
clarkb#topic Re-evaluating the meeting time19:41
clarkbfrickler mentioned that now that ianw is no longer attending the meetings we might be able to reschedule to be friendlier to other timezones. One problem with that is I'd like to continue to encourage tonyb to participate and moving the meeting earlier may make that difficult for tonyb 19:42
clarkbthat said, it is still pretty early for tonyb  and maybe tonyb would rather catch up asynchronously anyway?19:42
clarkbthoughts?19:42
fungii'm open to different times, though i do have (rather a lot of) other meetings on tuesdays. up to 4 other meetings besides this one depending on how the weeks/months fall19:42
tonybselfishly I'd like to keep the meeting at a time I can attend 19:43
clarkbwhen frickler brought the idea up to me I think we both decided that 1700-1800 UTC would work for us19:43
fungiand some of my tuesday meetings follow daylight/summer time shifts, which add to the complexity19:43
clarkbtonyb: ok that is good to know. I assume 1900 UTC is about as early as is practical for you too?19:43
tonyband with AU having done DST it's 6am which is about as early as I can do19:43
tonybcorrect 19:44
tonybthe winter 5am block was hard for me to get to19:44
fricklerhmm, o.k., so let's stick to this time, then19:45
clarkbanother approach would be to shift the meeting so that it is more convenient for australia and europe and those of us in north maerica take the hit19:45
tonybwe could alternate, but that sucks for different reasons 19:45
fungii guess there are better au/de overlap times which are like 3am for some folks in the us19:45
clarkbI would need to look at a map and do some maths to figure out when ^ those times are19:45
fungican we shrink the planet, or turn it at a different speed to fix this?19:46
fungiwait, no, quantum tunneling19:46
tonybwe could all move to NZ?19:46
clarkbI hear it is nice there19:46
fungii would find that an acceptable solution as well19:47
corvusbest suggestion so far19:47
clarkbbut ya no easy solutions. Lets keep it at 19:00 for now and I can look at a map and see if anything jumps out as a good alternative while also enabling those who which to attend time to attend19:47
fungii'm okay with alternating times too19:47
fungispread the pain around19:48
corvusfungi: you'd have to adjust your meeting orrery19:48
tonybI can do some research and dump it to an etherpad19:48
clarkbmy concern with alternating times is that they are already so few of us that we might end up with two meetings with significantly fewer people19:48
fricklerwell the EU/AU overlap would be around 6 UTC likely19:48
clarkbfrickler: that is approximately midnight where I am and 3am where fungi is so ya fungi's numebr above seems spot on19:49
fungi0600z is 2am local for me at the moment, right. in about a month that will change to 1am19:49
clarkbclearly we should do more brainstorming. At least now we know where the interest is and can do more accurate brainstorming19:49
clarkbfeedback welcome if/when we have a great idea on how to make this better :)19:50
clarkb#topic Open Discussion19:50
clarkbI know fungi had one item to discuss before we end the meeting so jumping to open discussion now19:50
fungifolks in the #openstack-kolla channel were talking about it being maybe nice if we could start mirroring the "osbpo" (debian openstack backports) package repository19:50
clarkbfunctionally that repo is very similar to Ubuntu Cloud Archive but for Debian?19:51
frickleryes19:51
fungii realized we have a mirror.deb-openstack volume in afs which is 5 years stale, the predecessor of osbpo essentially19:51
funginone of the files in there are of any use now19:51
clarkbI have no objections to mirroring it. Maybe we can get the kolla team to modify configs for mirror.deb-openstack to write out the current content?19:51
fungisize estimates for the whole of osbpo are around 22gb at the moment, but with reprepro we'd probably only mirror a fraction of that (like we don't care about openstack victoria packages for debian stretch)19:52
fungii expect the amount of data would come out about the same as uca, that order of magnitude anyway19:52
fungiand yeah, i'll work with kolla folks on it, just making sure there were no immediate objections19:53
clarkbnone from me.19:53
fricklerif it is only for kolla, just bookworm and maybe bullseye would be enough19:53
tonybseems like a reasonable ask to me 19:53
fricklerI fear this may end up as task for me, but no objection19:53
fungifrickler: yeah, i was going to start suggesting openstack bobcat packages for bookworm and then adding on if they realize they need more19:54
fricklerack19:54
fungiokay, that's all i needed to confirm19:54
fungithanks!19:54
tonybfrickler: I'm willing to help learn if you're willing to instruct.19:55
clarkbDepending on how the next coupel of days go I may try to disappear on Friday. Kids have it randomly off (teacher inservice day or soething) and will attempt to take advantage. But I expect wednesday and thursday to be busy19:55
tonybassuming infra-root isn't a prerequisite 19:55
clarkbtonyb: root shouldn't be necessary for most of it19:55
fungii can also provide pointers. it's pretty straightforward since we don't need to make any changes to afs other than delete some old files out of the volume for our own sake19:55
clarkbroot may be required to delete the old files if reprepro can't be convinced to do it19:55
fungitonyb: it'll be 100% config files, maybe a root sysadmin running the initial mirror pulse without a timeout just to make sure it populates safely19:56
tonybthat was my expectation just wanted to be clear 19:56
fungibut logs are also public19:56
fungiso you can quickly see what's going wrong if something is19:56
frickleron the topic of being away, I'll be mostly offline tomorrow19:57
fungithanks for the heads up19:57
fungialso i'll be absent the first half of next week19:57
fungiall things open runs sunday through tuesday so i'll be driving saturday and wednesday19:58
clarkbfungi: have fun!19:58
tonybenjoy 19:58
fungibut i should be around again thursday/friday next week19:58
fungialso i've got some family obligations that following weekend up into wednesday of ptg week, so will have limited availability then as well19:58
clarkband that is all we have time for. Feel free to continue discussion on the mailing list or in #opendev19:59
clarkbThank you everyoen for your time19:59
fungithanks clarkb!19:59
clarkb#endmeeting19:59
opendevmeetMeeting ended Tue Oct 10 19:59:49 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:59
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-10-10-19.01.html19:59
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-10-10-19.01.txt19:59
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-10-10-19.01.log.html19:59
tonybthanks all19:59
fricklero/20:00

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!