19:01:37 #startmeeting infra 19:01:37 Meeting started Tue Oct 10 19:01:37 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:37 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:37 The meeting name has been set to 'infra' 19:01:47 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/7WZKZSWIX2W3OHQFM6OZNJU54BUD4IIT/ Our Agenda 19:01:53 #topic Announcements 19:02:19 The OpenInfra PTG will take place October 23-27. Please keep this in mind when making changes to tools like etherpad and meetpad 19:02:49 #topic Mailman 3 19:02:55 and now we can dive right in to our agenda 19:03:09 The plan is still to migrate lists.openstack.org on Thursday starting around 15:30 UTC 19:03:24 yesterday I asked fungi if there aws anythign we should do to help prepare and sounded like it was well in hand? 19:03:30 still on track for maintenance thursday, yes. i sent a reminder to openstack-discuss on friday 19:04:02 i'm working on the changes we'll merge as part of the maintenance and will ping infra-root once they're ready for review today or worst case tomorrow 19:04:19 i'll also finish fleshing out the maintenance plan on the etherpad 19:04:27 sounds good 19:04:35 but it's mostly a copy/paste of the earlier maintenances with timelines adjusted 19:04:53 there are a few steps to do tomorrow, including initial rsync and checking dns record ttls 19:05:17 ping me if/when I can help and I can dig into things 19:05:25 will do 19:05:48 The other mailman3 item of note is we created a new mailing list through our automation otuside of a migration. That seems to ahve worked as welll 19:05:57 we expected it to as our testing covers that but always good to confirm 19:06:11 yes, the list owner confirmed they were able to create an account and got granted control of the settings 19:06:40 they do not yet seem to have posted anything though, empty archive 19:07:07 (i subscribed successfully when it was first created, and haven't seen any messages for it either, so no reason to think the archiving is broken) 19:07:49 ack 19:07:54 anything else mailman 3 related? 19:08:05 i have nothing 19:08:07 #topic Updating Our OpenMetal Cloud 19:08:16 #link https://etherpad.opendev.org/p/redeploying-opendevs-openmetal-cloud Notes from discussion with Yuriy 19:09:39 This was an informative discussion. There are 3 timeframes we can upgrade within which will produce different results. If we upgrade today we'll get a newer base OS. If we upgrade in about a month we'll get the same newer base os and the same openstack versiosn as before but with more robust configs. And finally if we upgrade early next year (sounded lik February) we'll get 19:09:41 an even newer base OS and openstack 2023.1 or 2023.2 19:10:19 did you discuss whether they might offer debian or ubuntu as base os? 19:10:23 upgrading today would also get us more ip addresses, right? 19:10:29 given that I think we shouldn't upgrade until the more robust openstack configs are available. That means not upgrading right away. But then we have to decide if we want to upgrade twice or just once 19:10:56 fungi: oh yes a few more addresses since we would reduce the total number of subnets in use which helps with vrrp 19:11:21 frickler: yes that came up. It didn't sound like they have fully decided which base os would be used in the February update 19:11:34 frickler: but currently its all a variation of centos 19:12:25 we don't need to decide anything right now, but I think the decision we need to make is if we want to upgrade twice (once in november and once in february) or just once and if just once decide if we wait for february or do it in november ish 19:12:26 sounds like they're more comfortable with red hat based platforms from a sheer familiarity standpoint but are considering alternatives due to recent changing winds in that regard 19:12:48 I don't think there is much value to doing three upgrades and doing an upgrade right now 19:13:05 I would be fine with waiting until feb 19:13:55 considering we seem to lack time as a major resource I think I'm leaning that way too 19:14:52 other useful bits of info: we do have to shutdown the existing cloud to build a new one because we would recycle the same hardware. The cloud should be renamed to openmetal from inmotion particularly important on the grafana side so we may update that dashboard with a new name earlier 19:15:15 maybe something like "OpenMetal (formerly InMotion)" to reduce confusion 19:15:49 maybe think this over and we can put together a plan next week 19:16:10 and then relay that back to Yuriy as necessary to ensure we're all on the same page 19:16:13 worth noting, the openstack feature freeze for their next release will be at the end of february, so that will probably be the busiest time for quota usage 19:16:34 though the openmetal cloud being offline at that time wouldn't severely diminish available quota 19:16:50 its about 10% of our quota? maybe a little less 19:16:56 right 19:16:58 probably noticeable but we should be able to limp along 19:17:04 maybe a little more after the reprovision 19:18:08 alright please think it over during the next week and bring up any other concerns like the rlease schedule timing for openstack and we can put a rough plan in place soon 19:18:25 #topic Python Container Updates 19:19:15 The Gerrit update is complete. We are now running Gerrit on bookworm with java 17 19:19:15 I haven't noticed any problems, but please say something if you notice anything off or weird 19:19:15 In theory GC performance is much improved under java 17 so we should see Gerrit being more responsive 19:19:19 #link https://review.opendev.org/q/(+topic:bookworm-python3.11+OR+hashtag:bookworm+)status:open 19:19:50 There are some other containers that need updates though. Within OpenDev we have lodgeit and gear. I think both of those should be good to go. In lodgeit's case I had to do some python3.11 porting 19:20:09 On the zuul side I updated zuul-registry to simply move to python3.11 and stay on bullseye for now 19:20:45 My hope is that we can get everything to python3.11 and then drop python3.9 and python3.10 builds on both bullseye and bookworm. And that will be a big impact on the total number of images we need to juggle 19:20:49 not as great as having just bookworm but still a big improvement 19:21:10 if we want to do lodgeit today I should be around to keep an eye on it 19:21:29 seems reasonable; i looked a bit at the zuul-registry issue, and i didn't see a quick fix. it could be a problem in the future. 19:21:35 and gear is something we don't use anymore so should be fine wheneever 19:21:42 (i mean, it's a problem now; could be a bigger one later) 19:22:14 corvus: ya I too looked at rehash and decided this would require actually undersanding openssl internals which felt like a pain. One crazy idea I had was to use golang's sha256sum implementation instead since it too is resumable 19:22:14 (so if anyone wants to explore python/c libssl bindings, there's a good puzzle for you!) 19:22:38 good idea 19:22:41 but then you are shuffling bits around between python and go instead of python and C and that may be more painful 19:22:52 i take it pyca/cryptography doesn't have resumable hash primitives 19:24:01 not that i have found 19:24:16 cryptography does allow you to make copies of hash objects in order to get intermediate results. But what we really need is the ability to serialize the objects in that state and I don't know that they support this 19:24:41 right, openssl lets you get intermediate results too 19:24:49 (even in v3 i mean) 19:25:04 but i also didn't see anywhere that it allowed exporting that state 19:26:05 https://github.com/stevvooe/resumable is the golang thing I found which is deprecated and says stdlib can do it directly 19:26:56 appears to do something similar to pickling the hash object then unpickles it later 19:27:28 in any case reviews on those outstanding changes are helpful and I'm happy to help shepherd things through with approvals once reviewed 19:27:36 #topic Etherpad 1.9.3 19:28:13 Pretty sure I mentioned this last week? But tonyb ran into similar cache related problems with 1.9.3 so I'm wary of upgrading prior to the PTG. My big concern is that people won't use etherpad until the PTG then it won't work due to cache issues 19:28:29 For this reason I think we defer this upgrade until after the PTG and then peopel can clear caches when not in a rush to attend sessions 19:28:41 fwiw, i don't believe this is the first time i've seen persisted cookies/storage/state get confused after an etherpad upgrade 19:28:57 but i agree avoiding that right before a ton of users swamp the system is probably wise 19:29:03 +1 19:29:14 +1 19:29:28 fungi: ya I think we've hit it at least once before 19:29:36 #topic Gitea 1.21 19:29:52 #link https://review.opendev.org/c/opendev/system-config/+/897679 A test change for gitea 1.21.0-rc1 19:30:12 I went ahead and got this started since the template updates tend to be annoying and not need many updates after the first RC 19:30:43 there is still no changelog but this change found at least one problem we will need to address: Gitea by default requires rsa keys to be 3072 bits or longer in 1.21 19:31:00 The key that trips over this is the gerrit replication key 19:31:20 In the change I've disabled key length checking, but I think we should also take a todo to rotate the key out with something larger or with another key type 19:31:23 which has been the default size ssh-keygen emits for a while, just not as long as when we generated that key 19:32:03 to do that we should be able to prestage the new key alongside the old key in gitea then add it to the gerrit user's home dir and remove the old key from gerrit 19:32:19 but we need to check that the key isn't used anywhere else before removing it 19:32:29 and yeah, we could either rotate the key before upgrading, or upgrade with the restriction disabled, rotate after, and then remove the override 19:32:58 We can probably wait for the changelog to exist before decidign just in case there is any extra info they have to share 19:33:10 replacing the key first might be less work, but could delay the upgrade 19:33:24 and i have no real preference either way 19:33:53 #topic Gerrit Replication File Leaks 19:34:16 * fungi pops some popcorn for this one 19:34:30 any new plot twists? 19:34:52 I was really hopeful that this bugfix would be in place by the time we updated Gerrit to bookworm. Unfortunately I got feedback late last week for a redesign on the approach. I'm a bit frustrated by that because I am not responsible for the approach in the current code base I'm merely patching it to stop elaking files (which my change does successfully do with three test 19:34:54 cases...) 19:35:35 I spent Friday trying to make the suggested approach work with minimal refactoring. I'm missing something clearly important in the handling of the events because I can get it to work when replication actually does need to happen but if you filter out the project using the replication config I'm now leaking more files 19:35:52 who new trying to fix one bug was going to result in you being the new maintainer? 19:36:18 one step forward two back sort of deal. The latest thing I'm attempting is getting eclipse going so that I can mroe easily use a debugger to understand that new behavior. Unfortunately I broke eclipse in about 30 seconds 19:36:43 I now have like 5 different eclipse workspace and two separate gerrit clones... I think I'm understanding eclipse better but its been interesting 19:36:55 If anyone understands eclipse and wants to walk through it with me I would be grateful :) 19:37:24 If that doesn't go anywhere I'll fallback to jdb which is not my favorite debugger but should be able to do what I need it to do 19:38:02 System.out.println is my fav 19:38:09 I'm speedrunning java developer setup 19:38:33 you should twitch stream that 19:38:34 and we're enjoying the show 19:38:44 I'd watch! 19:38:46 corvus: oh that is the other issue. The test suite appaers to eat the replication logs. Normal error logs are emitted though so I've thought about a mass replacement of the replication log calls with normal log calls so that I can see what is happening 19:39:00 oO 19:39:23 I think it is writing them to a virtual in memory disk location that goes away when the test finishes 19:39:38 similar to how w do it with python except they aren't capturing the contents so they can be given to you on completion 19:39:51 yeah, gerrit plugins have independent logging, right? i vaguely recall configuring that somewhere 19:39:52 whereas the normal log goes to stdout 19:39:58 fungi: the replication plugin does 19:40:51 anyway I haven't given up. I have some avenues to explore and will continue to poke at them as time permits 19:41:13 #topic Re-evaluating the meeting time 19:42:05 frickler mentioned that now that ianw is no longer attending the meetings we might be able to reschedule to be friendlier to other timezones. One problem with that is I'd like to continue to encourage tonyb to participate and moving the meeting earlier may make that difficult for tonyb 19:42:22 that said, it is still pretty early for tonyb and maybe tonyb would rather catch up asynchronously anyway? 19:42:30 thoughts? 19:42:43 i'm open to different times, though i do have (rather a lot of) other meetings on tuesdays. up to 4 other meetings besides this one depending on how the weeks/months fall 19:43:12 selfishly I'd like to keep the meeting at a time I can attend 19:43:19 when frickler brought the idea up to me I think we both decided that 1700-1800 UTC would work for us 19:43:22 and some of my tuesday meetings follow daylight/summer time shifts, which add to the complexity 19:43:43 tonyb: ok that is good to know. I assume 1900 UTC is about as early as is practical for you too? 19:43:48 and with AU having done DST it's 6am which is about as early as I can do 19:44:31 correct 19:44:53 the winter 5am block was hard for me to get to 19:45:16 hmm, o.k., so let's stick to this time, then 19:45:19 another approach would be to shift the meeting so that it is more convenient for australia and europe and those of us in north maerica take the hit 19:45:24 we could alternate, but that sucks for different reasons 19:45:33 i guess there are better au/de overlap times which are like 3am for some folks in the us 19:45:43 I would need to look at a map and do some maths to figure out when ^ those times are 19:46:25 can we shrink the planet, or turn it at a different speed to fix this? 19:46:43 wait, no, quantum tunneling 19:46:49 we could all move to NZ? 19:46:56 I hear it is nice there 19:47:04 i would find that an acceptable solution as well 19:47:06 best suggestion so far 19:47:35 but ya no easy solutions. Lets keep it at 19:00 for now and I can look at a map and see if anything jumps out as a good alternative while also enabling those who which to attend time to attend 19:47:46 i'm okay with alternating times too 19:48:15 spread the pain around 19:48:20 fungi: you'd have to adjust your meeting orrery 19:48:30 I can do some research and dump it to an etherpad 19:48:34 my concern with alternating times is that they are already so few of us that we might end up with two meetings with significantly fewer people 19:48:39 well the EU/AU overlap would be around 6 UTC likely 19:49:06 frickler: that is approximately midnight where I am and 3am where fungi is so ya fungi's numebr above seems spot on 19:49:28 0600z is 2am local for me at the moment, right. in about a month that will change to 1am 19:49:46 clearly we should do more brainstorming. At least now we know where the interest is and can do more accurate brainstorming 19:50:03 feedback welcome if/when we have a great idea on how to make this better :) 19:50:08 #topic Open Discussion 19:50:23 I know fungi had one item to discuss before we end the meeting so jumping to open discussion now 19:50:48 folks in the #openstack-kolla channel were talking about it being maybe nice if we could start mirroring the "osbpo" (debian openstack backports) package repository 19:51:06 functionally that repo is very similar to Ubuntu Cloud Archive but for Debian? 19:51:12 yes 19:51:18 i realized we have a mirror.deb-openstack volume in afs which is 5 years stale, the predecessor of osbpo essentially 19:51:31 none of the files in there are of any use now 19:51:50 I have no objections to mirroring it. Maybe we can get the kolla team to modify configs for mirror.deb-openstack to write out the current content? 19:52:15 size estimates for the whole of osbpo are around 22gb at the moment, but with reprepro we'd probably only mirror a fraction of that (like we don't care about openstack victoria packages for debian stretch) 19:52:47 i expect the amount of data would come out about the same as uca, that order of magnitude anyway 19:53:10 and yeah, i'll work with kolla folks on it, just making sure there were no immediate objections 19:53:17 none from me. 19:53:24 if it is only for kolla, just bookworm and maybe bullseye would be enough 19:53:44 seems like a reasonable ask to me 19:53:58 I fear this may end up as task for me, but no objection 19:54:13 frickler: yeah, i was going to start suggesting openstack bobcat packages for bookworm and then adding on if they realize they need more 19:54:40 ack 19:54:45 okay, that's all i needed to confirm 19:54:49 thanks! 19:55:10 frickler: I'm willing to help learn if you're willing to instruct. 19:55:22 Depending on how the next coupel of days go I may try to disappear on Friday. Kids have it randomly off (teacher inservice day or soething) and will attempt to take advantage. But I expect wednesday and thursday to be busy 19:55:27 assuming infra-root isn't a prerequisite 19:55:40 tonyb: root shouldn't be necessary for most of it 19:55:42 i can also provide pointers. it's pretty straightforward since we don't need to make any changes to afs other than delete some old files out of the volume for our own sake 19:55:52 root may be required to delete the old files if reprepro can't be convinced to do it 19:56:19 tonyb: it'll be 100% config files, maybe a root sysadmin running the initial mirror pulse without a timeout just to make sure it populates safely 19:56:46 that was my expectation just wanted to be clear 19:56:47 but logs are also public 19:56:59 so you can quickly see what's going wrong if something is 19:57:16 on the topic of being away, I'll be mostly offline tomorrow 19:57:32 thanks for the heads up 19:57:43 also i'll be absent the first half of next week 19:58:03 all things open runs sunday through tuesday so i'll be driving saturday and wednesday 19:58:15 fungi: have fun! 19:58:22 enjoy 19:58:23 but i should be around again thursday/friday next week 19:58:54 also i've got some family obligations that following weekend up into wednesday of ptg week, so will have limited availability then as well 19:59:40 and that is all we have time for. Feel free to continue discussion on the mailing list or in #opendev 19:59:45 Thank you everyoen for your time 19:59:47 thanks clarkb! 19:59:49 #endmeeting