19:01:25 #startmeeting infra 19:01:25 Meeting started Tue Sep 19 19:01:25 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:25 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:25 The meeting name has been set to 'infra' 19:01:34 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/ZBZMOM7RXD4AXORIXJO537T3YDOJPFTW/ Our Agenda 19:01:41 #topic Announcements 19:01:56 We are deep into the OpenStack release period and OpenStack elections end tomorrow 19:02:02 go vote if you haven't already 19:03:00 #topic Mailman 3 19:03:20 fungi: the plan is still for starlingx and openinfra to migrate on thursday at ~15:30 UTC? 19:03:41 #link https://etherpad.opendev.org/p/mm3migration maintenance plan starts at line 198 19:03:44 yes 19:04:04 tomorrow i'll prime the rsyncs and remind the relevant community managers/liaisons 19:04:10 but we're on track there 19:04:18 fungi: probably want to update dns records at the same time tomorrwo too? 19:04:21 (to reduce TTLs?) 19:04:33 i did the dns ttl adjustments early because one of the domains is in cloudflare 19:04:41 already crossed off the list 19:04:48 wanted to make sure i could still get to it 19:05:32 aha 19:05:32 this time we'll approve the change earlier ahead of the window so we don't end up starting late 19:05:55 #link https://review.opendev.org/895205 Move OpenInfra and StarlingX lists to Mailman 3 19:05:59 please review 19:06:17 i plan to approve it by 13:30 utc on thursday 19:06:46 sounds like a plan. Any other prep work besides reviewing that change we can help with? 19:07:05 it can technically be approved as far ahead as we like, but easier not to need to roll that back if we end up postponing for some reason 19:07:32 and avoids having the two lists of mailing lists diverge 19:07:42 though at this point I suspect we'd say please wait for the migration to complete before adding a new list 19:07:50 no remaining prep work for this window, though assuming it goes well i'll start planning for the openstack lists (15:30-19:30 utc on thursday 2023-10-12, a week after their release) 19:08:54 i've already jotted some preliminary notes for that one, and plan to split the server/config management deprovisioning to a separate change and put the old server in emergency disable to avoid any accidents 19:09:28 you can find the section for the final maintenance at the very end of the previously mentioned pad 19:09:55 sounds good, thank you for pushing this along 19:10:19 sure, looking forward to being done with it after what's been about a year of off-and-on effort from several of us 19:11:42 #topic Server Upgrades 19:11:44 nothing new here... 19:11:53 #topic Nodepool image upload situation 19:12:20 the timeout increase merged yeah? 19:12:31 yup yesterday 19:12:47 our image builds look good too. The thing we lack in the dashboard is a listing of how old images are in each cloud provider 19:13:00 but I think in a week we can do a nodepool image-list and check for any that are more than 7 days olver 19:13:06 *more than 7 days old 19:13:30 in semi-related news, cloudnull is back at rackspace and possibly has leverage/mandate to assign effort for fixing some of the problems we've observed 19:13:33 there are 6 and 13d old ones 19:14:04 frickler: the 13d images are the fedora ones 19:14:10 we'll need to clean up that dashboard to remove them I think 19:14:11 6d and 13d sounds perfect. like we'll be getting 0d and 7d tomorrow 19:14:18 oh you mean in the cloud sorry 19:14:47 all that to say early signs are this is working well enough for us again 19:15:06 but lets check back in again in a week and ensure that the complete set of changes is looking good 19:15:14 ack 19:15:43 we should probably also do another leaked upload cleanup to see if we keep getting more 19:16:42 fungi: good idea 19:16:47 yes, now would be a good time to see if the 6h timeout helps with that 19:17:09 i can try to find time for that later this week 19:18:14 thanks. Anything else nodepool related? 19:19:04 nothing i'm aware of 19:19:30 #topic Zuul PCRE deprecation 19:19:37 #link https://etherpad.opendev.org/p/3FbfuhNmIWT33fCFK1oK Draft announcement for OpenDev users (particularly OpenStack) 19:19:55 I think corvus was looking for feedback on this before sending it out. At this point I want to say we have chimed in so corvus you are probably good to send that when ready? 19:20:37 note I added some comments in the etherpad, so it shouldn't be sent as is 19:22:04 thank you everyone for reviewing that 19:22:06 I also tasked kopecmartin to look at the qa projects and started doing patches for OSC myself, those are the largest batches of warnings I saw 19:22:21 tripleo-ci repo had a lot of them too last I looked 19:22:29 or some of tha largest, yes 19:22:50 but tripleo was to be retired somehow? need to check the timeline for that 19:23:05 yeah, that's one to bring up with the tc 19:23:40 i wouldn't sink lots of effort digging into tripleo, we can just ask the tc how they want it handled 19:24:10 maybe it can be retired instead 19:24:27 I think that repo supports the stable branches they've kept open but in that case they should be able to fix it 19:24:32 either way we can come up with a solution 19:25:03 right. it's a question of whether they're keeping those open in light of the new "unmaintained" and opt-in or automatic eol resolution 19:25:39 #topic Python image updates 19:25:43 #link https://review.opendev.org/q/(+topic:bookworm-python3.11+OR+hashtag:bookworm+)status:open 19:26:26 I think I've decided that we should avoid the Gerrit update until after the openstack release. The two drivers for this are that Irealized there are plenty of other less impactful services that we can work on in the meantime (changes for some pushed up under that link) and Gerrit made a 3.7.5 release that we should also update to at the same time 19:26:50 that means our gerrit update will be a 3.7.4->3.7.5 upgrade, java 11 -> java 17 upgrade, and bullseye -> bookworm upgrade 19:27:07 we can split those up or go at once but either way enough changing that I think we should avoid the openstack release 19:27:32 sounds reasonable to me 19:27:33 In the meantime reviews on the other changes I've pushed are appreciated and I think I can approve and monitor those while we wait 19:28:00 note I've asked kopecmartin to weigh in on the refstack update too 19:28:43 I'm hopeful that we'll be able to drop bullseye and python3.9/3.10 image builds soon enoguh. Then we can look at adding 3.12 builds once that release occurs 19:28:46 but one step at a time :) 19:29:28 #topic Redeploying the InMotion/OpenMetal cloud 19:29:47 I sent email to yuriy to gather some initial information on what this involves. I cc'd fungi and frickler 19:30:25 Yuriy responded, but it wasn't imemdiately clear to me what a redeployment would be based on if we deployed today. Yuriy did talk about how in the new year they would be on 2023.1 or 2023.2 openstack. 19:30:54 Anyway I asked for further clarification and volunteered to set up time to meet and coordinate if necessary (I think that yuriy in particular finds the more synchronous interaction beneficial) 19:30:55 so currently they would be able to deploy yoga iiuc 19:30:57 in light of that, it might make sense to delay rebuilding for a few more months 19:31:22 ya I wanted more clarification on the base OS stuff before we decide 19:31:36 but if the base OS doesn't move forward much by redeploying today then a delay may be worthwhile 19:32:01 right, that's more the deciding factor than openstack version, since we can in-place upgrade openstack 19:32:29 Other items of note: we should call the new cloud openmetal not inmotion. They seem interested in continuing to help us successfully consume this hardware and service as well. Thank you openmetal for the help 19:33:35 but we wouldn't rename the cloud before the redeploy? other than possibly in grafana? 19:34:05 frickler: I think we can if it is important to them. It is a bit of a pain to do as we have to shutdown and startup a new provider in nodepool 19:34:10 yeah, i would say anything that's a significant lift in the renaming department should get wrapped into the redeployment 19:34:13 doable but if we can tie it into a redployment that will be easiest 19:34:33 we did something similar for internap->iweb though 19:35:05 also they want to rework the networking 19:35:34 seems currently we have 6 x /28, they'd provide a single /26 instead 19:35:38 easy things to rename we can do straight away because why not? and right if they express urgency then we can accommodate that 19:35:51 frickler: yup that should hopefully simplify thigns for us 19:36:14 yes, their platform previously could only allocate networks in /28 prefixes 19:36:27 but they've apparently overcome that design limitation now 19:36:35 (i wonder if they're any closer to ipv6) 19:36:40 It also sounded like we may need to do the self signed cert thing 19:36:55 kolla supports that so hopefuilly just a matter of setting the correct cars and having kolla rerun 19:37:17 overall though an encouraging first email trade. I'll try to keep the thread alive and drive it towards something we can make a plan off of 19:37:18 in 2023.2 kolla might support LE 19:37:48 thanks! 19:38:37 #topic Open Discussion 19:38:40 Anything else? 19:38:55 if we served the dns for the api endpoints in our opendev.org zone, we could probably wrangle our own le with our usual ansible 19:39:06 thanks for the suggestions on the email, i'll update the etherpad later 19:39:14 just missing the bit to inject that into kolla 19:39:36 just wanted to make sure we were all on board with the tone and the requested actions 19:39:47 yep, still lgtm 19:39:54 fungi: you'd just place the resulting cert file into the correct location in the kolla config 19:40:30 frickler: right, we're missing however we'd tie that file deployment into our ansible 19:40:57 i guess we'd put something in our inventory for wherever that is 19:41:34 ianw had started looking into that 19:41:49 seems like the linaro cloud could benefit from the same 19:41:50 basically a lighter weight base role application for systems where we don't want to manage firewalls and email and so on 19:42:00 yup it was the linaro cloud where he was proving this out 19:42:21 basically add our users and potentailly other lightweight stuff. I could see working some of the LE provisioning into that 19:42:30 speaking of LE I wonder if we can unfork acme.sh yet 19:43:20 Last call for any other items otherwise we can probably end a bit early today 19:43:59 thanks clarkb! 19:44:23 ack, thx all 19:45:06 #endmeeting