19:00:12 <clarkb> #startmeeting infra
19:00:12 <opendevmeet> Meeting started Tue Apr 16 19:00:12 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:12 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:12 <opendevmeet> The meeting name has been set to 'infra'
19:00:17 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/X5LH5DM5F4KOX5X2D2IGRGVJ5USFL3SZ/ Our Agenda
19:00:28 <clarkb> #topic Announcements
19:00:42 <clarkb> I wanted to call out this email fungi sent to the service-announce list
19:01:00 <clarkb> #link https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/4HUXROUE4ZRWLC6JFT5YF3GO6G3UULWL/
19:01:12 <fungi> i had fun pulling those numbers
19:01:15 <clarkb> Basically if you use ecdsa keys and putty you'll want to check that out as your private key material is potentially determinable
19:01:47 <clarkb> anything else to announce?
19:02:38 <fungi> not just any ecdsa, one specific set of nist parameters
19:02:44 <fungi> but yes
19:03:22 * fungi has nothing else to announce
19:03:45 <clarkb> #topic Upgrading Servers
19:04:04 <tonyb> no n progress.
19:04:06 <clarkb> I don't think there is anything new to report here, but it is worth noting that OpenStack and StarlingX have made major releases and the PTG has concluded
19:04:19 <clarkb> we should be good to start making changes with minimized impact to development cycles
19:04:37 <tonyb> I'll get back into the swing of things this week
19:04:38 <clarkb> #topic MariaDB Upgrades
19:04:41 <clarkb> ack
19:04:48 <clarkb> Etherpad, Gitea, Gerrit, and Mailman could use upgrades.
19:04:56 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/915183 Upgrade mailman3 mariadb to 10.11
19:05:01 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/911000 Upgrade etherpad mariadb to 10.11
19:05:21 <corvus> zuul runs on mariadb now and does not need upgrading :)
19:05:43 <clarkb> I believe that ansible and docker compose will automatically upgrade these db servers for us when we land these changes
19:05:47 <fungi> i think the mailman one could go in without much concern. i'm happy to monitor it as it deploys
19:06:02 <clarkb> if we don't want the upgrades to be automated we can put nodes in the emergency file and do it manually
19:06:24 <clarkb> fungi: ya I agree. Maybe we should just go ahead with that one and do similar with etherpad once we've got a better idea of what the 2.0.x upgrade path looks like?
19:06:49 <fungi> sure. we've already got one +2 on it, didn't know if anyone else wanted to look it over first
19:08:02 <clarkb> I guess any other reviewers can chime in during the meeting otherwise I think you're clear to proceed when ready
19:09:06 <fungi> thanks!
19:09:10 <clarkb> #topic AFS Mirror Cleanups
19:09:35 <clarkb> Now that the PTG is over I need to start pulling up existing xenial configs in zuul to figure out a course of action for cleaning up xenial
19:09:55 <clarkb> I suspect with this one we're going to have to set a hard date and then just accept zuul errors because there are a lot of tendrils in everything
19:11:01 <clarkb> once I've got a general sense of what needs to be cleaned up I'll try to write appropriate mailing list emails for those affected and then we can set a date and aim for that
19:11:11 <fungi> may need to merge some things bypassing errors too
19:11:48 <clarkb> #topic Building Ubuntu Noble Nodes
19:11:58 <clarkb> The other related item to cleaning old ubuntu up is adding new ubuntu
19:12:16 <clarkb> frickler has started some testing of this locally and discovered two problems. The first is glean needs some small updatse to be python3.12 ready
19:12:25 <clarkb> #link https://review.opendev.org/c/opendev/glean/+/915907 Glean updates for python3.12 support
19:13:01 <clarkb> The other is that debootstrap in our debian bookworm based nodepool-builder images is not new enough to build noble. I think this is "normal" and we have had to pull newer debootstrap from testing/unstable to accomodate prior ubuntu releases
19:13:13 <clarkb> frickler is there a change to do that bump yet?
19:13:30 <clarkb> I haven't seen one if so. Adding one would probably be a good idea
19:13:40 <frickler> no, I just created the missing symlink locally for me
19:14:07 <frickler> then there's also some infra elements that need updating, I hope I can do a patch for that tomorrow
19:14:28 <clarkb> sounds good, thank you for looking into that
19:16:03 <clarkb> #topic review gerrit service troubles
19:16:48 <clarkb> I haven't seen any word from mnaser or guillhermesp on why the server shutdown under us. At this point maybe we file this topic away and address it if we do get that info
19:17:17 <clarkb> However, on Sunday the service stopped responding again. This time the server itself was up and running but spinning its cpus and not getting any useful work done
19:17:46 <clarkb> I believe I tracked this down to a bad client and blocked them then restarted services and things have been happy since
19:18:08 <clarkb> We also upgraded Gerrit yesterday which brought in a bug fix for a potential dos vector
19:18:38 <frickler> so that was just a single IP address?
19:18:42 <clarkb> frickler: yes
19:19:28 <clarkb> as a side note when we rebuilt the 3.8 image we also rebuilt the 3.9 image which brought in fixes for the issues I was concerned about upon upgrading. We can probably start upgrade planning and testing now
19:21:20 <clarkb> I don't think there is anything we need to do related to gerrit at thsi moment. I just wanted to get everyone up to date on the issues we have had and point out we're in a position to begin upgrade testing and planning
19:21:29 <clarkb> #topic Project Renaming
19:21:53 <clarkb> Before we upgrade Gerrit we have a project rename request. We pencilled in April 19th as renaming day as it happens after the PTG and that happens to be this Friday.
19:22:30 <clarkb> Do we want to proceed with an April 19 renaming? If so we need to land https://review.opendev.org/c/opendev/system-config/+/911622 (or something like it ) and prep the record keeping changes
19:22:42 <clarkb> Oh and we need to decide on a time to do that so we can send an announcement
19:23:35 <clarkb> I'm happy to shepherd that stuff along but don't want to be the only one around on Friday if we proceed
19:24:04 <fungi> sounds good to me
19:24:19 <fungi> i can be available whenever you are
19:24:45 <clarkb> fungi: ok in that case 9am pacific is good for me. I think that is 1600 UTC. Lets announce 1600-1700 UTC as the window?
19:25:21 <clarkb> and I'll dedicate a chunk of tomorrow to getting everything prepared well in advance
19:25:57 <fungi> wfm
19:26:18 <clarkb> great I'll send an announcement later today
19:26:39 <clarkb> #topic Etherpad 2.0.x Upgrade
19:26:45 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/914119 WIP change for Etherpad 2.0.3
19:27:00 <clarkb> This change passes testing now which is a nice improvement.
19:27:25 <clarkb> The background on this is taht Etherpad made a 2.0.0 release that largely didn't chagne anything user facing and had everything to do with how you install and deploy etherpad using pnpm now
19:28:01 <clarkb> This resulted in dockerfile updates but was reasonably straightforward. Then before the PTG could end 2.0.2 was released and they removed support for APIKEY.txt based authentication and moved everything to oauth2.0
19:28:26 <fungi> so much for semver ;)
19:28:29 <clarkb> I filed a bug asking them to document how you can use the api like before and the result of that was new functioanlity in the oauth2 server to support client_credentials grants
19:28:55 <clarkb> The reason why that change above is a WIP is that this update (whcih does work for our purposes) is not in a release yet. I suspect that release will be 2.0.3 or 2.1.0
19:29:15 <clarkb> I also updated testing to cover the documented api tasks that we perform to ensure we can perform them via the new auth process
19:29:19 <clarkb> and I updated the docs
19:29:38 <clarkb> I do think this change is ready for review. I hope that when the release happens I can update teh git checkouts as the only updates to the change and we can upgrade
19:29:54 <fungi> thanks for solving that!
19:30:26 <clarkb> Given we don't know when that release will happen I think we can probably try to do the mariadb upgrade before we upgrade etherpad
19:30:46 <clarkb> I'll try to find time to watch that upgrade if no one has objections
19:30:49 <fungi> wfm
19:31:59 <clarkb> it was a fun one. I had to rtfs to find the api endpoints because even after the docs updates details like that were not mentioned. Then spent time reading the oauth2.0 rfc to figure out the client_credentials request flow
19:32:14 <clarkb> its actually fairly straightforward once you have that info, the hard part was discovering all the breadcrumbs myself
19:32:15 <fungi> use the source, luke
19:32:49 <clarkb> I'll get a held node up soon that we can use to test normal functionality hasn't regressed as well
19:33:46 <clarkb> #topic Gitea 1.21.11 Upgrade
19:33:53 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/916004 Upgrade Gitea to 1.21.11
19:34:02 <clarkb> Gitea made a release overnight
19:34:11 <clarkb> there are bug fixes in there that we should probably consider upgrading for
19:34:35 <clarkb> The templates we override did not change so we don't have any template updates either
19:35:20 <frickler> anything that might be related to the missing tags?
19:35:34 <clarkb> frickler: unfortunately I didn't see anything that looks related to that
19:36:41 <clarkb> #topic Open Discussion
19:36:43 <clarkb> Anything else?
19:37:27 <corvus> as alluded to earlier, zuul-db01 is running mariadb now and zuul is using that as its db
19:37:43 <clarkb> corvus: and the host is out of the emergencyfile?
19:37:46 <corvus> i've removed it from emergency, so we should consider it back in normal service
19:37:52 <clarkb> thanks!
19:38:05 <clarkb> the web ui works for me so this seems to be happy
19:38:14 <clarkb> corvus: we should probably plan cleanup of the trove db at some point?
19:38:18 <corvus> i did leave the mysql 8 files on disk; maybe we'll delete them next weekend/week?
19:38:31 <fungi> sounds good
19:38:35 <corvus> yes, maybe do both mysql 8 and trove cleanups at the same time?
19:38:41 <clarkb> wfm
19:39:19 <corvus> how about we action me on that, and i'll try to do it > fri  and < tues?
19:39:28 <fungi> i think i still need to clean up the old keycloak server too
19:40:14 <clarkb> #action corvus cleanup the zuul trove and mysql dbs
19:41:38 <clarkb> fungi: did you want an action on that too?
19:41:49 <fungi> sure
19:41:55 <frickler> /var/mariadb looks pretty full
19:42:05 <clarkb> #action fungi cleanup old keycloak server
19:42:42 <fungi> speaking of cleanup, openinfra foundation staff are looking at moving the openstack.org dns hosting from rackspace's dns service to cloudflare where they have the openinfra.dev domain hosted. we don't rely on it for much any more, but probably worth talking through. i was at least going to see about deleting any old records of ours we no longer need so we have a better idea of what's
19:42:43 <frickler> just looking at cacti
19:42:44 <fungi> still in there
19:43:04 <clarkb> frickler: agreed. The volume is mounted via lvm though so in theory we can add another and grow it. Or add a biger one, grow, then remove the old smaller one
19:43:24 <clarkb> fungi: ++ deleting old records is a great idea
19:43:27 <corvus> well, removing the mysql 8 data will give half the used space back
19:43:32 <corvus> if we're talking about zuul-db01
19:43:33 <clarkb> corvus: oh I see ++
19:43:59 <fungi> we can also pvmove extents from a smaller pv to a larger one in the same vg, if it becomes necessary
19:44:07 <clarkb> fungi: I also mentioned to them that ianw wrote a tool to dump the rax zones in zonefile format which I offered to provide them to simplify the move
19:45:18 <clarkb> Probably also worth noting that meetpad seems to have done well during the PTG. There were some reports of mic problems that apparently don't happen in other tools and some indicated they couldn't connect. But on the whole it worked well and performance was reasonable from what I could see
19:46:29 <fungi> some users also found the built-in noise cancellation option helpful to turn on, if their own mics didn't do a good enough job of it
19:46:37 <frickler> yes, not sure still what happened in the openeuler session
19:46:55 <clarkb> last call on any other items. Otherwise I'll end the meeting about 12 minutes early.
19:47:06 <fungi> frickler: diablo_rojo was going to check in with them to get details
19:47:54 <fungi> the edge wg session(s) used the recording feature, which seems to have worked out
19:48:19 <fungi> not sure if any other tracks recorded anything
19:48:29 <clarkb> I haven't seen any if so
19:49:25 <clarkb> sounds like that may be all. Thank you everyone! We'll be back next week at the same time and location.
19:49:29 <clarkb> #endmeeting