19:01:02 <clarkb> #startmeeting infra
19:01:02 <opendevmeet> Meeting started Tue Apr 25 19:01:02 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:02 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:02 <opendevmeet> The meeting name has been set to 'infra'
19:01:10 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/UT4QRPDF3HXQNR3PTIJZR5MIP5WPOTAW/ Our Agenda
19:01:17 <clarkb> #topic Announcements
19:01:23 <clarkb> I don't have any announcements this week
19:01:41 <clarkb> #topic Topics
19:01:43 <clarkb> we can dive right in
19:01:52 <clarkb> #topic Migrating to Quay.io
19:02:12 <clarkb> A fair bit has happend on this since we last met
19:02:27 <clarkb> In the Zuul world changes are up to convert all of the zuul image publications to quay.io instead ofdocker hub
19:02:45 <clarkb> I think they all pass CI now too so Zuul should move soon and we'll want to updte all our docker-compose.yaml files to match when that happens
19:03:37 <clarkb> Piggybacking off of the hard work done in Zuul I've started doing some of the work for opendev as well. In particular I have copied our images from docker hub to quay.io to pull in old content. Most of our images only tag latest so for most things this isn't super necessary, but other images (like Gerrit) do tag versions.
19:03:48 <corvus> technically the zuul-operator doesn't pass, but i think the current issue is bitrot and there's only a small chance there's a container image issue lurking there.
19:04:10 <clarkb> I did skip four images that I don't plan to transfer over. opendevorg/bazel, opendevorg/grafana, opendevorg/jitsi-meet-prosody, and opendevorg/jitsi-meet-web since none of these are images we use today
19:04:27 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/881285 WIP change to convert an image to quay.io using new jobs.
19:04:40 <fungi> sounds good to me, thanks for working through that
19:04:46 <clarkb> This change is WIP and open for feedback. corvus already had some feedback which i need to address
19:05:25 <clarkb> corvus: did you see my message in #opendev earlier today? I'm tempted to update ensure-quay-repo to check for a registry-type flag on each image in container_images and if it is set to quay then the role would act on it otherwise it would skip. Then we can run the role in the opendev base job
19:05:56 <clarkb> the major downside to this is that opendev quay users would need to include the api_token. I suppose I can skip if that isn't defined either
19:06:30 <corvus> clarkb: ah missed that message, but i like the idea.  also, checking for the api_token is probably a good idea; i don't think we want to *require* the role :)
19:06:45 <clarkb> ok, I can work on that and that will result in changes to 881285 (and a new change)
19:06:55 <clarkb> I did test ensure-quay-repo this morning and it worked for me locally outside of zuul
19:07:34 <clarkb> The other thing to keep in mind for opendev image updates is that we should resync the images from docker hub to quay.io just before we update the zuul jobs to build he image. That way we have the latest and greatest on both sides before the pulication method changes
19:08:06 <clarkb> And I don't think we should try to do all images at once. It will be too much churn and potential broken stuff to debug. We should go an image or two at a time and ensure that we're deploying the image from its new home afterwards
19:08:38 <ianw> ++
19:08:51 <clarkb> I'll try to pick this back up again later today. I think it is realistic that at least some of our images are published to quay.io automatically before our next meeting
19:09:15 <clarkb> any other questions/concerns/comments on this effort?
19:10:42 <clarkb> #topic Bastion Host Updates
19:11:08 <clarkb> I think the backups changes still need reviews? Is any other infra root willing/able to look at those? I think there is value here but also getting it wrong is potnetialy dangerous so review is valuable
19:11:38 <ianw> yeah it only makes sense if we have enough people who want to hold bits of it
19:12:04 <fungi> i can, keep meaning to
19:12:11 <clarkb> thanks.
19:12:21 <fungi> what's the review topic again?
19:12:31 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups
19:12:39 <fungi> thanks
19:13:03 <clarkb> Anything else bastion related?
19:13:54 <ianw> not from me
19:14:07 <clarkb> #topic Mailman 3
19:14:17 <clarkb> fungi: any progress with the held node?
19:14:42 <fungi> sorry, nothing to report yet. i've started playing with a new held node, but spent the past week catching up on urgent things that piled up while i was on vacation
19:15:11 <clarkb> #topic Gerrit Updates
19:15:32 <clarkb> before more important Gerrit items I want to note we are still running hte most recent proper release :) but gerrit 3.8 seems imminent
19:15:42 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/879906 Gerrit also normalized indentation of config files we should consider this to be in sync
19:16:12 <clarkb> fungi: ^ is a WIP change. We had talked about modifying the normalizer to insert a comment to explain the format to users. Is that the main missing piece before we unwip this change?
19:16:43 <fungi> yeah, i guess so
19:17:01 <ianw> i did propose something for that
19:17:30 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/880898/ better document gerrit acl config format
19:17:30 <ianw> #link https://review.opendev.org/c/openstack/project-config/+/880898
19:17:49 <fungi> also can't recall if i mentioned it already, but i saw something recently in the gerrit upstream matrix channel about changes to how links are done with ui plugins in 3.8, so we'll need to make sure we test our gitea integration
19:17:53 <ianw> that i think helps generally.  if something is wrong, it should give you a clue what it is without having to read the source
19:18:05 <clarkb> looks like the stack needs a rebase too?
19:18:16 <ianw> oh something might have merged, i can rebase
19:18:27 <clarkb> thanks i think we can remove the wip from the parent change at that point too
19:18:38 <clarkb> then it is just a matter of reviewing and applying the changes
19:18:49 <fungi> it'll need a rebase for every single change that touches or adds an acl, so we probably need to just make a decision to merge, then rebase, then review while other changes are on hold
19:19:03 <clarkb> fungi: yes, that should be testable from our test instances since the links will sned you to valid opendev.org links
19:19:09 <clarkb> fungi: something worth checking when we do upgrade testing though
19:20:02 <ianw> fungi: you're also still wip on it
19:20:17 <ianw> oh sorry, clarkb mentioned that
19:20:26 <clarkb> The other Gerrit thing to call out is cleaning up 3.6 images and updating our upgrade job
19:20:40 <fungi> ianw: yeah, i had proposed it as a straw man seeking folks to convince me the benefits outweighed the negatives
19:20:46 <clarkb> I put this on the list of things that new contributors could do, but I think we should probably tackle this sooner than later. I can probably poke at this later this week
19:21:27 <fungi> i guess i'm convinced enough to un-wip since others see that balance tipping toward benefit
19:21:30 <ianw> ++ i might have some time too
19:21:52 <clarkb> fungi: ya having cleaner diffs in the future is my motivation and that seems worthwhile to me
19:22:16 <clarkb> The last Gerrit item I have is the replication plugin leaking all over the review_site/data dir
19:22:31 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk
19:22:47 <clarkb> This is where I started digging into the problem and while that will help a bit it won't fully solve the issue
19:23:17 <clarkb> I ended up reading a fair bit of the plugin code and that resulted in two upstream bugs
19:23:23 <clarkb> #link https://bugs.chromium.org/p/gerrit/issues/detail?id=16867
19:24:14 <clarkb> This first issue is related to trying to replicate All-Projects and All-Users. I'm pretty sure that what happens here is the replication plugin checks if it should replicate those repos after creating the waiting/ file on disk, but then short circuits when it shouldn't replicate without sending the finished event which cleans things up
19:24:21 <clarkb> #link https://bugs.chromium.org/p/gerrit/issues/detail?id=16868
19:25:47 <clarkb> This second one is related to the plugin filtering out refs that it shouldn't replicate on changes it would otherwise replicate. For example you update a change via the web ui ths updates three refs. Two of the three should be replicated but not the third. I think this creates a new file on disk that gets properly managed and deleted orphaning the original with all three refs
19:26:22 <clarkb> I think this siutation is also related to the massive number of errors we get on startup. In those cases I suspect that we are actually filtering all refs for replication and it confuses the state machine
19:26:46 <clarkb> My workaround change addresses the first issue and some ofthe second issue.
19:27:07 <clarkb> I think it is still worth landing to reduce mental overhead but it likely won't completely solve the noise problem on startup.
19:27:30 <clarkb> I am hoping someone upstream can chime in on whether or not my ideas for fixing these problems are sound but I haven't seen any responses yet
19:28:02 <clarkb> And after doing all that I'm reasonably sure that deleting the files is completelyfine and that the only harm here is the leaking on disk
19:28:11 <ianw> i'm guessing there's not a ton of activity on that plugin ... you might be the expert now :)
19:28:52 <clarkb> ya if I can find some time to set up a test instance (I'll likely hold one launched by zuul) and recompile etc I will try to fix it
19:29:02 <clarkb> just lots of other stuff going on now that I'm really certain this is almost harmless :)
19:29:05 <fungi> and also the people who do have an interest in it are mostly interested in privileged gerrit-to-gerrit replication
19:29:28 <clarkb> fungi: yup that was something I realized in debugging this. Seems like very few people use the plugin with the restrictions we have
19:29:44 <clarkb> That was all I had for Gerrit. Anything else?
19:31:04 <clarkb> #topic Upgrading Servers
19:31:22 <clarkb> Etherpad is now running on the new server. THe move went well and it stuck to the expected timeline
19:31:40 <clarkb> We have since removed etherpad01 from our inventory and dns. I shutdown the server via the nova api too.
19:31:53 <clarkb> At this point we're probably comfortable deleting the server and its cinder volume?
19:32:01 <fungi> fine by me, yeah
19:32:50 <ianw> me too, new server lgtm
19:32:58 <ianw> and we double checked the backups
19:32:59 <fungi> i've been using it heavily, no problems
19:33:02 <clarkb> great I'll try to get that done (probably tomorrow)
19:33:16 <clarkb> Nameserver replacement is also in progress
19:33:28 <clarkb> Looks like the change to deploy the new servers (but not use them as resolvers yet) has landed
19:33:34 <clarkb> #link https://etherpad.opendev.org/p/2023-opendev-dns
19:33:39 <ianw> yep they should be active today, i'll keep ontop of it
19:34:19 <clarkb> ianw: is there anything else you need at this point?
19:34:39 <ianw> nope, i'll let people know when we can change registry records
19:34:46 <clarkb> thanks!
19:35:11 <ianw> i am assuming gating.dev and zuul* on gandi are corvus?
19:35:12 <clarkb> I doubt I'll have time to take on any replacements this week, but I'd like to grab another one or two and push on them next week. Maybe jitsi meet or mirrors or something
19:35:45 <clarkb> ianw: I think so
19:35:46 <ianw> actually zuul-ci.org is different to zuulci.org
19:35:57 <fungi> the zuulci.org domain is likely the foundation
19:36:02 <ianw> at least the registrar is
19:36:14 <corvus> yep
19:36:52 <fungi> yeah, zuulci.org is using csc global for its registrar, which is who the foundation goes through
19:37:33 <fungi> any whois for our domains with "Registrar: CSC Corporate Domains, Inc." is pretty much guaranteed to be foundation staff coordinated
19:37:47 <clarkb> I did give the foundation registrar managers a heads up this was going to happen soon too. They are ready and willing to help us when the time comes
19:38:16 <fungi> having a clear list of the domains we want them to update will be key, of course
19:38:28 <clarkb> fungi: sounds like opendev.org and zuulci.org
19:38:45 <clarkb> gating.dev and zuul-ci.org are the other two domains hosted by our nameservers and corvus has control of those
19:39:11 <clarkb> Anything else realted to nameservers or replacing servers generally?
19:40:43 <ianw> nope
19:40:51 <clarkb> #topic AFS volume quotas
19:41:00 <clarkb> Good news! The usage seems to have stabilized?
19:41:08 <clarkb> it sstill quite high but I don't think it has budged since last week
19:41:18 <fungi> the universe has changed in your favor, collect three tokens
19:41:39 <ianw> i hope to get back to wheel cleanup
19:41:47 <ianw> and fedora upgrades
19:42:02 <clarkb> sounds good and ya this continues to not be an emergency but I'm trying to keep an eye on it
19:43:02 <clarkb> #topic Gitea 1.19
19:43:32 <clarkb> At this point I'm happy waiting for 1.19.2. One of ianw's bugs has been fixed and will be included in 1.19.2. The other is marked as should be fixed for 1.19.2 but hasn't been fixed yet
19:43:50 <clarkb> There are no critical features we are missing. 1.19 was largely cut to add their actions implementation from what I can see
19:44:02 <clarkb> If that changes we may accelerate and deploy 1.19.1 before .2
19:44:15 <clarkb> #topic Quo vadis Storyboard
19:44:18 <fungi> as an aside, are we any closer to being concerned about needing to switch to forgejo, or has that teapot stopped tempesting?
19:44:24 <clarkb> #undo
19:44:24 <opendevmeet> Removing item from minutes: #topic Quo vadis Storyboard
19:44:54 <clarkb> fungi: no, as far as I can tell upstream development is still happening in the open via losely oragnized maintainers that write down their release goals at the beginning of their dev cycles
19:45:07 <clarkb> No features have been removed. Licensing has remained the same. etc
19:45:24 <fungi> any idea if forgejo has hard-forked or is continuing to pull new commits from gitea?
19:45:43 <clarkb> I think their intention is to avoid a hard fork
19:45:54 <fungi> just wondering if it's diverging substantially over time, or sitting stagnant
19:46:13 <clarkb> fomr what I've seen (and I'm totally not an expert) its basically the same code base with a different name
19:46:28 <fungi> the less it becomes like gitea, the harder it will be to switch if we decide we want to
19:46:36 <fungi> hence my curiosity
19:47:21 <fungi> anyway, i can read up on it, just didn't know if anyone had seen any news on that front
19:48:04 <clarkb> looking at their release notes they seem to be making intermediate releases between gitea releases but following gitea releases overall
19:48:22 <clarkb> and ya double checking on that is probably reasonable. I just haven't seen anything indicating major differences yet
19:48:28 <clarkb> #topic Pruning vexxhost backup server
19:48:49 <clarkb> We've gotten a few emails about this recently. We are at 92% of capacity. I suspect adding some of these new servers hasn't helped either
19:49:37 <clarkb> I can probably run the pruning script tomorrow
19:49:46 <ianw> i think we're about monthly on pruning that, so it probably tracks
19:49:47 <clarkb> and then maybe we check relative disk usage compared to prior prunes
19:49:51 <clarkb> ah ok
19:50:44 <ianw> i can run it and watch it
19:50:59 <clarkb> ianw: if you like. I'm happ to do it tomorrow too (I just have enough on my plate for today)
19:51:22 <ianw> it would be interesting to track individually backup size
19:51:27 <clarkb> And with that we've made it to the end of our agenda (actually this last topic was added by me at the last minute)
19:51:29 <fungi> yeah, i can also do it in the background tomorrow probably. i guess whoever gets to it first just let everyone else know ;)
19:51:34 <clarkb> ++
19:51:37 <clarkb> #topic Open Discussion
19:51:39 <clarkb> Anything else?
19:52:10 <frickler> did you skip sb intentionally? not much to say probably anyway
19:52:30 <frickler> also I'll be away for 3 weeks starting thursday
19:52:32 <clarkb> frickler: oh you know whta I undid it and the nforgot it
19:52:35 <clarkb> thats on me
19:52:47 <ianw> (backup prune running in a screen now)
19:53:12 <clarkb> frickler: fungi: re sb I did want to aks about recent discussion around that. Is that mostly around the logistics of reflecting changes in projects within our project-config repo?
19:53:17 <fungi> ianw: noted, thanks!
19:54:10 <frickler> clarkb: the latest was just about removing it for retired projects
19:54:29 <fungi> clarkb: sb's database schema supports marking projects as "inactive" in order to not present them as autocompletion options and stuff, so i've been trying to remember to set that when projects retire, but should to an audit to see if any have been missed
19:54:50 <clarkb> got it. I guess it is a good idea to mark both retired projects that way as well as those that move elsewhere
19:55:01 <fungi> also removing "use-storyboard" and "groups" entries from retired projects in our gerrit/projects.yaml file if there are any
19:55:22 <frickler> another short note: the wheel builds are still broken, I think ianw has some patches up for that?
19:55:48 <frickler> current issue seems to be afs on centos once again
19:56:09 <fungi> on a similar note, there was at least one dib change to fix openeuler image builds, do we need a new dib release for that before we turn them back on, or did it already happen?
19:56:11 <ianw> i do have something up that will release the builds separately.  i'll try to take a look at the build failures
19:56:31 <frickler> fungi: another dib release will be needed iirc
19:56:31 <ianw> we would need a dib release and nodepool rebuild.  i can look at that
19:56:51 <ianw> i don't think we have anything else in the queue for dib
19:57:33 <ianw> if centos is failing that often means our images are out of sync and have different kernel versions to the headers available from the mirror
19:59:24 <frickler> oh, there's also the pypi org thing
19:59:39 <frickler> do we want to register at least an opendev org there?
19:59:49 <clarkb> I'll admit I haven't read it yet.
19:59:56 <clarkb> The trusted publisher thing kinda made my grumpy
20:00:43 <fungi> related, i registered an "opendev.org" user on pypi ("opendev" was already taken) in case we eventually want to replace the openstackci account there
20:01:03 <clarkb> we are officially at time
20:01:10 <fungi> thanks clarkb!
20:01:17 <clarkb> I'll end the meeting here but feel free to continue discussion in #opendev or on the mailing list
20:01:19 <clarkb> #endmeeting