19:01:02 <clarkb> #startmeeting infra
19:01:02 <opendevmeet> Meeting started Tue May  9 19:01:02 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:02 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:02 <opendevmeet> The meeting name has been set to 'infra'
19:01:09 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/BQ5T6VULIAYPCN6LPWSEMA4XITIXTSZB/ Our Agenda
19:01:24 <clarkb> I didn't hvae any announcements. We can dive right into our topics
19:01:28 <clarkb> #topic Migrating to Quay
19:01:38 <clarkb> A bunch of progress has been made on this since ~Friday
19:01:44 <clarkb> #link ttps://etherpad.opendev.org/p/opendev-quay-migration-2023 Plan/TODO list
19:01:54 <clarkb> I wrote this plan / todo list document to keep everything straight
19:02:19 <clarkb> since then I've pushed like 25 changes and a good number of them have landed. At this point I think about 12/34 imgaes are being pushed and pulled from quay.io
19:02:39 <clarkb> That said late yesterday I realized we were pushing change tagsto quay.io which we didn't want to do
19:02:43 <clarkb> There are two potential fixes for this
19:02:49 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/882628 Don't run the upload role at all
19:03:07 <clarkb> In this one we modify our jobs to not run the upload role in the gate at all (the upload role in the gate is what is pushing those change tags)
19:03:13 <clarkb> #link https://review.opendev.org/c/zuul/zuul-jobs/+/882724 Upload role skip the push
19:03:28 <clarkb> In this option we modify the upload role to check the flag for promotion behavior and only push if the one that needs the push is selected
19:04:04 <clarkb> corvus: likes 882724 more. I do as well as it simplifies reorganizing all of the jobs as we go through this process. The changes to swap image publication over to quay.io are pretty minimal with this appraoch which is nice
19:04:42 <clarkb> If you have time to review both of these changes and weigh in that would be great. If we get one of them landed I can continue with my progress in migrating things (depending on which choice we choose I will need to do different things so don't want to move ahead right now)
19:04:53 <ianw> the only thing about that is all the documentation now says "use build in check and gate" and then promote for the IR path
19:05:21 <ianw> except then the main example of it doesn't do that :)
19:05:51 <clarkb> ianw: ya and I think we could possibly switch it over to that after the migration too. I'm mostly concerned that the amount of stuff to change to swap the images increases quite a bit if we go that route now. But both appraoches should work and I'm only preferring one for the amount of work it creates :)
19:06:27 <clarkb> but ya if we prefer the more explicit approach thats fine. I will just need to modify some changes and land the fixp change that ianw started
19:06:28 <corvus> i think things changed once we accepted the idea of being able to switch the promote mechanism with a variable
19:07:13 <corvus> i'm not sure it makes as much sense to switch that with a variable and then have to change the job structure too...
19:08:09 <clarkb> ya though it may be confusing to have two different jobs that behave identically if the flag is set
19:08:22 <clarkb> I can really go either way on this.
19:09:01 <clarkb> but making a decisions between one approach and the other is the next step in migrating so that we don't create more of a cleanup backlog. I can also do the cleanup once this is sorted out. Its only a few extra tags that can be removed by hand
19:09:12 <corvus> arguably, we should probably not have the role switch but instead have the job switch which role it includes... but this is sprawling and complex enough that putting the logic in the role seems like a reasonable concession.
19:10:01 <clarkb> maybe ianw and fungi can weigh in on the two changes and we can take it from there?
19:10:11 <fungi> sounds good
19:10:17 <clarkb> I'll use grafyaml's image as the confirmation whatever choice we make is functional
19:10:21 <clarkb> it is next in my list
19:11:22 <clarkb> anything else related to the container images moving to quay?
19:13:03 <clarkb> #topic Bridge Updates
19:13:08 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups
19:13:20 <clarkb> This topic could still use an additional infra-root to sanity check it
19:13:36 <clarkb> Other than that I'm not aware of any changes to bridge. It seems to be happy
19:14:31 <clarkb> #topic Mailman 3
19:14:39 <clarkb> fungi: looks like you have some updates on this one
19:15:33 <fungi> nothing exciting yet. new held node at 23.253.160.97 and the list filtering per-domain is working correctly by default, but there is a singular list domain name which gets displayed in the corner of the hyperkitty archive pages
19:16:03 <fungi> think i've figured out how that's piped in through the settings, though it also impacts the domain name used for admin communications the server sends out
19:16:16 <clarkb> yes I don't think that is going to be configurable unfortunately
19:16:40 <fungi> the current stack of changes tries to turn that default domain name into one which isn't one of the list domains, but i missed that setting i just found so it needs another revision
19:16:44 <clarkb> or at least not with current mailman 3. It has all of the info it needs to properly configure that though it would require updates to mailman to do so
19:17:36 <fungi> well, or it needs more advanced django site partitioning, below the mailman services layer
19:17:50 <clarkb> I think that template is in mailman itself
19:18:14 <fungi> postorius and hyperkitty are just delegating to django for that stuff, and this is bubbling up from django site info in settings and database
19:18:42 <fungi> so creating the sites in django and having per-site settings files would probably work around it
19:19:07 <fungi> but yes, using a singular settings.py is part of the problem, i think
19:19:34 <clarkb> mailman/hyperkitty/hyperkitty/templates/hyperkitty/*.html fwiw
19:19:37 <fungi> we could possibly just hide that string displayed in the webui
19:19:59 <fungi> to reduce confusion. though i still need to check whether it does the right thing with headers in messages
19:20:31 <fungi> which, with the held node, means fishing deferred copies out of the exim queue
19:20:52 <fungi> (and injecting them locally to start with)
19:21:06 <fungi> hopefully i'll get to that shortly. i'm coming out of a dark tunnel of other work finally
19:21:30 <fungi> anyway, i didn't have any real updates on it this week
19:21:42 <clarkb> let us know if/when we can help or review things
19:21:47 <clarkb> #topic Gerrit Updates
19:21:48 <fungi> will do
19:21:54 <clarkb> #link https://review.opendev.org/c/openstack/project-config/+/882075 last little bit of ACL cleanup work
19:22:18 <clarkb> This change (and its child) are unmerged due to documentation/ease of use concerns. I like the suggestion of having a tox target for it
19:22:28 <clarkb> since we expect people to be able to do that generally to run local tooling
19:23:08 <clarkb> Not urgent but also something we can probably clear out of the backlog quickly if that works for others
19:24:04 <fungi> would be easy to add a tox testenv to run that command in a followup change
19:24:13 <fungi> i'll try to knock that out really quick
19:24:26 <clarkb> thanks!
19:25:21 <clarkb> for replication tasks leaking I haven't seen any movement on the issues I filed. I'm || that close to creating a proper discord account to join the server to try and have a conversation about it (I think the matrix bridge may be losing messages and they are discussing using discord for the community meeting next month after May's was effectively cancelled due to no googler
19:25:23 <clarkb> being present to let people into the google meet...)
19:25:42 <clarkb> Assuming I can get through some higher priority itmes I can also spin up a gerrit dev env again and try to fix it myself.
19:26:04 <clarkb> The good news is the growth isn't insane. Currently just less than 16k files
19:26:17 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk
19:26:25 <clarkb> happy for feedback on my poor attempt at working around this locally too
19:26:53 <clarkb> And finally for Gerrit we should plan to restart the server on the new image once we move things to quay. That will also pick up the theming changes that were done for 3.8 upgrade prep
19:27:41 <clarkb> #topic Upgrading older servers
19:28:01 <clarkb> We've upgraded all of the servers we had been working on and need to pick up some new ones
19:28:05 <clarkb> #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes
19:28:18 <clarkb> mirror nodes, registry, meetpad, etc could all be done.
19:28:34 <clarkb> I'll be looking at this once quay is off my plate. Help very much welcome if you have time to do some as well
19:29:43 <clarkb> This also ties into the docker-compose stuff we hit last week
19:30:10 <clarkb> basicall if we can get all the container based services running on focal or newer it ooks like we can pretty safely switch from pip docker-compose to distro docker compose
19:30:28 * tonyb has some time to look at migrating some of the older servers
19:30:47 <clarkb> thereis also a newer docker compose tool written in go that we can switch to but it changes container names and other things so we need to be more careful with this one
19:31:36 <clarkb> tonyb: thanks! we can sync up on that and go through it. But one of the first thigns that can be done is updating the CI for a service to run against focal or jammy (jammy would be preferred) and ensure everything is working there. Then an infra root can deploy a new server and add it to inventory
19:31:50 <clarkb> tonyb: you should be able to do everything up to the point of deploying the replacement server and adding it to inventory
19:32:20 <tonyb> clarkb: sounds good
19:32:29 <clarkb> tonyb: system-config/zuul.d/system-config-run.yaml is the interesting file to start on that as it defines the nodesets for each service under testing
19:32:47 <clarkb> #topic OpenAFS disk utilization
19:33:17 <clarkb> in unexpected news utilization is down slightly from last week
19:33:29 <clarkb> this is a good thing. It isn't drastic but it is noticeable on the grafana dashboard
19:33:52 <clarkb> I also started the discussion about winding down fedora. Either just the mirrors for the disro or the test images entirely
19:34:44 <tonyb> Is there a timeline we'd like to hit to begin winding things down?
19:34:52 <clarkb> So far I haven't seen any objections to removing the mirrors. Some libvirt and fedora folks are interested in keeping the images to help ensure openstack works with fedora and new virtualiation stuff. But new virtualization stuff is being built for centos stream as well so less important I think
19:35:47 <clarkb> tonyb: I think I'll give the thread a bump later today and maybe give it anothe rweek for feedback just to be usre anyone with an important use case and/or willingness to help hasn't missed it. But then I suspect we can start next week if nothing on the feedback changes
19:36:08 <clarkb> My main concern is moving too quickly and someone missing the discussion. 2 weeks seems like plenty to avoid that problem
19:37:13 <tonyb> Okay.  I have it on my TODO for today to raise awareness inside RH, as well
19:37:34 <clarkb> I think even if we just drop the mirrors that would be a big win on the opendev side
19:37:49 <fungi> also if libvirt/fedora folks find it useful, then they clearly haven't been taking advantage of it for a while given it's not even the latest fedora any longer
19:37:49 <clarkb> rocky seems to do well enough without "local" mirrors since we don't run a ton of jobs on it and I think fedora is in a similar situation
19:38:13 <clarkb> but ya given the current situation I think we can probably go as far as removal. But we'll see where the feedback takes us
19:38:17 <tonyb> It can be a staged thing right
19:38:32 <clarkb> tonyb: yes we could start with mirror removal first as well. That won't solve fedora being 2 releases behind though
19:38:51 <tonyb> we can cleanup the mirrors and then work on winding up Fedora and/or focusing on stream
19:39:04 <clarkb> step one would be configuring jobs to not use the mirrors, then delete the mirrors, then $somethingelse
19:39:15 <clarkb> yup
19:39:16 <tonyb> okay, got it
19:39:27 <fungi> if the objections to removing fedora images are really because they provide newer $whatever then i don't mind keeping it around, but that implies actually having newest fedora which we don't, and nobody but us seems to have even noticed that
19:40:25 <clarkb> we will free up 400 GB of disk doing that or about 10%
19:40:34 <clarkb> will have a big impact
19:41:04 <tonyb> fungi: Yup I agree, seems like a theoretical objection
19:41:29 <clarkb> #topic Quo vadis Storyboard
19:41:42 <clarkb> There continues to be a steady trickle of projects moving off of storyboard
19:42:13 <clarkb> I haven't seen evidence of collaboration around tooling for that. I think the bulk of moves are just creating a line in the sand and switching
19:42:16 <clarkb> which is fine I guess
19:42:30 <fungi> nothing new to report on my end, but i have been making a point of switching projects to inactive and updating their descriptions to link to their new bug trackers. if anyone spots a "move off sb" change in project-config please make sure it's come to my attention
19:42:41 <clarkb> can do!
19:43:09 <fungi> i comment in them once i've done any relevant post-deploy cleanup
19:43:20 <fungi> just for tracking purposes
19:44:13 <clarkb> #topic Open Discussion
19:44:31 <clarkb> The wiki cert will need to be renewed. Historically I've done that with about 7 days remaining on it.
19:44:38 <clarkb> Apologies for the email it generates until then
19:45:42 <fungi> thanks for handling that
19:46:03 <clarkb> it is our last remaining non le cert :/
19:46:10 <clarkb> but one cert a year isn't so bad
19:47:14 <clarkb> Last call for anything else
19:49:20 <clarkb> Thank you everyone. genekuo tonyb feel free to ping me directly if I'm not reviewing things or if you have questions about where ou can help. I'm definitely feeling like I've got too many things to think about at once right now and I appreciate the help you have offered so don't want you to feel ignored
19:49:40 <clarkb> and with that I think we can end the meeting a little early
19:49:51 <clarkb> thanks again!
19:49:53 <clarkb> #endmeeting