19:01:02 #startmeeting infra 19:01:02 Meeting started Tue May 9 19:01:02 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:02 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:02 The meeting name has been set to 'infra' 19:01:09 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/BQ5T6VULIAYPCN6LPWSEMA4XITIXTSZB/ Our Agenda 19:01:24 I didn't hvae any announcements. We can dive right into our topics 19:01:28 #topic Migrating to Quay 19:01:38 A bunch of progress has been made on this since ~Friday 19:01:44 #link ttps://etherpad.opendev.org/p/opendev-quay-migration-2023 Plan/TODO list 19:01:54 I wrote this plan / todo list document to keep everything straight 19:02:19 since then I've pushed like 25 changes and a good number of them have landed. At this point I think about 12/34 imgaes are being pushed and pulled from quay.io 19:02:39 That said late yesterday I realized we were pushing change tagsto quay.io which we didn't want to do 19:02:43 There are two potential fixes for this 19:02:49 #link https://review.opendev.org/c/opendev/system-config/+/882628 Don't run the upload role at all 19:03:07 In this one we modify our jobs to not run the upload role in the gate at all (the upload role in the gate is what is pushing those change tags) 19:03:13 #link https://review.opendev.org/c/zuul/zuul-jobs/+/882724 Upload role skip the push 19:03:28 In this option we modify the upload role to check the flag for promotion behavior and only push if the one that needs the push is selected 19:04:04 corvus: likes 882724 more. I do as well as it simplifies reorganizing all of the jobs as we go through this process. The changes to swap image publication over to quay.io are pretty minimal with this appraoch which is nice 19:04:42 If you have time to review both of these changes and weigh in that would be great. If we get one of them landed I can continue with my progress in migrating things (depending on which choice we choose I will need to do different things so don't want to move ahead right now) 19:04:53 the only thing about that is all the documentation now says "use build in check and gate" and then promote for the IR path 19:05:21 except then the main example of it doesn't do that :) 19:05:51 ianw: ya and I think we could possibly switch it over to that after the migration too. I'm mostly concerned that the amount of stuff to change to swap the images increases quite a bit if we go that route now. But both appraoches should work and I'm only preferring one for the amount of work it creates :) 19:06:27 but ya if we prefer the more explicit approach thats fine. I will just need to modify some changes and land the fixp change that ianw started 19:06:28 i think things changed once we accepted the idea of being able to switch the promote mechanism with a variable 19:07:13 i'm not sure it makes as much sense to switch that with a variable and then have to change the job structure too... 19:08:09 ya though it may be confusing to have two different jobs that behave identically if the flag is set 19:08:22 I can really go either way on this. 19:09:01 but making a decisions between one approach and the other is the next step in migrating so that we don't create more of a cleanup backlog. I can also do the cleanup once this is sorted out. Its only a few extra tags that can be removed by hand 19:09:12 arguably, we should probably not have the role switch but instead have the job switch which role it includes... but this is sprawling and complex enough that putting the logic in the role seems like a reasonable concession. 19:10:01 maybe ianw and fungi can weigh in on the two changes and we can take it from there? 19:10:11 sounds good 19:10:17 I'll use grafyaml's image as the confirmation whatever choice we make is functional 19:10:21 it is next in my list 19:11:22 anything else related to the container images moving to quay? 19:13:03 #topic Bridge Updates 19:13:08 #link https://review.opendev.org/q/topic:bridge-backups 19:13:20 This topic could still use an additional infra-root to sanity check it 19:13:36 Other than that I'm not aware of any changes to bridge. It seems to be happy 19:14:31 #topic Mailman 3 19:14:39 fungi: looks like you have some updates on this one 19:15:33 nothing exciting yet. new held node at 23.253.160.97 and the list filtering per-domain is working correctly by default, but there is a singular list domain name which gets displayed in the corner of the hyperkitty archive pages 19:16:03 think i've figured out how that's piped in through the settings, though it also impacts the domain name used for admin communications the server sends out 19:16:16 yes I don't think that is going to be configurable unfortunately 19:16:40 the current stack of changes tries to turn that default domain name into one which isn't one of the list domains, but i missed that setting i just found so it needs another revision 19:16:44 or at least not with current mailman 3. It has all of the info it needs to properly configure that though it would require updates to mailman to do so 19:17:36 well, or it needs more advanced django site partitioning, below the mailman services layer 19:17:50 I think that template is in mailman itself 19:18:14 postorius and hyperkitty are just delegating to django for that stuff, and this is bubbling up from django site info in settings and database 19:18:42 so creating the sites in django and having per-site settings files would probably work around it 19:19:07 but yes, using a singular settings.py is part of the problem, i think 19:19:34 mailman/hyperkitty/hyperkitty/templates/hyperkitty/*.html fwiw 19:19:37 we could possibly just hide that string displayed in the webui 19:19:59 to reduce confusion. though i still need to check whether it does the right thing with headers in messages 19:20:31 which, with the held node, means fishing deferred copies out of the exim queue 19:20:52 (and injecting them locally to start with) 19:21:06 hopefully i'll get to that shortly. i'm coming out of a dark tunnel of other work finally 19:21:30 anyway, i didn't have any real updates on it this week 19:21:42 let us know if/when we can help or review things 19:21:47 #topic Gerrit Updates 19:21:48 will do 19:21:54 #link https://review.opendev.org/c/openstack/project-config/+/882075 last little bit of ACL cleanup work 19:22:18 This change (and its child) are unmerged due to documentation/ease of use concerns. I like the suggestion of having a tox target for it 19:22:28 since we expect people to be able to do that generally to run local tooling 19:23:08 Not urgent but also something we can probably clear out of the backlog quickly if that works for others 19:24:04 would be easy to add a tox testenv to run that command in a followup change 19:24:13 i'll try to knock that out really quick 19:24:26 thanks! 19:25:21 for replication tasks leaking I haven't seen any movement on the issues I filed. I'm || that close to creating a proper discord account to join the server to try and have a conversation about it (I think the matrix bridge may be losing messages and they are discussing using discord for the community meeting next month after May's was effectively cancelled due to no googler 19:25:23 being present to let people into the google meet...) 19:25:42 Assuming I can get through some higher priority itmes I can also spin up a gerrit dev env again and try to fix it myself. 19:26:04 The good news is the growth isn't insane. Currently just less than 16k files 19:26:17 #link https://review.opendev.org/c/opendev/system-config/+/880672 Dealing with leaked replication tasks on disk 19:26:25 happy for feedback on my poor attempt at working around this locally too 19:26:53 And finally for Gerrit we should plan to restart the server on the new image once we move things to quay. That will also pick up the theming changes that were done for 3.8 upgrade prep 19:27:41 #topic Upgrading older servers 19:28:01 We've upgraded all of the servers we had been working on and need to pick up some new ones 19:28:05 #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes 19:28:18 mirror nodes, registry, meetpad, etc could all be done. 19:28:34 I'll be looking at this once quay is off my plate. Help very much welcome if you have time to do some as well 19:29:43 This also ties into the docker-compose stuff we hit last week 19:30:10 basicall if we can get all the container based services running on focal or newer it ooks like we can pretty safely switch from pip docker-compose to distro docker compose 19:30:28 * tonyb has some time to look at migrating some of the older servers 19:30:47 thereis also a newer docker compose tool written in go that we can switch to but it changes container names and other things so we need to be more careful with this one 19:31:36 tonyb: thanks! we can sync up on that and go through it. But one of the first thigns that can be done is updating the CI for a service to run against focal or jammy (jammy would be preferred) and ensure everything is working there. Then an infra root can deploy a new server and add it to inventory 19:31:50 tonyb: you should be able to do everything up to the point of deploying the replacement server and adding it to inventory 19:32:20 clarkb: sounds good 19:32:29 tonyb: system-config/zuul.d/system-config-run.yaml is the interesting file to start on that as it defines the nodesets for each service under testing 19:32:47 #topic OpenAFS disk utilization 19:33:17 in unexpected news utilization is down slightly from last week 19:33:29 this is a good thing. It isn't drastic but it is noticeable on the grafana dashboard 19:33:52 I also started the discussion about winding down fedora. Either just the mirrors for the disro or the test images entirely 19:34:44 Is there a timeline we'd like to hit to begin winding things down? 19:34:52 So far I haven't seen any objections to removing the mirrors. Some libvirt and fedora folks are interested in keeping the images to help ensure openstack works with fedora and new virtualiation stuff. But new virtualization stuff is being built for centos stream as well so less important I think 19:35:47 tonyb: I think I'll give the thread a bump later today and maybe give it anothe rweek for feedback just to be usre anyone with an important use case and/or willingness to help hasn't missed it. But then I suspect we can start next week if nothing on the feedback changes 19:36:08 My main concern is moving too quickly and someone missing the discussion. 2 weeks seems like plenty to avoid that problem 19:37:13 Okay. I have it on my TODO for today to raise awareness inside RH, as well 19:37:34 I think even if we just drop the mirrors that would be a big win on the opendev side 19:37:49 also if libvirt/fedora folks find it useful, then they clearly haven't been taking advantage of it for a while given it's not even the latest fedora any longer 19:37:49 rocky seems to do well enough without "local" mirrors since we don't run a ton of jobs on it and I think fedora is in a similar situation 19:38:13 but ya given the current situation I think we can probably go as far as removal. But we'll see where the feedback takes us 19:38:17 It can be a staged thing right 19:38:32 tonyb: yes we could start with mirror removal first as well. That won't solve fedora being 2 releases behind though 19:38:51 we can cleanup the mirrors and then work on winding up Fedora and/or focusing on stream 19:39:04 step one would be configuring jobs to not use the mirrors, then delete the mirrors, then $somethingelse 19:39:15 yup 19:39:16 okay, got it 19:39:27 if the objections to removing fedora images are really because they provide newer $whatever then i don't mind keeping it around, but that implies actually having newest fedora which we don't, and nobody but us seems to have even noticed that 19:40:25 we will free up 400 GB of disk doing that or about 10% 19:40:34 will have a big impact 19:41:04 fungi: Yup I agree, seems like a theoretical objection 19:41:29 #topic Quo vadis Storyboard 19:41:42 There continues to be a steady trickle of projects moving off of storyboard 19:42:13 I haven't seen evidence of collaboration around tooling for that. I think the bulk of moves are just creating a line in the sand and switching 19:42:16 which is fine I guess 19:42:30 nothing new to report on my end, but i have been making a point of switching projects to inactive and updating their descriptions to link to their new bug trackers. if anyone spots a "move off sb" change in project-config please make sure it's come to my attention 19:42:41 can do! 19:43:09 i comment in them once i've done any relevant post-deploy cleanup 19:43:20 just for tracking purposes 19:44:13 #topic Open Discussion 19:44:31 The wiki cert will need to be renewed. Historically I've done that with about 7 days remaining on it. 19:44:38 Apologies for the email it generates until then 19:45:42 thanks for handling that 19:46:03 it is our last remaining non le cert :/ 19:46:10 but one cert a year isn't so bad 19:47:14 Last call for anything else 19:49:20 Thank you everyone. genekuo tonyb feel free to ping me directly if I'm not reviewing things or if you have questions about where ou can help. I'm definitely feeling like I've got too many things to think about at once right now and I appreciate the help you have offered so don't want you to feel ignored 19:49:40 and with that I think we can end the meeting a little early 19:49:51 thanks again! 19:49:53 #endmeeting