19:01:11 #startmeeting infra 19:01:11 Meeting started Tue Nov 7 19:01:11 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:11 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:11 The meeting name has been set to 'infra' 19:01:17 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/MRP4DFT7DBT56U56R6LCFHG7X36SS554/ Our Agenda 19:01:20 #topic Announcements 19:01:38 I believe that the majority (all?) of us have had DST start or end over the last month. Double check your meeting times :) 19:01:55 Related to that the OpenInfra Foundation Board meeting for November will start in 2 hours 19:03:02 also I'll be AFK November 10-13 (tahts Friday and Monday on both ends of the weekend) 19:03:44 #topic Mailman 3 19:04:10 All lists are now hosted on mailman 3, the mailman 3 services are upgraded to their latest versions, and the old mailman2 servers have been deleted 19:04:33 We're just about done with this item (thank you fungi!), but there was a django template parsing error during the upgrade we need to run down as we thought that was corrected 19:04:46 https://paste.opendev.org/show/bc7jfeZCt97fZm0dCPKw/ is the paste of that I pulled out of logs when the upgrade occurred 19:05:10 yeah, i need to check whether those show up in the log in zuul 19:05:13 it doesn't appear to be fatal (probably because we aren't relying on social media logins or similar functionaltiy in django so I think the bulk of the issue here is to understand why this happened 19:05:31 also whether it was only during the initial restart or whether it recurs 19:05:36 ++ 19:06:44 I think we can probably drop this off of next week's agenda 19:07:02 agreed 19:07:04 thanks again for getting this over the finish line fungi 19:07:06 #GreatSuccess 19:07:15 i just hope it keeps working 19:07:31 keep an ear to the ground for people talking about delivery issues 19:08:17 #topic Upgrading Servers 19:08:33 tonyb has started pushing on this for mirrors. 19:08:42 And I THink started investigating meetpad servers 19:08:51 tonyb: any concerns or items that need review etc? 19:09:33 Nope. I think the mirror servers are ready to launch new versions. I'm assuming that's paused due to 900220 19:09:43 I think meetpad will be pretty quick 19:10:17 after that it's just the hard ones, cacti, wiki, translate and storyboard 19:10:20 yup I think we go through the 900220 stuff and use this all as a good learning experience 19:10:34 let's move on. We'll discuss 900220 shortly 19:10:42 #topic Python container updates 19:10:50 Everything is running python3.11 except for zuul-operator 19:11:06 The reason for that is zuul-operator's k8s jobs haven't been working 19:11:26 dpawlik was poking at it and details ended up in https://review.opendev.org/c/zuul/zuul-operator/+/881245 and its depends on 19:11:56 We don't use the zuul-operator so I don't have a ton of context for this stuff. Despite that I've been meaning to try and page it in just haven't had time 19:12:06 I can work with dpawalik to get that all finished. 19:12:20 just in time to start talking about 3.12 ;) 19:12:28 \o/ 19:12:39 I think the short version is taht something with the way k8s is deployed there causes the operator to not function. dpawlik's changes addrss the k8s issues and now there is maybe a problem in zuul-operator itself that needs fixing 19:13:05 but ya start from that change its and depends on and once we can get it green then we should be good to land changes that update the python version for zuul-operator as well 19:13:34 ++ 19:13:42 #topic Gitea 1.21 19:13:58 I've left this item on the agenda because each week I think "this is the week there will be a release and changelog we can discuss" 19:14:04 unfortunately this week is not that week 19:14:31 I saw a message from one of the gitea maintainers on discord/matrix saying that the main release blocker at this point is the blog post. I think this must include writing up a change log because the change log doesn't exist yet 19:14:56 changelog-as-an-afterthought always baffles me 19:15:33 maybe next week will be the week :) 19:15:41 #topic Gerrit 3.8 Planning 19:15:52 #link https://etherpad.opendev.org/p/gerrit-upgrade-3.8 19:16:01 if others could look over that etherpad I think it is ready for review 19:16:55 Otherwise I think we are about as ready as we can be. We got the commentlink update in and restarted Gerrit 3.7 to ensure that is working as expected. The downgrade back to 3.7 is tested and the only issue we've found so far is related to a plugin bug in a plugin we don't use 19:17:17 898989 isn't marked as done, should be thought yeah? 19:17:34 we restarted onto it and manually tested 19:17:57 yup marked as done now 19:18:12 awesome, just wanted to be sure there wasn't anything outstanding there 19:18:41 as far as gerrit upgrades go this one seems to be an easy one (I've just jinxed it) 19:18:55 uncool man 19:18:56 feel free to review the chagne log as well to make sure I didn't miss anything 19:19:06 but I tried to put the important bits in the etherpad 19:19:22 yeah, seems to me like we're ready for maintenance day 19:19:35 ~1.5 weeks out? 19:19:39 which as a reminder is November 17, 2023 at 15:30 UTC 19:19:52 just shy of 10 days now 19:19:54 I actually failed to remember that I would be on standard time for that day so 15:30 UTC is a bit early for me 19:20:01 but I'll be fine, just get up a little early 19:20:13 07:30 pst i guess 19:20:49 yup 19:20:54 * tonyb will be around for the morning FWIW 19:20:55 I thought it was 8:30 am 19:20:59 i'm happy to run the maintenance if you want to focus on getting your tea steeped 19:21:00 tonyb: awesome 19:21:08 DST strikes again 19:21:17 fungi: cool we can decide when we get closer to the day of 19:21:21 wfm 19:22:20 #topic Adding tonyb to infra-root 19:22:42 rocketship emoji 19:22:48 we've had discussions about this outside of the meeting, but tonyb is willing to be adding to infra-root and help us out with even more stuff :) 19:22:50 #link https://review.opendev.org/c/opendev/system-config/+/900220 Will make it official 19:22:56 thank you tonyb! 19:23:02 Thank you all 19:23:10 yay! 19:23:20 I understand the level of trust that's being shown here 19:23:31 I apprecaite that 19:23:42 try not to give away the homeworld 19:23:45 feel free to pester me for access to things as you find you're missing something (we don't really have a checklist of everything) 19:24:23 the "plan" I've got here is we can approve this change after the meeting. Then I need to edit gerrit groups and some other things. Maybe tomorrow and/or thursday we can meet up and work through things like server boots and adding a gerrit admin account and so on 19:24:24 fungi: will do. It will be a slow process as my "comfort zone" increases. 19:24:43 yeah, there's no need to ask for access to stuff until you're ready to do something with it anyway 19:24:53 Sounds good. 19:25:05 it doesn't come up that often, because it doesn't change that often, but i do think a lot of the docs are mostly current: https://docs.opendev.org/opendev/system-config/latest/sysadmin.html#root-only-information 19:25:08 yup I mostly want to make sure we've given a reasonable base line of access so that you aren't in a weird spot of not being abel to say approve changes but can ssh into things 19:25:18 If you want to do that via meetpad or similar I can make sure I'm in a quiet place 19:25:39 tonyb: ya I was thinking a call like that then we can use shared screen sessions (gnu screen) to share context 19:25:52 perfect 19:26:35 also for stuff like the upcoming gerrit upgrade maintenance we explicitly start a screen session on the server so that other sysadmins can observe or participate as needed 19:27:22 (you'll see it called out in the maintenance plan) 19:27:25 Cool. I'll have to page in my gnu screen keybindings etc 19:27:47 I recently "switched" to tmux/tmate 19:28:06 for a long time we used screen because not all the systems (there were old centos systems for cgit) had tmux 19:28:12 and then we never switched 19:28:14 i've been using tmux personally for a decade or more, but still fall back on screen for some stuff it does better 19:28:38 these days though, about the only thing screen does better is connect to serial lines 19:28:56 I've got a usb to rs232 cable I use with screen :) 19:29:02 Yeah that was the only thing I really notcied 19:29:04 bingo 19:29:20 #topic Open Discussion 19:29:27 That was it for the posted agenda, is there anything else? 19:29:55 reminder that there's an openinfra foundation board of directors meeting in 2.5 hours 19:30:04 #link https://board.openinfra.dev/en/meetings/2023-11-07 19:30:04 1.5 I think 19:30:19 1.5, yep, i can't count 19:30:25 21:00 utc 19:30:41 yup I've got lunch then that consuming my next 3.5 hours or whatever the scheudled time is 19:30:54 spoiler: the budget discussion will probably have nice things to say about our work 19:31:15 \o/ 19:31:27 also there's discussion of upcoming bylaws changes, updating the diversity and inclusion wg's charter, and use of ai in code contributions 19:31:45 something for everyone 19:31:49 ooo that could be fun 19:31:52 yup I think it will be one where there is a lot of interesting content which isn't always the case 19:32:31 as long as you can make it through the first 15 minutes of rollcall 19:32:40 LOL 19:33:23 oh hi 19:33:39 heh 19:33:44 just a heads up that we merged a nodepool change that is having a small performance impact 19:33:52 corvus: is this the ssh keyscanning state machine change? 19:33:59 yep 19:34:11 I keep meanign to look at what motivated that 19:34:17 i don't think it's user-visible, but i did notice some extra time-to-ready 19:34:25 and some extra launch retries 19:34:27 i have a fix up 19:34:27 seems like scan in a loop until good or timeout doesn't really need a proper state machine :) 19:34:39 clarkb: paralellization 19:34:57 we could only do 10 before; get 10 slow machines booting and everything stops 19:35:21 ah is that the size of our threadpool? 19:35:31 yep. and increasing threads pool workers was :( because it would 2x the threads thanks to paramiko 19:35:39 yeah, i guess you want to be able to have fewer active loops than node requests 19:35:47 so now it's N+1 instead of 2N 19:35:55 threads 19:36:17 got it 19:36:29 polling state machine architecture takes me back to my mud coding days 19:37:09 anyway, i don't think we need to revert or anything, and i'll be monitoring it. but wanted to bring it up so folks are aware. 19:37:31 thanks. I'll try to review that change (as well as rereview that one zuul error handling change) this afternoon either during or after the baord meeting 19:37:44 cool, thx :) 19:38:30 sounds like that may be everything. I'm going to hit +A on 900220 then go find lunch 19:38:44 thank you for your time today everyone and for all the help running these services 19:38:46 thanks! 19:38:48 #endmeeting