19:01:14 #startmeeting infra 19:01:14 Meeting started Tue Mar 7 19:01:14 2023 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:14 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:14 The meeting name has been set to 'infra' 19:01:15 o/ 19:01:20 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/3ATGLS7XQ2Z2TUC7W4U2747T2XPBR2WL/ Our Agenda 19:01:27 #topic Announcements 19:01:54 I didn't think to put this on the agenda but the next virtual PTG and OpenStack release are just a couple/few weeks away 19:02:08 OpenStack release is March 22 and PTG is March 27-31 19:02:17 Things to be aware of as we're making changes 19:03:11 #topic Bastion Host Updates 19:03:18 #link https://review.opendev.org/q/topic:bridge-backups 19:03:38 I've reviewed this stack of changes and I think it is ready. That said there were two things I wanted to bring up 19:04:08 First: are we comfortable sticking the private key in the backup itself (fungi might have opinions on this). The alternative is storing it with our portion of the passphrase andkeeping it more secret 19:04:41 this is the private key for encrypting the backup? 19:04:55 Second: how do we want to assign passphrase portions and ensure that everyone has taken the appropriate one? THis only works if we all save our bit and don't all save the same one 19:05:10 fungi: correct. The private key that encrypted the backup and is itself protected by the split up passphrase 19:05:10 yes it's a trade off between loosing the key, or having the wrong one stored 19:05:45 for distributing the key, i can just generate one on bridge and put usernames against it. when people have stored it safely, we can delete everything 19:05:53 the usual way shamir is done is for each party to generate and provide part of they key, then there's no risk that people hold onto the wrong parts 19:05:54 ianw: that method works for me 19:06:54 but yeah, i'm fine with either way 19:07:32 fungi: the ssss tool has a 128 character encoding limit. so we are encoding a 128 mixed case password to the private key 19:09:05 I just wanted to make sure other reviewers considered the private key risk there and were ok with it. 19:09:22 I think the plan of annotating with usernames and then everyone indicating when they've captured their bit should work well 19:10:20 Anything else bridge related or should we move on? 19:10:29 not from me 19:10:55 #topic Mailman 3 19:11:08 fungi: ny progress creating the sites in django? 19:11:24 nope. maybe if unrelated things will stop breaking ;) 19:11:46 hopefully with the nodepool provider issues almost behind us i can get back to it 19:12:01 #topic Gerrit Updates 19:12:10 cool we can contineu on then 19:12:42 ianw has created a number of changes and a plan for updating our Gerrit ACLs to avoid items that are deprecated in 3.7 (and won't be accepted in new acl updates) 19:12:50 #link https://review.opendev.org/q/topic:gerrit-s-r-3.7 Cleaning up deprecated copy conditions in project ACLs 19:13:19 oh my upload may have broken the topic on some changes, i'll put them all back 19:13:35 there are a couple of axis in this. The first is the vote copy conditions need updating as well as converting to submit requirements. THe other axis is updating regular project ACLs vs All-Projects 19:14:10 It would be good for others to review this due to its potential for broad impact. Keep in mind the docs are a bit clunky however ianw has tested on a held node to try and undersatnd behaviors better 19:14:24 I think things are looking good at this point though and its just a matter of applying the updates 19:14:27 #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/message/ZOI6Z3M3K45FCX2CNZ6DYM5QBYZBDFP4/ 19:14:32 is a summary too 19:14:40 I do need to rereview the latest updates though 19:15:13 Once this is done we're basically ready to start planning a Gerrit 3.7 upgrade too 19:16:22 https://104.130.253.50/c/x/test-project/+/3 is All-Projects with s-r's 19:16:57 ianw: looks like that is a 3.7.1 host. Should we apply All-Projects updates to 3.6.4 too? 19:17:32 the little view conditions drop down is maybe a bit detail specific but cool to have that info at the change level 19:17:44 that is also there in 3.6 as well 19:18:06 neat 19:18:44 i can apply it to 3.6 if you like, but i don't think in this regard anything is different 19:18:54 thank you for putting this together. The docs definitely didn't make it easy and probably led to more experimenting than should be necessary 19:19:29 ianw: oh! should we be doing any updates to our CI jobs? I already converted Verified to submit requirements there. But we could update All-Projects for code-review too? 19:20:05 yeah, currently in CI we don't set all-projects, so by default we don't have the workflow label 19:20:49 I don't think its super urgent but that might be a good way to get ongoing coverage of this 19:20:54 i wonder if it would make sense to put some of that in all-projects 19:21:19 oh, yes that's what you're pointing out 19:21:21 i guess the problem is that it's either editing what's there, or maintaining a complete all-projects for each release 19:21:37 hrm ya since we already get a precanned version 19:21:44 at the moment we just get the gerrit default version 19:22:08 which gives us perfect "bootstrapability" -- but are we ever going to actually start with a compeltely fresh gerrit? 19:22:16 if the worst happened, we'd restore from backup anyway 19:22:34 another thought is that I think code-review is officiall defined with function = MaxWithBlock. Maybe we should ask upstream about changing that to a submit requirement and if that will impact any potential future migrations? 19:23:02 ianw: ya I think its mostly about having the test env align with prod as much as possible. Less about automating things 19:23:33 But its also not urgent considering we haven't yet needed to do that 19:23:39 we'll likely continue to be fine as is 19:24:19 i've fiddled to much now, i feel like the 3.7 started with code-review as a SR in it's pre-canned All-Projects 19:24:38 ianw: ah ok so there must already be a migration? 19:25:01 Or maybe they don't migrate it and expect you to update it by hand before/after 3.7. Some understanding of that might not be a bad idea 19:25:14 actually no, looking at the history of All-Projects 19:25:16 +[label "Code-Review"] 19:25:16 + function = MaxWithBlock 19:25:22 I'm just worred we'll get to doing our upgrade and it will break beacuse the migration tool doesn't know how to reconcile what we've got and the new thing 19:25:28 Initialized Gerrit Code Review 3.7.1-1-ga57c4bf868-dirty 19:25:45 ya ok so they haven't updated it internally yet, but eventually will 19:26:03 we should see if they have a plan for that and if converting ahead of time is in conflict with that plan 19:26:30 being 100% like production, i think the cla's might be a problem 19:26:56 (making testing All-Projects 100% like production, i mean) 19:27:01 because we would need to add the clas to testing too? 19:27:43 yeah, i think so 19:27:45 + copyCondition = changekind:NO_CHANGE OR changekind:TRIVIAL_REBASE OR is:MIN 19:28:06 at least that explains where i copied the NO_CHANGE copy-condition -- that's the pre-canned 3.7 copyConditions 19:28:10 for code-review 19:28:41 but ya if others can look it over to make sure we aren't changing behaviors or doing anything unexpected that would be great 19:28:57 I also wanted to call out the results of the gerrit community meeting 19:29:27 For Java 17 they are in no rush to drop Java 11. THis means we can stick to Java 11 for now and wait for upstream to fix the issues rather tahn deploying annoying workarounds 19:29:41 Sounds like part of what drives this is the version of java Google wants to deploy with and that isn't expected to change soon 19:29:55 For the ssh thing I was told they would accept a docs update for the secret config option 19:30:08 #link https://gerrit-review.googlesource.com/c/gerrit/+/362054 document secret Gerrit config option 19:30:24 And NasserG would try to test ianw's fix for the underlying issue in their env which has this problem a lot more than ours 19:30:53 tangentially related, sometime i want to pick the brains of people who have messed with automated gerrit setup for our deployment testing and zuul quickstart tests in order to figure out how to get un-stuck with updating the gerrit version we test git-review against, last i tried i was running up against problems bootstrapping user accounts (i think), maybe it was specifically 19:30:55 This ended up being fairly productive for us. I'll make an effort to continue attending that meeting each month (it is at 8am pacific on the first thursday of a month) 19:30:55 administrator bootstrapping for project creation 19:31:51 fungi: the zuul quickstart stuff is pretty easy toread though. Probably more so than our system-config stuff since the zuul deployment is far more focused 19:32:09 (we're trying to make something that has attributes similar to our prod setup to check things we care about whereas zuul quickstart is just give me a gerrit) 19:32:24 I want to say the trick is the development flag 19:33:13 yeah, i want to say there were things the git-review tests need which the zuul quickstart doesn't, and that's where i was stuck, but i need to page all that back in again 19:33:15 we've got a submitted feature addition to support something in newer gerrit we currently can't run the tests for because of being unable to boottrap newer gerrit for it 19:33:38 oh, right, setting develop mode broke some things about auth 19:33:50 huh ssh should still work as normal 19:33:59 we do a lot of ssh with dev mode in the system-config job 19:34:11 #link https://review.opendev.org/849419 Upgrade testing to Gerrit 3.4.4 19:34:35 i'll follow up after the meeting trying to see if i can find where i got stuck with it 19:34:43 #topic Upgrading Old Servers 19:35:07 gitea05-08 are now all remvoed from gerrit replication and not behind the gitea haproxy 19:35:16 #link https://review.opendev.org/c/opendev/system-config/+/876470 Remove old giteas from config management 19:35:30 This change will remove those servers from configuration management so that we can start to delete the old servers 19:36:09 For next steps we need to decide how many (if any) additional larger gitea servers we need in the cluster. To help figure this out I've disabled gitea01-04 via the haproxy command socket 19:36:35 So far the four new servers have been holding up. But we may need or want additional servers anyway. I'm open to feedback on that. 19:36:45 The other next step is moving gitea backups from gitea01 to gitea09 19:37:02 #link https://review.opendev.org/c/opendev/system-config/+/876471/ should do that 19:37:51 Good progress here overall. Just a couple of chagnes to land and then decisions on the remaining old servers and if they need to be replaced. Please let me know what you think if you get a chance to look at this 19:38:12 The other two tpes of servers on my radar for prioritizing replacements are the nameservers and etherpad 19:38:38 I think I'll do etherpad after the PTG. Since etherpad is heavily relied upon during the PTG I don't want to be rushed or find jammy's kernel makes nodejs sad or something 19:38:52 It should be a quick one too. Deploy new server. Take downtime, move database, update dns, done 19:39:15 ianw: I think you mentioned being able to look at nameservers. Have you had a chance to start that et? 19:39:36 I think the main thing here is going to be coordinating the chagnes in DNS itself? And thats somethign I need to page back in before I start 19:39:42 I always forget who has what records 19:40:00 not yet sorry, sniped by iterations on gerrit stuff. i think it's probably one we want to pre-plan with a checklist, i can come up with that 19:40:14 ++ 19:40:34 definitely a multistep process that we need to take and possibly one with hold points to ensure DNS out in the wild isn't relying on stale nameservers 19:40:50 #topic AFS quotas 19:41:08 I noticed yesterday a number of afs volumes are nearing their quota limits 19:41:15 in particular debian-security is down to a couple hundred MB 19:41:37 ubuntu-ports and one (or both?) of the centos volumes are also close. 19:41:50 Should I take some time to increase the debian-security one by say 20GB today? 19:43:41 I don't hear any objections so I'll try to get that done before it creates problems 19:44:00 ++ 19:44:14 i think we determined there wasn't much we could drop 19:44:30 ya I poked around and didn't find nything obvious at least 19:44:48 I'm a little less worried about the others but I should look at them and give them similar small bumps if appropriate too 19:44:56 still somewhat curious where the extra utilization has cropped up 19:45:34 for centos it seems they add packages without always removing packages 19:45:43 for example there are several gigabytes of just thunderbird packages 19:45:58 I suspect similar may happen with debian for security updates? 19:46:11 where they don't remove the security updates from the security mirror once they add them to regular mirrors 19:46:20 the debian increases are a bit odd, those typically get folded into point releases eventually 19:46:58 just wondering why it would have shot up in the past few months 19:47:33 lots of security updates that they don't actually remove when they add them to the point releases? 19:47:37 thats my hunch 19:49:02 But ya its a bit annoying things just grow forever even on old stable platforms 19:50:23 Anyawy I'll try to get that done today. I've got an errand to run and school pickup and ther eis an openinfra foundation board meeting so I may end up getting to it tomorrow 19:50:31 #topic Scheduling project renames 19:51:45 fungi: you wanted to discuss this. I think we should wait for the openstack release and ptg to complete but then we're probably wide open? 19:51:59 (this assumes I have the gitea work done by then and I expect this will be the case) 19:52:22 yes, i concur 19:52:36 easter is april 9 which may or may not be a good time to do it 19:52:47 we'll probably be in a place to upgrade gerrit then, if we want to do both 19:52:48 good friday or monday after easter? 19:53:08 ianw: I think we should do them as distinct steps but could share a downtime window 19:53:12 2023-03-22 is the openstack release 19:53:32 yeah i did mean sequentially :) 19:54:04 I think ianw's part of the world gets easter and adjacent days as holidays 19:54:11 should we look at the week after that instead? 19:54:38 (my kids have a four day weekend the week after so I personally prefer using the easter holiday, but I can be flexible) 19:54:43 15/16/18 are public holidays here 19:54:48 friday 2023-03-31 would be the friday after openstack release week, and yeah 2023-04-07 would be the friday before easter i guess 19:54:50 (also I don't know who did the scheduling at school but they mess that up) 19:55:13 fungi: I expect we'll all be tired from the PTG so wouldn't want to do it that week. I'm good with the 7th 19:55:29 or 10th 19:55:37 I expet both of those days will be quiet due to holidays 19:55:44 i'm travelling later in april, need to go check the dates if we're pushing it to later 19:56:15 ianw: do you hvaea preference? 19:57:20 maybe 7th ish? i'm happy to drive things at a quiet time 19:57:23 I'm good anytime in April and would prefer toavoid the 13-16th 19:57:32 looks like i'm gone april 10-16 19:57:52 if we have a good checklist i think the issues are minimal 19:57:56 ok why don't we pencil in the 7th and we can refine that as we get things ready 19:58:03 that sounds good to me 19:58:53 #info April 7th ish for Gerrit downtime to rename projects and do 3.7 upgrade. Specific downtime window to be determined as we get closer to the day of. 19:58:59 #topic Open Discussion 19:59:04 You have one minute for anything else 20:00:39 Sounds like that was it. Thank you everone 20:00:46 we'll be back here same time and location next week. 20:00:49 #endmeeting