19:01:14 <clarkb> #startmeeting infra
19:01:14 <opendevmeet> Meeting started Tue Mar  7 19:01:14 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:14 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:14 <opendevmeet> The meeting name has been set to 'infra'
19:01:15 <ianw> o/
19:01:20 <clarkb> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/3ATGLS7XQ2Z2TUC7W4U2747T2XPBR2WL/ Our Agenda
19:01:27 <clarkb> #topic Announcements
19:01:54 <clarkb> I didn't think to put this on the agenda but the next virtual PTG and OpenStack release are just a couple/few weeks away
19:02:08 <clarkb> OpenStack release is March 22 and PTG is March 27-31
19:02:17 <clarkb> Things to be aware of as we're making changes
19:03:11 <clarkb> #topic Bastion Host Updates
19:03:18 <clarkb> #link https://review.opendev.org/q/topic:bridge-backups
19:03:38 <clarkb> I've reviewed this stack of changes and I think it is ready. That said there were two things I wanted to bring up
19:04:08 <clarkb> First: are we comfortable sticking the private key in the backup itself (fungi might have opinions on this). The alternative is storing it with our portion of the passphrase andkeeping it more secret
19:04:41 <fungi> this is the private key for encrypting the backup?
19:04:55 <clarkb> Second: how do we want to assign passphrase portions and ensure that everyone has taken the appropriate one? THis only works if we all save our bit and don't all save the same one
19:05:10 <clarkb> fungi: correct. The private key that encrypted the backup and is itself protected by the split up passphrase
19:05:10 <ianw> yes it's a trade off between loosing the key, or having the wrong one stored
19:05:45 <ianw> for distributing the key, i can just generate one on bridge and put usernames against it.  when people have stored it safely, we can delete everything
19:05:53 <fungi> the usual way shamir is done is for each party to generate and provide part of they key, then there's no risk that people hold onto the wrong parts
19:05:54 <clarkb> ianw: that method works for me
19:06:54 <fungi> but yeah, i'm fine with either way
19:07:32 <ianw> fungi: the ssss tool has a 128 character encoding limit.  so we are encoding a 128 mixed case password to the private key
19:09:05 <clarkb> I just wanted to make sure other reviewers considered the private key risk there and were ok with it.
19:09:22 <clarkb> I think the plan of annotating with usernames and then everyone indicating when they've captured their bit should work well
19:10:20 <clarkb> Anything else bridge related or should we move on?
19:10:29 <ianw> not from me
19:10:55 <clarkb> #topic Mailman 3
19:11:08 <clarkb> fungi: ny progress creating the sites in django?
19:11:24 <fungi> nope. maybe if unrelated things will stop breaking ;)
19:11:46 <fungi> hopefully with the nodepool provider issues almost behind us i can get back to it
19:12:01 <clarkb> #topic Gerrit Updates
19:12:10 <clarkb> cool we can contineu on then
19:12:42 <clarkb> ianw has created a number of changes and a plan for updating our Gerrit ACLs to avoid items that are deprecated in 3.7 (and won't be accepted in new acl updates)
19:12:50 <clarkb> #link https://review.opendev.org/q/topic:gerrit-s-r-3.7 Cleaning up deprecated copy conditions in project ACLs
19:13:19 <ianw> oh my upload may have broken the topic on some changes, i'll put them all back
19:13:35 <clarkb> there are a couple of axis in this. The first is the vote copy conditions need updating as well as converting to submit requirements. THe other axis is updating regular project ACLs vs All-Projects
19:14:10 <clarkb> It would be good for others to review this due to its potential for broad impact. Keep in mind the docs are a bit clunky however ianw has tested on a held node to try and undersatnd behaviors better
19:14:24 <clarkb> I think things are looking good at this point though and its just a matter of applying the updates
19:14:27 <ianw> #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/message/ZOI6Z3M3K45FCX2CNZ6DYM5QBYZBDFP4/
19:14:32 <ianw> is a summary too
19:14:40 <clarkb> I do need to rereview the latest updates though
19:15:13 <clarkb> Once this is done we're basically ready to start planning a Gerrit 3.7 upgrade too
19:16:22 <ianw> https://104.130.253.50/c/x/test-project/+/3 is All-Projects with s-r's
19:16:57 <clarkb> ianw: looks like that is a 3.7.1 host. Should we apply All-Projects updates to 3.6.4 too?
19:17:32 <clarkb> the little view conditions drop down is maybe a bit detail specific but cool to have that info at the change level
19:17:44 <ianw> that is also there in 3.6 as well
19:18:06 <clarkb> neat
19:18:44 <ianw> i can apply it to 3.6 if you like, but i don't think in this regard anything is different
19:18:54 <clarkb> thank you for putting this together. The docs definitely didn't make it easy and probably led to more experimenting than should be necessary
19:19:29 <clarkb> ianw: oh! should we be doing any updates to our CI jobs? I already converted Verified to submit requirements there. But we could update All-Projects for code-review too?
19:20:05 <ianw> yeah, currently in CI we don't set all-projects, so by default we don't have the workflow label
19:20:49 <clarkb> I don't think its super urgent but that might be a good way to get ongoing coverage of this
19:20:54 <fungi> i wonder if it would make sense to put some of that in all-projects
19:21:19 <fungi> oh, yes that's what you're pointing out
19:21:21 <ianw> i guess the problem is that it's either editing what's there, or maintaining a complete all-projects for each release
19:21:37 <clarkb> hrm ya since we already get a precanned version
19:21:44 <ianw> at the moment we just get the gerrit default version
19:22:08 <ianw> which gives us perfect "bootstrapability" -- but are we ever going to actually start with a compeltely fresh gerrit?
19:22:16 <ianw> if the worst happened, we'd restore from backup anyway
19:22:34 <clarkb> another thought is that I think code-review is officiall defined with function = MaxWithBlock. Maybe we should ask upstream about changing that to a submit requirement and if that will impact any potential future migrations?
19:23:02 <clarkb> ianw: ya I think its mostly about having the test env align with prod as much as possible. Less about automating things
19:23:33 <clarkb> But its also not urgent considering we haven't yet needed to do that
19:23:39 <clarkb> we'll likely continue to be fine as is
19:24:19 <ianw> i've fiddled to much now, i feel like the 3.7 started with code-review as a SR in it's pre-canned All-Projects
19:24:38 <clarkb> ianw: ah ok so there must already be a migration?
19:25:01 <clarkb> Or maybe they don't migrate it and expect you to update it by hand before/after 3.7. Some understanding of that might not be a bad idea
19:25:14 <ianw> actually no, looking at the history of All-Projects
19:25:16 <ianw> +[label "Code-Review"]
19:25:16 <ianw> +	function = MaxWithBlock
19:25:22 <clarkb> I'm just worred we'll get to doing our upgrade and it will break beacuse the migration tool doesn't know how to reconcile what we've got and the new thing
19:25:28 <ianw> Initialized Gerrit Code Review 3.7.1-1-ga57c4bf868-dirty
19:25:45 <clarkb> ya ok so they haven't updated it internally yet, but eventually will
19:26:03 <clarkb> we should see if they have a plan for that and if converting ahead of time is in conflict with that plan
19:26:30 <ianw> being 100% like production, i think the cla's might be a problem
19:26:56 <ianw> (making testing All-Projects 100% like production, i mean)
19:27:01 <clarkb> because we would need to add the clas to testing too?
19:27:43 <ianw> yeah, i think so
19:27:45 <ianw> +	copyCondition = changekind:NO_CHANGE OR changekind:TRIVIAL_REBASE OR is:MIN
19:28:06 <ianw> at least that explains where i copied the NO_CHANGE copy-condition -- that's the pre-canned 3.7 copyConditions
19:28:10 <ianw> for code-review
19:28:41 <clarkb> but ya if others can look it over to make sure we aren't changing behaviors or doing anything unexpected that would be great
19:28:57 <clarkb> I also wanted to call out the results of the gerrit community meeting
19:29:27 <clarkb> For Java 17 they are in no rush to drop Java 11. THis means we can stick to Java 11 for now and wait for upstream to fix the issues rather tahn deploying annoying workarounds
19:29:41 <clarkb> Sounds like part of what drives this is the version of java Google wants to deploy with and that isn't expected to change soon
19:29:55 <clarkb> For the ssh thing I was told they would accept a docs update for the secret config option
19:30:08 <clarkb> #link https://gerrit-review.googlesource.com/c/gerrit/+/362054 document secret Gerrit config option
19:30:24 <clarkb> And NasserG would try to test ianw's fix for the underlying issue in their env which has this problem a lot more than ours
19:30:53 <fungi> tangentially related, sometime i want to pick the brains of people who have messed with automated gerrit setup for our deployment testing and zuul quickstart tests in order to figure out how to get un-stuck with updating the gerrit version we test git-review against, last i tried i was running up against problems bootstrapping user accounts (i think), maybe it was specifically
19:30:55 <clarkb> This ended up being fairly productive for us. I'll make an effort to continue attending that meeting each month (it is at 8am pacific on the first thursday of a month)
19:30:55 <fungi> administrator bootstrapping for project creation
19:31:51 <clarkb> fungi: the zuul quickstart stuff is pretty easy toread though. Probably more so than our system-config stuff since the zuul deployment is far more focused
19:32:09 <clarkb> (we're trying to make something that has attributes similar to our prod setup to check things we care about whereas zuul quickstart is just give me a gerrit)
19:32:24 <clarkb> I want to say the trick is the development flag
19:33:13 <fungi> yeah, i want to say there were things the git-review tests need which the zuul quickstart doesn't, and that's where i was stuck, but i need to page all that back in again
19:33:15 <fungi> we've got a submitted feature addition to support something in newer gerrit we currently can't run the tests for because of being unable to boottrap newer gerrit for it
19:33:38 <fungi> oh, right, setting develop mode broke some things about auth
19:33:50 <clarkb> huh ssh should still work as normal
19:33:59 <clarkb> we do a lot of ssh with dev mode in the system-config job
19:34:11 <fungi> #link https://review.opendev.org/849419 Upgrade testing to Gerrit 3.4.4
19:34:35 <fungi> i'll follow up after the meeting trying to see if i can find where i got stuck with it
19:34:43 <clarkb> #topic Upgrading Old Servers
19:35:07 <clarkb> gitea05-08 are now all remvoed from gerrit replication and not behind the gitea haproxy
19:35:16 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/876470 Remove old giteas from config management
19:35:30 <clarkb> This change will remove those servers from configuration management so that we can start to delete the old servers
19:36:09 <clarkb> For next steps we need to decide how many (if any) additional larger gitea servers we need in the cluster. To help figure this out I've disabled gitea01-04 via the haproxy command socket
19:36:35 <clarkb> So far the four new servers have been holding up. But we may need or want additional servers anyway. I'm open to feedback on that.
19:36:45 <clarkb> The other next step is moving gitea backups from gitea01 to gitea09
19:37:02 <clarkb> #link https://review.opendev.org/c/opendev/system-config/+/876471/ should do that
19:37:51 <clarkb> Good progress here overall. Just a couple of chagnes to land and then decisions on the remaining old servers and if they need to be replaced. Please let me know what you think if you get a chance to look at this
19:38:12 <clarkb> The other two tpes of servers on my radar for prioritizing replacements are the nameservers and etherpad
19:38:38 <clarkb> I think I'll do etherpad after the PTG. Since etherpad is heavily relied upon during the PTG I don't want to be rushed or find jammy's kernel makes nodejs sad or something
19:38:52 <clarkb> It should be a quick one too. Deploy new server. Take downtime, move database, update dns, done
19:39:15 <clarkb> ianw: I think you mentioned being able to look at nameservers. Have you had a chance to start that et?
19:39:36 <clarkb> I think the main thing here is going to be coordinating the chagnes in DNS itself? And thats somethign I need to page back in before I start
19:39:42 <clarkb> I always forget who has what records
19:40:00 <ianw> not yet sorry, sniped by iterations on gerrit stuff.  i think it's probably one we want to pre-plan with a checklist, i can come up with that
19:40:14 <clarkb> ++
19:40:34 <clarkb> definitely a multistep process that we need to take and possibly one with hold points to ensure DNS out in the wild isn't relying on stale nameservers
19:40:50 <clarkb> #topic AFS quotas
19:41:08 <clarkb> I noticed yesterday a number of afs volumes are nearing their quota limits
19:41:15 <clarkb> in particular debian-security is down to a couple hundred MB
19:41:37 <clarkb> ubuntu-ports and one (or both?) of the centos volumes are also close.
19:41:50 <clarkb> Should I take some time to increase the debian-security one by say 20GB today?
19:43:41 <clarkb> I don't hear any objections so I'll try to get that done before it creates problems
19:44:00 <ianw> ++
19:44:14 <ianw> i think we determined there wasn't much we could drop
19:44:30 <clarkb> ya I poked around and didn't find nything obvious at least
19:44:48 <clarkb> I'm a little less worried about the others but I should look at them and give them similar small bumps if appropriate too
19:44:56 <fungi> still somewhat curious where the extra utilization has cropped up
19:45:34 <clarkb> for centos it seems they add packages without always removing packages
19:45:43 <clarkb> for example there are several gigabytes of just thunderbird packages
19:45:58 <clarkb> I suspect similar may happen with debian for security updates?
19:46:11 <clarkb> where they don't remove the security updates from the security mirror once they add them to regular mirrors
19:46:20 <fungi> the debian increases are a bit odd, those typically get folded into point releases eventually
19:46:58 <fungi> just wondering why it would have shot up in the past few months
19:47:33 <clarkb> lots of security updates that they don't actually remove when they add them to the point releases?
19:47:37 <clarkb> thats my hunch
19:49:02 <clarkb> But ya its a bit annoying things just grow forever even on old stable platforms
19:50:23 <clarkb> Anyawy I'll try to get that done today. I've got an errand to run and school pickup and ther eis an openinfra foundation board meeting so I may end up getting to it tomorrow
19:50:31 <clarkb> #topic Scheduling project renames
19:51:45 <clarkb> fungi: you wanted to discuss this. I think we should wait for the openstack release and ptg to complete but then we're probably wide open?
19:51:59 <clarkb> (this assumes I have the gitea work done by then and I expect this will be the case)
19:52:22 <fungi> yes, i concur
19:52:36 <clarkb> easter is april 9 which may or may not be a good time to do it
19:52:47 <ianw> we'll probably be in a place to upgrade gerrit then, if we want to do both
19:52:48 <clarkb> good friday or monday after easter?
19:53:08 <clarkb> ianw: I think we should do them as distinct steps but could share a downtime window
19:53:12 <fungi> 2023-03-22 is the openstack release
19:53:32 <ianw> yeah i did mean sequentially :)
19:54:04 <clarkb> I think ianw's part of the world gets easter and adjacent days as holidays
19:54:11 <clarkb> should we look at the week after that instead?
19:54:38 <clarkb> (my kids have a four day weekend the week after so I personally prefer using the easter holiday, but I can be flexible)
19:54:43 <ianw> 15/16/18 are public holidays here
19:54:48 <fungi> friday 2023-03-31 would be the friday after openstack release week, and yeah 2023-04-07 would be the friday before easter i guess
19:54:50 <clarkb> (also I don't know who did the scheduling at school but they mess that up)
19:55:13 <clarkb> fungi: I expect we'll all be tired from the PTG so wouldn't want to do it that week. I'm good with the 7th
19:55:29 <clarkb> or 10th
19:55:37 <clarkb> I expet both of those days will be quiet due to holidays
19:55:44 <fungi> i'm travelling later in april, need to go check the dates if we're pushing it to later
19:56:15 <clarkb> ianw: do you hvaea preference?
19:57:20 <ianw> maybe 7th ish?  i'm happy to drive things at a quiet time
19:57:23 <clarkb> I'm good anytime in April and would prefer toavoid the 13-16th
19:57:32 <fungi> looks like i'm gone april 10-16
19:57:52 <ianw> if we have a good checklist i think the issues are minimal
19:57:56 <clarkb> ok why don't we pencil in the 7th and we can refine that as we get things ready
19:58:03 <fungi> that sounds good to me
19:58:53 <clarkb> #info April 7th ish for Gerrit downtime to rename projects and do 3.7 upgrade. Specific downtime window to be determined as we get closer to the day of.
19:58:59 <clarkb> #topic Open Discussion
19:59:04 <clarkb> You have one minute for anything else
20:00:39 <clarkb> Sounds like that was it. Thank you everone
20:00:46 <clarkb> we'll be back here same time and location next week.
20:00:49 <clarkb> #endmeeting