19:01:11 <clarkb> #startmeeting infra
19:01:11 <openstack> Meeting started Tue May  4 19:01:11 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:12 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:14 <openstack> The meeting name has been set to 'infra'
19:01:18 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2021-May/000228.html Our Agenda
19:01:27 <clarkb> #topic Announcements
19:01:43 <clarkb> I won't be able to make the meeting next week as it conflicts with an appointment I can't easiyl move
19:02:01 <clarkb> we'll either need a volunteer meeting chair or cancel it
19:02:12 <clarkb> (I'm happy with canceling it as I suspect we may only have 2 or 3 participants)
19:02:44 <diablo_rojo> o/
19:02:49 <fungi> i'm fine skipping next week
19:03:52 <clarkb> wfm
19:04:00 <clarkb> #topic Actions from last meeting
19:04:05 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-04-27-19.04.txt minutes from last meeting
19:04:19 <clarkb> fungi has/had an action to push changes to retire survey and ianw has a similar action for pbx
19:04:28 <clarkb> have those changes made it to gerrit yet?
19:04:33 <ianw> not yet
19:04:44 <clarkb> #action ianw Push changes to retire pbx.openstack.org
19:05:23 <clarkb> fungi: ^ any news on that from you?
19:05:27 <fungi> oh, yeah i have that up for review, just a sec
19:05:42 <fungi> #link https://review.opendev.org/789060 Deprovision Limesurvey config management and docs
19:05:57 <fungi> should be complete, reviews welcome
19:05:58 <clarkb> thanks!
19:06:05 <fungi> once it merges i can take the server offline
19:06:13 <clarkb> #topic Priority Efforts
19:06:24 <clarkb> #topic OpenDev
19:06:46 <fungi> i still find it amusing that the opendev infra meeting has an opendev topic ;)
19:06:53 <clarkb> heh
19:07:01 <clarkb> fungi: you have a couple of opendev related items to bring up
19:07:11 <fungi> oh, yeah probably. dealer's choice
19:07:15 <clarkb> #link https://review.opendev.org/789098 Update our base nodeset to focal
19:07:25 <fungi> right, that one
19:07:31 <fungi> so mainly we need to announce a date
19:07:48 <fungi> what seems like a reasonable warning timeframe? one week? two? a month?
19:07:55 <clarkb> I was thinking 2-4 weeks
19:08:14 <clarkb> its fairly easy to test if it works as a user. YOu push up changes to swap early and if it breaks you fix it
19:08:17 <fungi> the openstack-discuss ml has a related thread in progress about devstack's master branch dropping bionic support
19:08:42 <fungi> also our base-test job is already updated to ubuntu-focal, so you can try reparenting a job in a dnm change
19:09:30 <fungi> okay, so send announcement today, switch on may 18? 25? june 1?
19:09:42 <fungi> june 8 would be 4 weeks
19:09:44 <ianw> i think earlier is better IMO
19:09:52 <fungi> i'm good with may 18
19:10:00 <ianw> ++
19:10:00 <clarkb> no objects to that from me
19:10:05 <clarkb> *no objections
19:10:06 <ianw> pull the bandaid off :)
19:10:07 <fungi> this ubuntu version is already over a year old
19:10:35 <fungi> #agreed announce base job nodeset change to ubuntu-focal for 2021-05-18
19:10:57 <fungi> i'll send that asap after the meeting ends
19:11:00 <clarkb> thanks
19:11:17 <clarkb> #link https://review.opendev.org/789383 Updating gerrit config docs and planning acl simplification
19:11:45 <clarkb> This is the other one. Basically it documents a simplification of the global gerrit acls that we'd liek to make now that openstack has a meta proejct for openstack specific acls
19:12:09 <clarkb> I think at this point we should probably go for it and then do the acl update too
19:12:56 <fungi> yeah, the summary is that we've moved the openstack release management special cases into a contentless repo inherited by everything in the openstack/ namespace
19:13:26 <fungi> so we should be able to remove their permission to do stuff to projects outside openstack/ with no impact to them now
19:13:48 <fungi> i also rolled some config doc cleanup in with that change
19:14:12 <fungi> but basically happy to remove those permissions once the documentation change reflecting it merges
19:14:25 <clarkb> sounds good
19:14:44 <clarkb> Has anyone had time to look over my large list of probably safe gerrit account cleanups?
19:15:44 <fungi> i would like to say i have, but that's probably a lie?
19:15:50 <clarkb> heh ok :)
19:16:01 <fungi> i honestly don't remember now, so i should look again anyway
19:16:03 <clarkb> well if ya'll manage to find time to look at those I would appreciate it so I can tackle the next group
19:16:18 <clarkb> #topic Update Config Management
19:16:31 <clarkb> I've been dabbling with the ansible recently.
19:17:00 <fungi> a time-honored tradition
19:17:01 <clarkb> Have a change up to enable ubuntu ESM if we don't want to do that by hand. In particular we do need to update the unattended upgrades list if we want to have those updates apply automatically and my change reflects that
19:17:15 <clarkb> I also have a change up to start ansibling mailman stuff but it is nowhere near complete
19:17:22 <fungi> maybe we should talk about esm a bit first?
19:17:26 <clarkb> sure
19:18:01 <clarkb> The way I have written that change it would only apply if we set some host_vars flags and secrets on a per host basis
19:18:18 <clarkb> It should be safe to land as is, then we can "enroll" servers simply by adding that host var info
19:18:27 <fungi> so, what is ubuntu esm, and how do we have access?
19:18:31 <clarkb> ah right
19:18:44 <clarkb> ubuntu esm is ubuntu's extended support effort for LTS releases
19:19:02 <fungi> so, like, after they reach eol
19:19:25 <clarkb> LTSs typically get 5 years of support, but if you need more you can enroll in ubuntu advantage and make use of the "esm-infra" packaging
19:20:14 <fungi> and that's normally a paid thing, right?
19:20:16 <clarkb> This is available for free to various contributors and such. ttx and I reached out to them to see if opendev could have access too and they said yes
19:20:49 <fungi> awesome. thanks canonical! very generous, and may help us limp along on some of our xenial servers a bit longer while we're behind on upgrades
19:21:05 <clarkb> but ya typically I expect most users would pay for this. It also includes other things like fips support and landscape and lvie kernel patching. We don't want any of that just esm access
19:21:37 <clarkb> The way I've written the change it tries to be very careful and only enable esm
19:21:48 <clarkb> by default you get live kernel patching too if you enroll a server
19:21:48 <ianw> it looks safe enough.  it's one global token good for all servers?
19:22:17 <clarkb> ianw: yes seems to be a token associated with the account and it doesn't seem to change. THough I've only enrolled a single host so far to test things out
19:22:46 <clarkb> anything else on esm? do we want to talk mailman ansible?
19:23:29 <ianw> esm looks fine, i guess we can't really test it, so just a matter of babysitting
19:23:37 <clarkb> ianw: ya
19:23:49 <fungi> i'm good to move on
19:23:59 <ianw> i presume the follow-on is to add the enable flag to hosts gradually
19:24:12 <clarkb> ianw: yes and we would be doing that in private host vars I think
19:24:25 <ianw> yeah, good idea
19:25:03 <clarkb> for mailman ansible I've started what is largely a 1:1 mapping of the existing puppet. It is nowhere near complete and there are a lot of moving pieces and I'm a mailman noob so review is welcome :)
19:25:33 <clarkb> I think we should be able to get it to a point where a sytem-config-run-mailman job is able to validate some stuff though
19:26:04 <clarkb> it would just create a bunch of empty lists which the old puppet never really managed beyond either
19:26:17 <ianw> this is with mailman 2 though, right?
19:26:21 <clarkb> correct
19:26:42 <clarkb> it doesn't currently try to convert mailman or do docker or anything like that yet. Just a 1:1 mapping
19:27:06 <clarkb> thinking that maybe we can test upgrades etc if we do it this way
19:28:11 <fungi> yeah, this also gets us a nice transitional state to mm3 i think
19:28:16 <ianw> yeah i'm afraid i'll have to seriously bootstrap my mailman config knowledge :)
19:28:18 <clarkb> if you think this is a terrible idea or want to see a different approach let me know (though I'm nto sure I've got enough of the background and content mapped in to do something like the upgrade)
19:28:37 <fungi> because we can install the mm3 containers onto the current listserv with ansible too, and then only map specific domains to it
19:28:38 <clarkb> basically I can do a 1:1 mapping, but beyond that I'm going to need a lot of catching up
19:28:47 <fungi> luckily exim is already managed by ansible
19:29:05 <ianw> by mailman containers do we mean -- https://github.com/maxking/docker-mailman ?\
19:29:59 <fungi> maybe, or some we build ourselves
19:31:22 <fungi> at this point the ubuntu packages for mailman3 in focal are likely fine too, but that's another upgrade or two to get there
19:31:41 <fungi> less disruptive i think if we switch to containers for that
19:32:02 <fungi> but anyway, somewhat tangential to what clarkb has written
19:32:24 <clarkb> #topic General Topics
19:32:31 <clarkb> #topic Server Upgrades
19:32:54 <clarkb> the zk cluster is done. I've starting thinking about the zuul scheduler but haven't gotten to launching/writing anything yet
19:33:37 <clarkb> I think the way that will look is we launch a new scheduler and have ansible ansible it without starting services. Then we schedule a time to stop existing zuul, copy data as necessary to new zuul, and then start services
19:33:48 <clarkb> in theory the total downtime shouldn't be much longer than a typical zuul restart
19:34:11 <clarkb> But I need to double check the ansible actually works that way (I think it does but I don't want to be wrong and have two zuul start)
19:34:48 <fungi> i suppose this isn't something we want to pair with a distributed scheduler rollout
19:35:12 <clarkb> I don't think so as in theory a distributed scheduler rollout could just be a second new zuul later
19:35:19 <fungi> (because if we did want to, that's an awesome example of where the distributed scheduler work shines)
19:35:26 <clarkb> we don't make the distributed rollout easier by waiting
19:35:31 <fungi> yeah
19:35:54 <fungi> more thinking distributed scheduler could allow for a zero-downtime scheduler server upgrade
19:36:01 <fungi> but the timing isn't ideal
19:37:13 <clarkb> Oh I also want to shrink the size of the scheduler
19:37:30 <clarkb> it is currently fiarly large which helps us when we have memory leaks, but we don't have memroy leaks as often anymore and detecting them more quickly might be a good thing
19:37:39 <clarkb> I'm thinking a 16GB instance instead of the current 30GB is probably good?
19:38:33 <clarkb> anyway that was all I had on this
19:38:36 <fungi> seems fine, yeah, we grew it in response to leaks
19:38:47 <clarkb> #topic OpenEuler
19:39:17 <clarkb> ianw: I haven't seen any discussion for this on the mailing list or elsewhere yet. Wanted to make sure that we pushing things that direction if still necessary
19:39:39 <ianw> ahh yeah sorry i haven't had any further updates
19:40:10 <ianw> i guess
19:40:12 <ianw> #link https://review.opendev.org/c/opendev/system-config/+/784874
19:40:21 <ianw> is the outstanding task, for the mirror
19:40:51 <clarkb> ah ok so just need reviews (as well as someone to make the afs volume and babysit its syncing when that is happy)
19:40:55 <ianw> MIRROR="rsync://root@mirrors.nju.edu.cn/openeuler" still feels odd, but at least there's no password in there now
19:42:12 <clarkb> #topic InMotion cloud network reorg
19:42:38 <clarkb> This is mostly to call out that we're using the inmotion deployed cloud as nodepool resources now. But we are currently limited by ipv4 address availability
19:43:08 <clarkb> One thing we can try is running a zuul executor in the cloud region directly then have it talk to test nodes via private networking.
19:43:18 <clarkb> This hasn't been done by us before but corvus thought it should be doable.
19:43:52 <clarkb> There is one gotcha which is what while zuul+nodepool try really hard to garuntee all the jobs that share dependencies run in the same cloud they don't fully garuntee it
19:43:59 <clarkb> but it is probably good enough for us if we do that
19:44:25 <fungi> also held nodes which land there will need temporary floating ips allocated in order to be reachable
19:44:43 <clarkb> I don't think this is urgent, but wanted to call it out as an option for us there. We suspect we would at least triple our node count if we did this (from 8 to 25 ish)
19:44:50 <fungi> not ideal, but we have spare fip quota
19:44:54 <clarkb> yup
19:45:24 <clarkb> Anyway lets move on. The information is out there now :)
19:45:33 <clarkb> #topic Removing our registration requirement on IRC channels
19:45:39 <ianw> how would the mirror work?
19:45:47 <clarkb> #topic undo
19:46:28 <clarkb> ianw: the mirror could remain as is with external connectivity and test nodes would NAT to it. Or we could redeploy the mirror with a floating IP and have it talk over the private network too
19:46:39 <clarkb> would be similar to how we use the rax mirrors today if we redeployed it that way
19:47:50 <ianw> ok, yeah i guess maybe NAT to it shortcuts the internet, maybe?
19:48:25 <ianw> presumably networking magic would keep the hops low.  seems like the easiest approach.  anyway, sorry, move on
19:48:48 <clarkb> #topic Removing our registration requirement on IRC channels
19:49:02 <clarkb> Late last week TheJulia asked if we had looked at removing our registration requirement on IRC channels
19:49:30 <clarkb> I mentioned that last I checked we had seen fairly regular (but less frequent) spamming in the unregistered channel. However I looked at recent logs and we had ~1.5 spam instances over the last month
19:49:50 <clarkb> One was clearly spam. The other was a join our discord server message which may have been legit. I didn't want to check it out :)
19:50:12 <clarkb> Given that I think we can probably consider updating the accessbot acls and see what things look like afterwards? we can always put the restriction back again?
19:50:18 <clarkb> but wanted to make sure others thought this was a good idea
19:51:48 <ianw> it seems fine; i mean the spam attacks come and go
19:52:01 <ianw> a few weeks ago i was getting the privmsg spam again
19:53:23 <clarkb> ok I'll try to sort that out when I've got time
19:53:42 <clarkb> #topic Switching artifact signing keys from RSA to ECC
19:53:49 <clarkb> #link https://review.opendev.org/789062
19:53:55 <clarkb> fungi: want to take this one?
19:54:14 <fungi> mmm
19:54:38 <fungi> yeah, so frickler pointed out that we might want to reconsider our previous rsa/3072 default
19:55:03 <fungi> and the openstack artifact signing key rotation was overdue anyway
19:55:28 <fungi> i looked, and the latest release of gnu privacy guard switched the default to ecc
19:55:57 <fungi> the version we've been using on bridge to generate keys supports creating ecc keys as long as you pass --expert on the command line
19:56:37 <clarkb> ya so seems like its mostly a small docs update to show how to do the ecc keys
19:56:41 <fungi> so i've taken a shot at rotating to an elliptic curve keypair for this cycle, and documented the process in that change
19:56:41 <clarkb> It looked good to me
19:56:48 <ianw> oohh i am going to make sure to add a --expert argument to all future programs i write :)
19:56:53 <clarkb> ha
19:57:20 <fungi> to be clear, more recent gnupg can create ecc keys without passing --expert
19:57:47 <fungi> they were just somewhat new and so hidden by default in the version shipped in ubuntu bionic (what bridge is running)
19:58:51 <clarkb> and if you use new gnupg you are automatically promoted to expert :)
19:59:02 <fungi> #link https://review.opendev.org/789063 Replace old Wallaby cycle signing key with Xena
19:59:08 <fungi> that's the actual key rotation
19:59:39 <fungi> in case folks want to review and/or attest to the correctness of the key
19:59:46 <clarkb> thanks for putting this together!
19:59:51 <clarkb> We are just about at time though so I'll end it here
19:59:53 <clarkb> THank you everyone
19:59:57 <clarkb> #endmeeting