19:01:11 #startmeeting infra 19:01:11 Meeting started Tue May 4 19:01:11 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:14 The meeting name has been set to 'infra' 19:01:18 #link http://lists.opendev.org/pipermail/service-discuss/2021-May/000228.html Our Agenda 19:01:27 #topic Announcements 19:01:43 I won't be able to make the meeting next week as it conflicts with an appointment I can't easiyl move 19:02:01 we'll either need a volunteer meeting chair or cancel it 19:02:12 (I'm happy with canceling it as I suspect we may only have 2 or 3 participants) 19:02:44 o/ 19:02:49 i'm fine skipping next week 19:03:52 wfm 19:04:00 #topic Actions from last meeting 19:04:05 #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-04-27-19.04.txt minutes from last meeting 19:04:19 fungi has/had an action to push changes to retire survey and ianw has a similar action for pbx 19:04:28 have those changes made it to gerrit yet? 19:04:33 not yet 19:04:44 #action ianw Push changes to retire pbx.openstack.org 19:05:23 fungi: ^ any news on that from you? 19:05:27 oh, yeah i have that up for review, just a sec 19:05:42 #link https://review.opendev.org/789060 Deprovision Limesurvey config management and docs 19:05:57 should be complete, reviews welcome 19:05:58 thanks! 19:06:05 once it merges i can take the server offline 19:06:13 #topic Priority Efforts 19:06:24 #topic OpenDev 19:06:46 i still find it amusing that the opendev infra meeting has an opendev topic ;) 19:06:53 heh 19:07:01 fungi: you have a couple of opendev related items to bring up 19:07:11 oh, yeah probably. dealer's choice 19:07:15 #link https://review.opendev.org/789098 Update our base nodeset to focal 19:07:25 right, that one 19:07:31 so mainly we need to announce a date 19:07:48 what seems like a reasonable warning timeframe? one week? two? a month? 19:07:55 I was thinking 2-4 weeks 19:08:14 its fairly easy to test if it works as a user. YOu push up changes to swap early and if it breaks you fix it 19:08:17 the openstack-discuss ml has a related thread in progress about devstack's master branch dropping bionic support 19:08:42 also our base-test job is already updated to ubuntu-focal, so you can try reparenting a job in a dnm change 19:09:30 okay, so send announcement today, switch on may 18? 25? june 1? 19:09:42 june 8 would be 4 weeks 19:09:44 i think earlier is better IMO 19:09:52 i'm good with may 18 19:10:00 ++ 19:10:00 no objects to that from me 19:10:05 *no objections 19:10:06 pull the bandaid off :) 19:10:07 this ubuntu version is already over a year old 19:10:35 #agreed announce base job nodeset change to ubuntu-focal for 2021-05-18 19:10:57 i'll send that asap after the meeting ends 19:11:00 thanks 19:11:17 #link https://review.opendev.org/789383 Updating gerrit config docs and planning acl simplification 19:11:45 This is the other one. Basically it documents a simplification of the global gerrit acls that we'd liek to make now that openstack has a meta proejct for openstack specific acls 19:12:09 I think at this point we should probably go for it and then do the acl update too 19:12:56 yeah, the summary is that we've moved the openstack release management special cases into a contentless repo inherited by everything in the openstack/ namespace 19:13:26 so we should be able to remove their permission to do stuff to projects outside openstack/ with no impact to them now 19:13:48 i also rolled some config doc cleanup in with that change 19:14:12 but basically happy to remove those permissions once the documentation change reflecting it merges 19:14:25 sounds good 19:14:44 Has anyone had time to look over my large list of probably safe gerrit account cleanups? 19:15:44 i would like to say i have, but that's probably a lie? 19:15:50 heh ok :) 19:16:01 i honestly don't remember now, so i should look again anyway 19:16:03 well if ya'll manage to find time to look at those I would appreciate it so I can tackle the next group 19:16:18 #topic Update Config Management 19:16:31 I've been dabbling with the ansible recently. 19:17:00 a time-honored tradition 19:17:01 Have a change up to enable ubuntu ESM if we don't want to do that by hand. In particular we do need to update the unattended upgrades list if we want to have those updates apply automatically and my change reflects that 19:17:15 I also have a change up to start ansibling mailman stuff but it is nowhere near complete 19:17:22 maybe we should talk about esm a bit first? 19:17:26 sure 19:18:01 The way I have written that change it would only apply if we set some host_vars flags and secrets on a per host basis 19:18:18 It should be safe to land as is, then we can "enroll" servers simply by adding that host var info 19:18:27 so, what is ubuntu esm, and how do we have access? 19:18:31 ah right 19:18:44 ubuntu esm is ubuntu's extended support effort for LTS releases 19:19:02 so, like, after they reach eol 19:19:25 LTSs typically get 5 years of support, but if you need more you can enroll in ubuntu advantage and make use of the "esm-infra" packaging 19:20:14 and that's normally a paid thing, right? 19:20:16 This is available for free to various contributors and such. ttx and I reached out to them to see if opendev could have access too and they said yes 19:20:49 awesome. thanks canonical! very generous, and may help us limp along on some of our xenial servers a bit longer while we're behind on upgrades 19:21:05 but ya typically I expect most users would pay for this. It also includes other things like fips support and landscape and lvie kernel patching. We don't want any of that just esm access 19:21:37 The way I've written the change it tries to be very careful and only enable esm 19:21:48 by default you get live kernel patching too if you enroll a server 19:21:48 it looks safe enough. it's one global token good for all servers? 19:22:17 ianw: yes seems to be a token associated with the account and it doesn't seem to change. THough I've only enrolled a single host so far to test things out 19:22:46 anything else on esm? do we want to talk mailman ansible? 19:23:29 esm looks fine, i guess we can't really test it, so just a matter of babysitting 19:23:37 ianw: ya 19:23:49 i'm good to move on 19:23:59 i presume the follow-on is to add the enable flag to hosts gradually 19:24:12 ianw: yes and we would be doing that in private host vars I think 19:24:25 yeah, good idea 19:25:03 for mailman ansible I've started what is largely a 1:1 mapping of the existing puppet. It is nowhere near complete and there are a lot of moving pieces and I'm a mailman noob so review is welcome :) 19:25:33 I think we should be able to get it to a point where a sytem-config-run-mailman job is able to validate some stuff though 19:26:04 it would just create a bunch of empty lists which the old puppet never really managed beyond either 19:26:17 this is with mailman 2 though, right? 19:26:21 correct 19:26:42 it doesn't currently try to convert mailman or do docker or anything like that yet. Just a 1:1 mapping 19:27:06 thinking that maybe we can test upgrades etc if we do it this way 19:28:11 yeah, this also gets us a nice transitional state to mm3 i think 19:28:16 yeah i'm afraid i'll have to seriously bootstrap my mailman config knowledge :) 19:28:18 if you think this is a terrible idea or want to see a different approach let me know (though I'm nto sure I've got enough of the background and content mapped in to do something like the upgrade) 19:28:37 because we can install the mm3 containers onto the current listserv with ansible too, and then only map specific domains to it 19:28:38 basically I can do a 1:1 mapping, but beyond that I'm going to need a lot of catching up 19:28:47 luckily exim is already managed by ansible 19:29:05 by mailman containers do we mean -- https://github.com/maxking/docker-mailman ?\ 19:29:59 maybe, or some we build ourselves 19:31:22 at this point the ubuntu packages for mailman3 in focal are likely fine too, but that's another upgrade or two to get there 19:31:41 less disruptive i think if we switch to containers for that 19:32:02 but anyway, somewhat tangential to what clarkb has written 19:32:24 #topic General Topics 19:32:31 #topic Server Upgrades 19:32:54 the zk cluster is done. I've starting thinking about the zuul scheduler but haven't gotten to launching/writing anything yet 19:33:37 I think the way that will look is we launch a new scheduler and have ansible ansible it without starting services. Then we schedule a time to stop existing zuul, copy data as necessary to new zuul, and then start services 19:33:48 in theory the total downtime shouldn't be much longer than a typical zuul restart 19:34:11 But I need to double check the ansible actually works that way (I think it does but I don't want to be wrong and have two zuul start) 19:34:48 i suppose this isn't something we want to pair with a distributed scheduler rollout 19:35:12 I don't think so as in theory a distributed scheduler rollout could just be a second new zuul later 19:35:19 (because if we did want to, that's an awesome example of where the distributed scheduler work shines) 19:35:26 we don't make the distributed rollout easier by waiting 19:35:31 yeah 19:35:54 more thinking distributed scheduler could allow for a zero-downtime scheduler server upgrade 19:36:01 but the timing isn't ideal 19:37:13 Oh I also want to shrink the size of the scheduler 19:37:30 it is currently fiarly large which helps us when we have memory leaks, but we don't have memroy leaks as often anymore and detecting them more quickly might be a good thing 19:37:39 I'm thinking a 16GB instance instead of the current 30GB is probably good? 19:38:33 anyway that was all I had on this 19:38:36 seems fine, yeah, we grew it in response to leaks 19:38:47 #topic OpenEuler 19:39:17 ianw: I haven't seen any discussion for this on the mailing list or elsewhere yet. Wanted to make sure that we pushing things that direction if still necessary 19:39:39 ahh yeah sorry i haven't had any further updates 19:40:10 i guess 19:40:12 #link https://review.opendev.org/c/opendev/system-config/+/784874 19:40:21 is the outstanding task, for the mirror 19:40:51 ah ok so just need reviews (as well as someone to make the afs volume and babysit its syncing when that is happy) 19:40:55 MIRROR="rsync://root@mirrors.nju.edu.cn/openeuler" still feels odd, but at least there's no password in there now 19:42:12 #topic InMotion cloud network reorg 19:42:38 This is mostly to call out that we're using the inmotion deployed cloud as nodepool resources now. But we are currently limited by ipv4 address availability 19:43:08 One thing we can try is running a zuul executor in the cloud region directly then have it talk to test nodes via private networking. 19:43:18 This hasn't been done by us before but corvus thought it should be doable. 19:43:52 There is one gotcha which is what while zuul+nodepool try really hard to garuntee all the jobs that share dependencies run in the same cloud they don't fully garuntee it 19:43:59 but it is probably good enough for us if we do that 19:44:25 also held nodes which land there will need temporary floating ips allocated in order to be reachable 19:44:43 I don't think this is urgent, but wanted to call it out as an option for us there. We suspect we would at least triple our node count if we did this (from 8 to 25 ish) 19:44:50 not ideal, but we have spare fip quota 19:44:54 yup 19:45:24 Anyway lets move on. The information is out there now :) 19:45:33 #topic Removing our registration requirement on IRC channels 19:45:39 how would the mirror work? 19:45:47 #topic undo 19:46:28 ianw: the mirror could remain as is with external connectivity and test nodes would NAT to it. Or we could redeploy the mirror with a floating IP and have it talk over the private network too 19:46:39 would be similar to how we use the rax mirrors today if we redeployed it that way 19:47:50 ok, yeah i guess maybe NAT to it shortcuts the internet, maybe? 19:48:25 presumably networking magic would keep the hops low. seems like the easiest approach. anyway, sorry, move on 19:48:48 #topic Removing our registration requirement on IRC channels 19:49:02 Late last week TheJulia asked if we had looked at removing our registration requirement on IRC channels 19:49:30 I mentioned that last I checked we had seen fairly regular (but less frequent) spamming in the unregistered channel. However I looked at recent logs and we had ~1.5 spam instances over the last month 19:49:50 One was clearly spam. The other was a join our discord server message which may have been legit. I didn't want to check it out :) 19:50:12 Given that I think we can probably consider updating the accessbot acls and see what things look like afterwards? we can always put the restriction back again? 19:50:18 but wanted to make sure others thought this was a good idea 19:51:48 it seems fine; i mean the spam attacks come and go 19:52:01 a few weeks ago i was getting the privmsg spam again 19:53:23 ok I'll try to sort that out when I've got time 19:53:42 #topic Switching artifact signing keys from RSA to ECC 19:53:49 #link https://review.opendev.org/789062 19:53:55 fungi: want to take this one? 19:54:14 mmm 19:54:38 yeah, so frickler pointed out that we might want to reconsider our previous rsa/3072 default 19:55:03 and the openstack artifact signing key rotation was overdue anyway 19:55:28 i looked, and the latest release of gnu privacy guard switched the default to ecc 19:55:57 the version we've been using on bridge to generate keys supports creating ecc keys as long as you pass --expert on the command line 19:56:37 ya so seems like its mostly a small docs update to show how to do the ecc keys 19:56:41 so i've taken a shot at rotating to an elliptic curve keypair for this cycle, and documented the process in that change 19:56:41 It looked good to me 19:56:48 oohh i am going to make sure to add a --expert argument to all future programs i write :) 19:56:53 ha 19:57:20 to be clear, more recent gnupg can create ecc keys without passing --expert 19:57:47 they were just somewhat new and so hidden by default in the version shipped in ubuntu bionic (what bridge is running) 19:58:51 and if you use new gnupg you are automatically promoted to expert :) 19:59:02 #link https://review.opendev.org/789063 Replace old Wallaby cycle signing key with Xena 19:59:08 that's the actual key rotation 19:59:39 in case folks want to review and/or attest to the correctness of the key 19:59:46 thanks for putting this together! 19:59:51 We are just about at time though so I'll end it here 19:59:53 THank you everyone 19:59:57 #endmeeting