19:01:10 #startmeeting infra 19:01:12 Meeting started Tue May 19 19:01:10 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:15 The meeting name has been set to 'infra' 19:01:19 #link http://lists.opendev.org/pipermail/service-discuss/2020-May/000025.html Our Agenda 19:01:25 #topic Announcements 19:01:43 The OpenStack reelase went really smoothly 19:01:52 thank you to everyone for ensuring that services were running and happy for that 19:02:33 This next weekend is also a holiday weekend. OpenStack Foundation staff have decided that we're gonna tack on an extra day on the other side of it making Friday a day off too 19:02:37 and that was with ovh entirely offline for us too ;) 19:03:04 as a heads up that means I'll be largely away from my computer Friday and Monday 19:03:32 i'll likely be around if something comes up, because i'm terrible at vacationing 19:04:08 but i may defer non-emergency items 19:04:09 #topic Actions from last meeting 19:04:20 #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-05-12-19.01.txt minutes from last meeting 19:04:31 There are no actions to call out 19:04:37 #topic Priority Efforts 19:04:44 #topic Update Config Management 19:05:06 any movement on the gerritbot installation ? I haven't seen any 19:05:08 there's been a ton of progress on our ci mirrors 19:05:35 yes, ianw has redeployed them all to be ansible managed udner the opendev.org domain 19:05:52 at least the ones which weren't yet 19:06:07 i did similarly for the ovh mirrors since they needed rebuilding anyway 19:07:19 corvus: mordred: I think the other big item on this topic is the nb03 arm64 docker image (which still needs debugging on the image build side) and the work to improve testing of Zuul deployments pre merge with zuul 19:07:29 is there anything else to call out on that? maybe changes that are ready for review? 19:07:35 so far everything seems to be working after the mass mirror replacements, so that went extremely well 19:07:47 i think the work on getting zuul+nodepool working in tests is nearly there -- at least, i think the 2 problems we previously identified have solutions which are complete and ready for review except: 19:08:00 * diablo_rojo sneaks in late again 19:08:18 diablo_rojo: there might still be a few cookies left, but the coffee's gone already 19:08:24 they keep hitting random internet hangups, so they're both still 'red'. but i think their constituent jobs have individually demonstrated functionality 19:08:34 2 main changes: 19:08:36 I'm planning on digging back in to figuring out what's up with the multi-arch stuff :( 19:08:55 #link better iptables support in testing https://review.opendev.org/726475 19:09:01 fungi, thats alright, I prefer tea as my hot caffeine source anyway. 19:09:07 #link run zuul as zuuld user https://review.opendev.org/726958 19:09:18 corvus: gerrit reports they conflict with each other. Do they need to be stacked? 19:10:03 that last one is worth a little extra discussion -- we went back and forth on irc on how to handle that, and we decided to use a 'container' user, but once i was done with that, i really didn't like it, so i'm proposing we zig and use a 'zuuld' user instead. 19:10:18 ++ 19:10:24 the bulk of the change is normalizing all the variables around that, so, really, it's not a big deal to change (now or later) 19:11:06 the main thing we were worried about is whether it would screw up the executors, but i walked through the process, and i think the executor is going to happily sshing to the 'zuul' user on the worker nodes even though it's running as zuuld 19:11:32 cool and that will avoid zuul on the test nodes conflicting with zuul the service user 19:11:39 clarkb: unsure about conflicts; i'll check. i might rebase both of them on the 'vendor ppa keys' change i'm about to right 19:12:01 clarkb: yep 19:12:34 i'm unsure if the ansible will rename the zuul->zuuld user on the real executors or not 19:12:52 but if it errors, we can manually do that pretty easily 19:12:55 the uids stay the same right? so its jsut a minor /etc/passwd edit? 19:12:57 the uid will be the same 19:12:58 yep 19:13:44 alright anything else on the topic of config management updates? 19:14:43 #topic OpenDev 19:14:48 #link https://etherpad.opendev.org/p/XRyf4UliAKI9nRGstsP4 Email draft for building advisory board. 19:15:06 I wrote this draft email yesterday and am hoping I can get some reviews of it before sending it out 19:15:28 basic idea is to send email to service-discuss while BCC'ing people who I think may be interested. Then have those that are interested respond to the list and volunteer 19:16:16 if that plan sounds good and people are ok with the email content I'll go ahead and send that out and start trying to drum up interest there 19:17:15 clarkb: generally looks good. i noted 2 things that could use fixing 19:17:31 corvus: thanks I'll try to incorporate feedback after the meeting 19:17:43 Probably don't need to discuss this further here, just wanted to make sure people were aware of it 19:17:54 Any other OpenDev topics to bring up before we move on? 19:19:18 #topic General Topics 19:19:27 #topic pip-and-virtualenv next steps 19:19:38 ianw you added this item want to walk us through it? 19:20:24 yes so the status of this is currently that centos-8, all suse images, all arm64 images have dropped pip-and-virtualenv 19:20:40 that of course leaves the "hard" ones :) 19:21:04 we have "-plain" versions of our other platform nodes, and these are passing all tests for zuul-jobs 19:21:23 i'm not sure what to do but notify people we'll be changing this, and do it? 19:21:30 i'm open to suggestiosn 19:21:45 yeah - I think that's likely the best bet 19:22:00 folks can test by adding a "nodeset" line to a project-pipeline job variant, yeah? 19:22:01 oh and the plain nodes pass devstack 19:22:03 ianw: ya I think now is likely a good time for openstack. Zuul is trying to get a release done, but is also capable of debugging quickly. Airship is probably my biggest concern as they are trying to do their big 2.0 release sometime soonish 19:22:06 it's beginning of an openstack cycle - and there should be a clear and easy answer to "my job isn't working any more" right 19:22:08 ? 19:22:18 corvus: yes I think so 19:22:27 altenative suffix -vanilla :P 19:22:35 zbr: we'll be dropping the suffixes entirely I think 19:22:43 zbr: its jsut there temporarily for testing 19:22:49 yes i'd like to drop the suffix 19:22:58 at a minimum we should make sure the ubuntu-focal images are "plain" 19:23:02 maybe the thing to do is notify people of the change with direction on testing using a vairant and the -plain images 19:23:05 maybe we should send out an email saying "we will change by date $x; you can test now by following procedure $y" ? 19:23:09 oh, yeah the focal images dropped it to 19:23:09 clarkb: yeah that :) 19:23:11 corvus: ++ 19:23:20 openstack projects are supposed to be switching to focal and replacing legacy imported zul v2 jobs this cycle anyway 19:23:42 the bigger impact for openstack will likely be stable branches 19:23:53 doing it next week may work, week after is PTG though which we'd probably want to avoid unnecesary disruption during 19:23:58 fungi: yeah - where I imagine there will be a long tail of weird stuff 19:23:59 they may have to backport some job fixes to cope with missing pip/virtualenv 19:24:14 fungi: ya, though I think the idea is that we've handled ti for them? 19:24:15 yeah, at worse you might have to put in an ensure-virtualenv role 19:24:30 set the date at next friday? (or thursday?) 19:25:16 we can also do a soft update where we switch default nodesets to -plain while keeping the other images around. Then in a few more weeks remove the -plain images 19:26:00 ok, i will come up with a draft email, noting the things around plain images etc and circulate it 19:26:15 sounds good 19:26:22 clarkb: I like that idea 19:26:24 but if ianw's hunch is right, then the job changes needed to add the role are equally trivial as the job changes needed to temoprarily use the other image name 19:27:26 ya I think we've done a fair bit of prep work to reduce impact and make fixing it easy. Rolling forward is likely as easy as being extra cautious 19:27:30 so keeping the nonplain images around after the default switch may just be creating more work for us and an attractive nuisance for projects who don't know the fix is as simple as the workaround 19:27:37 the key thing is to give people the info they need to know what to do if it breaks them 19:28:14 ++ i will definitely explain what was happening and what should happen now 19:28:23 ++ 19:28:30 if anyone reads it, is of course another matter :) 19:28:51 they'll read it after they pop into irc asking why their jobs broke and we link them to the ml archive 19:29:07 better late than never 19:29:36 anything else on this topic? 19:29:42 no thanks 19:29:46 #topic DNS cleanup 19:29:59 #link https://review.opendev.org/728739 : Add tool to export Rackspace DNS domains to bind format 19:30:03 ahh, also me 19:30:07 go for it 19:30:24 so when i was clearing out old mirrors, i very nearly deleted the openstack.org DOMAIN ... the buttons start to look very similar 19:30:44 after i regained composure, i wondered if we had any sort of recovery for that 19:30:52 but also, there's a lot of stuff in there 19:31:10 I like the idea of exporting to a bind file 19:31:18 firstly, is there any issue with me pasting it to an etherpad for shared eyes to do an audit of what we can remove? 19:31:43 ianw: I don't have any concern doing that for the records we manage in that zone. I'm not sure if that is true of the records the foundation manages 19:31:52 ianw: you poked at the api - how hard do you think it woudl be to make a tool that would diff against a bind file and what's in the api? (other than just another export and a diff) 19:31:54 fungi: ^ you may have a sense for that? 19:31:56 istr there were some maybe semi-private foundation stuff 19:32:13 last time i looked, there was no export option in the webui and their nameservers reject axfr queries 19:32:36 ... ok, so maybe i won't post it 19:32:52 clarkb: we can ask the osf admins if there's anything sensitive about dns entries 19:32:52 ianw: we could use a file on bridge instead of etherpad 19:33:01 could we verify with foundation if there actually are semi-private things? it woudl be nice if we could export it and put a copy into git for reference 19:33:04 and ya fungi and I can ask jimmy and friends if there are reasons to not do that 19:33:08 fungi: there isn't an export option in the UI, but the change linked above uses the API for a bind export 19:33:39 so yeah, i'm thinking at a minimum we should dump it, and other domains, periodically to bridge or somewhere incase someones does someday click the delete domain button 19:33:42 ianw: oh, does that support pagination? last time i looked the zone was too big and just got truncated 19:33:54 ianw: ++ 19:34:18 fungi: it looks complete to me ... let me put it on bridge if you want to look quickly 19:34:31 sure, happy to skim 19:34:45 ya I like the idea of a periodic dump. I actually got introduced to Anne via a phone call while walking around seattle one afternoon due to a dns mistake that happened :/ 19:34:58 ~ianw/openstack.org-bind.txt 19:35:46 ok, i could write it as an ansible role to dump the domain, although it would probably have to be skipped for testing 19:35:51 or, just a cron job 19:35:53 ideally we won't be adding any new names to that zone though, right? at most replacing a/aaaa with cname and adding le cnames 19:36:07 fungi: yeah - but we remove names from it 19:36:10 as we retire things 19:36:12 ya deleting will be common 19:36:41 i suppose the periodic dump is to remind us what's still there 19:36:41 fungi: yep ... and there's also a ton of other domains under management ... i think it would be good to back them up too, just as a service 19:37:00 oh, got it, disaster recovery 19:37:32 and, if we want to migrate things, a bind file is probably a good place to start :) 19:37:49 in the past we've shied away from adding custom automation which speaks proprietary rackspace protocols... how is this different? 19:37:54 ianw: feature request ... better sorting 19:38:24 fungi: we use the rax dns api for reverse dns 19:38:50 also this is less automation and more record keeping (eg it can run in the background and if it breaks it won't affectanything) 19:38:54 i like it since it's a bridge to help us move off of the platform 19:38:58 mordred: we do, but we do that manually 19:39:10 oh and ya if yo uwant to move off having bind zone files is an excellent place to start 19:39:20 fungi: i guess it's not ... but we are just making a copy of what's there. i feel like we're cutting off our nose to spite our face if i had of deleted the domain and we didn't back it up for ideological reasons :) 19:39:23 fungi: yeah - but what clarkb and corvus said 19:39:46 i find the "we have periodic exports of your zones in a format you could feed into a better nameserver" argument compelling, yes 19:39:51 ++ 19:40:03 fwiw I've asked the question about publishing the zone file more publicly 19:40:07 will let everyone know what i hear back 19:40:08 cool 19:40:29 clarkb: thanks, we can do a shared cleanup if that's ok then, and i'll work on some sort of backup solution as well 19:40:32 but also - I think we should spend a few minutes to sort/format that file - it'll make diffing easier 19:40:45 like "dump file before change, dump file after change, diff to verify" 19:41:05 mordred: ya that sounds like a good idea 19:41:14 mordred: alphabetical sort of hostnames? 19:41:20 I could see just straight alph sorting being the most sensible - I don't think it needs to be grouped by type 19:41:24 ianw: ++ 19:41:36 i can make the export tool do that no probs 19:41:46 that should be the easiest for our human brains to audit too 19:42:13 (in fact, the rax ui grouping things by type drives me crazy) 19:42:37 ok, i'll also list out the other domains under management 19:42:51 ianw: your zone export looks comprehensive to me, just skimming. i want to say the limit was something like 500 records so maybe we've sufficiently shrunk our usage that their api is no longer broken for us 19:43:05 (it's 432 now, looks like) 19:43:23 that sounds like a legit number :) 19:43:24 oh good, maybe i got lucky :) 19:43:30 as a timecheck we have ~17 minutes and a few more items to get through. Sounds like we've got next steps for this item sorted out. I'm going to move things along 19:43:39 yep, thanks for feedback! 19:43:44 #topic HTTPS on in region mirrors 19:43:50 #link https://review.opendev.org/#/c/728986/ Enable more ssl vhosts on mirrors 19:44:07 I wrote a change to enable more ssl on our mirrors. Then ianw mentioned testing and I ended up down that rabbit hole today 19:44:15 its a good thing too because the change as is wouldn't have worked 19:44:31 I think the latest patchset is good to go now though 19:44:44 The idea there is we can start using ssl anywhere that the clients can parse it 19:45:11 In particular things like pypi and docker and apt repos should all benefit 19:45:23 that is basically everything ! apt on xenial? 19:45:35 ianw: ya and maybe older debian if we still have that around 19:46:13 if we get that in I'll confirm all the mirrors are happy and then look at updating our mirror configuration in jobs 19:46:29 This has the potential to be disruptive but I think it should be transparent to users 19:46:35 ohh s/region/openafs/ ... doh 19:47:24 mostly just a request for review on that and a heads up that there will be changes in the space 19:47:39 #topic Scaling Jitsi Meet for Meetpad 19:47:52 Last Friday diablo_rojo hosted a conference call on meetpad 19:48:11 we had something like ~22 concurrent users at peak and it all seemed to work well 19:48:19 what we did notice is that the jitsi video bridge (jvb) was using significant cpu resources though. 19:48:23 Yeah it was good. I had no lag or freezing. 19:48:26 i thought there would be an open bar 19:48:37 corvus, there was at my house 19:48:46 I dug into jitsi docs over the weekend and found that jvb is the main component necessary to scale as it doe all the video processing 19:49:05 thankfully we can run multiple jvbs and they load balance reasonably well (each conference is run on a single jvb though) 19:49:15 #link https://review.opendev.org/#/c/729008/ Starting to poke at running more jvbs 19:49:17 clarkb: did you read frickler's link about that? 19:49:18 and it can be network distributed from the other services too i guess? 19:49:22 corvus: I couldn't find it 19:49:36 fungi: yes 19:49:50 basically where that took me was the 729008 change above 19:50:06 so run multiple virtual machines with a jvb each, and then another virtual machine with the remaining services 19:50:09 got it 19:50:12 which I think is really close at this point and is going to be blocked on dns in our fake setup as well as firewall hole punching 19:50:56 clarkb: maybe rebase on my firewall patch? 19:51:14 corvus: ya I think the firewall patch solves the firewall problem. Did you also end up doing the /etc/hosts or similar stuff? 19:51:51 my thinking on this is if we can get 729008 in or something like it then we can just spin up a few extra jvb's next week, have them run during the ptg, then shut them down after wards 19:52:20 clarkb: no, firewall config is ip addresses from inventory 19:52:38 clarkb: I have a patch for hostnames 19:52:48 clarkb: at least, that's the backend implementation. the frontend is 'just add the group' 19:52:50 mordred: you what? 19:52:51 https://review.opendev.org/#/c/726910/ 19:53:00 if we want to do that 19:53:06 i thought we decided not to? 19:54:24 I'll continue and we can sort out those details after the meeting. Have a few more items to bring up before our hour is up 19:54:27 I don't remember us deciding that - but if we did, cool - I can abandon that patch 19:54:35 #topic Project Renames 19:54:46 We have a single project queued up for renaming in gerrit and gitea 19:55:06 we've been asked about this several times and the response previously was we didn't want to take an outage just prior to the openstack release 19:55:10 it's been waiting since just after our last rename maintenance 19:55:18 that release is now done so we can schedule a new rename. 19:55:29 fungi: right, one of the problems here was they didn't get on the queue last time and showed up like the day after 19:55:50 this time around I'll try to "advertise" the scheduled rename date to get things on the list as early as possible 19:56:07 My current thinking is that with all the prep to the ptg and the ptg itself as well as holidays the best time for this may be the week after the ptg 19:56:24 June 8-12 ish time frame 19:56:45 also post PTG tends to be a quiet time so may be good for users too 19:57:07 any preferences or other ideas? 19:58:02 doesn't sound like it. Let's pencil in June 12 and start getting potential renames to think with that deadline in mind? 19:58:21 (sorry if I'm moving too fast I keep looking at hte clock and realize we are just about at time for the day) 19:58:31 wfm, thanks 19:58:41 #topic Virtual PTG Attendance 19:58:48 #link https://virtualptgjune2020.eventbrite.com Register if you plan to attend. This helps with planning details. 19:58:53 #link https://etherpad.opendev.org/p/opendev-virtual-ptg-june-2020 PTG Ideas 19:59:01 A friendly reminder to register for the PTG if you plan to attend 19:59:10 as well as a link to our planning document with connection and time details 19:59:22 This will be all new and different. Will be interesting to see how it goes 19:59:44 And that basically takes us to the end of the hour 19:59:53 thanks clarkb! 19:59:58 Thank you everyone for your time. Feel free to continue discussions in #opendev 20:00:02 #endmeeting