19:01:10 <clarkb> #startmeeting infra
19:01:12 <openstack> Meeting started Tue May 19 19:01:10 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:15 <openstack> The meeting name has been set to 'infra'
19:01:19 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-May/000025.html Our Agenda
19:01:25 <clarkb> #topic Announcements
19:01:43 <clarkb> The OpenStack reelase went really smoothly
19:01:52 <clarkb> thank you to everyone for ensuring that services were running and happy for that
19:02:33 <clarkb> This next weekend is also a holiday weekend. OpenStack Foundation staff have decided that we're gonna tack on an extra day on the other side of it making Friday a day off too
19:02:37 <fungi> and that was with ovh entirely offline for us too ;)
19:03:04 <clarkb> as a heads up that means I'll be largely away from my computer Friday and Monday
19:03:32 <fungi> i'll likely be around if something comes up, because i'm terrible at vacationing
19:04:08 <fungi> but i may defer non-emergency items
19:04:09 <clarkb> #topic Actions from last meeting
19:04:20 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-05-12-19.01.txt minutes from last meeting
19:04:31 <clarkb> There are no actions to call out
19:04:37 <clarkb> #topic Priority Efforts
19:04:44 <clarkb> #topic Update Config Management
19:05:06 <clarkb> any movement on the gerritbot installation ? I haven't seen any
19:05:08 <fungi> there's been a ton of progress on our ci mirrors
19:05:35 <clarkb> yes, ianw has redeployed them all to be ansible managed udner the opendev.org domain
19:05:52 <fungi> at least the ones which weren't yet
19:06:07 <fungi> i did similarly for the ovh mirrors since they needed rebuilding anyway
19:07:19 <clarkb> corvus: mordred: I think the other big item on this topic is the nb03 arm64 docker image (which still needs debugging on the image build side) and the work to improve testing of Zuul deployments pre merge with zuul
19:07:29 <clarkb> is there anything else to call out on that? maybe changes that are ready for review?
19:07:35 <fungi> so far everything seems to be working after the mass mirror replacements, so that went extremely well
19:07:47 <corvus> i think the work on getting zuul+nodepool working in tests is nearly there -- at least, i think the 2 problems we previously identified have solutions which are complete and ready for review except:
19:08:00 * diablo_rojo sneaks in late again
19:08:18 <fungi> diablo_rojo: there might still be a few cookies left, but the coffee's gone already
19:08:24 <corvus> they keep hitting random internet hangups, so they're both still 'red'.  but i think their constituent jobs have individually demonstrated functionality
19:08:34 <corvus> 2 main changes:
19:08:36 <mordred> I'm planning on digging back in to figuring out what's up with the multi-arch stuff :(
19:08:55 <corvus> #link better iptables support in testing https://review.opendev.org/726475
19:09:01 <diablo_rojo> fungi, thats alright, I prefer tea as my hot caffeine source anyway.
19:09:07 <corvus> #link run zuul as zuuld user https://review.opendev.org/726958
19:09:18 <clarkb> corvus: gerrit reports they conflict with each other. Do they need to be stacked?
19:10:03 <corvus> that last one is worth a little extra discussion -- we went back and forth on irc on how to handle that, and we decided to use a 'container' user, but once i was done with that, i really didn't like it, so i'm proposing we zig and use a 'zuuld' user instead.
19:10:18 <mordred> ++
19:10:24 <corvus> the bulk of the change is normalizing all the variables around that, so, really, it's not a big deal to change (now or later)
19:11:06 <corvus> the main thing we were worried about is whether it would screw up the executors, but i walked through the process, and i think the executor is going to happily sshing to the 'zuul' user on the worker nodes even though it's running as zuuld
19:11:32 <clarkb> cool and that will avoid zuul on the test nodes conflicting with zuul the service user
19:11:39 <corvus> clarkb: unsure about conflicts; i'll check.  i might rebase both of them on the 'vendor ppa keys' change i'm about to right
19:12:01 <corvus> clarkb: yep
19:12:34 <corvus> i'm unsure if the ansible will rename the zuul->zuuld user on the real executors or not
19:12:52 <corvus> but if it errors, we can manually do that pretty easily
19:12:55 <clarkb> the uids stay the same right? so its jsut a minor /etc/passwd edit?
19:12:57 <corvus> the uid will be the same
19:12:58 <corvus> yep
19:13:44 <clarkb> alright anything else on the topic of config management updates?
19:14:43 <clarkb> #topic OpenDev
19:14:48 <clarkb> #link https://etherpad.opendev.org/p/XRyf4UliAKI9nRGstsP4 Email draft for building advisory board.
19:15:06 <clarkb> I wrote this draft email yesterday and am hoping I can get some reviews of it before sending it out
19:15:28 <clarkb> basic idea is to send email to service-discuss while BCC'ing people who I think may be interested. Then have those that are interested respond to the list and volunteer
19:16:16 <clarkb> if that plan sounds good and people are ok with the email content I'll go ahead and send that out and start trying to drum up interest there
19:17:15 <corvus> clarkb: generally looks good.  i noted 2 things that could use fixing
19:17:31 <clarkb> corvus: thanks I'll try to incorporate feedback after the meeting
19:17:43 <clarkb> Probably don't need to discuss this further here, just wanted to make sure people were aware of it
19:17:54 <clarkb> Any other OpenDev topics to bring up before we move on?
19:19:18 <clarkb> #topic General Topics
19:19:27 <clarkb> #topic pip-and-virtualenv next steps
19:19:38 <clarkb> ianw you added this item want to walk us through it?
19:20:24 <ianw> yes so the status of this is currently that centos-8, all suse images, all arm64 images have dropped pip-and-virtualenv
19:20:40 <ianw> that of course leaves the "hard" ones :)
19:21:04 <ianw> we have "-plain" versions of our other platform nodes, and these are passing all tests for zuul-jobs
19:21:23 <ianw> i'm not sure what to do but notify people we'll be changing this, and do it?
19:21:30 <ianw> i'm open to suggestiosn
19:21:45 <mordred> yeah - I think that's likely the best bet
19:22:00 <corvus> folks can test by adding a "nodeset" line to a project-pipeline job variant, yeah?
19:22:01 <ianw> oh and the plain nodes pass devstack
19:22:03 <clarkb> ianw: ya I think now is likely a good time for openstack. Zuul is trying to get a release done, but is also capable of debugging quickly. Airship is probably my biggest concern as they are trying to do their big 2.0 release sometime soonish
19:22:06 <mordred> it's beginning of an openstack cycle - and there should be a clear and easy answer to "my job isn't working any more" right
19:22:08 <mordred> ?
19:22:18 <clarkb> corvus: yes I think so
19:22:27 <zbr> altenative suffix -vanilla :P
19:22:35 <clarkb> zbr: we'll be dropping the suffixes entirely I think
19:22:43 <clarkb> zbr: its jsut there temporarily for testing
19:22:49 <ianw> yes i'd like to drop the suffix
19:22:58 <fungi> at a minimum we should make sure the ubuntu-focal images are "plain"
19:23:02 <clarkb> maybe the thing to do is notify people of the change with direction on testing using a vairant and the -plain images
19:23:05 <corvus> maybe we should send out an email saying "we will change by date $x; you can test now by following procedure $y" ?
19:23:09 <ianw> oh, yeah the focal images dropped it to
19:23:09 <corvus> clarkb: yeah that :)
19:23:11 <clarkb> corvus: ++
19:23:20 <fungi> openstack projects are supposed to be switching to focal and replacing legacy imported zul v2 jobs this cycle anyway
19:23:42 <fungi> the bigger impact for openstack will likely be stable branches
19:23:53 <clarkb> doing it next week may work, week after is PTG though which we'd probably want to avoid unnecesary disruption during
19:23:58 <mordred> fungi: yeah - where I imagine there will be a long tail of weird stuff
19:23:59 <fungi> they may have to backport some job fixes to cope with missing pip/virtualenv
19:24:14 <clarkb> fungi: ya, though I think the idea is that we've handled ti for them?
19:24:15 <ianw> yeah, at worse you might have to put in an ensure-virtualenv role
19:24:30 <corvus> set the date at next friday?  (or thursday?)
19:25:16 <clarkb> we can also do a soft update where we switch default nodesets to -plain while keeping the other images around. Then in a few more weeks remove the -plain images
19:26:00 <ianw> ok, i will come up with a draft email, noting the things around plain images etc and circulate it
19:26:15 <clarkb> sounds good
19:26:22 <mordred> clarkb: I like that idea
19:26:24 <fungi> but if ianw's hunch is right, then the job changes needed to add the role are equally trivial as the job changes needed to temoprarily use the other image name
19:27:26 <clarkb> ya I think we've done a fair bit of prep work to reduce impact and make fixing it easy. Rolling forward is likely as easy as being extra cautious
19:27:30 <fungi> so keeping the nonplain images around after the default switch may just be creating more work for us and an attractive nuisance for projects who don't know the fix is as simple as the workaround
19:27:37 <clarkb> the key thing is to give people the info they need to know what to do if it breaks them
19:28:14 <ianw> ++ i will definitely explain what was happening and what should happen now
19:28:23 <mordred> ++
19:28:30 <ianw> if anyone reads it, is of course another matter :)
19:28:51 <fungi> they'll read it after they pop into irc asking why their jobs broke and we link them to the ml archive
19:29:07 <fungi> better late than never
19:29:36 <clarkb> anything else on this topic?
19:29:42 <ianw> no thanks
19:29:46 <clarkb> #topic DNS cleanup
19:29:59 <clarkb> #link https://review.opendev.org/728739 : Add tool to export Rackspace DNS domains to bind format
19:30:03 <ianw> ahh, also me
19:30:07 <clarkb> go for it
19:30:24 <ianw> so when i was clearing out old mirrors, i very nearly deleted the openstack.org DOMAIN ... the buttons start to look very similar
19:30:44 <ianw> after i regained composure, i wondered if we had any sort of recovery for that
19:30:52 <ianw> but also, there's a lot of stuff in there
19:31:10 <mordred> I like the idea of exporting to a bind file
19:31:18 <ianw> firstly, is there any issue with me pasting it to an etherpad for shared eyes to do an audit of what we can remove?
19:31:43 <clarkb> ianw: I don't have any concern doing that for the records we manage in that zone. I'm not sure if that is true of the records the foundation manages
19:31:52 <mordred> ianw: you poked at the api - how hard do you think it woudl be to make a tool that would diff against a bind file and what's in the api? (other than just another export and a diff)
19:31:54 <clarkb> fungi: ^ you may have a sense for that?
19:31:56 <corvus> istr there were some maybe semi-private foundation stuff
19:32:13 <fungi> last time i looked, there was no export option in the webui and their nameservers reject axfr queries
19:32:36 <ianw> ... ok, so maybe i won't post it
19:32:52 <fungi> clarkb: we can ask the osf admins if there's anything sensitive about dns entries
19:32:52 <clarkb> ianw: we could use a file on bridge instead of etherpad
19:33:01 <mordred> could we verify with foundation if there actually are semi-private things? it woudl be nice if we could export it and put a copy into git for reference
19:33:04 <clarkb> and ya fungi and I can ask jimmy and friends if there are reasons to not do that
19:33:08 <ianw> fungi: there isn't an export option in the UI, but the change linked above uses the API for a bind export
19:33:39 <ianw> so yeah, i'm thinking at a minimum we should dump it, and other domains, periodically to bridge or somewhere incase someones does someday click the delete domain button
19:33:42 <fungi> ianw: oh, does that support pagination? last time i looked the zone was too big and just got truncated
19:33:54 <mordred> ianw: ++
19:34:18 <ianw> fungi: it looks complete to me ... let me put it on bridge if you want to look quickly
19:34:31 <fungi> sure, happy to skim
19:34:45 <clarkb> ya I like the idea of a periodic dump. I actually got introduced to Anne via a phone call while walking around seattle one afternoon due to a dns mistake that happened :/
19:34:58 <ianw> ~ianw/openstack.org-bind.txt
19:35:46 <ianw> ok, i could write it as an ansible role to dump the domain, although it would probably have to be skipped for testing
19:35:51 <ianw> or, just a cron job
19:35:53 <fungi> ideally we won't be adding any new names to that zone though, right? at most replacing a/aaaa with cname and adding le cnames
19:36:07 <mordred> fungi: yeah - but we remove names from it
19:36:10 <mordred> as we retire things
19:36:12 <clarkb> ya deleting will be common
19:36:41 <fungi> i suppose the periodic dump is to remind us what's still there
19:36:41 <ianw> fungi: yep ... and there's also a ton of other domains under management ... i think it would be good to back them up too, just as a service
19:37:00 <fungi> oh, got it, disaster recovery
19:37:32 <ianw> and, if we want to migrate things, a bind file is probably a good place to start :)
19:37:49 <fungi> in the past we've shied away from adding custom automation which speaks proprietary rackspace protocols... how is this different?
19:37:54 <mordred> ianw: feature request ... better sorting
19:38:24 <mordred> fungi: we use the rax dns api for reverse dns
19:38:50 <clarkb> also this is less automation and more record keeping (eg it can run in the background and if it breaks it won't affectanything)
19:38:54 <corvus> i like it since it's a bridge to help us move off of the platform
19:38:58 <fungi> mordred: we do, but we do that manually
19:39:10 <clarkb> oh and ya if yo uwant to move off having bind zone files is an excellent place to start
19:39:20 <ianw> fungi: i guess it's not ... but we are just making a copy of what's there.  i feel like we're cutting off our nose to spite our face if i had of deleted the domain and we didn't back it up for ideological reasons :)
19:39:23 <mordred> fungi: yeah - but what clarkb and corvus said
19:39:46 <fungi> i find the "we have periodic exports of your zones in a format you could feed into a better nameserver" argument compelling, yes
19:39:51 <mordred> ++
19:40:03 <clarkb> fwiw I've asked the question about publishing the zone file more publicly
19:40:07 <clarkb> will let everyone know what i hear back
19:40:08 <mordred> cool
19:40:29 <ianw> clarkb: thanks, we can do a shared cleanup if that's ok then, and i'll work on some sort of backup solution as well
19:40:32 <mordred> but also - I think we should spend a few minutes to sort/format that file - it'll make diffing easier
19:40:45 <mordred> like "dump file before change, dump file after change, diff to verify"
19:41:05 <clarkb> mordred: ya that sounds like a good idea
19:41:14 <ianw> mordred: alphabetical sort of hostnames?
19:41:20 <mordred> I could see just straight alph sorting being the most sensible - I don't think it needs to be grouped by type
19:41:24 <mordred> ianw: ++
19:41:36 <ianw> i can make the export tool do that no probs
19:41:46 <mordred> that should be the easiest for our human brains to audit too
19:42:13 <mordred> (in fact, the rax ui grouping things by type drives me crazy)
19:42:37 <ianw> ok, i'll also list out the other domains under management
19:42:51 <fungi> ianw: your zone export looks comprehensive to me, just skimming. i want to say the limit was something like 500 records so maybe we've sufficiently shrunk our usage that their api is no longer broken for us
19:43:05 <fungi> (it's 432 now, looks like)
19:43:23 <corvus> that sounds like a legit number :)
19:43:24 <ianw> oh good, maybe i got lucky :)
19:43:30 <clarkb> as a timecheck we have ~17 minutes and a few more items to get through. Sounds like we've got next steps for this item sorted out. I'm going to move things along
19:43:39 <ianw> yep, thanks for feedback!
19:43:44 <clarkb> #topic HTTPS on in region mirrors
19:43:50 <clarkb> #link https://review.opendev.org/#/c/728986/ Enable more ssl vhosts on mirrors
19:44:07 <clarkb> I wrote a change to enable more ssl on our mirrors. Then ianw mentioned testing and I ended up down that rabbit hole today
19:44:15 <clarkb> its a good thing too because the change as is wouldn't have worked
19:44:31 <clarkb> I think the latest patchset is good to go now though
19:44:44 <clarkb> The idea there is we can start using ssl anywhere that the clients can parse it
19:45:11 <clarkb> In particular things like pypi and docker and apt repos should all benefit
19:45:23 <ianw> that is basically everything ! apt on xenial?
19:45:35 <clarkb> ianw: ya and maybe older debian if we still have that around
19:46:13 <clarkb> if we get that in I'll confirm all the mirrors are happy and then look at updating our mirror configuration in jobs
19:46:29 <clarkb> This has the potential to be disruptive but I think it should be transparent to users
19:46:35 <ianw> ohh s/region/openafs/ ... doh
19:47:24 <clarkb> mostly just a request for review on that and a heads up that there will be changes in the space
19:47:39 <clarkb> #topic Scaling Jitsi Meet for Meetpad
19:47:52 <clarkb> Last Friday diablo_rojo hosted a conference call on meetpad
19:48:11 <clarkb> we had something like ~22 concurrent users at peak and it all seemed to work well
19:48:19 <clarkb> what we did notice is that the jitsi video bridge (jvb) was using significant cpu resources though.
19:48:23 <diablo_rojo> Yeah it was good. I had no lag or freezing.
19:48:26 <corvus> i thought there would be an open bar
19:48:37 <diablo_rojo> corvus, there was at my house
19:48:46 <clarkb> I dug into jitsi docs over the weekend and found that jvb is the main component necessary to scale as it doe all the video processing
19:49:05 <clarkb> thankfully we can run multiple jvbs and they load balance reasonably well (each conference is run on a single jvb though)
19:49:15 <clarkb> #link https://review.opendev.org/#/c/729008/ Starting to poke at running more jvbs
19:49:17 <corvus> clarkb: did you read frickler's link about that?
19:49:18 <fungi> and it can be network distributed from the other services too i guess?
19:49:22 <clarkb> corvus: I couldn't find it
19:49:36 <clarkb> fungi: yes
19:49:50 <clarkb> basically where that took me was the 729008 change above
19:50:06 <fungi> so run multiple virtual machines with a jvb each, and then another virtual machine with the remaining services
19:50:09 <fungi> got it
19:50:12 <clarkb> which I think is really close at this point and is going to be blocked on dns in our fake setup as well as firewall hole punching
19:50:56 <corvus> clarkb: maybe rebase on my firewall patch?
19:51:14 <clarkb> corvus: ya I think the firewall patch solves the firewall problem. Did you also end up doing the /etc/hosts or similar stuff?
19:51:51 <clarkb> my thinking on this is if we can get 729008 in or something like it then we can just spin up a few extra jvb's next week, have them run during the ptg, then shut them down after wards
19:52:20 <corvus> clarkb: no, firewall config is ip addresses from inventory
19:52:38 <mordred> clarkb: I have a patch for hostnames
19:52:48 <corvus> clarkb: at least, that's the backend implementation.  the frontend is 'just add the group'
19:52:50 <corvus> mordred: you what?
19:52:51 <mordred> https://review.opendev.org/#/c/726910/
19:53:00 <mordred> if we want to do that
19:53:06 <corvus> i thought we decided not to?
19:54:24 <clarkb> I'll continue and we can sort out those details after the meeting. Have a few more items to bring up before our hour is up
19:54:27 <mordred> I don't remember us deciding that - but if we did, cool - I can abandon that patch
19:54:35 <clarkb> #topic Project Renames
19:54:46 <clarkb> We have a single project queued up for renaming in gerrit and gitea
19:55:06 <clarkb> we've been asked about this several times and the response previously was we didn't want to take an outage just prior to the openstack release
19:55:10 <fungi> it's been waiting since just after our last rename maintenance
19:55:18 <clarkb> that release is now done so we can schedule a new rename.
19:55:29 <clarkb> fungi: right, one of the problems here was they didn't get on the queue last time and showed up like the day after
19:55:50 <clarkb> this time around I'll try to "advertise" the scheduled rename date to get things on the list as early as possible
19:56:07 <clarkb> My current thinking is that with all the prep to the ptg and the ptg itself as well as holidays the best time for this may be the week after the ptg
19:56:24 <clarkb> June 8-12 ish time frame
19:56:45 <clarkb> also post PTG tends to be a quiet time so may be good for users too
19:57:07 <clarkb> any preferences or other ideas?
19:58:02 <clarkb> doesn't sound like it. Let's pencil in June 12 and start getting potential renames to think with that deadline in mind?
19:58:21 <clarkb> (sorry if I'm moving too fast I keep looking at hte clock and realize we are just about at time for the day)
19:58:31 <fungi> wfm, thanks
19:58:41 <clarkb> #topic Virtual PTG Attendance
19:58:48 <clarkb> #link https://virtualptgjune2020.eventbrite.com Register if you plan to attend. This helps with planning details.
19:58:53 <clarkb> #link https://etherpad.opendev.org/p/opendev-virtual-ptg-june-2020 PTG Ideas
19:59:01 <clarkb> A friendly reminder to register for the PTG if you plan to attend
19:59:10 <clarkb> as well as a link to our planning document with connection and time details
19:59:22 <clarkb> This will be all new and different. Will be interesting to see how it goes
19:59:44 <clarkb> And that basically takes us to the end of the hour
19:59:53 <fungi> thanks clarkb!
19:59:58 <clarkb> Thank you everyone for your time. Feel free to continue discussions in #opendev
20:00:02 <clarkb> #endmeeting