19:01:11 <clarkb> #startmeeting infra
19:01:12 <openstack> Meeting started Tue Apr 28 19:01:11 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:13 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:15 <openstack> The meeting name has been set to 'infra'
19:01:22 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-April/000011.html Our Agenda
19:01:31 <clarkb> #topic Announcements
19:02:08 <clarkb> For the OpenDev Service Coordinator I said we would wait for volunteers until the end of the month. That gives you a few more days if interested :)
19:02:40 <mordred> o/
19:02:41 <fungi> how many people have volunteered so far?
19:02:55 <clarkb> fungi: I think only me with my informal "I'm willing to do it" portion of the message
19:03:10 <clarkb> I figured I would send a separate email thursday if no one else did first :)
19:03:18 <fungi> thanks!
19:03:35 <diablo_rojo_phon> o/
19:04:00 <clarkb> #topic Actions from last meeting
19:04:07 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-04-21-19.01.txt minutes from last meeting
19:04:24 <clarkb> There were no actions recorded last meeting. Why don't we dive right into all the fun ansible docker puppet things
19:04:32 <clarkb> #topic Priority Efforts
19:04:36 <fungi> why don't we?
19:04:39 <clarkb> #topic Update Config Management
19:04:50 <fungi> oh, i see, that was a rhetorical question! ;)
19:04:53 <clarkb> fungi: :)
19:05:11 <clarkb> mordred: first up on this agenda we've got dockerization of gerrit. Are we happy with how that has gone and can we remove that agenda item now?
19:05:19 <clarkb> mordred: I think all of the outstanding issues I knew about were addressed
19:05:48 <fungi> we do seem to be caught back up since friday's unanticipated excitements
19:06:10 <fungi> and now we're much further along with things too
19:06:21 <corvus> so i guess that's done and we could start thinking about the upgrade to 2.16?
19:06:27 <mordred> WELLL
19:06:36 <mordred> there's still a cleanup task
19:06:40 <mordred> which is gerritbot
19:06:47 <mordred> I don't want to forget about it
19:07:05 <fungi> oh, right, tied in with the eavesdrop dockering
19:07:06 <mordred> but we've got eavesdrop split into a playbook and are containering accessbot on eavesdrop now
19:07:20 <mordred> as well as a gerritbot container patch up
19:07:41 <mordred> oh - we landed that
19:07:49 <mordred> cool - I wanna get that bit finished
19:08:05 <mordred> but then, yes, I agree with corvus - next step is working on the 2.16 upgrade
19:08:26 <clarkb> sounds like we are very close
19:08:45 <mordred> yeah. I think we can remove the agenda item - gerritbot is more just normal work
19:09:12 <fungi> thuogh the gerrit upgrade becomes an agenda item
19:09:32 <clarkb> Next up is the Zuul driven playbooks. Last Friday we dove head first into running all of zuul from ansible and most of zuul with containers
19:10:01 <mordred> yeah - zuul-executor is still installed via pip - because we haven't figured out docker+bubblewrap+afs yet
19:10:07 <clarkb> #link https://review.opendev.org/#/c/724115/ Fix for zuul-scheduler ansible role
19:10:19 <mordred> everything else is in containers - which means everything else is now running python 3.7
19:10:21 <clarkb> I found a recent issue with that that could use some eyeballs (looks like testing failed)
19:10:30 <fungi> we sort of didn't anticipate merging the change to go through and change uids/gids on servers, which turned it into a bit of a fire drill, but the outcome is marvellous
19:11:02 <clarkb> related to this is work to do the same with nodepool services
19:11:11 <mordred> clarkb: oh - that's the reason I didn't have the value set in the host_vars
19:11:34 <clarkb> mordred: does ansible load from the production value and override testing?
19:13:18 <clarkb> for nodepool services I think we are sorting out multiarch container builds as we have x86 and arm64 nodepool builders
19:13:23 <mordred> clarkb: I think so
19:13:48 <mordred> we're VERY close to having multiarch working
19:14:00 <mordred> https://review.opendev.org/#/c/722339/
19:14:01 <clarkb> then once that is doen and deployed I'd like to land a rebase of https://review.opendev.org/#/c/722394/ to make it easier to work with all of our new jobs in system-config
19:14:13 <corvus> yes please :)
19:14:56 <mordred> ++
19:15:11 <mordred> https://review.opendev.org/#/c/724079/ <-- we need this to work with the multi-arch job on an ipv6-only cloud
19:16:43 <ianw> ^ that has an unknown configuration error?
19:18:26 <mordred> ianw: AROO
19:19:33 <corvus> maybe that's the long-standing bug we've been trying to track down
19:19:50 <corvus> i can take a look after mtg
19:20:43 <clarkb> The other thing I wanted to call out is that we are learning quite a bit about using Zuul for CD
19:21:01 <ianw> apropos the container services, i don't see any reason not to replace nb01 & nb02 with container based versions now?
19:21:02 <clarkb> for example I think we've decided that periodic jobs should pull latest git repo state rather than rely on what zuul provided
19:21:03 <fungi> i like to think we're improving zuul's viability for cd
19:21:06 <clarkb> ianw: ++
19:21:41 <clarkb> if you notice irregularities in playbook application to production please call it out. Beacuse as fungi points out I think we are improving things by learning here :)
19:21:49 <ianw> i can do that and move builds there
19:21:55 <ianw> (new builders, i mean)
19:22:01 <fungi> at the very least we're becoming an early case study on complex cd with zuul
19:22:31 <fungi> (spoiler, it's working remarkably well!)
19:23:07 <clarkb> anything else on the subject of config management and continuous deployment?
19:23:30 <mordred> oh - I'm testing focal for zuul-executors
19:23:44 <mordred> https://review.opendev.org/#/c/723528/
19:23:58 <fungi> it's due to be officially released next week, right?
19:24:06 <mordred> once that's working - I wanna start replacing ze*.openstack.org with ze*.opendev.org on focal
19:24:08 <mordred> it's already released
19:24:14 <fungi> oh!
19:24:22 <fungi> i'm clearly behind on the news
19:24:51 <fungi> time has become more of an illusion than usual
19:25:13 <ianw> hrm, so for new builders, such as nb0X ... focal too?
19:25:30 <fungi> might as well
19:25:32 <ianw> s/builders/servers  ?
19:25:57 <clarkb> my only concern at this point with it is major services like mysql crash on it
19:26:06 <ianw> that will add some testing updates into the loop, but that's ok
19:26:15 <mordred> clarkb: "awesome"
19:26:15 <clarkb> we should carefully test things are workign as we put focal into production
19:26:18 <mordred> ++
19:26:20 <fungi> in theory anything we can deploy on bionic we ought to be able to deploy on focal, bugs like what clarkb mentions aside
19:26:44 <fungi> stuff that's still relying on puppet is obviously frozen in the xenial past
19:27:08 <ianw> i doubt system-config testing will work on it ATM for testing infrastructure reasons ... i'm working on it
19:27:08 <clarkb> less and less stuff is relying on puppet though
19:27:10 <mordred> ianw: we could also go ahead and just do bionic for builders and get that done
19:27:37 <ianw> particularly the ensure-tox role and it installing as a user
19:27:56 <mordred> ianw: since those are in containers - I mostly wanted to roll executors out on focal so that we could run all of zuul on the same version of python
19:28:06 <ianw> ... yes pip-and-virtualenv is involved somehow
19:28:33 <mordred> ianw: the focal test nodes are actually going ok for the run-service-zuul test job fwiw
19:29:27 <ianw> mordred: hrm, perhaps it's mostly if we were to have a focal bridge.o.o in the mix in the test suite, where testinfra is run
19:30:03 <mordred> yeah - for things where we're running our ansible- the lack of pre-installed stuff is good, since we install everything from scratch anyway
19:31:58 <clarkb> Sounds like that may be it for this topic
19:32:02 <clarkb> #topic OpenDev
19:32:13 <clarkb> Another friendly reminder to volunteer for service coordinator if interested
19:32:36 <clarkb> On a services front we upgraded gitea thursdayish then had reports of failed git clones over the weekend
19:32:50 <clarkb> "thankfully" it seems that is a network problem and unrelated to our upgrade
19:33:05 <mordred> \o.
19:33:07 <clarkb> citycloud kna1 (and kna3?) was losing packets sent to vexxhost sjc1
19:33:07 <mordred> I mean
19:33:09 <mordred> \o/
19:33:25 <clarkb> fungi was able to track that down using our mirror in kna1 to reproduce the user reports
19:33:33 <clarkb> and from there we did some traceroutes and passed that along to the cloud providers
19:33:52 <clarkb> something to be aware of if we have more reports of this. Double checking the origin is worthwhile
19:34:05 <clarkb> Also lists.* has been OOMing daily at between 1000-1200UTC
19:34:33 <fungi> yeah, that one's not been so easy to correlate
19:34:40 <clarkb> fungi has been running dstat data collection to help debug that and I think the data shows it isn't bup or mailman. During the period of sadness we get many listinfo processes
19:34:49 <fungi> i think you're on to something with the semrush bot in the logs
19:35:15 <clarkb> those listinfo processes are started by apache to render webpage stuff for mailman and correlating to logs we have a semrush bot hitting us during every OOM I've checked
19:35:26 <clarkb> I've manually dropped in a robots.txt file to tell semrushbot to go away
19:35:33 <fungi> though ultimately, this probably means we should eventually upgrade the lists server to something with a bit more oomph
19:35:43 <clarkb> I've also noticed a "The Knowledge AI" bot but it doesn't seem to show up when things are sad
19:35:43 <fungi> or tune apache to not oom the server
19:35:55 <clarkb> fungi: and maybe even both things :)
19:36:29 <clarkb> but ya if the robots.txt "fixes" things I think we can encode that in puppet and then look at tuning apache to reduce numebr of connections?
19:37:48 <clarkb> #topic General Topics
19:37:49 <fungi> i think so, yes
19:38:13 <clarkb> A virtual PTG is planned for the beginning of june
19:38:32 <clarkb> I've requested these time blocks for us: Monday 1300-1500 UTC, Monday 2300-0100 UTC, Wednesday 0400-0600 UTC
19:38:54 <clarkb> fungi: have you been tracking what registration and other general "getting involved" requires?
19:39:29 <fungi> not really
19:39:41 <fungi> i mean, i understand registration is free
19:40:46 <clarkb> k I'll try to get more details on that so that any interesting in participating can do so
19:40:55 <clarkb> (I expect it will be relatively easy compared to typical PTGs)
19:40:57 <fungi> but folks are encouraged to register so that 1. organizers can have a better idea of what capacity to plan for, and 2. to help the osf meet legal requirements for things like code of conduct agreement
19:41:18 <fungi> there's also discussion underway to do requirements gathering for tools
19:41:56 <fungi> i can find an ml archive link
19:42:07 <clarkb> thanks!
19:43:46 <corvus> should we do anything with meetpad?
19:43:51 <fungi> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014367.html PTG Signup Reminder & PTG Registration
19:44:26 <fungi> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014481.html Virtual PTG Tooling Requirements & Suggestions
19:44:32 <clarkb> I think we should keep pushing on meetpad. Last I checked the deployment issues were addressed. Should we plan another round of test calls?
19:44:50 <corvus> is it generally working now?  does it need more testing?  are there bugs we should look at?  does it meet any of the requirements the foundation has set?
19:45:33 <fungi> there's an etherpad in that second ml archive link where a list of requirements is being gathered
19:45:33 <clarkb> I don't know if its working -> more testing probably a good idea. One outstanding issue is the http to https redirect is still missing iirc
19:45:35 <corvus> i noticed there was a bunch of 'required' stuff on https://etherpad.opendev.org/p/virt-ptg-requirements
19:45:51 <corvus> clarkb: ah, yeah forgot about that.  i can add that real quick.
19:46:11 <fungi> probably one of the harder things to meet there is "Legal/Compliance Approval (e.g. OSF Privacy Policy, OSF Code of Conduct, GDPR)"
19:46:22 <corvus> i don't even know what that means
19:46:25 <fungi> but i don't really know, maybe that's easy
19:47:43 <clarkb> corvus: it might be a good idea to get meetpad to a place we are generally happy with it, then we can put it in front of the osf and ask them for more details on those less explicit rquirements?
19:47:43 <fungi> i think it's supposed to mean that the osf privacy policy is linked from the service in an easy to find way, that the service complies with the gdpr (that's vague too), and that something enforces that all people connecting agree to follow the code of conduct
19:48:14 <fungi> but yes, getting clarification on those points would be good
19:49:10 <fungi> also we've (several of us) done our best to remind osf folks that people will use whatever tools they want at the end of the day, so it may be a matter of legal risks for the osf endorsing specific tools vs just accepting their use
19:49:14 <clarkb> I do think on paper jitsi meets the more concrete requirments with maybe exception of the room size (depends on whether or not we can bump that up?)
19:49:28 <corvus> is there a limit on room size?
19:49:40 <clarkb> corvus: did you say it is a limit of 35? I thought someone said that
19:49:44 <corvus> i thought i read about hundreds of people in a jitsi room
19:49:46 <clarkb> but there was some workaround for that
19:50:04 <fungi> i think that one was more of "the service should still be usable for conversation when 50 people are in the same room"
19:50:22 <clarkb> https://community.jitsi.org/t/maximum-number-of-participants-on-a-meeting-on-meet-jit-si-server/22273
19:50:29 <clarkb> maybe the 35 number came from something like that
19:50:37 <fungi> (noting that 10 people talking at once is tough to manage, much less 50, and that has little to do with the tools)
19:50:50 <clarkb> also those numbers may be specific to the meet.jit.si deployment (and we can tune ours separate?)
19:51:25 <fungi> also osf has concerns that whatever platforms are endorsed have strong controls allowing rooms to be moderated, people to be muted by moderators, and abusive attendees to be removed reliably
19:51:27 <corvus> yeah, sounds like there may be issues with larger numbers of folks
19:51:39 <clarkb> in any case I think step 0 is getting it to work in our simpler case with the etherpad integration
19:52:00 <clarkb> I think we haven't quite picked that up against since all the etherpad and jitsi updates so worth retesting and seeing where it is at now
19:52:26 <corvus> k, i'll do the http redirect and ping some folks maybe tomorrow for testing
19:52:39 <clarkb> thanks!
19:52:49 <clarkb> fungi any wiki updates?
19:53:07 <fungi> none, the most i find time for is patrolling page edits from new users
19:53:18 <fungi> (and banning and mass deleting the spam)
19:53:32 <clarkb> #topic Open Discussion
19:53:39 <clarkb> That takes us to the end of our agenda
19:54:10 <clarkb> As a quick note my ISP has been sold off and acquired by a new company. That transition takes effect May 1st (Friday). I don't expect outages but seems like chances for them are higher under those circumstances
19:54:43 <corvus> clarkb: i hear the internet is gonna be big
19:55:22 <clarkb> corvus: do you think we could sell books over the internet?
19:55:57 <ianw> if i could get an eye on
19:56:02 <ianw> #link https://review.opendev.org/#/c/723309/
19:56:26 <ianw> that is part of the pip-and-virtualenv work to add an ensure-virtualenv role for things that actually require virutalenv
19:57:02 <ianw> dib is one such thing, this gets the arm64 builds back testing; we have dropped pip-and-virtualenv from them
19:57:21 <clarkb> ianw: I'll take a look after lunch if it is still up there then
19:58:03 <ianw> #link http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-04-28.log.html#t2020-04-28T04:47:00
19:58:10 <ianw> is my near term plan for that work
19:58:16 <fungi> i have a very rough start on the central auth spec drafted locally, though it's still a patchwork of prose i've scraped from various people's e-mail messages over several years so i'm trying to find time to wrangle that into something sensible and current
19:59:14 <fungi> and i've been fiddling with a new tool to gather engagement metrics from gerrit (and soon mailman pipermail archives, meetbot channel logs, et cetera)
19:59:33 <fungi> trying to decide whether i should push that up to system-config or make a new repo for it
20:00:18 <clarkb> fungi: new repo might be worthwhile. Thinking out loud here: the zuul work in system-config is really orientating it towards deployment of tools but not in defining the tools itself as much?
20:00:36 <fungi> yeah
20:00:42 <fungi> i concur
20:00:50 <fungi> i should make it an installable python project
20:01:10 <clarkb> and we are at time
20:01:13 <clarkb> thank you everyone@
20:01:15 <clarkb> er !
20:01:17 <clarkb> #endmeeting