19:01:11 #startmeeting infra 19:01:12 Meeting started Tue Apr 28 19:01:11 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:13 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:15 The meeting name has been set to 'infra' 19:01:22 #link http://lists.opendev.org/pipermail/service-discuss/2020-April/000011.html Our Agenda 19:01:31 #topic Announcements 19:02:08 For the OpenDev Service Coordinator I said we would wait for volunteers until the end of the month. That gives you a few more days if interested :) 19:02:40 o/ 19:02:41 how many people have volunteered so far? 19:02:55 fungi: I think only me with my informal "I'm willing to do it" portion of the message 19:03:10 I figured I would send a separate email thursday if no one else did first :) 19:03:18 thanks! 19:03:35 o/ 19:04:00 #topic Actions from last meeting 19:04:07 #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-04-21-19.01.txt minutes from last meeting 19:04:24 There were no actions recorded last meeting. Why don't we dive right into all the fun ansible docker puppet things 19:04:32 #topic Priority Efforts 19:04:36 why don't we? 19:04:39 #topic Update Config Management 19:04:50 oh, i see, that was a rhetorical question! ;) 19:04:53 fungi: :) 19:05:11 mordred: first up on this agenda we've got dockerization of gerrit. Are we happy with how that has gone and can we remove that agenda item now? 19:05:19 mordred: I think all of the outstanding issues I knew about were addressed 19:05:48 we do seem to be caught back up since friday's unanticipated excitements 19:06:10 and now we're much further along with things too 19:06:21 so i guess that's done and we could start thinking about the upgrade to 2.16? 19:06:27 WELLL 19:06:36 there's still a cleanup task 19:06:40 which is gerritbot 19:06:47 I don't want to forget about it 19:07:05 oh, right, tied in with the eavesdrop dockering 19:07:06 but we've got eavesdrop split into a playbook and are containering accessbot on eavesdrop now 19:07:20 as well as a gerritbot container patch up 19:07:41 oh - we landed that 19:07:49 cool - I wanna get that bit finished 19:08:05 but then, yes, I agree with corvus - next step is working on the 2.16 upgrade 19:08:26 sounds like we are very close 19:08:45 yeah. I think we can remove the agenda item - gerritbot is more just normal work 19:09:12 thuogh the gerrit upgrade becomes an agenda item 19:09:32 Next up is the Zuul driven playbooks. Last Friday we dove head first into running all of zuul from ansible and most of zuul with containers 19:10:01 yeah - zuul-executor is still installed via pip - because we haven't figured out docker+bubblewrap+afs yet 19:10:07 #link https://review.opendev.org/#/c/724115/ Fix for zuul-scheduler ansible role 19:10:19 everything else is in containers - which means everything else is now running python 3.7 19:10:21 I found a recent issue with that that could use some eyeballs (looks like testing failed) 19:10:30 we sort of didn't anticipate merging the change to go through and change uids/gids on servers, which turned it into a bit of a fire drill, but the outcome is marvellous 19:11:02 related to this is work to do the same with nodepool services 19:11:11 clarkb: oh - that's the reason I didn't have the value set in the host_vars 19:11:34 mordred: does ansible load from the production value and override testing? 19:13:18 for nodepool services I think we are sorting out multiarch container builds as we have x86 and arm64 nodepool builders 19:13:23 clarkb: I think so 19:13:48 we're VERY close to having multiarch working 19:14:00 https://review.opendev.org/#/c/722339/ 19:14:01 then once that is doen and deployed I'd like to land a rebase of https://review.opendev.org/#/c/722394/ to make it easier to work with all of our new jobs in system-config 19:14:13 yes please :) 19:14:56 ++ 19:15:11 https://review.opendev.org/#/c/724079/ <-- we need this to work with the multi-arch job on an ipv6-only cloud 19:16:43 ^ that has an unknown configuration error? 19:18:26 ianw: AROO 19:19:33 maybe that's the long-standing bug we've been trying to track down 19:19:50 i can take a look after mtg 19:20:43 The other thing I wanted to call out is that we are learning quite a bit about using Zuul for CD 19:21:01 apropos the container services, i don't see any reason not to replace nb01 & nb02 with container based versions now? 19:21:02 for example I think we've decided that periodic jobs should pull latest git repo state rather than rely on what zuul provided 19:21:03 i like to think we're improving zuul's viability for cd 19:21:06 ianw: ++ 19:21:41 if you notice irregularities in playbook application to production please call it out. Beacuse as fungi points out I think we are improving things by learning here :) 19:21:49 i can do that and move builds there 19:21:55 (new builders, i mean) 19:22:01 at the very least we're becoming an early case study on complex cd with zuul 19:22:31 (spoiler, it's working remarkably well!) 19:23:07 anything else on the subject of config management and continuous deployment? 19:23:30 oh - I'm testing focal for zuul-executors 19:23:44 https://review.opendev.org/#/c/723528/ 19:23:58 it's due to be officially released next week, right? 19:24:06 once that's working - I wanna start replacing ze*.openstack.org with ze*.opendev.org on focal 19:24:08 it's already released 19:24:14 oh! 19:24:22 i'm clearly behind on the news 19:24:51 time has become more of an illusion than usual 19:25:13 hrm, so for new builders, such as nb0X ... focal too? 19:25:30 might as well 19:25:32 s/builders/servers ? 19:25:57 my only concern at this point with it is major services like mysql crash on it 19:26:06 that will add some testing updates into the loop, but that's ok 19:26:15 clarkb: "awesome" 19:26:15 we should carefully test things are workign as we put focal into production 19:26:18 ++ 19:26:20 in theory anything we can deploy on bionic we ought to be able to deploy on focal, bugs like what clarkb mentions aside 19:26:44 stuff that's still relying on puppet is obviously frozen in the xenial past 19:27:08 i doubt system-config testing will work on it ATM for testing infrastructure reasons ... i'm working on it 19:27:08 less and less stuff is relying on puppet though 19:27:10 ianw: we could also go ahead and just do bionic for builders and get that done 19:27:37 particularly the ensure-tox role and it installing as a user 19:27:56 ianw: since those are in containers - I mostly wanted to roll executors out on focal so that we could run all of zuul on the same version of python 19:28:06 ... yes pip-and-virtualenv is involved somehow 19:28:33 ianw: the focal test nodes are actually going ok for the run-service-zuul test job fwiw 19:29:27 mordred: hrm, perhaps it's mostly if we were to have a focal bridge.o.o in the mix in the test suite, where testinfra is run 19:30:03 yeah - for things where we're running our ansible- the lack of pre-installed stuff is good, since we install everything from scratch anyway 19:31:58 Sounds like that may be it for this topic 19:32:02 #topic OpenDev 19:32:13 Another friendly reminder to volunteer for service coordinator if interested 19:32:36 On a services front we upgraded gitea thursdayish then had reports of failed git clones over the weekend 19:32:50 "thankfully" it seems that is a network problem and unrelated to our upgrade 19:33:05 \o. 19:33:07 citycloud kna1 (and kna3?) was losing packets sent to vexxhost sjc1 19:33:07 I mean 19:33:09 \o/ 19:33:25 fungi was able to track that down using our mirror in kna1 to reproduce the user reports 19:33:33 and from there we did some traceroutes and passed that along to the cloud providers 19:33:52 something to be aware of if we have more reports of this. Double checking the origin is worthwhile 19:34:05 Also lists.* has been OOMing daily at between 1000-1200UTC 19:34:33 yeah, that one's not been so easy to correlate 19:34:40 fungi has been running dstat data collection to help debug that and I think the data shows it isn't bup or mailman. During the period of sadness we get many listinfo processes 19:34:49 i think you're on to something with the semrush bot in the logs 19:35:15 those listinfo processes are started by apache to render webpage stuff for mailman and correlating to logs we have a semrush bot hitting us during every OOM I've checked 19:35:26 I've manually dropped in a robots.txt file to tell semrushbot to go away 19:35:33 though ultimately, this probably means we should eventually upgrade the lists server to something with a bit more oomph 19:35:43 I've also noticed a "The Knowledge AI" bot but it doesn't seem to show up when things are sad 19:35:43 or tune apache to not oom the server 19:35:55 fungi: and maybe even both things :) 19:36:29 but ya if the robots.txt "fixes" things I think we can encode that in puppet and then look at tuning apache to reduce numebr of connections? 19:37:48 #topic General Topics 19:37:49 i think so, yes 19:38:13 A virtual PTG is planned for the beginning of june 19:38:32 I've requested these time blocks for us: Monday 1300-1500 UTC, Monday 2300-0100 UTC, Wednesday 0400-0600 UTC 19:38:54 fungi: have you been tracking what registration and other general "getting involved" requires? 19:39:29 not really 19:39:41 i mean, i understand registration is free 19:40:46 k I'll try to get more details on that so that any interesting in participating can do so 19:40:55 (I expect it will be relatively easy compared to typical PTGs) 19:40:57 but folks are encouraged to register so that 1. organizers can have a better idea of what capacity to plan for, and 2. to help the osf meet legal requirements for things like code of conduct agreement 19:41:18 there's also discussion underway to do requirements gathering for tools 19:41:56 i can find an ml archive link 19:42:07 thanks! 19:43:46 should we do anything with meetpad? 19:43:51 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014367.html PTG Signup Reminder & PTG Registration 19:44:26 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014481.html Virtual PTG Tooling Requirements & Suggestions 19:44:32 I think we should keep pushing on meetpad. Last I checked the deployment issues were addressed. Should we plan another round of test calls? 19:44:50 is it generally working now? does it need more testing? are there bugs we should look at? does it meet any of the requirements the foundation has set? 19:45:33 there's an etherpad in that second ml archive link where a list of requirements is being gathered 19:45:33 I don't know if its working -> more testing probably a good idea. One outstanding issue is the http to https redirect is still missing iirc 19:45:35 i noticed there was a bunch of 'required' stuff on https://etherpad.opendev.org/p/virt-ptg-requirements 19:45:51 clarkb: ah, yeah forgot about that. i can add that real quick. 19:46:11 probably one of the harder things to meet there is "Legal/Compliance Approval (e.g. OSF Privacy Policy, OSF Code of Conduct, GDPR)" 19:46:22 i don't even know what that means 19:46:25 but i don't really know, maybe that's easy 19:47:43 corvus: it might be a good idea to get meetpad to a place we are generally happy with it, then we can put it in front of the osf and ask them for more details on those less explicit rquirements? 19:47:43 i think it's supposed to mean that the osf privacy policy is linked from the service in an easy to find way, that the service complies with the gdpr (that's vague too), and that something enforces that all people connecting agree to follow the code of conduct 19:48:14 but yes, getting clarification on those points would be good 19:49:10 also we've (several of us) done our best to remind osf folks that people will use whatever tools they want at the end of the day, so it may be a matter of legal risks for the osf endorsing specific tools vs just accepting their use 19:49:14 I do think on paper jitsi meets the more concrete requirments with maybe exception of the room size (depends on whether or not we can bump that up?) 19:49:28 is there a limit on room size? 19:49:40 corvus: did you say it is a limit of 35? I thought someone said that 19:49:44 i thought i read about hundreds of people in a jitsi room 19:49:46 but there was some workaround for that 19:50:04 i think that one was more of "the service should still be usable for conversation when 50 people are in the same room" 19:50:22 https://community.jitsi.org/t/maximum-number-of-participants-on-a-meeting-on-meet-jit-si-server/22273 19:50:29 maybe the 35 number came from something like that 19:50:37 (noting that 10 people talking at once is tough to manage, much less 50, and that has little to do with the tools) 19:50:50 also those numbers may be specific to the meet.jit.si deployment (and we can tune ours separate?) 19:51:25 also osf has concerns that whatever platforms are endorsed have strong controls allowing rooms to be moderated, people to be muted by moderators, and abusive attendees to be removed reliably 19:51:27 yeah, sounds like there may be issues with larger numbers of folks 19:51:39 in any case I think step 0 is getting it to work in our simpler case with the etherpad integration 19:52:00 I think we haven't quite picked that up against since all the etherpad and jitsi updates so worth retesting and seeing where it is at now 19:52:26 k, i'll do the http redirect and ping some folks maybe tomorrow for testing 19:52:39 thanks! 19:52:49 fungi any wiki updates? 19:53:07 none, the most i find time for is patrolling page edits from new users 19:53:18 (and banning and mass deleting the spam) 19:53:32 #topic Open Discussion 19:53:39 That takes us to the end of our agenda 19:54:10 As a quick note my ISP has been sold off and acquired by a new company. That transition takes effect May 1st (Friday). I don't expect outages but seems like chances for them are higher under those circumstances 19:54:43 clarkb: i hear the internet is gonna be big 19:55:22 corvus: do you think we could sell books over the internet? 19:55:57 if i could get an eye on 19:56:02 #link https://review.opendev.org/#/c/723309/ 19:56:26 that is part of the pip-and-virtualenv work to add an ensure-virtualenv role for things that actually require virutalenv 19:57:02 dib is one such thing, this gets the arm64 builds back testing; we have dropped pip-and-virtualenv from them 19:57:21 ianw: I'll take a look after lunch if it is still up there then 19:58:03 #link http://eavesdrop.openstack.org/irclogs/%23opendev/%23opendev.2020-04-28.log.html#t2020-04-28T04:47:00 19:58:10 is my near term plan for that work 19:58:16 i have a very rough start on the central auth spec drafted locally, though it's still a patchwork of prose i've scraped from various people's e-mail messages over several years so i'm trying to find time to wrangle that into something sensible and current 19:59:14 and i've been fiddling with a new tool to gather engagement metrics from gerrit (and soon mailman pipermail archives, meetbot channel logs, et cetera) 19:59:33 trying to decide whether i should push that up to system-config or make a new repo for it 20:00:18 fungi: new repo might be worthwhile. Thinking out loud here: the zuul work in system-config is really orientating it towards deployment of tools but not in defining the tools itself as much? 20:00:36 yeah 20:00:42 i concur 20:00:50 i should make it an installable python project 20:01:10 and we are at time 20:01:13 thank you everyone@ 20:01:15 er ! 20:01:17 #endmeeting