19:01:24 #startmeeting infra 19:01:25 Meeting started Tue May 5 19:01:24 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:28 The meeting name has been set to 'infra' 19:01:35 #link http://lists.opendev.org/pipermail/service-discuss/2020-May/000017.html Our Agenda 19:01:59 o/ 19:02:01 o/ 19:02:10 #topic Announcements 19:02:28 Just the standing reminder that we are meeting here now. I can probably drop that at this point 19:02:38 #topic Actions from last meeting 19:02:46 #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-04-28-19.01.txt minutes from last meeting 19:02:58 There were no actions from last meeting (maybe need to get better at tracking those) 19:03:12 it's a cool meetbot feature 19:03:17 I think we can dive straight into the agenda for today 19:03:19 #topic Priority Efforts 19:03:25 but i kinda like having no action items ;) 19:03:26 #topic Update Config Management 19:03:30 fungi: me too :) 19:03:47 First up is a check in on the Gerrit and particularly Gerritbot situation 19:03:56 People keep noticing that gerritbot's config isn't updating 19:04:05 mordred: ^ anything new to share there or things people can help with? 19:04:16 uh 19:04:29 no - mostly still slogging through the zuul/nodepool rollout 19:05:12 so I have made zero progress on gerrit/gerritbot since last week 19:05:24 mordred: is the plan there to deploy it in a container on eavesdrop? 19:05:31 BUT - as soon as we rollout nodepool and then do your reorg patch, I'll work on that 19:05:33 yes 19:05:44 so it should be easy - once there are slightly fewer plates spinning 19:05:48 k 19:05:59 also I suppose if anyone wanted to help they could start with changes to build those docker image(s)? 19:06:01 you know - and fewer memory leaks 19:06:06 already done 19:06:12 ah perfect 19:06:25 done and merged even: https://review.opendev.org/#/c/715635/ 19:06:35 so it's really just adding the ansible to run gerritbot on eavesdrop 19:06:54 but - that's new jobs, so I think is an after-clarkb-zuul.yaml-split sort of thing 19:07:12 else we might lose our minds ;) 19:07:18 #info Next up for Gerritot is to add asnible to run gerritbot docker image on eavesdrop. Will want to do it after the system-config zuul config reorg 19:07:50 Lets talk about that then. The bit we are waiting for before reorging system-config's zuul configs is getting nodepool running via ansible and docker 19:08:08 mordred: I think most of the changes ofr that are ready to go? its just a matter of deploying them? 19:08:46 yup 19:08:50 maybe we can pick a launcher and a builer to convert, test that everything works on them then roll out to the other launchers abd builders? my biggest concern is with openstack doing release things nowish that we want to avoid big disruptions but a staged rollout should be pretty safe 19:09:11 yeah - we'll have chowns to do like we did for zuul 19:09:26 but - it's launchers and builders - so one being down for a minute isn't a big deal 19:09:38 yup rolling updates on those is pretty small impact 19:09:44 clarkb: maybe we put everything in to emergency, then just work them one at a time 19:09:49 ++ 19:10:27 then when that is rolled out I can refresh my system-config reorg and hopefully land that quickly before it conflicts with anything 19:11:19 The last thing I wanted to talk about here was trying to minimze user impacts for a bit to help openstack's release process go smoothly as possible 19:11:32 ++ 19:11:38 I know we want to restart gerrit and we'll want to restart zuul services on the scheduler once new images land to pick up the jemalloc update 19:12:04 if we want to do that in the next week and a half we should coordinate with the openstack release team 19:12:19 Both should be relatively low impact but zuul-scheduler in particular will dump queues restarting jobs 19:12:47 clarkb: we could do the scheduler restart on the weekend when there's less traffic 19:13:17 the gerrit restart should be super easy - that one we can probably do whenever - no new functionality - really just the repl config 19:14:18 ya, maybe plan both for friday (releas ethings go quiet on fridays iirc) 19:14:29 I'll ask the release team if that works fo rthem 19:14:52 sounds good to me 19:15:11 great. Anything else on the subject of our config management manuevers? 19:15:22 oh - re: nodepool, we've also got arm images working through the system - so we should also be able to update nb03 to ansible/docker 19:15:43 gerrit restart still needs the release team to avoid pushing release changes though, since in the past they've had trouble with replication lag and a gerrit restart will force a full replication run, right? 19:15:44 and the other nb's can get rebuilt as opendev and go ansible/docker too 19:15:55 so anything which merges shortly after the restart may not get replicated for a while 19:15:57 fungi: it should not force a full repl 19:16:00 fungi: we don't force replication on restart anymore 19:16:07 yeah - we finally fixed that 19:16:11 oh, excellent, i had clearly forgotten that! 19:16:13 but we'll clear it with them anyway 19:17:37 #topic OpenDev 19:18:17 I made a call for Service Coordinator volunteers with a deadline set for the end of April. Last week I promised I would volunteer officially to mkae that more official and I did not see any other volunteers 19:18:51 I think that means I'm it, but wanted to get some acknowledgement of that beyond me asserting it if that makes sense 19:19:11 congratudolences! 19:20:39 Are there any other opendev related topics? I haven't heard anything new on citycloud/vexxhost network trouble so I assume that got sorted? 19:20:40 thanks clarkb for volunteering! 19:20:42 ++ 19:20:44 oh, in other opendev news, i've started on a poc for reporting aggregate user activity statistics for our services 19:20:48 #link https://review.opendev.org/724886 Create new project for OpenDev Engagement Stats 19:21:00 once that is approved i can push my initial work in there 19:21:27 so far it's just measuring gerrit changes created/merged, revisions pushed, comments added, et cetera 19:21:39 queried from the rest api 19:21:53 planning to add stuff for counting our mailing list archives, irc logs, and so on 19:22:21 oh related to new projects we fixed the long standing bug where jeepyb didn't update gerrit acls for retired projects 19:22:24 stackalytics version 3? ;) 19:22:27 fungi: ^ thank you for getting that in and running it 19:22:39 not per-user stats, no 19:22:45 and nothing related to affiliation 19:23:18 fungi: so overall commits for all of openstack, all of Zuul? 19:23:25 this is more like how many reviewers interacted with projects in a given namespace, how many reviews were done, et cetera 19:23:34 yep 19:23:52 the gerrit stuff so far has per-namespace and total across the gerrit deployment 19:23:53 more a project health than comparison metrics 19:24:06 and can chunk by month, quarter or year 19:24:06 understood now 19:25:20 the idea is taht we can at least provide some additional useful numbers to folks who are investing in our services (whether volunteering their time, donating server quota, ...) 19:25:42 so that they can see what kind of impact we're having 19:26:28 anyway, once the code's up i'll start up a conversation on what direction(s) we should take it in 19:27:32 #topic General Topics 19:27:38 First up is the PTG 19:27:47 #link https://virtualptgjune2020.eventbrite.com Register if you plan to attend. This helps with planning details. 19:28:11 It is free registration. This is mostly for planning and related bookkeeping purposes 19:28:34 I've requested three 2 hour chunks of time as mentioned previously and haven't been told no so I expect we'll get those 19:28:46 and it makes sure "attendees" have agreed to eth event's code of conduct 19:28:54 Monday 1300-1500 UTC, Monday 2300-0100 UTC, Wednesday 0400-0600 UTC with Monday being June 1, 2020 19:28:59 as much as it can anyway 19:29:15 #action clarkb prep PTG agenda etherpad 19:29:27 I've been meaning to do ^ and now that things are getting close I should actually do that 19:29:53 I'll send a link for that out to the mailing list once I've got it up 19:30:05 Hope to see you there and please register when you have a moment if you plan to attend 19:30:18 Next is the wiki server. fungi any updates there? 19:30:45 nothnig 19:30:59 or some similarly-spelled word 19:31:04 fungi: you're still next: Gerrit reviewers plugin 19:31:10 #link https://review.opendev.org/#/q/topic:gerrit-plugins+is:open 19:31:13 ahh, so this is an interesting one 19:31:41 jim sommerville from the starlingx project reached out to me about this because i was the only user in #opendev... 19:31:48 ...on oftc 19:31:58 (i'm squatting the channel there in case we ever need it) 19:32:37 not sure what prompted him to /join it, though i did suggest to him that in the future the equivalent channel on freenode was a better bet for finding us 19:32:52 o/ 19:33:21 anyway, it sounds like folks in some of the starlingx teams are looking for better ways to organize their review backlogs and notifications 19:33:49 and he found the reviewers gerrit plugin and was asking whether it was something we'd consider adding 19:34:01 so i took a look, it seems to be maintained 19:34:06 when I saw this go on the agenda Idid check that the plugin has gerrit 2.13 versions and it does 19:34:18 i pushed up a couple of preliminary changes to add it to our images 19:34:30 marked wip for now until we could talk it through 19:34:48 there's a big missing piece, which is figuring out how to manage the configuration for it 19:34:53 from a technical standpoint I think my only concern would be if we need to rebuild the plugins like we did for melody and storyboard ics? 19:35:02 fungi: oh its configurable? 19:35:24 when I looked at it it seemed to look for a file in repos and if that was there add the listed users as reviewers 19:35:41 a quick summary is that the project-specific configs are where you say certain file or branch patterns in a repo should get specific reviewers or groups (i think we'd just support the latter maybe for sanity) get mapped to one another 19:35:58 these are implemented similarly to how acls and other project options are set 19:36:14 so i think we'd need to handle them the same way we do acl configs with manage-projects 19:36:24 "Per project configuration of the @PLUGIN@ plugin is done in the reviewers.config file of the project." 19:36:45 my guess is that's in the refs/meta/config branch 19:36:46 and probably auto-create any new groups they reference in reviewers.config just like we do for groups referenced in an acl 19:36:53 I think it should be fine to roll that out as part of >=2.14 - I don't want to try to build new stuff for 2.13 19:36:55 but i don't know that 19:36:55 corvus: yeah, that's what it looked like to me as well 19:37:17 like - last time I tried it was a complete failure which is why the dockerfile for 2.13 just downloads the war 19:37:38 mordred: ya thats why I was concerned about needing to build things 19:37:41 so anyway, it's not just as simple as adding the plugin, doesn't look like (though that part seems straightforward and easy, thus the changes i've pushed so far) 19:37:48 we're planning on stopping at 2.16 for a bit; how about we open the window for new plugins there? 19:38:05 fungi: also we might want to make sure we understand what starlingx's needs are to ensure that plugin would help them 19:38:08 that seems reasonable to me, for sure 19:38:11 wfm. and I don't see anything about this plugin that would make me think it's a bad idea 19:38:39 corvus: ++ 19:38:50 right, it seemed like a reasonable request, i wanted to dig into it some more, but since it looks like it'll require additional automation i wanted to make sure to start talking about it 19:39:22 i can start up a thread on the service-discuss ml if that'll help us get through the specifics 19:39:34 and i agree, waiting until 2.16 is probably warranted 19:39:47 so as not to further complicate our upgrades 19:40:00 i can update my wip patches to remove it from <2.16 19:40:31 anyway, that answers my questions for now, i think 19:40:52 fungi: thanks, this was a good discussion :) 19:41:12 i didn't know if it warranted a full spec, but i'll write up the use case and how i would envision it working and send it to the ml 19:42:00 and solicit feedback from the starlingx crew there 19:42:10 ++ 19:42:12 thanks! 19:42:49 #topic Open Discussion 19:43:02 That was all I had on the agenda. Anything else before we close the meeting? 19:43:16 fungi: it's worth noting that we're motivated to get upgraded to 2.16 - so we're not talking about years here 19:43:28 yup 19:43:53 we've found some more stale afs volume locks from broken vos releases resulting from the 2020-04-28 afs01.dfw.o.o outage 19:44:15 i took care of mirror.fedora earlier today, corvus fixed mirror.centos yesterday 19:44:22 i've got mirror.opensuse running now 19:44:33 and mirror.yum-puppet needs doing as well 19:44:34 mirror.opensuse and mirror.yum-puppetlabs in particular seem to be the remaining locked but no transaction cases 19:44:35 there may be others 19:44:49 fungi: I don't think there are based on my read of vos listvldb and vos status 19:44:58 i recall i (or someone) fixed some of the docs/static site volumes last week 19:45:00 (someone should double check it though) 19:45:20 thanks for checking that, clarkb 19:46:34 matches up with http://grafana.openstack.org/d/ACtl1JSmz/afs?orgId=1 , thanks 19:47:28 yeah, i think the fedora one should clear when the current rsync and vos release pulse ends 19:47:52 the manual vos release in these cases seems to only redo/finish the incomplete vos release from the time of the crash 19:48:00 so doesn't actually switch to the current data 19:49:16 the second vos release will do the catch up 19:49:22 yep 19:49:32 which i've just been letting the mirror-update cronjobs take care of 19:50:58 alright last call for any other items otherwise I'll call it a few minutes early today 19:51:35 if i could request some reviews on https://review.opendev.org/#/q/status:open+topic:cleanup-test-image-python 19:51:39 #link https://review.opendev.org/#/q/status:open+topic:cleanup-test-image-python 19:51:50 that will help me make progress getting rid of pip-and-virtualenv 19:53:40 there's also some nodepool things out there that could use some eyes too 19:54:35 #link https://review.opendev.org/724214 19:54:52 is another one just to get afs testing on focal 19:55:07 focal did the same thing as bionic unfortunately and shipped a ~pre version of openafs 19:55:34 * fungi grumbles 19:55:40 a different version though I expect 19:55:43 not the same ~pre version i guess? 19:55:48 the arm64 job @ https://review.opendev.org/724439 fails due to pip-and-virtualenv 19:55:53 ianw: know whow well it works? 19:56:22 being removed on that platform. i need to fix ensure-tox, as we install tox for the zuul user and then try to run it as root which blows up 19:57:04 corvus: it being openafs? we put in 1.8.5 packages to the ppa and it passes testing, at least 19:57:25 ya. thx. so we just need to get one of the mirrors on it and see how it holds up over time? 19:57:48 also sounds like we'll be wanting the new executors on it too 19:58:05 great point 19:58:13 yep that would be a good start. although our bionic hosts are on 1.8 series too, so i would expect things are ok 19:58:46 ya iirc 1.8 can client to 1.6 but we can't mix fileservers between 1.6 and 1.8 ? 19:58:56 soits the upgrade of the fileservers that will be "fun" 19:59:15 and we are at time. Thanks everyone 19:59:18 #endmeeting