19:01:20 <clarkb> #startmeeting infra
19:01:21 <openstack> Meeting started Tue Apr 14 19:01:20 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:01:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:01:24 <openstack> The meeting name has been set to 'infra'
19:01:29 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-April/000002.html Our Agenda
19:01:35 <clarkb> #topic Announcements
19:01:46 <clarkb> Just a reminder that we are meeting here today and not in #openstack-meeting
19:02:00 * clarkb goes over to the other channel to post that reminder
19:02:49 <clarkb> #topic Actions from last meeting
19:02:53 <fungi> do we have a change up to fix http://eavesdrop.openstack.org/#Project_Infrastructure_Team_Meeting yet?
19:03:03 <fungi> it still lists the old location
19:03:06 <clarkb> fungi: oh not yes though that was on my list
19:03:16 <clarkb> I got distracted by container and zuul things
19:03:19 <fungi> i'll push one up while i'm thinking about it
19:03:21 <clarkb> #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-04-07-19.01.txt minutes from last meeting
19:04:01 <clarkb> there were no actions from last meeting
19:04:29 <clarkb> There are also no specs to discuss so we can dive straight into the things that had me distracted :)
19:04:30 <clarkb> #topic Priority Efforts
19:04:39 <clarkb> #topic Update Config Management
19:04:48 <clarkb> why don't we start with gerrit?
19:05:00 <clarkb> We are now running gerrit from a docker image
19:05:07 <mordred> BOOM
19:05:15 <clarkb> there were additional corner cases that we had to handle to make that happen
19:05:26 <mordred> yeah - but it seems to mostly be working now
19:05:29 <clarkb> mordred: maybe you want to summarize the relevant bits to help ensure we've written down the current status?
19:05:34 <mordred> uhm
19:05:59 <mordred> we were missing a few bindmounts for the gerrit hook scripts
19:06:18 <mordred> there were also some places where jeepyb was hardcoding paths that are now different inside the container
19:06:25 <mordred> so we fixed both of those
19:06:49 <mordred> clarkb has a patch up to have docker-compose gracefully shutdown gerrit
19:06:56 <clarkb> #link https://review.opendev.org/719051
19:07:15 <clarkb> that change requires us to use a newer version of docker-compose than is on ubuntu-xenial so we switch to installing it from pypi across the board
19:07:39 <clarkb> and I've tried to do extra validation of that in https://review.opendev.org/719682 to ensure the differences we've found don't create problems for us
19:07:46 <clarkb> and the testing seems to show docker-compose handles it just fine
19:08:04 <clarkb> but others should review that and check it for themselves :)
19:09:04 <clarkb> mordred: we also still need to udpate gerritbot right?
19:09:14 <mordred> yeah - gerritbot is oneof the next thigns on my list
19:09:16 <clarkb> is that the last major outstanding item for gerrit?
19:09:18 <mordred> I'm going to do it after projet-config
19:09:31 <mordred> since - you konw - it's driven by project-config :)
19:09:56 <mordred> that said - step one is here: https://review.opendev.org/#/c/715635/
19:10:20 <clarkb> that is probably a good lead into the zuul cd efforts?
19:10:31 <mordred> sure!
19:11:04 <mordred> so - we're now running all ansible from zuul and not from cron!
19:11:37 <mordred> as a followup, there is a stack, ending at https://review.opendev.org/#/c/720029 - which is working on getting that to use the zuul checkout of system-config and project-config instead of independently cloned copies of those
19:11:57 <mordred> doing that will let us land stacks of changes and have each one run in prod with the actual state of the change
19:12:17 <clarkb> #link https://review.opendev.org/#/c/720029 use Zuul provided system-config and project-config when Zuul runs our playbooks
19:12:37 <clarkb> its on my list to rereview that all after the meeting
19:13:48 <clarkb> I guess the big thing to be aware of here is that zuul is running the jobs and we may still need to tune when certain jobs run?
19:13:58 <clarkb> though I think we've fixed the majority of those problems
19:14:35 <mordred> yeah - but it's definitely a thing to keep watching
19:15:02 <mordred> also - we should look at logs in /var/log/ansible, verify that various service logs are clean and set the flag in zuul.yaml to collect and publish the log files
19:15:05 <clarkb> oh and if you need logs the vast majority at at bridge:/var/log/ansible
19:15:23 <clarkb> and ya as we verify they are safe to publish we can flip that flag and have zuul collect the logs
19:16:14 <clarkb> any questions on this? or anythign else to bring up?
19:16:25 <mordred> things run real quick now
19:18:12 <clarkb> mordred: fungi should we talk about etherpad upgrades now or later (I have it later in the agenda but its somewhat relevant to the topic)
19:18:42 <mordred> we can do that now too
19:18:47 <mordred> (agree, it seems of a piece)
19:18:54 <mordred> we are now running etherpad from docker too!
19:19:39 <clarkb> and in the process upgraded it to 1.8.0 release
19:19:48 <clarkb> are we still on 1.8.0 now that we build our own images?
19:20:00 <mordred> yup
19:20:15 <mordred> our Dockerfile points to :1.8.0 - when they release 1.8.3 we can update that
19:21:14 <fungi> yeah, "soon"
19:21:51 <fungi> also something between what we were deploying on the old server and 1.8.0 seems to have fixed the line number misalignment bug
19:22:20 <clarkb> oh and we are hosting the db for etherpad on the server itself (no more trove db)
19:22:22 <fungi> oh, for 1.8.(1? 3?) we need to watch out for breaking plugins
19:22:32 <clarkb> mordred: related to ^ the db thing did we confirm that backups are running properly now?
19:22:40 <fungi> there's apparently a non-backward-compatible change coming which impacts the plugin interface
19:22:44 <mordred> clarkb: yeah - they seem to be running well now
19:22:46 <mordred> however ...
19:23:20 <mordred> https://review.opendev.org/#/c/719192/ <-- we should land that for gerrit backups (they're working because we used to write that file out)
19:23:45 <mordred> also - stack ending here: https://review.opendev.org/#/c/719484/ has a few more backup updtes
19:24:06 <clarkb> oh good call (we haven't needed that in the other container/ansible things beacuse they use local dbs)
19:24:22 <mordred> yeah
19:24:38 <clarkb> we should do similar with gitea0X fwiw
19:24:42 <mordred> so - fwiw - I have confirmed that we're doing the mysqldump, and that we are backing up with bup
19:24:47 <clarkb> I don't think we need to back them all up but should back up one of them
19:24:51 <mordred> I have not confirmed that the db backup is making it in to bup
19:24:57 <mordred> clarkb: it's in that stack
19:25:11 <mordred> https://review.opendev.org/#/c/719484/
19:25:20 <clarkb> oh perfect
19:25:25 <clarkb> I have lots of things to review this afternoon :)
19:25:42 <mordred> (there's actually a bunch of things in system-config with 1 +2 from several days ago that could use a pass to pick up)
19:25:56 <fungi> should we be looking to move the gerrit db into a local container anyway?
19:26:18 <clarkb> fungi: gerrit uses a lot of memory and I expect its db does too
19:26:27 <clarkb> it might be an exception to general cohabitation for our services
19:27:04 <clarkb> but I think we should consider not using trove as part of its upgrades
19:27:06 <mordred> yeah. I think eventually having a dedicated galera cluster for gerrit might be nice
19:27:13 <fungi> ahh, yeah, maybe we'd deploy it in a container on a separate server in that case
19:27:16 <fungi> or serverS
19:27:17 <mordred> oh - wait - duh
19:27:19 <mordred> NEVERMIND
19:27:26 <mordred> as part of gerrit ugprades we're getting rid of mysql
19:27:31 <clarkb> mordred: not entirely
19:27:33 <mordred> so - it stops being an issue
19:27:42 <clarkb> but I bet the db does use far less memroy in the new setup
19:27:42 <fungi> er, well, it's not so heavily used at least
19:27:45 <mordred> mostly - I don't really care about the one remaining table
19:27:47 <mordred> yeah
19:27:47 <fungi> so maybe would fit on the same server
19:27:50 <mordred> we can switch to local for that
19:27:54 <clarkb> got it
19:27:57 <fungi> makes sense
19:28:01 <mordred> we could probably just use h2 even
19:28:02 <fungi> i agree
19:28:48 <clarkb> alright anything else on config management, docker, gerrit, backups?
19:28:55 <mordred> once the project-config stack and then gerritbot are done I'll start working on gerrit upgrade planning
19:29:04 <clarkb> mordred: !
19:29:33 <clarkb> mordred: I wonder if we can manipulate my ideas for testing of upgrades to do some of the work there
19:29:48 <mordred> (general idea is probably going to be an upgrade event to get to 2.16, then pause there for a while letting it do online migration to notedb - then upgrade to 3.1)
19:29:49 <clarkb> I think I'll have time tomorrow to really dig into that. We'll see :)
19:30:07 <mordred> but we'll do testing of that first and find out
19:30:20 <fungi> 2.16 has the new review screen and polygerrit both available, right?
19:30:40 <clarkb> fungi: yes
19:30:42 <mordred> yes
19:30:50 <mordred> 3.0 removes non-polygerrit
19:31:14 <fungi> that does make it a good resting place briefly, for folks to get used to the newest interface design
19:31:22 <mordred> me is hoping that subchecks are ready before we're ready to do the 2.16->3.1 bump
19:31:33 <fungi> though i assume the polygerrit interface in 3.1 is way nicer than in 2.16
19:31:38 <mordred> so much nicer
19:31:53 <mordred> current polygerrit is pretty awesome
19:31:55 <fungi> so 2.16 may provide a poor impression of polygerrit
19:32:13 <fungi> we'll probably want to couch that announcement carefully with that in mind
19:32:17 <mordred> maybe - the biggest issue is going to be the ci summary table
19:32:33 <mordred> for the period between when we're on 2.16 and the time that subchecks is fully ready
19:33:05 <mordred> but we'll have to see how long it takes for us to be ready to upgrade in the first place - and then see where subchecks are
19:33:12 <fungi> even after subchecks is fully ready, there's likely to be a ton of third-party ci systems who are not on a checks-api-capable platform
19:33:24 <mordred> fungi: good reason for them to upgrade :)
19:33:28 <fungi> i concur
19:33:45 <mordred> but yeah - there's a pile of stuff to make a plan for
19:33:51 <fungi> but we'll need to take the opportunity to make sure folks know that's coming
19:34:25 <clarkb> seems like we may end up udpating our js too if that isn't too bad and we are expecting to be on 2.16 for a while
19:34:41 <fungi> i have a feeling "nobody sees your ci results" is going to be a major fulcrum to get 3pci operators to upgrade/rebuild
19:34:43 <clarkb> testing should help sort all of that out
19:34:49 <mordred> ++
19:35:07 <mordred> first step is the biggy - getting to 2.16 and notedb is the big challenge
19:35:27 <mordred> once we're there - the subsequent steps are all much more managable
19:35:49 <clarkb> ok lets move on have a few more things to talk about
19:36:06 <clarkb> #topic OpenDev
19:36:20 <clarkb> #link http://lists.opendev.org/pipermail/service-discuss/2020-April/000001.html Email thread on next plans for governance
19:37:05 <clarkb> I sent that email out. TLDR is I've opened nomination for those that want to do PTL duties until end of April. Which gives us plenty of time (I think) to account for pandemic and such
19:37:23 <mordred> ++
19:37:23 <clarkb> Also I was goign to kick start briging up the advisory board discussions with people that can facilitate that in their constituent communities
19:37:41 <clarkb> I haven't done that yet, containers and zuul things were distracting, but its high on my list to get to
19:38:15 <clarkb> if you are interested in the advisory board please reach out as well
19:38:23 <fungi> s/ptl/project coordinator/
19:38:27 <clarkb> fungi: right
19:38:36 <fungi> er, "service coordinator"
19:38:40 <fungi> my bad
19:38:49 <fungi> we'll get all this terminology figured out someday
19:38:54 <clarkb> fungi: its difficult when you have to go from one set of terms to another :)
19:39:20 <clarkb> fungi: on the authentication spec/summary/planning is there anything new to add?
19:39:47 <fungi> no, not yet
19:40:00 <fungi> oh, well, a little
19:40:45 <fungi> corvus found some of mordred's prior draft plan from years ago, so i'll be using that as a sort of checklist to make sure i cover the use cases he identified
19:42:22 <clarkb> #topic General Topics
19:42:37 <clarkb> The only item here is server upgrades and in particular the wiki
19:42:41 <clarkb> fungi: anything new to add re the wiki
19:42:55 <fungi> zilch
19:43:18 <clarkb> I also noticed that my refstack changes that were pushed to make it possible to build docker images for refstack seem to have landed
19:43:26 <clarkb> the board meeting today prompted me to take a look
19:43:44 <clarkb> so I may pick that back up again if time permits
19:44:18 <clarkb> I think the board wants to make that toolchain less openstack specific so maybe it is something that can fit in opendev but probably to start it will be a good test case for oepnstack-infra specific things
19:44:20 <clarkb> we'll see
19:45:17 <clarkb> #topic Open Discussion
19:45:21 <clarkb> Anything else?
19:45:24 <fungi> yeah, i'm struggling to see how to make refstack non-openstack-centric
19:46:15 <clarkb> fungi: what I took away from it was that the interop wg could use tools built for openstack to do validation and verification of other tools in a similar wy
19:46:26 <clarkb> really I doubt that will get much traction, but time will tell
19:46:31 <fungi> oh, i've pushed up the change i mentioned earlier to update our meeting info, and another to make the irc meetings index page less openstacky:
19:46:42 <ianw> i'd appreciate it people could look over
19:46:50 <fungi> #link https://review.opendev.org/720060 Update OpenDev meeting location and name
19:46:50 <ianw> #link https://review.opendev.org/#/q/topic:ensure-pip+status:open
19:47:14 <fungi> #link https://review.opendev.org/720063 Not all meetings are OpenStack
19:47:15 <ianw> to get support for pip-and-virtualenv-less images in zuul-jobs
19:47:32 <clarkb> ianw: added to my list
19:47:56 <clarkb> I've got kid school things at 2:30 but my rough plan is to review between now and then (and probably continue to reivew if they can do school things without too much help)
19:48:04 <clarkb> because there are a lot of changes out there :)
19:48:19 <ianw> also, i'm not sure what people's thoughts on suse are, wrt to
19:48:20 <ianw> #link https://review.opendev.org/718299
19:48:43 <ianw> i would like to start building it without pip/virtualenv (like fedora is now) and push on that prior stack to get it back into shape
19:49:07 <clarkb> ianw: that makes sense given how things have changed upstream
19:49:34 <fungi> oh, for last week's maintenance we used this channel, and did an ad hoc meetbot meeting under the opendev-maint name
19:49:58 <mordred> ++
19:50:10 <fungi> which might be a pattern we want to keep in mind for future scheduled maintenance activities
19:50:44 <clarkb> mordred: not to add to your stack but I was thinkign we might want to consider a zuul.d for system-config
19:50:56 <clarkb> mordred: then we can put docker image jobs in a file and prod jobs in a file and pre merge or shwatever
19:51:26 <fungi> #link http://eavesdrop.openstack.org/meetings/opendev_maint/ maintenance meeting(s0
19:51:31 <mordred> clarkb: I had the same thought
19:53:09 <clarkb> I'll call the meeting here. Thanks everyone!
19:53:23 <clarkb> we get a few minutes back to eat breakfast/lunch/dinner or do more reviews :)
19:53:29 <clarkb> #endmeeting