19:01:20 #startmeeting infra 19:01:21 Meeting started Tue Apr 14 19:01:20 2020 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:22 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:24 The meeting name has been set to 'infra' 19:01:29 #link http://lists.opendev.org/pipermail/service-discuss/2020-April/000002.html Our Agenda 19:01:35 #topic Announcements 19:01:46 Just a reminder that we are meeting here today and not in #openstack-meeting 19:02:00 * clarkb goes over to the other channel to post that reminder 19:02:49 #topic Actions from last meeting 19:02:53 do we have a change up to fix http://eavesdrop.openstack.org/#Project_Infrastructure_Team_Meeting yet? 19:03:03 it still lists the old location 19:03:06 fungi: oh not yes though that was on my list 19:03:16 I got distracted by container and zuul things 19:03:19 i'll push one up while i'm thinking about it 19:03:21 #link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-04-07-19.01.txt minutes from last meeting 19:04:01 there were no actions from last meeting 19:04:29 There are also no specs to discuss so we can dive straight into the things that had me distracted :) 19:04:30 #topic Priority Efforts 19:04:39 #topic Update Config Management 19:04:48 why don't we start with gerrit? 19:05:00 We are now running gerrit from a docker image 19:05:07 BOOM 19:05:15 there were additional corner cases that we had to handle to make that happen 19:05:26 yeah - but it seems to mostly be working now 19:05:29 mordred: maybe you want to summarize the relevant bits to help ensure we've written down the current status? 19:05:34 uhm 19:05:59 we were missing a few bindmounts for the gerrit hook scripts 19:06:18 there were also some places where jeepyb was hardcoding paths that are now different inside the container 19:06:25 so we fixed both of those 19:06:49 clarkb has a patch up to have docker-compose gracefully shutdown gerrit 19:06:56 #link https://review.opendev.org/719051 19:07:15 that change requires us to use a newer version of docker-compose than is on ubuntu-xenial so we switch to installing it from pypi across the board 19:07:39 and I've tried to do extra validation of that in https://review.opendev.org/719682 to ensure the differences we've found don't create problems for us 19:07:46 and the testing seems to show docker-compose handles it just fine 19:08:04 but others should review that and check it for themselves :) 19:09:04 mordred: we also still need to udpate gerritbot right? 19:09:14 yeah - gerritbot is oneof the next thigns on my list 19:09:16 is that the last major outstanding item for gerrit? 19:09:18 I'm going to do it after projet-config 19:09:31 since - you konw - it's driven by project-config :) 19:09:56 that said - step one is here: https://review.opendev.org/#/c/715635/ 19:10:20 that is probably a good lead into the zuul cd efforts? 19:10:31 sure! 19:11:04 so - we're now running all ansible from zuul and not from cron! 19:11:37 as a followup, there is a stack, ending at https://review.opendev.org/#/c/720029 - which is working on getting that to use the zuul checkout of system-config and project-config instead of independently cloned copies of those 19:11:57 doing that will let us land stacks of changes and have each one run in prod with the actual state of the change 19:12:17 #link https://review.opendev.org/#/c/720029 use Zuul provided system-config and project-config when Zuul runs our playbooks 19:12:37 its on my list to rereview that all after the meeting 19:13:48 I guess the big thing to be aware of here is that zuul is running the jobs and we may still need to tune when certain jobs run? 19:13:58 though I think we've fixed the majority of those problems 19:14:35 yeah - but it's definitely a thing to keep watching 19:15:02 also - we should look at logs in /var/log/ansible, verify that various service logs are clean and set the flag in zuul.yaml to collect and publish the log files 19:15:05 oh and if you need logs the vast majority at at bridge:/var/log/ansible 19:15:23 and ya as we verify they are safe to publish we can flip that flag and have zuul collect the logs 19:16:14 any questions on this? or anythign else to bring up? 19:16:25 things run real quick now 19:18:12 mordred: fungi should we talk about etherpad upgrades now or later (I have it later in the agenda but its somewhat relevant to the topic) 19:18:42 we can do that now too 19:18:47 (agree, it seems of a piece) 19:18:54 we are now running etherpad from docker too! 19:19:39 and in the process upgraded it to 1.8.0 release 19:19:48 are we still on 1.8.0 now that we build our own images? 19:20:00 yup 19:20:15 our Dockerfile points to :1.8.0 - when they release 1.8.3 we can update that 19:21:14 yeah, "soon" 19:21:51 also something between what we were deploying on the old server and 1.8.0 seems to have fixed the line number misalignment bug 19:22:20 oh and we are hosting the db for etherpad on the server itself (no more trove db) 19:22:22 oh, for 1.8.(1? 3?) we need to watch out for breaking plugins 19:22:32 mordred: related to ^ the db thing did we confirm that backups are running properly now? 19:22:40 there's apparently a non-backward-compatible change coming which impacts the plugin interface 19:22:44 clarkb: yeah - they seem to be running well now 19:22:46 however ... 19:23:20 https://review.opendev.org/#/c/719192/ <-- we should land that for gerrit backups (they're working because we used to write that file out) 19:23:45 also - stack ending here: https://review.opendev.org/#/c/719484/ has a few more backup updtes 19:24:06 oh good call (we haven't needed that in the other container/ansible things beacuse they use local dbs) 19:24:22 yeah 19:24:38 we should do similar with gitea0X fwiw 19:24:42 so - fwiw - I have confirmed that we're doing the mysqldump, and that we are backing up with bup 19:24:47 I don't think we need to back them all up but should back up one of them 19:24:51 I have not confirmed that the db backup is making it in to bup 19:24:57 clarkb: it's in that stack 19:25:11 https://review.opendev.org/#/c/719484/ 19:25:20 oh perfect 19:25:25 I have lots of things to review this afternoon :) 19:25:42 (there's actually a bunch of things in system-config with 1 +2 from several days ago that could use a pass to pick up) 19:25:56 should we be looking to move the gerrit db into a local container anyway? 19:26:18 fungi: gerrit uses a lot of memory and I expect its db does too 19:26:27 it might be an exception to general cohabitation for our services 19:27:04 but I think we should consider not using trove as part of its upgrades 19:27:06 yeah. I think eventually having a dedicated galera cluster for gerrit might be nice 19:27:13 ahh, yeah, maybe we'd deploy it in a container on a separate server in that case 19:27:16 or serverS 19:27:17 oh - wait - duh 19:27:19 NEVERMIND 19:27:26 as part of gerrit ugprades we're getting rid of mysql 19:27:31 mordred: not entirely 19:27:33 so - it stops being an issue 19:27:42 but I bet the db does use far less memroy in the new setup 19:27:42 er, well, it's not so heavily used at least 19:27:45 mostly - I don't really care about the one remaining table 19:27:47 yeah 19:27:47 so maybe would fit on the same server 19:27:50 we can switch to local for that 19:27:54 got it 19:27:57 makes sense 19:28:01 we could probably just use h2 even 19:28:02 i agree 19:28:48 alright anything else on config management, docker, gerrit, backups? 19:28:55 once the project-config stack and then gerritbot are done I'll start working on gerrit upgrade planning 19:29:04 mordred: ! 19:29:33 mordred: I wonder if we can manipulate my ideas for testing of upgrades to do some of the work there 19:29:48 (general idea is probably going to be an upgrade event to get to 2.16, then pause there for a while letting it do online migration to notedb - then upgrade to 3.1) 19:29:49 I think I'll have time tomorrow to really dig into that. We'll see :) 19:30:07 but we'll do testing of that first and find out 19:30:20 2.16 has the new review screen and polygerrit both available, right? 19:30:40 fungi: yes 19:30:42 yes 19:30:50 3.0 removes non-polygerrit 19:31:14 that does make it a good resting place briefly, for folks to get used to the newest interface design 19:31:22 me is hoping that subchecks are ready before we're ready to do the 2.16->3.1 bump 19:31:33 though i assume the polygerrit interface in 3.1 is way nicer than in 2.16 19:31:38 so much nicer 19:31:53 current polygerrit is pretty awesome 19:31:55 so 2.16 may provide a poor impression of polygerrit 19:32:13 we'll probably want to couch that announcement carefully with that in mind 19:32:17 maybe - the biggest issue is going to be the ci summary table 19:32:33 for the period between when we're on 2.16 and the time that subchecks is fully ready 19:33:05 but we'll have to see how long it takes for us to be ready to upgrade in the first place - and then see where subchecks are 19:33:12 even after subchecks is fully ready, there's likely to be a ton of third-party ci systems who are not on a checks-api-capable platform 19:33:24 fungi: good reason for them to upgrade :) 19:33:28 i concur 19:33:45 but yeah - there's a pile of stuff to make a plan for 19:33:51 but we'll need to take the opportunity to make sure folks know that's coming 19:34:25 seems like we may end up udpating our js too if that isn't too bad and we are expecting to be on 2.16 for a while 19:34:41 i have a feeling "nobody sees your ci results" is going to be a major fulcrum to get 3pci operators to upgrade/rebuild 19:34:43 testing should help sort all of that out 19:34:49 ++ 19:35:07 first step is the biggy - getting to 2.16 and notedb is the big challenge 19:35:27 once we're there - the subsequent steps are all much more managable 19:35:49 ok lets move on have a few more things to talk about 19:36:06 #topic OpenDev 19:36:20 #link http://lists.opendev.org/pipermail/service-discuss/2020-April/000001.html Email thread on next plans for governance 19:37:05 I sent that email out. TLDR is I've opened nomination for those that want to do PTL duties until end of April. Which gives us plenty of time (I think) to account for pandemic and such 19:37:23 ++ 19:37:23 Also I was goign to kick start briging up the advisory board discussions with people that can facilitate that in their constituent communities 19:37:41 I haven't done that yet, containers and zuul things were distracting, but its high on my list to get to 19:38:15 if you are interested in the advisory board please reach out as well 19:38:23 s/ptl/project coordinator/ 19:38:27 fungi: right 19:38:36 er, "service coordinator" 19:38:40 my bad 19:38:49 we'll get all this terminology figured out someday 19:38:54 fungi: its difficult when you have to go from one set of terms to another :) 19:39:20 fungi: on the authentication spec/summary/planning is there anything new to add? 19:39:47 no, not yet 19:40:00 oh, well, a little 19:40:45 corvus found some of mordred's prior draft plan from years ago, so i'll be using that as a sort of checklist to make sure i cover the use cases he identified 19:42:22 #topic General Topics 19:42:37 The only item here is server upgrades and in particular the wiki 19:42:41 fungi: anything new to add re the wiki 19:42:55 zilch 19:43:18 I also noticed that my refstack changes that were pushed to make it possible to build docker images for refstack seem to have landed 19:43:26 the board meeting today prompted me to take a look 19:43:44 so I may pick that back up again if time permits 19:44:18 I think the board wants to make that toolchain less openstack specific so maybe it is something that can fit in opendev but probably to start it will be a good test case for oepnstack-infra specific things 19:44:20 we'll see 19:45:17 #topic Open Discussion 19:45:21 Anything else? 19:45:24 yeah, i'm struggling to see how to make refstack non-openstack-centric 19:46:15 fungi: what I took away from it was that the interop wg could use tools built for openstack to do validation and verification of other tools in a similar wy 19:46:26 really I doubt that will get much traction, but time will tell 19:46:31 oh, i've pushed up the change i mentioned earlier to update our meeting info, and another to make the irc meetings index page less openstacky: 19:46:42 i'd appreciate it people could look over 19:46:50 #link https://review.opendev.org/720060 Update OpenDev meeting location and name 19:46:50 #link https://review.opendev.org/#/q/topic:ensure-pip+status:open 19:47:14 #link https://review.opendev.org/720063 Not all meetings are OpenStack 19:47:15 to get support for pip-and-virtualenv-less images in zuul-jobs 19:47:32 ianw: added to my list 19:47:56 I've got kid school things at 2:30 but my rough plan is to review between now and then (and probably continue to reivew if they can do school things without too much help) 19:48:04 because there are a lot of changes out there :) 19:48:19 also, i'm not sure what people's thoughts on suse are, wrt to 19:48:20 #link https://review.opendev.org/718299 19:48:43 i would like to start building it without pip/virtualenv (like fedora is now) and push on that prior stack to get it back into shape 19:49:07 ianw: that makes sense given how things have changed upstream 19:49:34 oh, for last week's maintenance we used this channel, and did an ad hoc meetbot meeting under the opendev-maint name 19:49:58 ++ 19:50:10 which might be a pattern we want to keep in mind for future scheduled maintenance activities 19:50:44 mordred: not to add to your stack but I was thinkign we might want to consider a zuul.d for system-config 19:50:56 mordred: then we can put docker image jobs in a file and prod jobs in a file and pre merge or shwatever 19:51:26 #link http://eavesdrop.openstack.org/meetings/opendev_maint/ maintenance meeting(s0 19:51:31 clarkb: I had the same thought 19:53:09 I'll call the meeting here. Thanks everyone! 19:53:23 we get a few minutes back to eat breakfast/lunch/dinner or do more reviews :) 19:53:29 #endmeeting