19:01:20 #startmeeting infra 19:01:21 Meeting started Tue Dec 19 19:01:20 2017 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. 19:01:22 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 19:01:24 The meeting name has been set to 'infra' 19:01:29 #link https://wiki.openstack.org/wiki/Meetings/InfraTeamMeeting#Agenda_for_next_meeting 19:01:38 #topic Announcements 19:01:51 We will be skipping next week's meeting on the 26th of December 19:01:57 I won't be around and don't really expect anyone else to be around 19:02:29 Related to that we should probably be very slushy next week in the changes we make as I expect there won't be many of us around for significant periods of time 19:03:13 #topic Actions from last meeting 19:03:24 fungi modified the jenkins votes in gerrit 19:03:38 there is no jenkins, only fungi 19:03:38 i did indeed. spammed the whole community. it was glorious 19:03:51 yeah! 19:04:14 and at least one person asked about it so it didn't go compeltely unnoticed 19:04:35 i got plenty of comments through various channels 19:04:35 #link http://eavesdrop.openstack.org/meetings/infra/2017/infra.2017-12-12-19.01.txt Minutes from last meeting 19:04:43 that was the only action item from last time 19:04:57 #topic Specs approval 19:05:26 We don't have any current specs up for approval, but thank you to everyon that read and reviewed the top level project hosting spec, that spec is now in and currently being worked on 19:05:44 there's still a work item up for grabs 19:05:47 talk to me if you want to help 19:05:53 git servers, right? 19:05:55 yep 19:06:34 #info talk to jeblair if you want to help with work items on the top level project hosting spec 19:06:51 also everyone should feel free to review the changes that go up fro that :) 19:06:57 #topic Priority Efforts 19:07:06 #topic Zuul v3 19:07:32 There are one and a half zuul v3 topic that are on the agenda. The first is closing down the zuulv3 issues etherpad and moving more towards typical bug tracking 19:07:47 considering that we are no longer in firefighting mode I am in favor of this 19:07:59 ++ 19:08:09 ++ 19:08:14 absolutely 19:08:20 also - thank god for not being in firefighting mode 19:08:34 * fungi hangs up asbestos suit 19:08:44 we should probably ask folks to file general job problems against their best guess as to the jobs repo involved -- openstack-zuul-jobs, zuul-jobs, project-config 19:08:53 jeblair: ++ 19:08:56 please don't file "my job doesn't work" against zuul itself :) 19:09:01 ... individual project repo 19:09:02 we still have some backlog for Zuulv3 - like the docs jobs that mordred has been working on... 19:09:11 but those are smallish fires ;) 19:09:26 in storyboard or launchpad? 19:09:30 but ya if we can all take ~15 minutes to migrate the items we are familiar with I expect that will go quickly 19:09:35 storyboard 19:09:40 AJaeger: these are more fires related to other efforts which have involved zuul jobs, not zuul v3 imposed breakage, right? 19:09:40 ianw: the projects we care about are in storyboard so I think there 19:09:49 a few people have turned up an said "i have a bug not a story", so we might like to make that clear 19:10:04 a bug is just a sad story 19:10:09 hah 19:10:44 i've also suggested in the past we might want a custom reporting view for the storyboard webclient which is a "bug/defect reporting" interface 19:10:51 we should replace the etherpad content with a big "report bugs here" thing 19:10:59 #link https://etherpad.openstack.org/zuulv3-issues please take a moment to migrate the items you are familiar with on this etherpad to storyboard and try to be specific about using the correct repo (zuul-jobs, openstack-zuul-jobs, project-config, etc) 19:11:03 the current work in progress for custom reporting urls likely makes that a lot saner 19:11:20 fungi, we fix the problem that repos that need required-projects in docs publishing are failing to publish - so, that's a migration fallout 19:11:33 AJaeger: oh, yep definitely 19:11:50 i was thinking you meant teh pti-related docs job work 19:12:03 The other half an issue is how many changes are required to create new projects. This is lsited under general topics but I think zuul v3 is the major reason for this so maybe we want to talk about it in this context instead 19:12:07 fungi: but mordred added some more goals into it as well - like the pti work. so, those are related. 19:12:15 hrm no dmsimard here 19:12:18 fungi: solving the docs publishing is a side effect of the PTI work 19:12:35 got it 19:13:01 but basically you have to create the project in gerrit first, then in zuul, then add jobs to it 19:13:27 and that is because zuul needs to load the project from its source before it can do anything with it 19:13:40 did we merge mordred's fix to bring this back down to 2 changes? 19:14:08 #link https://review.openstack.org/#/c/528375/ documents the process 19:14:15 jeblair: I'm not aware of that change so I don't know 19:14:50 is this the idea where we'd do zuul after gerrit in the ansible cron? 19:14:51 AJaeger's change which you linked documents the 2 change process 19:14:55 clarkb: yep 19:15:00 solving the need to split adding a project to zuul and adding jobs for the project (in project-config) isn't solvable without altering how zuul handles validating configuration changes right? 19:15:07 jeblair: ah ok, the meeting agenda item still said 3 19:15:13 and yes, that change merged 19:15:15 clarkb: I think we did 19:15:17 yah 19:15:27 #link 2 change process https://review.openstack.org/528744 19:16:35 clarkb, jeblair : I think the topic was added *before* mordred did his change, so I consider this void and propose to move on 19:16:36 fungi: yep. if we consider that a problem, then it's a much different problem. 19:16:46 I don't expect we'll be able to improve the number of changes much beyond that, are there other things we can do the make the process easier (documentation like AJaeger's change helps I'm sure) 19:17:08 we probably *could* but they would be more than simple bugfixes 19:17:25 seems hairy and fraught with possible security implications 19:17:35 let's merge my change for now - I had too many people just send on and think we need to be louder about it 19:17:48 like extending zuul to dynamically evaluate main.yaml (crazy!) or not adding projects to project-config and only doing in-repo config. 19:18:06 AJaeger: ya I think we improved docs we can table this and see how it goes 19:18:11 ++ 19:18:40 the latter solution of the two seems more likely to be viable, and only loses us the ability to have an explicit system-required project-template 19:19:04 (which could probably be resolved somewhat safely by implementing support for implicit project-templates?) 19:19:27 oh, there was a change for that... did it merge? 19:19:57 implicit project templates meaning every project gets system-required template regardless of any explicit config? 19:20:21 also new projects are an excellent place to start with in repo configs :) 19:20:26 clarkb: ++ 19:20:28 if that's already a thing, then maybe we don't need a second project-config change for projects who want to define jobs in-repo 19:20:34 yeah, tobiash had a change 19:20:41 neat 19:21:17 i'm having trouble finding it 19:21:38 https://review.openstack.org/513368 19:21:50 #link https://review.openstack.org/513368 implicit project-templates in zuul 19:21:52 still open but has a +2 from me 19:22:38 so... soon to become possibly simpler to add new projects 19:22:50 cool, sounds like there is work in progress ( and some done) to make this whole project addition process easier 19:23:05 any other zuulv3 related items? 19:23:12 yeah, i think considering that generally the only thing we expect a new project to do in project-config is add system-required, if we drop that, we could get down to one change 19:23:18 with no loss in functionality 19:23:28 or substantial change in polices 19:23:29 oh yeah - that change. I thought I'd reviewed that 19:23:51 mordred: i think you expressed pleasure about it 19:24:02 which is like a review 19:25:23 not hearing about other zuulv3 things so moving on 19:25:47 I'm going to skip dmsimard's other topic in hopes that dmsimard joins us soon and talk about the control plane upgrades first 19:25:56 #topic General Topics 19:26:17 as I mentioned in an email on friday I think we were reasonably successful with the control plane upgrade sprint 19:26:24 i have a sort of zv3-related topic i didn't put on the agenda and was saving for open discussion 19:26:26 o/ 19:26:51 fungi: ok I expect we will have time at the end of the meeting 19:26:57 though it's sort of related to the upgrade sprint as well 19:27:04 oh well then maybe talk about it now ;) 19:27:20 it is conveniently related to all agenda topics 19:27:29 I did want to mention I thought jeblairs idea of each of us trying to knock out a service per week or so would be nice 19:27:38 basically we still have some v2-specific servers hanging out there and not only are they potentially wasting resources now but they're somewhat of a liability 19:27:39 keeps the momentum going and we don't ignore the problem until the next rush 19:28:03 fungi: +1 19:28:06 fungi: at htis point it is just zuul.openstack.org and nodepool.openstack.org right? 19:28:15 clarkb: cool... i should get started -- i haven't done one this week :) 19:28:17 e.g., rackspace decided to do the reboot-based migration of zuul.o.o a day later than they told us, so it came up with the service running and commenting on changes for several days 19:28:34 zuul.openstack.org has an http redirect, and nodepool.o.o is still our zookeeper server? 19:28:43 for nodepool.o.o we need to decide how to migrate zookeeper into zk01.o.o/zk02.o.o/zk03.o.o 19:28:50 jeblair: yes 19:28:52 jeblair: yes. I think for zuul.o.o we can just cname to zuulv3.openstack.org? 19:29:01 clarkb: openstackstatus/etherpad/ethercalc should all be ready-ish to go 19:29:02 clarkb: we should 19:29:06 wondering if we can safely remove the zuul.o.o a/aaaa records and replace with a cname to zuulv3.o.o for now 19:29:24 if so, that should allow us to drop the redirect/delete the old server? 19:29:32 replacement zookeeper is up and running, we just need to confirm it works as expected (plus migrate) 19:29:33 fungi: can we add a vhost on zuulv3.o.o to keep the redirect rule? 19:29:55 jeblair: ahh, good point, or a catch-all redirect on the zuulv3 vhost 19:29:59 ++ 19:30:24 assuming zuulv3.o.o is the default vhost though, it may "just work" anyway? 19:30:27 and then, when we redeploy on zuul.o.o, which we should do soon (maybe after the puppet-openstackci stuff lands?) we can do the same in reverse. 19:30:44 #agreed add vhost to zuulv3.o.o and redirect zuul.openstack.org to zuulv3.openstack.org then when we redeploy zuul.openstack.org we can do the same in reverse 19:30:48 something like ^ 19:30:55 that will allow us to delete existing zuul.openstack.org 19:31:01 wfm 19:31:04 yeah -- whatever allows us to accomplish that :) 19:31:08 i can safely test whether a redirect is needed or abusing the default vhost is sufficient 19:31:23 and then we can move formward with better information 19:31:29 er, forward 19:31:34 for nodepool/zookeeper my reading of zk docs is that we can't easily do an online migration of data from existing single node cluster into new 3 node cluster 19:31:36 pabelanger: did you find out if we can add zk* to the current nodepool.o.o zk system? 19:31:44 oh :( ^ 19:31:57 we can take a downtime stop old zk, copy all data to the other nodes then start zk on other nodes and point nodepool services to new cluster 19:32:06 and zuul 19:32:12 ya 19:32:15 yah, that's basically what I found too 19:32:25 or we could take a downtime and just build new state on new cluster 19:32:33 so no expand/contract for migrating clusters i guess 19:32:54 fungi: not with the version we have aiui, the next version would allow it if you can ninja interactive management socket commands 19:32:56 clarkb: we'd need to build / upload images for that 19:32:59 was it just that it's unsafe to have an even number of cluster members, or were there additional issues? 19:33:14 how long do image rebuilds and uploads take in openstack land? 19:33:16 sounds like the challenge ran deeper. okay 19:33:27 which we could start now with images, if we move 1 builder to new zookeeper 19:33:27 fungi: the way I understand it is the existing 3 node cluster has quorum and numbers, so will win out if we join the old node to it 19:33:39 ahh 19:33:46 clarkb: what if we do one at a time? 19:33:49 fungi: the way you work around that is with the management socket stuff where you say "these N nodes are from the other side of quorum" 19:33:57 tobiash: it used to take a day? but i think it got faster... 19:34:04 tobiash: a few hours at this point i think, but i haven't looked in a while so don't have hard numbers 19:34:15 jeblair: I couldn't find any docs that said that would work it was all either do the shutdown and start or do the newer management stuff 19:34:15 so just rebuilding takes probably too ling 19:34:21 s/ling/long 19:34:32 yah, we can do an image in 60mins build / upload with nb04 I think 19:34:38 pabelanger: we could even move both builders, continue to just use old images on old side and then switch over 19:34:40 nb03 is much longer due to network limits 19:34:43 tobiash: yeah, long enough that if we did that, we'd want to start up a builder ahead of time like pabelanger suggested 19:34:45 we'd just want to do that in a shortish period of time 19:34:54 clarkb: and couldn't incrementally add the new cluster members to the "old" cluster of one? 19:35:05 yah, if we want to use old images for a day or so, we can do that 19:35:07 the data isn't too huge, yeah? so a downtime to export/import data wouldn't be super long most likely? 19:35:10 might be a good test too 19:35:13 fungi: I couldn't find any documentation on how to do that 19:35:14 of zookeeper 19:35:31 mordred: correct its mostly about coordinating all the things I think 19:35:36 so zk basically has a predetermined cluster size you can't expand later i guess? 19:35:37 we could schedule that for whenever our next gerrit project rename downtime might be 19:36:06 fungi: ya that was the sense I got like you didn't want to add data until you had a quorum 19:36:27 thanks, good to keep in mind for future scaling plans 19:36:53 its also possible I just didn't find the relevant docs or they haven't been written :) 19:37:00 i guess clustering with zk doesn't eliminate the need for forklift migrations 19:37:34 mordred: or we just pcik a slow day and do it separately 19:37:38 yah 19:37:39 (to reduce things in flux) 19:37:53 ++ 19:37:57 I know a lot of people are not around that first week of january but I will be here to do things like this if we want 19:38:00 speaking of upcoming outages, we still need to do the mysql charset migration for gerrit right? 19:38:06 fungi: yes 19:38:36 (but its also getting close to feature freeze and release and all that) 19:38:43 so anything we decide on we should run by the release team 19:38:47 yup 19:38:56 similarly, we need an outage to fix nova-specs 19:39:15 and of course pending project rename(s) 19:39:18 and there's a rename request for some months which needs an outage 19:39:27 wonder how many of these things we can safely roll together 19:39:49 for the project rename front we said last week? or was it week before last we just need to write down what the new process is in zuulv3 land 19:39:59 I have not had time to look at that with the psrint and other stuff going on 19:40:10 if someone else wants to do that the help is much appreciated :) 19:40:15 mostly to do with order of operations merging changes? 19:40:28 ya basically keeping zuul happy as things move 19:40:51 (remove project from zuul, stop gerrit, rename things, start gerrit, add back to zuul with new name or similar)