22:02:45 <jeblair> #startmeeting zuul
22:02:46 <openstack> Meeting started Mon Aug 21 22:02:45 2017 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
22:02:47 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
22:02:49 <openstack> The meeting name has been set to 'zuul'
22:02:51 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/Zuul
22:03:00 <jeblair> #link previous meeting http://eavesdrop.openstack.org/meetings/zuul/2017/zuul.2017-08-14-22.03.html
22:03:26 <jeblair> the agenda remains: "What needs to happen before PTG" (as i expect it to until the ptg)
22:03:43 <jeblair> #link pre-ptg etherpad https://etherpad.openstack.org/p/zuulv3-pre-ptg
22:04:02 <jeblair> so let's go through those which we identified last time
22:04:23 <jeblair> actually, i'm going to do easy things first
22:04:33 <jeblair> #topic startup time
22:04:52 <jeblair> we ran this test, at least as best we can until we have more jobs defined
22:05:11 <jeblair> we got a baseline for how long it will take zuul to checkout all branches of all projects
22:05:30 <jeblair> if we size zuulv3 at least as large as zuulv2, things look good.  it should be able to start in less than 2.5 minutes
22:05:53 <jlk> switching to phone, family dragging me to icecream
22:06:02 <mordred> that's excellent time for an initial start - especially considering it doesn't need to initial start much
22:06:08 <jeblair> i think that's quite manageable for a system that large which is, after all, not actually intended to restart very often
22:06:10 <jeblair> mordred: ya that :)
22:06:36 <jeblair> (ftr: that's ~1600 repos, with 8 mergers and 8 executors)
22:06:49 <Shrews> is a progress indicator emitted during that?
22:06:52 <clarkb> and that was with pre primed repos right?
22:07:09 <clarkb> (that may be worthy of deployment documentation if expected to be used)
22:07:19 <jeblair> Shrews: no
22:07:32 <jlk> Not bad at all.
22:07:42 <jeblair> clarkb: yes -- that's a good point.  if we want to minimize this for our very first restart, we should prime the git repos on our new hosts
22:07:53 <jlk> There are logs in the file...
22:08:09 <mordred> jeblair: was 2.5 minutes with or without pre-primed git repos?
22:08:19 <jeblair> clarkb: i don't think i'd mention this in general documentation though.  most other zuul v3 instances will be able to grow to this size
22:08:24 <jeblair> mordred: with pre-primed
22:08:27 <mordred> ok. cool
22:08:33 <jeblair> primed in this case means they were already cloned onto the host
22:09:22 <mordred> and yah, I don't think it's super useful for normal docs - our migration is a weird special case of starting a new v3 at massive size :)
22:09:25 <jeblair> clarkb: by which i mean, it's really only the case where a zuul v3 instance springs from nowhere with 1600 repos that it's worth considering
22:09:49 <jeblair> i'm comfortable scratching this from the pending list; any other concerns?
22:10:02 <mordred> the results have exceeded my expectations
22:10:08 <jlk> That's like replacing an existing Zuul site with a new host and new file systems.
22:10:18 <jlk> So, like a major failure.
22:10:40 <mordred> yah. and in that case, pre-cloning the repos isn't likely to buy you much in terms of downtime response
22:10:54 <jeblair> jlk: yeah, and if you're a large site, you'd have to lose all your (ideally many) executors+mergers to achieve this level of failure
22:10:58 <mordred> in fact, it's likely to make it slower, since you'll have to clone the repos which means scripting that real quick
22:11:08 <jeblair> mordred: indeed
22:11:24 <jlk> So a very unlikely scenario.
22:11:43 <jeblair> #topic fix for "A worker was found in a dead state" bug
22:11:53 <jeblair> we didn't actually talk about this last time
22:12:18 <jeblair> but i put it on the list because it was killing a significant portion of our ansible-playbook runs
22:12:34 <jeblair> good news: we tracked it down to a python segfault which has been fixed in current versions of python
22:12:54 <jeblair> mordred created a ppa with the backported bugfix (which we're running now i believe?)
22:13:07 <jeblair> and SpamapS started the ubuntu SRU process for it
22:13:32 <mordred> https://launchpad.net/~openstack-ci-core/+archive/ubuntu/python-bpo-27945-backport <- ppa exists with python3.5 package built
22:13:42 <mordred> so the package there is good to go
22:13:53 <jeblair> mordred: ah, do we still need to add that to our puppet?
22:14:23 <SpamapS> I did
22:14:26 <mordred> no, we have done that already
22:14:28 <jeblair> cool
22:14:30 <jlk> Neat.
22:14:31 <SpamapS> and sorry I missed the start AGAIN... weird day. :-P
22:14:36 <mordred> https://review.openstack.org/495399 was merged
22:14:43 <jeblair> SpamapS: sundial messed up?
22:14:45 <mordred> jlk, SpamapS: y'all likely want to add that :)
22:14:46 <SpamapS> exactly
22:15:29 <jeblair> SpamapS: let us know what happens with the sru process, please :)
22:15:33 <SpamapS> I would expect that upload to be released into xenial-proposed within a week.
22:15:53 <SpamapS> and once it's there, we should report back that the binaries work, and it will spend another few days in proposed before they release it in updates.
22:15:59 <SpamapS> I suggest subscribing to the bug.
22:16:27 <jlk> Okay
22:16:32 <jeblair> SpamapS: have the bug link handy?
22:16:49 <jeblair> oh i do
22:16:54 <SpamapS> https://bugs.launchpad.net/ubuntu/+source/python3.6/+bug/1711724
22:16:55 <openstack> Launchpad bug 1711724 in python3.5 (Ubuntu Xenial) "Segfaults with dict" [High,In progress] - Assigned to Clint Byrum (clint-fewbar)
22:17:06 <jeblair> #link https://bugs.launchpad.net/ubuntu/+source/python3.6/+bug/1711724
22:17:16 <jeblair> SpamapS: thanks!
22:17:31 <jeblair> #topic tarball/publish jobs
22:17:46 <jeblair> #link https://etherpad.openstack.org/p/mVSVwG4xos
22:18:02 <jeblair> pabelanger and mordred have been working on this
22:18:19 <mordred> pabelanger has some next patches up - I've got some locally I wrote on the plane that follow up to his patches that I'll get pushed up after the meeting
22:18:43 <jeblair> #link https://review.openstack.org/494672
22:18:45 <pabelanger> yes, python-branch-tarball should be ready fore testing
22:18:50 <pabelanger> also added upload-twine role
22:19:03 <jeblair> #link https://review.openstack.org/495972
22:19:08 <jeblair> i think those are those two patches ^
22:19:17 <pabelanger> yes
22:19:36 <pabelanger> https://review.openstack.org/495973/ uses the new twine role
22:19:43 <jeblair> pabelanger: would you mind keeping the etherpad updated and adding links to those patches in there?
22:19:52 <jeblair> i just added those, i mean in the future
22:19:53 <pabelanger> jeblair: sure
22:20:06 <jlk> Also ping me if you need more work. I have an empty plate.
22:20:34 <dmsimard> jeblair: pong (sorry got sidetracked)
22:20:51 <pabelanger> release-openstack-tarball still needs to be pushed up, but holding off until we get branch-tarball working
22:20:52 * dmsimard reads backlog
22:21:31 <jeblair> pabelanger: you mean release-openstack-python?
22:21:42 <pabelanger> Ah, yes
22:22:32 <pabelanger> actually, I'll push up that review shortly
22:22:44 <jeblair> pabelanger: thank you.  that will let us work on things in parallel
22:22:53 <pabelanger> agree
22:23:06 <jeblair> pabelanger, mordred: do you think we can wrap up these jobs in the next 2 days or so?
22:23:23 <pabelanger> yes, I hope we can finish them for tomorrow
22:23:56 <mordred> jeblair: yes. I agree with pabelanger on tomorrow
22:24:01 <jeblair> okay, thanks!
22:24:03 <mordred> they're very close
22:24:12 <pabelanger> if we land 494672 today, that would be helpful too
22:25:40 <jeblair> please continue to ping me as soon as any patches directly related to these efforts are ready for review
22:25:53 <jeblair> shall we move on to devstack now?
22:26:05 <pabelanger> yes
22:26:15 <jeblair> #topic devstack jobs
22:26:18 <jeblair> #link https://etherpad.openstack.org/p/AIFz4wRKQm
22:26:28 <jeblair> there's the brainstorming etherpad for this one
22:26:50 <jeblair> jlk: i suspect that there may be more opportunity for you to jump in here, as compared to the tarball jobs
22:26:58 <jlk> Okay
22:27:46 <mordred> jeblair: I did some (mostly useless) noodling in an off moment over the weekend - I do not think we'll wind up being able to use anything I poked at over the weekend directly...
22:28:10 <jeblair> once mordred and pabelanger finish up with the tarball jobs, i expect their focus will shift to this
22:28:11 <pabelanger> yes
22:28:24 <mordred> jeblair: but one thing that jumped out that we're missing from the list is a role or something to get from our new repo structure to somethign devstack can consume
22:28:26 <jeblair> and clarkb has volunteered to review some of this as well :)
22:28:39 <jlk> I'll read up this evening / tomorrow.
22:28:44 <mordred> because the PROJECTS list and repos on disk is drastically different
22:28:58 <pabelanger> I think mordred added devstack-gate to zuulv3 last week?
22:29:06 <mordred> yes. d-g is in v3 currently
22:29:40 <pabelanger> great
22:30:19 <jeblair> mordred: good point.  i added an item to the etherpad todo list
22:30:24 <mordred> jeblair: my hunch was that it might be good for you to at least eyeball that and ponder it
22:31:00 <jeblair> mordred: i expect 'required-projects' to be a replacement for the $PROJECTS variable
22:31:06 <mordred> I do too
22:31:10 <pabelanger> +1
22:31:15 <jeblair> clarkb's work to reduce the use of that to minimum will be helpful here
22:31:55 <jeblair> so yeah, i'll plan on thinking about that next
22:32:22 <pabelanger> one question on d-g, for legacy hooks, that is basically just going to be a shell task right?
22:32:26 <jeblair> in the mean time, i started on the localrc ansible module we discussed.  that's a nice out-of-the way thing i could start while other folks were finishing the tarball jobs
22:32:33 <mordred> jeblair: ++
22:32:51 <jeblair> the guts of that are done, i just need to wrap it up in module boilerplate.  should have that up tomorrow.
22:33:25 <jeblair> pabelanger: the etherpad says: make "legacy" playbooks that run hook scripts with vars collected by part one
22:33:32 <mordred> yah
22:33:34 <jeblair> (part one is "process env vars")
22:33:53 <mordred> and in non-legacy jobs people can just use pre-tasks as needed
22:34:09 <jeblair> pabelanger: so yeah, i think so -- a playbook with a shell task with the current hook content
22:34:21 <jeblair> automatically generated by the migration script
22:34:29 <mordred> this: https://review.openstack.org/#/c/495930/ does not work - but is the first attempt at dealing with part one "process env vars" btw
22:34:49 <mordred> it does not work and probalby should be deleted - but that was the inspiration
22:34:49 <pabelanger> Right, so I guess we'll have to do some magic for bash variables like WORKSPACE too
22:35:19 <mordred> well - the idea in my brainhole is that we make a thing that produces all of the legacy envvars that things normally run with
22:35:31 <pabelanger> okay, cool
22:35:37 <mordred> then when we take a hook script, we run it in a script that first sources those vars, then runs the hook script
22:35:51 <pabelanger> perfect
22:36:03 <jeblair> we'll want legacy vars for a bunch of types of jobs, so that's probable a role to generate the file, then source when appropriate?
22:36:19 <mordred> yah
22:37:15 <jlk> A role that wraps a script, script passed as role var at role call?
22:37:29 <pabelanger> not that I am looking for an answer, but have we given any thought on how long we'd support a legacy hook for?
22:37:38 <mordred> jlk: yah- something like that
22:38:22 <mordred> pabelanger: preferrably not a super long time :)
22:38:36 <jeblair> pabelanger: i think once we get the automagic conversion done, we (openstack) set a release+1 goal for projects to migrate their own jobs (and help projects as we're able)
22:38:47 <mordred> jeblair: ++
22:38:49 <pabelanger> okay, cool
22:38:50 <jlk> Less time than keystone v2...
22:39:14 <jeblair> anything else devstack related?
22:39:19 <mordred> also - fwiw, MOST of the hook scripts are one line scripts
22:39:24 <mordred> that just call a script in the repo
22:39:31 <clarkb> which then calls our d-g script
22:39:34 <pabelanger> mordred: tripleo-ci is the usecase I am thinking off
22:39:35 <clarkb> so we also have to keep that around
22:39:37 <pabelanger> of*
22:39:43 <SpamapS> Just a hook, script, and a jump?
22:39:49 <mordred> the ones that are more than one line are mostly 2 lines, with the first line being a CD
22:39:52 <mordred> cd
22:39:54 <mordred> gah
22:40:07 <jeblair> CD \
22:40:17 <pabelanger> jeblair: nothing more here
22:40:35 <jeblair> #topic migration script
22:40:44 <mordred> so thankfully there's not a ton of actual complex logic in the job definitions themselves - and we can probably take any of the very few actual special cases and nudge them to fit into the pattern of the otherhook scripts
22:41:04 <jeblair> this is basically pending completion of the publish and devstack jobs
22:41:23 <jeblair> mostly because we need to know what to migrate too
22:41:48 <jeblair> i'm hoping we can get those jobs wrapped up very soon and have at least a couple of weeks to work on the migration script and deal with the output
22:41:56 <mordred> ++
22:42:11 <pabelanger> ++
22:42:39 <mordred> luckily the migration script is a thign that just reads local yaml files - so it's actually easy to iterate on locally
22:42:56 <mordred> it may hurt one's brain - but other than that it's just normal hacking
22:43:39 <jeblair> #topic migration docs
22:43:52 <jeblair> Shrews has started to pick up work on this
22:43:59 <jlk> YAML2YAML The YAMLing
22:44:31 <jeblair> updating the infra-manual migration page based on his experiences trying to use zuulv3 with almost no relevant documentation :)
22:44:31 <Shrews> https://review.openstack.org/495971
22:44:44 <jlk> Yay
22:45:23 <jeblair> Shrews: should i continue to flesh out some of the todo items i left in there?  or did you want to take some of them over?
22:45:47 <Shrews> that link is just for a 0-day, "omg, how do i do something" guide
22:46:39 <Shrews> jeblair: i'm afraid of assuming where you were heading with some of those todos
22:46:39 <jeblair> (eg: one of the important things i think we need to communicate is how the variant binding works.  there's some tribal knowledge about metajobs and skip-if that we need to tell people how to do with variants)
22:46:45 * jlk has to go afk. Will review logs.
22:47:20 <jeblair> Shrews: okay, i'll continue to poke at them as able, and maybe try to trick you into doing some if i can articulate it adequately :)
22:47:24 <Shrews> jeblair: does what i put up cover your "actually work" todos?
22:47:40 <Shrews> or were you planning to go deeper?
22:48:18 <SpamapS> There should definitely be a "zuulv3 for the jjb programmer"
22:48:34 <Shrews> i guess it doesn't really cover inheritence or roles very well
22:50:39 <pabelanger> has anybody looked at project-template from python-jobs?
22:51:12 <pabelanger> possible that is in mordred migration script, but I haven't looked
22:53:01 <jeblair> Shrews: i'm not sure; there's some good stuff in there, but it almost looks like it's more aimed at the user who hasn't used zuulv2 in openstack.  i think we'll be able to use a lot of that in the project drivers guide of the infra-manual to replace the current jjb/zuulv2 stuff that's there.
22:54:47 <jeblair> i'll leave some suggestions as to how we might change things for the migration audience
22:57:09 <jeblair> anything else?
22:58:24 <jeblair> i forgot to start the meeting with our countown clock, so i'll end it with: we have 15 workdays until the scheduled ptg cutover
22:58:27 <jeblair> thanks everyone!
22:58:30 <jeblair> #endmeetig
22:58:31 <jeblair> #endmeeting