22:01:48 <jeblair> #startmeeting zuul
22:01:49 <openstack> Meeting started Mon Nov 21 22:01:48 2016 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
22:01:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
22:01:52 <openstack> The meeting name has been set to 'zuul'
22:01:57 <Shrews> yo
22:02:10 <jeblair> hi all.  i know we have some apologies for absences today...
22:02:19 <phschwartz> Hi
22:02:26 <jeblair> so this might be brief
22:02:33 <jeblair> i know that would be disappointing
22:02:37 <jeblair> but we will cope
22:02:43 <Shrews> say it ain't so
22:02:46 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/Zuul
22:03:04 <jeblair> there is our shiny new agenda!
22:03:19 <jeblair> #topic Actions from last meeting
22:03:25 <jeblair> jeblair set up meeting agenda wiki page
22:03:26 <jeblair> done!
22:03:34 <jeblair> #action jeblair work with Shuo_ to document roadmap location / process
22:03:38 <jeblair> not so much with that one yet
22:03:59 <jeblair> #topic Status updates
22:04:10 <jeblair> let's talk about nodepool/zookeeper
22:04:15 * zaro lurks for the latest news
22:04:17 <jeblair> i think we're really really close there
22:04:27 <pabelanger> Ya, I'm happy how well it is going too
22:04:53 <jeblair> pabelanger, Shrews: are we about there on the test-reenablement front?
22:05:39 <pabelanger> I think we are in good shape. I believe all our existing tests ther were disabled, have been re-enabled (up for review or merged)
22:05:42 <pabelanger> I
22:05:45 <Shrews> 4 in test_commands, 3 in test_nodepool, 1 in test_webapp
22:05:54 <Shrews> according to a quick "git grep"
22:06:05 <Shrews> some of those probably have reviews up
22:06:20 <jeblair> yeah, i know there are at least a few
22:06:28 <jeblair> so we're probably down to the last 1-3 or something like that
22:06:54 <jeblair> we've also been polishing up the procedures for enabling/disabling images/providers
22:07:13 <jeblair> and we identified last week an operational need for a 'pause' attribute to temporarily suspend image builds
22:07:28 <jeblair> i think that will end up being a 10 line change, i can write that after the meeting
22:07:57 <jeblair> we also recently discussed rolling this out into production
22:08:20 <jeblair> and that we would like to spin up a new builder host and run this on it in parallel while we continue using the current system
22:08:58 <jeblair> that way we can watch image builds and uploads, make sure it works well, and see if we can get a quorum of images in zk
22:09:07 <jeblair> (if we can't, we could always port the data over)
22:09:35 <jeblair> i think we can probably make that happen by the end of wednesday?  what do folks think?
22:10:23 <Shrews> i'm out on Wednesday
22:11:34 <pabelanger> internet is poor where I am :(
22:11:43 <jeblair> Shrews: maybe we can make it happen before then :)  but also, maybe by then we'll just be down to puppet changes
22:12:20 <pabelanger> Ya, I've starte on puppet-nodepool changes. I can pick that up again this week
22:13:08 <jeblair> pabelanger: let's see how far we can get
22:13:20 <pabelanger> ack
22:13:37 <Shrews> how many ZK nodes in the cluster are you planning to use?
22:13:37 <rcarrillocruz> i'll be around, so i can check things during EMEA times if needed
22:13:45 <jeblair> Shrews: just one for starters
22:13:53 <jeblair> rcarrillocruz: cool!
22:14:27 <jhesketh> ditto APAC time for me
22:14:49 <Shrews> jhesketh: who needs redundancy?  :-P
22:15:00 <Shrews> err, jeblair
22:15:09 <Shrews> darn you, tab completion
22:15:53 <jeblair> Shrews: not us -- not on a system which is, today, entirely contained on a single host.  :)
22:15:58 <fungi> wednesday will mostly be entertaining in-laws for me, so not much help sorry :/
22:16:25 <Shrews> jeblair: maybe we should do some disaster testing... like, killing the single ZK node to see what breaks and what cleanup looks like
22:16:26 <phschwartz> About the same here but for most of the morning est I can lend a hand.
22:16:38 <jeblair> Shrews: we can expand that perhaps after we start running a builder, and maybe before we start running separate node workers.
22:17:08 <phschwartz> If this will be turned on at all in production I would advise against 1 zk node
22:17:09 <pabelanger> I'll be on a train again Wednesday, but should be able to help too
22:17:16 <jeblair> Shrews: yeah, that's a good idea.  i feel certain we're going to want some "--force" commands to forcibly alter the ZK tree.
22:17:40 <jeblair> Shrews: and of course, developing some tests for that once we figure out how would be nice as well.  :)
22:17:51 <Shrews> yup
22:18:38 <jeblair> phschwartz: well, we want the entire system to scale down to being able to run on one node (zuul and zookeeper) for the very small case.  so if zk can't run with just a single node, we have made a very bad mistake.
22:18:54 <mordred> ohai
22:19:09 <jeblair> i haven't encountered anything that suggests that though
22:19:19 <phschwartz> You can run it with a single zk node but there is no data safety guarantee which is bad.
22:20:01 <phschwartz> You can loose the running data set with the last journal set to replay from which can be missing stuff.
22:20:49 <phschwartz> It is not a show stopper for dev systems and such. But I would say yes for prod if we don't want a chance of loosing stuff.
22:21:24 <jeblair> well, if we want to support one-node operation, we may need to be aware of that limitation and work around it
22:21:46 <phschwartz> Ack
22:22:06 <Shrews> yep, thus my suggestion for disaster scenerio testing
22:22:28 <mordred> what does "lose the running data set with the last journal set to replay from which can be missing stuff" mean?
22:22:40 <mordred> is that a thing that's a case that happens if the machine restarts hard?
22:22:47 <mordred> or just a thing that can happen while it's running?
22:22:50 <Shrews> mordred: it means we all go drinking when that happens
22:22:55 <mordred> neat!
22:22:56 <jeblair> (for instance, it sounds like we might end up leaking an image or node -- fortunately, openstack has taught us to be pretty skeptical of that sort of thing, so we often have routines that deal with stuff like that)
22:23:06 <clarkb> mordred: hard restart
22:23:13 <clarkb> its no different than today with gearman aiui
22:23:21 <pabelanger> right
22:23:48 <mordred> ack
22:24:18 <phschwartz> The only difference is the journal can be damn large and take a long time to replay to get back up to date
22:24:29 <phschwartz> But not a showstopper by any means.
22:24:48 <jeblair> cool, i'm in favor of building systems that are pretty resilient to that kind of thing.  i think nodepool is currently to a signifigant degree.  we'll run into some edge cases with zk which are new to us and we'll have to fix them up.  but dealing with surprising data is something we can do.
22:27:37 <jeblair> i think that covers it for nodepool for now
22:27:48 <jeblair> any status updates re zuul?
22:28:12 <pabelanger> nothing, for me. I've shifted to nodepool this past week
22:28:42 <jeblair> (and yeah, i think the push on nodepool right now is a good idea)
22:28:59 <rcarrillocruz> i have a handful of devstack-gate ansible roles in review for some time
22:29:02 <jhesketh> jeblair: it's not v3 related, but I'd like reviews on the sql reporter: https://review.openstack.org/#/c/223333/
22:29:17 <mordred> I keep thinking that's landed
22:29:36 <mordred> rcarrillocruz: oh yeah? you got a topic?
22:29:40 <rcarrillocruz> also, awaiting for the secrets spec to land ( i +1, jhesketh too), so i can move forward implementation
22:29:44 <rcarrillocruz> mordred: zuulv3
22:29:45 <jeblair> jhesketh: oh yes, that's a good idea.  i'm happy to do so (i think it's grandfathered in to pass the v2 feature freeze)
22:30:05 <jhesketh> I'm also working on merging master into feature/zuulv3 which I assume we still want to do
22:30:13 <jeblair> jhesketh: ++
22:30:31 <jeblair> rcarrillocruz: i'm not seeing your changes under topic zuulv3
22:30:52 <jhesketh> I poked around merging nodepool master into the feature branch too which looks like it might pick up some useful changes... Is this a thing we want to try and do or should I go through and cherrypick anythign that might be applicable given how close it is
22:31:06 <mordred> jeblair: I do actually: https://review.openstack.org/#/q/status:open+project:openstack-infra/devstack-gate+branch:master+topic:zuulv3
22:31:16 <rcarrillocruz> jeblair: https://review.openstack.org/#/q/topic:zuulv3 i do, maybe cos those changes are openstack-infra/devstack-gate ?
22:31:19 <mordred> rcarrillocruz: sorry - I had missed those - they're on my list now
22:31:26 <rcarrillocruz> no worries, thanks
22:31:41 <jeblair> rcarrillocruz: i'm very excited that's happening!  :)
22:31:49 <rcarrillocruz> ++
22:32:11 <jeblair> rcarrillocruz: oh, i see my problem.  sorry.  :)
22:32:44 <jeblair> jhesketh: if you can give that a shot, i think it would be useful.
22:32:58 <rcarrillocruz> we need cross-project topics in gerrit naow :P
22:33:03 <jeblair> jhesketh: i hope the builder in master hasn't changed much
22:33:04 <jhesketh> jeblair: we'll need to set the acl's to allow merge commits, so I'll do that shortly
22:33:22 <jeblair> jhesketh: cool.
22:33:24 <jhesketh> err, there's some big merge conflicts that I need to figure out if they are useful
22:34:03 <jeblair> jhesketh: ok, let me know if you need help sorting through that.
22:34:09 <jhesketh> thanks :-)
22:34:23 <jhesketh> so I realise I'm jumping back to nodepool again, but after we test/prove the new system, how soon do we plan on cutting the feature branch over to master?
22:34:34 <jeblair> and, ftr, i think the plan is that once we are ready to use this builder in production, we will merge the zuulv3 branch of nodepool into master
22:34:34 <jhesketh> can we do that before zuulv3 is ready for example? (ie asap)
22:34:39 <jeblair> jhesketh: yep
22:34:52 <jeblair> puppet-openstackci has a pin on it, so it should be safe
22:35:05 <jhesketh> heh, cool, thanks :-)
22:35:26 <jeblair> and, it is definitely something people can start running if they want
22:35:33 <jhesketh> jeblair: you want to merge the feature branch into master? or do you want to switch the branch to become master
22:35:34 <jeblair> ie, it's still compatible with zuulv2
22:36:08 <clarkb> jhesketh: merge otherwise its non fast forward
22:36:22 <jeblair> jhesketh: i think merge into master (but prefer v3 in conflict resolution -- which, hopefully, there won't be any after you finish your work)
22:36:25 <jhesketh> okay, we'll have to resolve the diffs then
22:36:31 <jhesketh> yep
22:39:00 <jeblair> one last status update:  SpamapS found an issue with zuulv3 job configuration in his work on tests.  i think i have a solution for that which i will write a patch for soon
22:39:37 <jeblair> moving on -- i'm going to skip progress summary since SpamapS isn't around...
22:39:44 <jeblair> #topic Announcements of areas of interest
22:40:07 <jeblair> anyone have anything new they'd like to start working on?
22:40:37 <rcarrillocruz> i have, but don't want to put more on my plate till i have something decent for the zuul secrets stuff
22:40:50 <jeblair> rcarrillocruz: ++ that's a big one
22:41:07 <jeblair> and actually, with that...
22:41:08 <rcarrillocruz> in particular, nodepool working with other node providers, which can be another big one :P
22:41:46 <jeblair> rcarrillocruz: yes, that will be important, but it's also not in our critical path to get the initial zuulv3 version out the door
22:41:54 <rcarrillocruz> ++ agreed
22:42:30 <jeblair> rcarrillocruz: but that will build on the next stage of nodepool work, so good to be involved in that and make sure we're setting things up so we can accomplish that later
22:42:51 <jeblair> (the use zk for node builders phase)
22:42:54 <rcarrillocruz> i'll def. keen an eye on that
22:43:23 <fungi> jeblair: are you planning to propose the outstanding zuul spec updates at the infra meeting tomorrow, or giving them another week to firm up first?
22:43:25 <jeblair> i think next week i should be able to clean up the spec update for that, and maybe we can review it
22:43:40 <jeblair> #topic Secrets spec update
22:43:42 <fungi> aha, excellent
22:43:52 <jeblair> #link https://review.openstack.org/386281
22:43:56 <jeblair> i think that is ready now
22:44:13 <jeblair> fungi: so i'd like to put it on tomorrows agenda
22:44:26 <fungi> sounds great
22:44:45 <jeblair> mordred, fungi: i would especially like your reviews on it before it merges
22:45:00 <fungi> already have the latest diff pulled up, thanks for the reminder
22:45:08 <jeblair> but you've both been involved in it enough now that i think we can safely open it up for council vote
22:45:22 <fungi> i agree
22:45:37 <jeblair> (and of course, i only didn't mention rcarrillocruz there since he already voted on it :)
22:45:46 <rcarrillocruz> ++
22:46:31 <jeblair> but anyone else who would like to go over it with a fine tooth comb, that would be great.  there is much opportunity for us to shoot ourselves in the foot with this.  :)
22:47:07 <jeblair> i'll put it on the infra meeting agenda
22:47:49 <jeblair> fungi: and yeah, the other 2 outstanding ones i'm not quite ready for, but hope to clean up after this one
22:48:13 <jeblair> #topic Open Discussion
22:48:23 <jeblair> ...so not that short after all...
22:49:18 <Shrews> Many thanks to pabelanger for jumping in on the nodepool test enablement and new tests. That pushed us along quite awesomely.
22:49:30 <rcarrillocruz> \o/
22:49:32 <Shrews> and found cool bugs
22:49:57 <pabelanger> Shrews: no problem, thanks for the awesome zk.py, for make it easy to use :)
22:49:58 <clarkb> for turning the builder on if we preserve the ezisting log publiahing that should help people debug any potential issues
22:50:04 <clarkb> also exct upload failures
22:50:12 <jeblair> yes, i'd like to add that it's been really fun working with Shrews, pabelanger, and clarkb on pushing hard on getting the nodepool builder ready for production :)
22:50:42 <clarkb> ok now I afk again
22:51:32 <jeblair> shall we end this with clarkb's mic drop?
22:51:37 <Shrews> word
22:51:43 <jeblair> #endmeeting