#openstack-meeting-alt log

22:03:44 <jeblair> #startmeeting zuul
22:03:45 <openstack> Meeting started Mon Jun  5 22:03:44 2017 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
22:03:46 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
22:03:49 <openstack> The meeting name has been set to 'zuul'
22:04:14 <jeblair> #link previous meeting http://eavesdrop.openstack.org/meetings/zuul/2017/zuul.2017-05-22-22.04.html
22:04:22 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/Zuul
22:04:30 <jeblair> #topic Actions from last meeting
22:04:43 <jeblair> pabelanger create zuul-base-jobs and zuul-jobs repos
22:04:53 <pabelanger> they should be created
22:05:02 <jeblair> w00t
22:05:17 <pabelanger> confirmed
22:05:27 <jeblair> i think reviews there have probably taken a back seat to the server work
22:05:35 <SpamapS> o/
22:05:41 <pabelanger> yes, I haven't started migrating anything into them as of yet
22:05:55 <pabelanger> likely after new servers are online I can switch back to that
22:05:58 <jeblair> #topic Status updates:  Zuul sample jobs
22:06:03 <jeblair> let's just segue into that...
22:06:18 <jeblair> since i think this rightly takes a back seat to the new server work right now
22:06:38 <jeblair> since we've just landed some changes that will prevent us running zuulv3 on the current dev server if we restart it
22:06:47 <pabelanger> ya, I haven't done much work over the last week on jobs
22:06:57 * mordred waves
22:07:11 <pabelanger> zuulv3.o.o is online, working on ze01.o.o next
22:07:41 <jeblair> it makes sense to me to prioritize getting the xenial server up, then resume work on the jobs.
22:07:44 <jeblair> there's also one more config-breaking change that i think we need to land before we can really push on this
22:08:01 <pabelanger> wfm, I plan on focusing on ze01 tomorrow
22:08:01 <jeblair> it's the one that should let us more easily share repos across zuul installs
22:08:17 <SpamapS> oooo like, puppetized zuulv3.o.o ?
22:08:19 <SpamapS> that's nice. :)
22:08:26 <jeblair> (eg, share repos between zuul and bonnyci, etc)
22:08:30 <jeblair> SpamapS: yep!
22:08:44 <pabelanger> python3 too
22:09:04 * SpamapS is playing with trying to get jlk's containerization work in zuulv3 deployed into Bluemix's shiny new k8s service.
22:09:14 <jeblair> (in order to share repos effictively, we need to be able to indicate that we only want to load certain config objects from certain repos, eg, bonnyci does not want to load our pipeline config.  this is also important for third-party ci)
22:10:16 <mordred> jeblair: ++
22:10:20 <pabelanger> great
22:10:41 <jeblair> #info zuul now requires bubblewrap which needs at lesat ubuntu xenial, and will require python3 soon
22:10:46 <jeblair> #info work on puppetized production xenial zuulv3.openstack.org is in progress
22:11:19 <pabelanger> 469994 could use a review
22:11:25 <pabelanger> that is the puppet executor part
22:11:39 <pabelanger> system-config is ready
22:12:08 <jeblair> #info effective use of sample jobs requires a change to zuul to specify which config objects should be loaded per-project
22:12:14 <jeblair> #info work on sample jobs will resume when xenial server work and repo-config-object change is complete
22:12:39 <pabelanger> I have to step away for 20mins or so
22:12:47 <jeblair> pabelanger: thanks!
22:13:01 <jeblair> #topic Status updates: Zuul test enablement
22:13:16 <jeblair> i know there are some outstanding changes in need of review
22:13:36 <SpamapS> I don't think we have any left that are straight forward
22:13:58 <jeblair> i hope to review them soon; i've just been trying to push some production-blockers through
22:14:18 <jeblair> one of which might (depending on how you count) dispose of a number of tests
22:14:28 <jeblair> (it deals with the cloner tests)
22:15:27 <jeblair> i hope to have this bubble back up to the top of my list soon
22:15:34 <jeblair> anything else related to test enablement?
22:15:51 <jeblair> (as usual, if anyone is working on a test and has questions, feel free to ping me)
22:15:57 <SpamapS> The sshkey one is a bit confusing to me
22:16:13 <jeblair> SpamapS: which one is that?
22:16:21 <SpamapS> jeblair: test_timer_sshkey
22:16:33 <SpamapS> it tries to find an SSH wrapper for gerrit when a job is started via timer
22:16:42 <SpamapS> but the code that makes ssh wrappers is gone? or something? I'm not sure.
22:17:28 <SpamapS> I've bounced off of it twice.
22:17:30 <jeblair> SpamapS: hrm, i'll look at that post-meeting
22:17:42 <SpamapS> yeah it's the only one I'm still trying to tackle
22:17:53 <SpamapS> the others are related to your branch work and the cloner work
22:18:12 <SpamapS> (I think if you count all in-flight we're down to 5 that have not been addressed)
22:18:29 <jeblair> \o/
22:18:49 <jeblair> #topic Status updates: Github
22:19:03 <jeblair> i propose we drop this from the fixed agenda, on account of the branch has landed :)
22:19:47 <jeblair> (there's still more work to do, but we have something that, if not working, at least passes tests :)
22:19:51 <SpamapS> Seconded.
22:20:41 <jeblair> motion carries, or something
22:20:42 <jeblair> #topic Status updates: Bubblewrap
22:20:46 <jeblair> ditto :)
22:21:21 <jeblair> i'm not sure anyone has run this in production yet, and we won't until we have the xenial server
22:21:22 <SpamapS> There is one minor topic here.. which is that with bubblewrap landing, we've made the zuul-executor difficult to run in a container
22:21:24 <jeblair> but that's rsn.
22:21:48 <jamielennox> omg, github AND bubblewrap landed? it's like christmas
22:21:50 <clarkb> SpamapS: it currently needs to be privileged container?
22:21:53 <SpamapS> Containers running zuul-executor need to be run privileged now.
22:22:11 <SpamapS> Or you can use the nullwrap driver and drop bubblewrap
22:22:14 <mordred> jamielennox: welcome back!
22:22:18 <jamielennox> o/
22:22:39 <pabelanger> have internet again
22:23:02 <jeblair> SpamapS: how onerous a requirement would you say that is?
22:23:10 <pabelanger> they also need port 79/tcp right?
22:23:15 <jamielennox> SpamapS: also do you know exactly why?
22:23:16 <SpamapS> So, that's why I started playing with deploying in Bluemix's k8s
22:23:22 <Shrews> pabelanger: no, that's configurable now
22:23:24 <SpamapS> Which does allow your pods to be privileged.
22:23:30 <tristanC> if zuul-executor needs to contains it's own execution, what's the issue with it running in a privileged container?
22:23:39 <pabelanger> Shrews: cool, wasn't following that. thanks
22:23:58 <mordred> tristanC: that would make it harder to run in a public k8s service
22:24:01 <fungi> i guess the issue is that it becomes a nested containerization problem?
22:24:04 <SpamapS> tristanC: some container services don't allow you to run privileged.
22:24:07 <mordred> SpamapS: oh -it does? neat
22:24:18 <SpamapS> mordred: yeah, since your cluster is made up of VMs that you own.
22:24:39 <SpamapS> Anyway, it's just something I wanted to make sure we socialize.
22:24:44 <mordred> SpamapS: sweet. that should buy a little time- I had some thoughts on this topic over the weekend I need to vomit out somewhere so that you can comment on them
22:25:13 <pabelanger> using nullwrap driver seems to be a good compromise for unprivileged containers
22:25:21 <jeblair> pabelanger: compromise being the operative word
22:25:23 <pabelanger> would it make sense to run bwrap in a container?
22:25:27 <SpamapS> There's a thought I have also which is to possibly have untrusted-only and trusted-only executors which would make running w/ nullwrap possibly safer.
22:25:43 <SpamapS> but it's more of a high-scale high-class problem so I'll keep stewing on it.
22:25:44 <tristanC> then how difficult would it be to run a zuul-executor container for trusted config, and another one for untrusted config?
22:26:07 <tristanC> to move the nested logic one layer up...
22:26:17 <SpamapS> pabelanger: it definitely makes sense to run bwrap in a container, since the point is to shield executor data from untrusted playbooks.
22:26:24 <mordred> tristanC, SpamapS: jobs can have trusted and untrusted content in the same job
22:26:35 <SpamapS> mordred: I know, that's why this is still stewing. :)
22:26:48 <pabelanger> SpamapS: right, I see what you are saying now
22:26:50 <mordred> SpamapS: stew is tasty
22:26:56 <SpamapS> and chunky
22:27:30 <jeblair> (the use case for that is: consider a user may want to take the logs from an untrusted job and put them somewhere that requires special access to write to)
22:27:37 <jeblair> (it's not farfetched :)
22:28:30 <jeblair> SpamapS: i guess the action item at the moment is: once we write the operational docs, make sure we treat this subject.
22:29:22 <SpamapS> jeblair: agreed
22:29:34 <jeblair> i expect docs will be a standing agenda item in the near future... :)
22:29:58 <jeblair> let's move on to...
22:30:03 <jeblair> #topic Progress summary
22:30:24 <SpamapS> so the board has been getting almost no attention from me lately
22:30:27 <SpamapS> #link https://storyboard.openstack.org/#!/board/41
22:30:30 <jeblair> i got a lot of emails from my cron script
22:30:35 <jeblair> so *something* has been happening :)
22:31:01 <SpamapS> Yeah we're using storyboard for sure
22:31:10 <jeblair> #action triage new lane in storyboard
22:31:14 <jeblair> #undo
22:31:15 <openstack> Removing item from minutes: #action triage new lane in storyboard
22:31:17 <jeblair> #action jeblair triage new lane in storyboard
22:31:21 <SpamapS> Lots of New stuff
22:32:12 <jeblair> and yeah, for me i've just been mostly head down trying to push stuff through :)
22:33:10 <jeblair> #topic nodepool-drivers status
22:33:13 <SpamapS> I do think we may want to think about pruning the board down to just blockers for zuulv3 going into prod for infra
22:33:33 <jeblair> #unfo
22:33:36 <jeblair> #undo
22:33:37 <openstack> Removing item from minutes: #topic nodepool-drivers status
22:33:41 <jeblair> SpamapS: yeah, we may be losing focus a bit there
22:33:50 <mordred> SpamapS: ++
22:34:00 <jeblair> maybe a new lane or a new board?
22:34:27 <SpamapS> I think a new tag and splitting of the things is in order
22:34:57 * jeblair updates a task he's working on
22:35:00 <SpamapS> zuulv30 maybe? like, zuulv3 is "zuulv3 running jobs in infra" and zuulv30 is "we finished the spec" ?
22:36:33 <jeblair> SpamapS: can we boolean 'and' the tags?
22:36:38 <jeblair> i can't remember if a board can do that
22:37:04 <jeblair> zuulv3 and openstack-infra or something might be nice...
22:37:15 <SpamapS> worth looking into
22:37:17 <jeblair> anyway, whatever it takes, that sounds good to me.  :)
22:37:25 <fungi> i think as long as the query itself works you should be able to use it in an automatic worklist or board situation too
22:37:49 <jeblair> ok, let's move on to nodepool drivers
22:37:51 <jeblair> #topic nodepool-drivers status
22:38:01 <jeblair> tristanC: i think you added this?
22:38:09 <jeblair> tristanC: care to introduce it?
22:38:23 <tristanC> indeed, well this is related to the nodepool-drivers spec
22:38:47 <tristanC> i *tried* to hack something together to get static node support in nodepool
22:38:47 * fungi notes that the spec in question has no assignees nor a storyboard story
22:39:28 <tristanC> fungi: *cough* https://review.openstack.org/#/c/468623/ :)
22:39:45 <fungi> awesome! rebasing my help-wanted change on top of that now
22:39:52 <jeblair> cool
22:40:02 <tristanC> so basically, if this what was expecting, then I could continue on this and start adding test
22:40:10 <Shrews> tristanC: any chance you've been collaborating with smyers (also a RH person) on that? his team is very interested in that as well
22:40:28 <jeblair> we can also add tristanC and smeyers if he's interested, (and maybe tobiash after he gets back?) to the spec as assignees too
22:41:33 <tristanC> Shrews: no, this is a one off change done over the week end, I can contact him for follow-ups
22:41:54 <jeblair> for my part, i very much want to look this over, but haven't had a chance yet.  i'm feeling some urgency to try to get some things out of the way that are blocking running zuulv3 in production (which should enable more folks to start working on other facets of using zuulv3 in openstack infra), so i've been trying to focus on that.
22:42:11 <clarkb> I liked where the implementation of things started but then as the changes were added on top I felt that the driver api became very muddied. I think we just need to carefully write down some api contracts and stick with them until we decide they are wrong. (tristanC seemed to agree in review so not very controversial just wanted to point it out to bigger audience)
22:42:34 <jeblair> basically, i'm trying to find a balance between staying focused on what we need to get into production asap, but i also don't want to block follow-on efforts for too long
22:42:36 <fungi> sounds like a reasonable approach
22:43:27 <SpamapS> that goes back to the /41 board
22:43:58 <SpamapS> if we can push some of the things to "later", I think we're pretty close to saying zuulv3 does all the things infra needs
22:43:59 <pabelanger> clarkb: I agree with your statement
22:45:02 <jeblair> SpamapS: yeah, i feel like we ought to be able to clear my urgent list by the end of the week
22:45:27 * mordred agrees with jeblair and SpamapS
22:45:34 <jeblair> tristanC: can you give me till the end of the week to review your work so far?
22:45:36 <SpamapS> jeblair: maybe if you could stop uncovering race conditions we could ship it. ;)
22:45:59 <fungi> we're going to need a bigger broom?
22:46:21 <jeblair> tristanC: that might be a good opportunity to try to loop smeyers into the process as well
22:46:22 <tristanC> clarkb: pabelanger: agreed, though this is quite difficult to do without disrupting too much the openstack driver
22:46:26 <pabelanger> should care be taken for linchpin driver when we are working on the nodepool driver? eg: some sort of ansible driver from nodepool?
22:46:37 <tristanC> jeblair: sure thing!
22:47:12 <tristanC> well to be honest, I stoped when the static and oci driver started to work, I'm sure the interface can be improved
22:47:51 <Shrews> pabelanger: yeah, i think linchpin is probably what smyers would be thinking of adding to it
22:48:07 <jeblair> possibly dumb question, but since folks are here, is "oci" something that can/should be handled by linchpin, or does it need its own driver?
22:49:08 <pabelanger> jeblair: that's one question I had, are we expecting people to write nodepool (python logic) drivers each time something new comes along or maybe abstract some of it via ansible
22:49:15 <jeblair> i'm working from the assumption that linchpin would give us access to *many* clouds at small scale, and the reason we still need an openstack driver is for our large-scale needs
22:49:40 <jeblair> and then, if folks want to use, i dunno, aws at our scale, maybe they write an aws driver
22:49:59 <SpamapS> +1 from me
22:50:07 <clarkb> I think linchpin is largely new to the larger audience. If it uses ansible to provision instances in clouds I'd personally write a specific driver for $cloud just having tried to use ansible for that in the past
22:50:28 <clarkb> you run into a lot of fun problems like what I've hit with random networking stuff in say citycloud
22:50:37 <SpamapS> may as well libcloud, right?
22:50:37 <mordred> jeblair: the way the oci driver tristanC put together opened up a way of thinking about that space I hadn't considered yet
22:50:44 <mordred> SpamapS: that doens't work
22:50:45 <SpamapS> would that be a nice perfect full circle?
22:50:53 <mordred> :)
22:51:07 <pabelanger> my only concern for nodepool drivers, what is our plan for testing them. Ansible I think we have a good hold on now, but nodepool driver for aws?
22:51:17 <mordred> jeblair: so I may have said linch-pin before, but there's some real nice possibilities with the simplicitiy of what's in that patch currently
22:51:21 <SpamapS> pabelanger: there's a free tier
22:51:49 <jeblair> pabelanger: i would definitely be looking for someone to sign up for ongoing test maintenance before we consider it.  :)
22:51:58 <mordred> jeblair: oh - I mean "may have said linch-pin before for oci" - not talking in general
22:51:59 <tristanC> pabelanger: it seems like what matter the most is a solid interface, then you can use what ever tool to do the implementation
22:52:15 <clarkb> tristanC: ++
22:52:29 <jeblair> pabelanger: but to bring that to the present...
22:52:36 <jeblair> pabelanger, tristanC: how can we test an oci driver?
22:52:58 <clarkb> jeblair: you have the nodepool driver test ssh into itselef and execute containers
22:53:03 <clarkb> it should be straightfoward to do that
22:53:05 <mordred> I think doing an linch-pin driver once we have a driver interface we're happy with, will open a door to folks experimenting. whether or not it winds up being a preferred way to get non-openstack resources or not will remain to be seen I'd guess
22:53:17 <mordred> jeblair: yah. what clarkb said
22:53:34 <jeblair> ok
22:53:41 <mordred> the oci driver makes little oci processes running ssh - so it by-and-large looks a bunch like what we get from vms
22:54:17 <jeblair> thanks for entertaining my questions even though i'm ignorant of the patch series contents.  that's some good background for me to have while i look at it.  :)
22:54:17 <tristanC> jeblair: the oci driver I proposed needs ssh access to the "hypervisor", and the runc program installed
22:54:19 <pabelanger> okay, we should be able to update nodepool dsvm for POC in that case
22:55:27 <tristanC> and well i wrote it that way to check the initial driver interface could work for other things than openstack/static node
22:56:04 <jeblair> going from 1 to 3 drivers should give us a pretty good triangulation on the api.  :)
22:56:09 <pabelanger> Oh, we could do a multinode job then for oci test
22:56:12 <mordred> yah- at first I was like "2 drivers" - but then it was helpful for looking at the api - yah
22:56:43 <clarkb> pabelanger: multinode not required (nor desireable I don't think)
22:56:52 <clarkb> pabelanger: single node will exercise all of the code just as well as multinode
22:57:09 <pabelanger> clarkb: agree
22:57:14 <fungi> yeah, trying to work out where multinode testing would even come into the picture
22:57:26 <fungi> seems unwarranted
22:57:48 <fungi> unless we somehow exceed the amount of resources we have available on one node
22:58:19 <fungi> which i think is extremely unlikely
22:58:30 <pabelanger> the SSH to hypervisor part made me think about multi node
22:58:39 <pabelanger> however, loopback should also work
23:00:05 <jeblair> this has been a good discussion
23:00:10 <jeblair> and we're out of time
23:00:14 <jeblair> tristanC: thanks!
23:00:21 <jeblair> and everyone else, thanks too
23:00:24 <jeblair> #endmeeting