22:03:44 #startmeeting zuul 22:03:45 Meeting started Mon Jun 5 22:03:44 2017 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 22:03:46 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 22:03:49 The meeting name has been set to 'zuul' 22:04:14 #link previous meeting http://eavesdrop.openstack.org/meetings/zuul/2017/zuul.2017-05-22-22.04.html 22:04:22 #link agenda https://wiki.openstack.org/wiki/Meetings/Zuul 22:04:30 #topic Actions from last meeting 22:04:43 pabelanger create zuul-base-jobs and zuul-jobs repos 22:04:53 they should be created 22:05:02 w00t 22:05:17 confirmed 22:05:27 i think reviews there have probably taken a back seat to the server work 22:05:35 o/ 22:05:41 yes, I haven't started migrating anything into them as of yet 22:05:55 likely after new servers are online I can switch back to that 22:05:58 #topic Status updates: Zuul sample jobs 22:06:03 let's just segue into that... 22:06:18 since i think this rightly takes a back seat to the new server work right now 22:06:38 since we've just landed some changes that will prevent us running zuulv3 on the current dev server if we restart it 22:06:47 ya, I haven't done much work over the last week on jobs 22:06:57 * mordred waves 22:07:11 zuulv3.o.o is online, working on ze01.o.o next 22:07:41 it makes sense to me to prioritize getting the xenial server up, then resume work on the jobs. 22:07:44 there's also one more config-breaking change that i think we need to land before we can really push on this 22:08:01 wfm, I plan on focusing on ze01 tomorrow 22:08:01 it's the one that should let us more easily share repos across zuul installs 22:08:17 oooo like, puppetized zuulv3.o.o ? 22:08:19 that's nice. :) 22:08:26 (eg, share repos between zuul and bonnyci, etc) 22:08:30 SpamapS: yep! 22:08:44 python3 too 22:09:04 * SpamapS is playing with trying to get jlk's containerization work in zuulv3 deployed into Bluemix's shiny new k8s service. 22:09:14 (in order to share repos effictively, we need to be able to indicate that we only want to load certain config objects from certain repos, eg, bonnyci does not want to load our pipeline config. this is also important for third-party ci) 22:10:16 jeblair: ++ 22:10:20 great 22:10:41 #info zuul now requires bubblewrap which needs at lesat ubuntu xenial, and will require python3 soon 22:10:46 #info work on puppetized production xenial zuulv3.openstack.org is in progress 22:11:19 469994 could use a review 22:11:25 that is the puppet executor part 22:11:39 system-config is ready 22:12:08 #info effective use of sample jobs requires a change to zuul to specify which config objects should be loaded per-project 22:12:14 #info work on sample jobs will resume when xenial server work and repo-config-object change is complete 22:12:39 I have to step away for 20mins or so 22:12:47 pabelanger: thanks! 22:13:01 #topic Status updates: Zuul test enablement 22:13:16 i know there are some outstanding changes in need of review 22:13:36 I don't think we have any left that are straight forward 22:13:58 i hope to review them soon; i've just been trying to push some production-blockers through 22:14:18 one of which might (depending on how you count) dispose of a number of tests 22:14:28 (it deals with the cloner tests) 22:15:27 i hope to have this bubble back up to the top of my list soon 22:15:34 anything else related to test enablement? 22:15:51 (as usual, if anyone is working on a test and has questions, feel free to ping me) 22:15:57 The sshkey one is a bit confusing to me 22:16:13 SpamapS: which one is that? 22:16:21 jeblair: test_timer_sshkey 22:16:33 it tries to find an SSH wrapper for gerrit when a job is started via timer 22:16:42 but the code that makes ssh wrappers is gone? or something? I'm not sure. 22:17:28 I've bounced off of it twice. 22:17:30 SpamapS: hrm, i'll look at that post-meeting 22:17:42 yeah it's the only one I'm still trying to tackle 22:17:53 the others are related to your branch work and the cloner work 22:18:12 (I think if you count all in-flight we're down to 5 that have not been addressed) 22:18:29 \o/ 22:18:49 #topic Status updates: Github 22:19:03 i propose we drop this from the fixed agenda, on account of the branch has landed :) 22:19:47 (there's still more work to do, but we have something that, if not working, at least passes tests :) 22:19:51 Seconded. 22:20:41 motion carries, or something 22:20:42 #topic Status updates: Bubblewrap 22:20:46 ditto :) 22:21:21 i'm not sure anyone has run this in production yet, and we won't until we have the xenial server 22:21:22 There is one minor topic here.. which is that with bubblewrap landing, we've made the zuul-executor difficult to run in a container 22:21:24 but that's rsn. 22:21:48 omg, github AND bubblewrap landed? it's like christmas 22:21:50 SpamapS: it currently needs to be privileged container? 22:21:53 Containers running zuul-executor need to be run privileged now. 22:22:11 Or you can use the nullwrap driver and drop bubblewrap 22:22:14 jamielennox: welcome back! 22:22:18 o/ 22:22:39 have internet again 22:23:02 SpamapS: how onerous a requirement would you say that is? 22:23:10 they also need port 79/tcp right? 22:23:15 SpamapS: also do you know exactly why? 22:23:16 So, that's why I started playing with deploying in Bluemix's k8s 22:23:22 pabelanger: no, that's configurable now 22:23:24 Which does allow your pods to be privileged. 22:23:30 if zuul-executor needs to contains it's own execution, what's the issue with it running in a privileged container? 22:23:39 Shrews: cool, wasn't following that. thanks 22:23:58 tristanC: that would make it harder to run in a public k8s service 22:24:01 i guess the issue is that it becomes a nested containerization problem? 22:24:04 tristanC: some container services don't allow you to run privileged. 22:24:07 SpamapS: oh -it does? neat 22:24:18 mordred: yeah, since your cluster is made up of VMs that you own. 22:24:39 Anyway, it's just something I wanted to make sure we socialize. 22:24:44 SpamapS: sweet. that should buy a little time- I had some thoughts on this topic over the weekend I need to vomit out somewhere so that you can comment on them 22:25:13 using nullwrap driver seems to be a good compromise for unprivileged containers 22:25:21 pabelanger: compromise being the operative word 22:25:23 would it make sense to run bwrap in a container? 22:25:27 There's a thought I have also which is to possibly have untrusted-only and trusted-only executors which would make running w/ nullwrap possibly safer. 22:25:43 but it's more of a high-scale high-class problem so I'll keep stewing on it. 22:25:44 then how difficult would it be to run a zuul-executor container for trusted config, and another one for untrusted config? 22:26:07 to move the nested logic one layer up... 22:26:17 pabelanger: it definitely makes sense to run bwrap in a container, since the point is to shield executor data from untrusted playbooks. 22:26:24 tristanC, SpamapS: jobs can have trusted and untrusted content in the same job 22:26:35 mordred: I know, that's why this is still stewing. :) 22:26:48 SpamapS: right, I see what you are saying now 22:26:50 SpamapS: stew is tasty 22:26:56 and chunky 22:27:30 (the use case for that is: consider a user may want to take the logs from an untrusted job and put them somewhere that requires special access to write to) 22:27:37 (it's not farfetched :) 22:28:30 SpamapS: i guess the action item at the moment is: once we write the operational docs, make sure we treat this subject. 22:29:22 jeblair: agreed 22:29:34 i expect docs will be a standing agenda item in the near future... :) 22:29:58 let's move on to... 22:30:03 #topic Progress summary 22:30:24 so the board has been getting almost no attention from me lately 22:30:27 #link https://storyboard.openstack.org/#!/board/41 22:30:30 i got a lot of emails from my cron script 22:30:35 so *something* has been happening :) 22:31:01 Yeah we're using storyboard for sure 22:31:10 #action triage new lane in storyboard 22:31:14 #undo 22:31:15 Removing item from minutes: #action triage new lane in storyboard 22:31:17 #action jeblair triage new lane in storyboard 22:31:21 Lots of New stuff 22:32:12 and yeah, for me i've just been mostly head down trying to push stuff through :) 22:33:10 #topic nodepool-drivers status 22:33:13 I do think we may want to think about pruning the board down to just blockers for zuulv3 going into prod for infra 22:33:33 #unfo 22:33:36 #undo 22:33:37 Removing item from minutes: #topic nodepool-drivers status 22:33:41 SpamapS: yeah, we may be losing focus a bit there 22:33:50 SpamapS: ++ 22:34:00 maybe a new lane or a new board? 22:34:27 I think a new tag and splitting of the things is in order 22:34:57 * jeblair updates a task he's working on 22:35:00 zuulv30 maybe? like, zuulv3 is "zuulv3 running jobs in infra" and zuulv30 is "we finished the spec" ? 22:36:33 SpamapS: can we boolean 'and' the tags? 22:36:38 i can't remember if a board can do that 22:37:04 zuulv3 and openstack-infra or something might be nice... 22:37:15 worth looking into 22:37:17 anyway, whatever it takes, that sounds good to me. :) 22:37:25 i think as long as the query itself works you should be able to use it in an automatic worklist or board situation too 22:37:49 ok, let's move on to nodepool drivers 22:37:51 #topic nodepool-drivers status 22:38:01 tristanC: i think you added this? 22:38:09 tristanC: care to introduce it? 22:38:23 indeed, well this is related to the nodepool-drivers spec 22:38:47 i *tried* to hack something together to get static node support in nodepool 22:38:47 * fungi notes that the spec in question has no assignees nor a storyboard story 22:39:28 fungi: *cough* https://review.openstack.org/#/c/468623/ :) 22:39:45 awesome! rebasing my help-wanted change on top of that now 22:39:52 cool 22:40:02 so basically, if this what was expecting, then I could continue on this and start adding test 22:40:10 tristanC: any chance you've been collaborating with smyers (also a RH person) on that? his team is very interested in that as well 22:40:28 we can also add tristanC and smeyers if he's interested, (and maybe tobiash after he gets back?) to the spec as assignees too 22:41:33 Shrews: no, this is a one off change done over the week end, I can contact him for follow-ups 22:41:54 for my part, i very much want to look this over, but haven't had a chance yet. i'm feeling some urgency to try to get some things out of the way that are blocking running zuulv3 in production (which should enable more folks to start working on other facets of using zuulv3 in openstack infra), so i've been trying to focus on that. 22:42:11 I liked where the implementation of things started but then as the changes were added on top I felt that the driver api became very muddied. I think we just need to carefully write down some api contracts and stick with them until we decide they are wrong. (tristanC seemed to agree in review so not very controversial just wanted to point it out to bigger audience) 22:42:34 basically, i'm trying to find a balance between staying focused on what we need to get into production asap, but i also don't want to block follow-on efforts for too long 22:42:36 sounds like a reasonable approach 22:43:27 that goes back to the /41 board 22:43:58 if we can push some of the things to "later", I think we're pretty close to saying zuulv3 does all the things infra needs 22:43:59 clarkb: I agree with your statement 22:45:02 SpamapS: yeah, i feel like we ought to be able to clear my urgent list by the end of the week 22:45:27 * mordred agrees with jeblair and SpamapS 22:45:34 tristanC: can you give me till the end of the week to review your work so far? 22:45:36 jeblair: maybe if you could stop uncovering race conditions we could ship it. ;) 22:45:59 we're going to need a bigger broom? 22:46:21 tristanC: that might be a good opportunity to try to loop smeyers into the process as well 22:46:22 clarkb: pabelanger: agreed, though this is quite difficult to do without disrupting too much the openstack driver 22:46:26 should care be taken for linchpin driver when we are working on the nodepool driver? eg: some sort of ansible driver from nodepool? 22:46:37 jeblair: sure thing! 22:47:12 well to be honest, I stoped when the static and oci driver started to work, I'm sure the interface can be improved 22:47:51 pabelanger: yeah, i think linchpin is probably what smyers would be thinking of adding to it 22:48:07 possibly dumb question, but since folks are here, is "oci" something that can/should be handled by linchpin, or does it need its own driver? 22:49:08 jeblair: that's one question I had, are we expecting people to write nodepool (python logic) drivers each time something new comes along or maybe abstract some of it via ansible 22:49:15 i'm working from the assumption that linchpin would give us access to *many* clouds at small scale, and the reason we still need an openstack driver is for our large-scale needs 22:49:40 and then, if folks want to use, i dunno, aws at our scale, maybe they write an aws driver 22:49:59 +1 from me 22:50:07 I think linchpin is largely new to the larger audience. If it uses ansible to provision instances in clouds I'd personally write a specific driver for $cloud just having tried to use ansible for that in the past 22:50:28 you run into a lot of fun problems like what I've hit with random networking stuff in say citycloud 22:50:37 may as well libcloud, right? 22:50:37 jeblair: the way the oci driver tristanC put together opened up a way of thinking about that space I hadn't considered yet 22:50:44 SpamapS: that doens't work 22:50:45 would that be a nice perfect full circle? 22:50:53 :) 22:51:07 my only concern for nodepool drivers, what is our plan for testing them. Ansible I think we have a good hold on now, but nodepool driver for aws? 22:51:17 jeblair: so I may have said linch-pin before, but there's some real nice possibilities with the simplicitiy of what's in that patch currently 22:51:21 pabelanger: there's a free tier 22:51:49 pabelanger: i would definitely be looking for someone to sign up for ongoing test maintenance before we consider it. :) 22:51:58 jeblair: oh - I mean "may have said linch-pin before for oci" - not talking in general 22:51:59 pabelanger: it seems like what matter the most is a solid interface, then you can use what ever tool to do the implementation 22:52:15 tristanC: ++ 22:52:29 pabelanger: but to bring that to the present... 22:52:36 pabelanger, tristanC: how can we test an oci driver? 22:52:58 jeblair: you have the nodepool driver test ssh into itselef and execute containers 22:53:03 it should be straightfoward to do that 22:53:05 I think doing an linch-pin driver once we have a driver interface we're happy with, will open a door to folks experimenting. whether or not it winds up being a preferred way to get non-openstack resources or not will remain to be seen I'd guess 22:53:17 jeblair: yah. what clarkb said 22:53:34 ok 22:53:41 the oci driver makes little oci processes running ssh - so it by-and-large looks a bunch like what we get from vms 22:54:17 thanks for entertaining my questions even though i'm ignorant of the patch series contents. that's some good background for me to have while i look at it. :) 22:54:17 jeblair: the oci driver I proposed needs ssh access to the "hypervisor", and the runc program installed 22:54:19 okay, we should be able to update nodepool dsvm for POC in that case 22:55:27 and well i wrote it that way to check the initial driver interface could work for other things than openstack/static node 22:56:04 going from 1 to 3 drivers should give us a pretty good triangulation on the api. :) 22:56:09 Oh, we could do a multinode job then for oci test 22:56:12 yah- at first I was like "2 drivers" - but then it was helpful for looking at the api - yah 22:56:43 pabelanger: multinode not required (nor desireable I don't think) 22:56:52 pabelanger: single node will exercise all of the code just as well as multinode 22:57:09 clarkb: agree 22:57:14 yeah, trying to work out where multinode testing would even come into the picture 22:57:26 seems unwarranted 22:57:48 unless we somehow exceed the amount of resources we have available on one node 22:58:19 which i think is extremely unlikely 22:58:30 the SSH to hypervisor part made me think about multi node 22:58:39 however, loopback should also work 23:00:05 this has been a good discussion 23:00:10 and we're out of time 23:00:14 tristanC: thanks! 23:00:21 and everyone else, thanks too 23:00:24 #endmeeting