22:05:40 <jeblair> #startmeeting zuul
22:05:41 <openstack> Meeting started Mon May  1 22:05:40 2017 UTC and is due to finish in 60 minutes.  The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot.
22:05:42 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
22:05:44 <openstack> The meeting name has been set to 'zuul'
22:05:52 <jeblair> #link agenda https://wiki.openstack.org/wiki/Meetings/Zuul
22:06:11 <jeblair> #topic  Status updates: Zuul test enablement
22:06:20 <jeblair> SpamapS is on a roll here
22:06:57 <jeblair> there are still some more remaining if folks are looking for fun and educational tasks
22:07:12 <jeblair> not sure there's much else to add on this
22:07:49 <jeblair> #topic Status updates: Zuul sample jobs
22:07:59 <jeblair> pabelanger: you restarted zuulv3-dev, right?
22:08:08 <pabelanger> yes, last week
22:08:53 <jeblair> pabelanger: what's the status with the sample jobs; should we be writing or reviewing anything?
22:09:14 <pabelanger> Also started on our first stdlib: https://review.openstack.org/#/c/459066/ was going to ping some people this week to bikeshed on it, specifically how we handle variables
22:10:06 <jeblair> oh cool, so we're ready to start reviewing that
22:10:20 <pabelanger> would like to engage bonnyCI folks see if they could use that if we merged it or not
22:10:30 <jeblair> i think shortly afterwords, we need to start thinking about how to make that reusable
22:10:52 <pabelanger> in what sense?
22:11:12 <jlk> yeah I SpamapS and I would like to get other BonnyCI folks engaged on the standard lib front
22:11:36 <pabelanger> 459066 should be a simple one to tackle first
22:11:45 <pabelanger> it has default users, and ports for zuul_stream
22:11:47 <jeblair> whether it should go in a new repo, or be installed with zuul as a special case.  the "playbooks/" directory within zuul is not necessarily the right place for it (it will contain playbooks for running and testing zuul, not necessarily things that should be in the stdlib)
22:12:14 <pabelanger> agree
22:12:26 <jeblair> (i don't think we need to block on that -- just indicating that it's destined for the stdlib vs something else is good enough for us to iterate on it for now)
22:13:32 <jeblair> jlk: ++
22:13:46 <jeblair> anything else folks should be aware of on this?
22:14:15 <jeblair> #info starting to organize "standard library" roles/playbooks in https://review.openstack.org/459066
22:15:22 <jeblair> #topic Status updates: Github
22:15:51 <jlk> oh hey, that's me.
22:16:07 <jeblair> jlk, mordred, and I had a significant conversation about what the config syntax should look like last week
22:16:14 <jlk> we had an excellent chat on Friday about the patch set, and I got some direction to move in
22:16:28 <jeblair> (that should be archived by eavesdrop.openstack.org for posterity)
22:16:46 <jlk> I've made a lot of progress on that front, another pass of rebasing on top of the earlier changes
22:17:06 <jlk> I should have the whole patch set ready to put up for review again later today if I'm lucky.
22:17:12 <jeblair> neat!
22:17:43 <jeblair> i'll continue to try to review down the tip of teh stack (i've only gotten about 7 patches in so far), as well as loop back to the top as things are addressed
22:17:55 <jlk> and jamielennox is going to have an implementation of zv3 with our github patches running hooked up to github, so we can do some extra validation.
22:18:35 <jeblair> jlk, jamielennox: great!  (caveat: there's still some known incorrect behavior and missing security bits, but i think you know that)
22:19:02 <jlk> yup!
22:19:38 <jeblair> mordred has been chiming in on reviews too; if anyone else wants to review, please feel free to do so, and let me know if you want me to make sure we wait for your reviews before merging
22:20:57 <jeblair> #info github series is being rebased and revised to address review comments
22:21:04 <jeblair> anything else on this?
22:21:58 <jeblair> #topic Status updates: Bubblewrap
22:22:25 <jeblair> SpamapS, pabelanger: can one of you fill us in on the latest here?
22:23:12 <fungi> i heard we're no longer concerned about whether the ubuntu ppa system is (still) working
22:23:26 <pabelanger> mordred suggested we might use the PPA now that vhd-utils built over the weekend
22:23:26 <jeblair> fungi: yeah, apparently it's working now?  so ppas are an option?
22:23:40 <SpamapS> I have nothing new to report
22:24:01 <pabelanger> I am not sure if anybody is working on backporting that or not
22:24:02 <SpamapS> if PPA's are in fact building, then I'd suggest we just use a PPA
22:24:11 <fungi> so maybe uploading the source package from stretch or yakkety to rebuild would be an option
22:24:12 <SpamapS> until xenial-backports gets bubblewrap
22:24:18 <SpamapS> which I do expect to happen at some point
22:24:44 <pabelanger> I can see about doing an upload, to my own PPA first
22:24:49 <jeblair> okay, ppa until xenial-backports sounds like it's an option
22:25:05 <fungi> i have a feeling pushing that source package into a ppa would "just work[tm]" since the claimed deps are already met on xenial
22:25:06 <SpamapS> There's a script in ubuntu-dev-scripts that will do it for you even
22:25:33 <pabelanger> So, has anybody seen this: https://launchpad.net/~ansible/+archive/ubuntu/bubblewrap
22:25:34 <SpamapS> backportpackage -d xenial -u ppa:foo bubblewrap <-- that's all you need
22:25:38 <jeblair> can we use the same ppa we use for vhd-util?
22:25:51 <pabelanger> which is xenial package
22:25:59 <pabelanger> maybe we can just work with ansible on it
22:26:05 <jeblair> i have not seen that.
22:26:12 <SpamapS> pabelanger: that's also under Ansible's control. So it may not match what infra wants.
22:26:13 <pabelanger> ppa:ansible/bubblewrap
22:26:21 <SpamapS> but it's likely exactly the same thing
22:26:37 <pabelanger> right, but we could audit and copy into our PPA
22:26:49 <jeblair> btw, are we positive that's really ansible?
22:27:01 <fungi> i think we should create a new ppa under the same team
22:27:05 <fungi> link at https://launchpad.net/~openstack-ci-core
22:27:20 <jeblair> fungi: that seems safest and most stable
22:27:33 <fungi> the existing ppa is "VHD Util" which is probably surprising if it contains a package set for bubblewrap
22:27:35 <pabelanger> ya, we can check GPG keys first
22:28:07 <SpamapS> literally make a PPA and run that backportpackage command.. that's all you need to do
22:28:09 <fungi> #link https://launchpad.net/~openstack-ci-core create a new ppa here, adjacent to "VHD Util"
22:28:49 <fungi> solution so simple, it's probably time to stop debating it until we find out there's a problem ;)
22:28:51 <SpamapS> Can verify that the source package in the ansible PPA is a no-change backport
22:28:59 <jeblair> should we call the ppa 'zuul-deps' or 'bublewrap'?
22:29:14 <SpamapS> would be safe to just copy that into a PPA and we don't have to wait for build
22:29:46 <fungi> copying their ppa to one under our control seems fine to me. relying on their ppa indefinitely does not
22:30:02 <jeblair> SpamapS: by "Can verify" do you mean that you have verified or that one is able to verify?
22:30:25 <fungi> but we have the source package they used and assume the canonical build infrastructure is reliable, so can trust what's currently there at least
22:30:44 <SpamapS> http://paste.openstack.org/show/608551/
22:30:57 <SpamapS> jeblair: have verified
22:31:06 <jeblair> cool
22:31:21 <fungi> also neat that they seem to already be using or otherwise want it
22:31:33 <fungi> (assuming tat's really ansible)
22:31:35 <jeblair> #action pabelanger create new ppa under openstack-ci-core and add bubblewrap to it; copy existing source package from ansible bubblewrap ppa.
22:31:41 <jeblair> does that look correct ^?
22:32:10 <fungi> sure
22:32:11 <SpamapS> we don't, however, have any proof that launchpad.net/~ansible _is_ ansible. That GPG key is single-purpose for debs
22:32:19 <SpamapS> but again, safe to copy the debs for speed
22:32:34 <jeblair> yep.  out of curiosity, i'll ask 'em next time i see 'em.  :)
22:32:45 <SpamapS> jeblair: indeed, tell them to sign their keys ;)
22:33:13 <fungi> (and maybe also add contact info to that lp profile)
22:33:23 <pabelanger> I am guess it is driven by jenkins, based on SSH keys added to https://launchpad.net/~ansible
22:33:28 <fungi> yup
22:33:32 <fungi> that was my guess as well
22:33:41 <jeblair> hopefully we can move this along quickly now -- we have patches we should be able to test as soon as the ppa is available
22:33:43 <SpamapS> fungi: the GPG keys list security@ansible.com as their email, that counts.
22:33:53 <SpamapS> jeblair: indeed!
22:34:02 <SpamapS> I started poking at seccomp
22:34:02 <jeblair> cool, this has been productive, thanks :)
22:34:20 <pabelanger> I'm happy to give backportpackage a go too, if I can find it for fedora :)
22:34:25 <jeblair> SpamapS: what does that entail?
22:34:35 <fungi> SpamapS: yeah, i suppose openpgp uids are a relevant contact ;)
22:34:41 <SpamapS> jeblair: stracing ansible-playbook and then locking down to just those syscalls
22:34:55 <SpamapS> there's a tool to generate a binary file to feed into bwrap
22:35:01 <fungi> though one of them lists security@, the other jenkins@
22:35:36 <fungi> pabelanger: you could debootstrap an ubuntu chroot... ;)
22:35:49 <SpamapS> auditd will also log seccomp for you if you ask it to
22:35:55 <jeblair> SpamapS: neat... think it'll be workable/useful?
22:36:22 <jeblair> (there's probably some number of file and network operations we'll have to allow in order to have useful untrusted playbooks)
22:36:36 <pabelanger> fungi: it looks to be python. Going to clone source
22:37:05 <SpamapS> jeblair: yes I do, it's pretty straight forward whitelisting of behavior.
22:37:20 <jeblair> yay!
22:37:37 <jeblair> anything else on this topic then?
22:37:58 <SpamapS> (you can blacklist too, but that's for small minds, and we are HUGE minds.. YUGE)
22:38:14 <jeblair> oy
22:38:20 <jeblair> that's my cue
22:38:26 <jeblair> #topic nodepool image testing spec (pabelanger)
22:39:24 <pabelanger> so, I wanted to talk a little about if possible
22:39:41 <pabelanger> basically, over the last 3 weeks we worked in openstack-infra to remove puppet from our images
22:40:06 <pabelanger> it became obviously pretty fast, it was a little painful to ensure our image didn't break as we started deleting puppet things
22:40:44 <pabelanger> I know this was a topic in the past, but was curious if we could maybe spend some time talking about this
22:41:17 <pabelanger> as a operator, it is hard to stage images in nodepool before they go live
22:41:28 <pabelanger> and not sure how to make this easier moving forward
22:41:37 <jeblair> #link nodepool image testing spec https://review.openstack.org/#/c/139598/
22:41:48 <jeblair> pabelanger: well, the *best* outcome would be gate testing of images, right?
22:42:16 <jeblair> then we can stop problems before they start
22:42:38 <pabelanger> Right, that is something I am trying to do now. Hence my recent changes to project-config
22:42:56 <pabelanger> but, it would mean running nodepool dsvm against project-config, maybe?
22:43:18 <clarkb> and tripleo and puppet openstack and osa and ....
22:43:32 <jeblair> pabelanger: yep; probably with file restrictions since it's so heavyweight.  or in experimental pipeline.
22:43:36 <pabelanger> however, thinking of general zuul users, I am not sure if users will do that or expect some sort of audit in nodepool / zuul
22:43:48 <jeblair> clarkb: i don't think we use tripleo to build images?
22:44:14 <jeblair> pabelanger: i think many zuul users don't have nearly so complex images.  to be fair, we're trying to have simpler images too.
22:44:18 <clarkb> jeblair: the problem isn't the build failing though its tripleo failing when it runs on the image
22:44:22 <SpamapS> If it's really really heavy weight, can also be done as a more stable-release paradigm.. let post-commit jobs test all the combos and tag good images when things stabilize.
22:44:27 <clarkb> so just checking if dib works or not isn't going to get us much
22:44:35 <SpamapS> s/post-commit/periodic/
22:45:02 <jeblair> clarkb: that situation is pretty untenable from a post-image build testing situation as well.  i think an image-validity check has to be relatively low cost.
22:45:10 <clarkb> jeblair: yup I agree
22:45:40 <clarkb> problem is I don't know that such a thing exists. We could start with something arbitrary like jenkins and zuul users exist and then tack on things as we go
22:45:59 <SpamapS> v3's secrets thing comes in handy here..
22:46:03 <SpamapS> "it uploads and boots"
22:46:26 <pabelanger> I think most of this could be covered by a pre-task in zuulv3, ensure directories exist, proper user permissions, etc from DIB creation. But, is that something we'd want all jobs to run? or just 1 time after we upload / launch a image
22:46:34 <fungi> though that eats into your quota unless you set aside a separate tenant
22:46:42 <jeblair> pabelanger: how does a pre-task help?
22:47:11 <pabelanger> jeblair: I might be thinking wrong, but a job will not fail is a pre-task does?
22:47:15 <clarkb> pabelanger: or even make it part of the dib build itself? and inspect the chroot?
22:47:32 <jeblair> pabelanger: right, it will run again.  but it will fail the second and third times as well.
22:47:37 <jeblair> pabelanger: and then the job will fail.
22:47:43 <SpamapS> fungi: can be a very small flavor at least.
22:47:50 <pabelanger> clarkb: I'd love to use the ansible chroot connection to audit, is we want to do that
22:47:58 <fungi> something a little less error-prone than when we ran devstack smoke on new images before putting them into use ;)
22:48:00 <adam_g> we've been embedding some basic validation checks into an elements post-install.d phase, so we can fail the build and prevent nodepool from uploading and using a broken image
22:48:09 <pabelanger> jeblair: k, I wasn't sure if we limited on pre-task failures
22:48:38 <jeblair> pabelanger: well, what i'm saying is that by the time zuul runs the job, if it's failing because the image is broken, there's no way to fix that.
22:48:48 <jeblair> adam_g: what kind of things do you check?
22:48:52 <clarkb> pabelanger: I don't know that you'd need to specify the tool. Just have a run stage in dib for checking things then each element can publish its own arbitrary checkers
22:48:54 <fungi> adam_g: yeah, self-checking elements seem like a great idea
22:49:10 <clarkb> or just use post-install.d
22:49:14 * clarkb catches up to adam_g
22:49:35 <adam_g> jeblair: atm just some things that have broken in recent DIB releases (a functioning python, specifically).. that zuul-cloner at least loads and spits its usage, etc.
22:49:43 <jeblair> adam_g: good ideas
22:50:12 <adam_g> a proper validate-image.d or similar phase in DIB would be preferable
22:50:48 <adam_g> oh also, we then monitor nodepool image age and alert if we dont have an image newer than $foo, so we know when builds start failing
22:50:59 <SpamapS> adam_g: very easy to add phases
22:51:22 <SpamapS> and I think it's a good ide
22:51:24 <SpamapS> a
22:51:39 <fungi> we also do test our dib elements in the nodepool integration job, so can easily reject changes which would cause images to fail such self-validation
22:51:56 <SpamapS> maybe have those checks inside the chroot but after everything else including cleanup
22:52:36 <clarkb> fungi: I don't think we do a full build with our elements, its minimal
22:53:12 <pabelanger> So, if we want to do this pre commit, that is not a problem. I already have some thoughts / plans to help make project-config better. I mostly wanted to see if we want to add this functionality native into nodepool / zuul as an audit / testing stage
22:53:13 <jeblair> clarkb: because of git clones?
22:53:26 <fungi> clarkb: oh, well, i suppose we could require that at least those minimal elements pass their validation
22:53:33 <pabelanger> then have operator either manually promote images or auto promote if the audit passed
22:53:36 <jeblair> pabelanger: i feel like with our limited developer bandwidth at the moment, the best bang for the buck is gate jobs and dib validation.  i think that nodepool image validation is workable and i'm not opposed to implemeting it (i think that spec was heading in a good direction), but from the openstack-infra operator perspective, i don't think it's critical to our workflow if we are able to do the other checks.
22:54:21 <fungi> it does seem like something an industrious downstream consumer could contribute if they need it
22:54:24 <pabelanger> fungi: yes, we can add nodepool-dsvm job to project-config, but full builds today are 1hr+
22:54:39 <pabelanger> jeblair: okay, that works for now. I'll work on this from openstack-infra POV today
22:54:42 <clarkb> jeblair: our builds take like an hour right now with hot cache
22:54:52 <clarkb> not sure how feasible a full build in integration job is as a rsult
22:55:02 <pabelanger> if we drop cache-devstack / openstack-repos (what I am testing) it is about 20mins
22:55:33 <fungi> again, though, if this is because of iterating over all 1.7k (or however many we're up to now) git repos to see whether they need to be updated, that's likely to get much better if we can drop git caches from our im ages with zuul v3
22:55:39 <jeblair> clarkb: i'd say 1 hour is in the realm of feasible, certainly for an experimental pipeline job.
22:55:59 <clarkb> jeblair: it will be longer on test nodes with a cold cache
22:56:07 <jeblair> clarkb: test nodes have a warm cache :)
22:56:19 <pabelanger> sure, but today there is only 2 things that depend on openstack cached repos at build time, zuul-env and tempest-env
22:56:34 <pabelanger> which we could update to git clone from another source
22:56:48 <fungi> we should probably look into moving the tempest-env addition to job runtime
22:57:15 <jeblair> pabelanger: agreed, then the only issue becomes how important it is that we validate the cache on image builds (maybe it's not that important)
22:57:29 <fungi> creating zuul-env at job runtime is more likely to be a catch-22 but maybe we make it stop requiring a local cache
22:57:43 <pabelanger> jeblair: yes, so far, that hasn't been our issues
22:58:11 <pabelanger> unless repo didn't mirror from gerrit to git.o.o properly
22:58:22 <jeblair> pabelanger: yeah, i can get behind severing those deps and dropping the repo cache from the gate testing
22:58:49 <jeblair> fungi: pabelanger's proposed change just clones master directly, which i think is reasonable.  it's functionally the same as what we're doing now.
22:58:49 <fungi> that totally brings the runtime into a reasonable realm for check jobs, in my opinion
22:59:11 <jeblair> fungi: yeah, we're into the territory where we can run it on a wider variety of changes
22:59:17 <clarkb> but if all we check is can it build we haven't done much to fix real problems?
22:59:28 <clarkb> if something fails to build we continue to use old image happily
22:59:36 <fungi> if the elements incorporate validation
22:59:43 <pabelanger> next step would be post build auditing
22:59:46 <fungi> i think was the assumption
22:59:54 <clarkb> fungi: if elements incorporate validation we can likely even defer that to build time too
22:59:56 <clarkb> for the same reason
23:00:08 <clarkb> (though in check it will make review easier)
23:00:09 <jeblair> or upload the image somewhere (as SpamapS suggested, or to a devstack, if we're really patient)
23:00:21 <jeblair> we're out of time
23:00:28 <fungi> well, if we defer validation to production build time (is that what you mean?) then we get to merge changes which break image builds
23:00:33 <jeblair> thanks everyone!
23:00:39 <clarkb> fungi: yes but don't impact production
23:00:41 <fungi> thanks jeblair!
23:00:44 <jeblair> #endmeeting