22:05:40 #startmeeting zuul 22:05:41 Meeting started Mon May 1 22:05:40 2017 UTC and is due to finish in 60 minutes. The chair is jeblair. Information about MeetBot at http://wiki.debian.org/MeetBot. 22:05:42 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 22:05:44 The meeting name has been set to 'zuul' 22:05:52 #link agenda https://wiki.openstack.org/wiki/Meetings/Zuul 22:06:11 #topic Status updates: Zuul test enablement 22:06:20 SpamapS is on a roll here 22:06:57 there are still some more remaining if folks are looking for fun and educational tasks 22:07:12 not sure there's much else to add on this 22:07:49 #topic Status updates: Zuul sample jobs 22:07:59 pabelanger: you restarted zuulv3-dev, right? 22:08:08 yes, last week 22:08:53 pabelanger: what's the status with the sample jobs; should we be writing or reviewing anything? 22:09:14 Also started on our first stdlib: https://review.openstack.org/#/c/459066/ was going to ping some people this week to bikeshed on it, specifically how we handle variables 22:10:06 oh cool, so we're ready to start reviewing that 22:10:20 would like to engage bonnyCI folks see if they could use that if we merged it or not 22:10:30 i think shortly afterwords, we need to start thinking about how to make that reusable 22:10:52 in what sense? 22:11:12 yeah I SpamapS and I would like to get other BonnyCI folks engaged on the standard lib front 22:11:36 459066 should be a simple one to tackle first 22:11:45 it has default users, and ports for zuul_stream 22:11:47 whether it should go in a new repo, or be installed with zuul as a special case. the "playbooks/" directory within zuul is not necessarily the right place for it (it will contain playbooks for running and testing zuul, not necessarily things that should be in the stdlib) 22:12:14 agree 22:12:26 (i don't think we need to block on that -- just indicating that it's destined for the stdlib vs something else is good enough for us to iterate on it for now) 22:13:32 jlk: ++ 22:13:46 anything else folks should be aware of on this? 22:14:15 #info starting to organize "standard library" roles/playbooks in https://review.openstack.org/459066 22:15:22 #topic Status updates: Github 22:15:51 oh hey, that's me. 22:16:07 jlk, mordred, and I had a significant conversation about what the config syntax should look like last week 22:16:14 we had an excellent chat on Friday about the patch set, and I got some direction to move in 22:16:28 (that should be archived by eavesdrop.openstack.org for posterity) 22:16:46 I've made a lot of progress on that front, another pass of rebasing on top of the earlier changes 22:17:06 I should have the whole patch set ready to put up for review again later today if I'm lucky. 22:17:12 neat! 22:17:43 i'll continue to try to review down the tip of teh stack (i've only gotten about 7 patches in so far), as well as loop back to the top as things are addressed 22:17:55 and jamielennox is going to have an implementation of zv3 with our github patches running hooked up to github, so we can do some extra validation. 22:18:35 jlk, jamielennox: great! (caveat: there's still some known incorrect behavior and missing security bits, but i think you know that) 22:19:02 yup! 22:19:38 mordred has been chiming in on reviews too; if anyone else wants to review, please feel free to do so, and let me know if you want me to make sure we wait for your reviews before merging 22:20:57 #info github series is being rebased and revised to address review comments 22:21:04 anything else on this? 22:21:58 #topic Status updates: Bubblewrap 22:22:25 SpamapS, pabelanger: can one of you fill us in on the latest here? 22:23:12 i heard we're no longer concerned about whether the ubuntu ppa system is (still) working 22:23:26 mordred suggested we might use the PPA now that vhd-utils built over the weekend 22:23:26 fungi: yeah, apparently it's working now? so ppas are an option? 22:23:40 I have nothing new to report 22:24:01 I am not sure if anybody is working on backporting that or not 22:24:02 if PPA's are in fact building, then I'd suggest we just use a PPA 22:24:11 so maybe uploading the source package from stretch or yakkety to rebuild would be an option 22:24:12 until xenial-backports gets bubblewrap 22:24:18 which I do expect to happen at some point 22:24:44 I can see about doing an upload, to my own PPA first 22:24:49 okay, ppa until xenial-backports sounds like it's an option 22:25:05 i have a feeling pushing that source package into a ppa would "just work[tm]" since the claimed deps are already met on xenial 22:25:06 There's a script in ubuntu-dev-scripts that will do it for you even 22:25:33 So, has anybody seen this: https://launchpad.net/~ansible/+archive/ubuntu/bubblewrap 22:25:34 backportpackage -d xenial -u ppa:foo bubblewrap <-- that's all you need 22:25:38 can we use the same ppa we use for vhd-util? 22:25:51 which is xenial package 22:25:59 maybe we can just work with ansible on it 22:26:05 i have not seen that. 22:26:12 pabelanger: that's also under Ansible's control. So it may not match what infra wants. 22:26:13 ppa:ansible/bubblewrap 22:26:21 but it's likely exactly the same thing 22:26:37 right, but we could audit and copy into our PPA 22:26:49 btw, are we positive that's really ansible? 22:27:01 i think we should create a new ppa under the same team 22:27:05 link at https://launchpad.net/~openstack-ci-core 22:27:20 fungi: that seems safest and most stable 22:27:33 the existing ppa is "VHD Util" which is probably surprising if it contains a package set for bubblewrap 22:27:35 ya, we can check GPG keys first 22:28:07 literally make a PPA and run that backportpackage command.. that's all you need to do 22:28:09 #link https://launchpad.net/~openstack-ci-core create a new ppa here, adjacent to "VHD Util" 22:28:49 solution so simple, it's probably time to stop debating it until we find out there's a problem ;) 22:28:51 Can verify that the source package in the ansible PPA is a no-change backport 22:28:59 should we call the ppa 'zuul-deps' or 'bublewrap'? 22:29:14 would be safe to just copy that into a PPA and we don't have to wait for build 22:29:46 copying their ppa to one under our control seems fine to me. relying on their ppa indefinitely does not 22:30:02 SpamapS: by "Can verify" do you mean that you have verified or that one is able to verify? 22:30:25 but we have the source package they used and assume the canonical build infrastructure is reliable, so can trust what's currently there at least 22:30:44 http://paste.openstack.org/show/608551/ 22:30:57 jeblair: have verified 22:31:06 cool 22:31:21 also neat that they seem to already be using or otherwise want it 22:31:33 (assuming tat's really ansible) 22:31:35 #action pabelanger create new ppa under openstack-ci-core and add bubblewrap to it; copy existing source package from ansible bubblewrap ppa. 22:31:41 does that look correct ^? 22:32:10 sure 22:32:11 we don't, however, have any proof that launchpad.net/~ansible _is_ ansible. That GPG key is single-purpose for debs 22:32:19 but again, safe to copy the debs for speed 22:32:34 yep. out of curiosity, i'll ask 'em next time i see 'em. :) 22:32:45 jeblair: indeed, tell them to sign their keys ;) 22:33:13 (and maybe also add contact info to that lp profile) 22:33:23 I am guess it is driven by jenkins, based on SSH keys added to https://launchpad.net/~ansible 22:33:28 yup 22:33:32 that was my guess as well 22:33:41 hopefully we can move this along quickly now -- we have patches we should be able to test as soon as the ppa is available 22:33:43 fungi: the GPG keys list security@ansible.com as their email, that counts. 22:33:53 jeblair: indeed! 22:34:02 I started poking at seccomp 22:34:02 cool, this has been productive, thanks :) 22:34:20 I'm happy to give backportpackage a go too, if I can find it for fedora :) 22:34:25 SpamapS: what does that entail? 22:34:35 SpamapS: yeah, i suppose openpgp uids are a relevant contact ;) 22:34:41 jeblair: stracing ansible-playbook and then locking down to just those syscalls 22:34:55 there's a tool to generate a binary file to feed into bwrap 22:35:01 though one of them lists security@, the other jenkins@ 22:35:36 pabelanger: you could debootstrap an ubuntu chroot... ;) 22:35:49 auditd will also log seccomp for you if you ask it to 22:35:55 SpamapS: neat... think it'll be workable/useful? 22:36:22 (there's probably some number of file and network operations we'll have to allow in order to have useful untrusted playbooks) 22:36:36 fungi: it looks to be python. Going to clone source 22:37:05 jeblair: yes I do, it's pretty straight forward whitelisting of behavior. 22:37:20 yay! 22:37:37 anything else on this topic then? 22:37:58 (you can blacklist too, but that's for small minds, and we are HUGE minds.. YUGE) 22:38:14 oy 22:38:20 that's my cue 22:38:26 #topic nodepool image testing spec (pabelanger) 22:39:24 so, I wanted to talk a little about if possible 22:39:41 basically, over the last 3 weeks we worked in openstack-infra to remove puppet from our images 22:40:06 it became obviously pretty fast, it was a little painful to ensure our image didn't break as we started deleting puppet things 22:40:44 I know this was a topic in the past, but was curious if we could maybe spend some time talking about this 22:41:17 as a operator, it is hard to stage images in nodepool before they go live 22:41:28 and not sure how to make this easier moving forward 22:41:37 #link nodepool image testing spec https://review.openstack.org/#/c/139598/ 22:41:48 pabelanger: well, the *best* outcome would be gate testing of images, right? 22:42:16 then we can stop problems before they start 22:42:38 Right, that is something I am trying to do now. Hence my recent changes to project-config 22:42:56 but, it would mean running nodepool dsvm against project-config, maybe? 22:43:18 and tripleo and puppet openstack and osa and .... 22:43:32 pabelanger: yep; probably with file restrictions since it's so heavyweight. or in experimental pipeline. 22:43:36 however, thinking of general zuul users, I am not sure if users will do that or expect some sort of audit in nodepool / zuul 22:43:48 clarkb: i don't think we use tripleo to build images? 22:44:14 pabelanger: i think many zuul users don't have nearly so complex images. to be fair, we're trying to have simpler images too. 22:44:18 jeblair: the problem isn't the build failing though its tripleo failing when it runs on the image 22:44:22 If it's really really heavy weight, can also be done as a more stable-release paradigm.. let post-commit jobs test all the combos and tag good images when things stabilize. 22:44:27 so just checking if dib works or not isn't going to get us much 22:44:35 s/post-commit/periodic/ 22:45:02 clarkb: that situation is pretty untenable from a post-image build testing situation as well. i think an image-validity check has to be relatively low cost. 22:45:10 jeblair: yup I agree 22:45:40 problem is I don't know that such a thing exists. We could start with something arbitrary like jenkins and zuul users exist and then tack on things as we go 22:45:59 v3's secrets thing comes in handy here.. 22:46:03 "it uploads and boots" 22:46:26 I think most of this could be covered by a pre-task in zuulv3, ensure directories exist, proper user permissions, etc from DIB creation. But, is that something we'd want all jobs to run? or just 1 time after we upload / launch a image 22:46:34 though that eats into your quota unless you set aside a separate tenant 22:46:42 pabelanger: how does a pre-task help? 22:47:11 jeblair: I might be thinking wrong, but a job will not fail is a pre-task does? 22:47:15 pabelanger: or even make it part of the dib build itself? and inspect the chroot? 22:47:32 pabelanger: right, it will run again. but it will fail the second and third times as well. 22:47:37 pabelanger: and then the job will fail. 22:47:43 fungi: can be a very small flavor at least. 22:47:50 clarkb: I'd love to use the ansible chroot connection to audit, is we want to do that 22:47:58 something a little less error-prone than when we ran devstack smoke on new images before putting them into use ;) 22:48:00 we've been embedding some basic validation checks into an elements post-install.d phase, so we can fail the build and prevent nodepool from uploading and using a broken image 22:48:09 jeblair: k, I wasn't sure if we limited on pre-task failures 22:48:38 pabelanger: well, what i'm saying is that by the time zuul runs the job, if it's failing because the image is broken, there's no way to fix that. 22:48:48 adam_g: what kind of things do you check? 22:48:52 pabelanger: I don't know that you'd need to specify the tool. Just have a run stage in dib for checking things then each element can publish its own arbitrary checkers 22:48:54 adam_g: yeah, self-checking elements seem like a great idea 22:49:10 or just use post-install.d 22:49:14 * clarkb catches up to adam_g 22:49:35 jeblair: atm just some things that have broken in recent DIB releases (a functioning python, specifically).. that zuul-cloner at least loads and spits its usage, etc. 22:49:43 adam_g: good ideas 22:50:12 a proper validate-image.d or similar phase in DIB would be preferable 22:50:48 oh also, we then monitor nodepool image age and alert if we dont have an image newer than $foo, so we know when builds start failing 22:50:59 adam_g: very easy to add phases 22:51:22 and I think it's a good ide 22:51:24 a 22:51:39 we also do test our dib elements in the nodepool integration job, so can easily reject changes which would cause images to fail such self-validation 22:51:56 maybe have those checks inside the chroot but after everything else including cleanup 22:52:36 fungi: I don't think we do a full build with our elements, its minimal 22:53:12 So, if we want to do this pre commit, that is not a problem. I already have some thoughts / plans to help make project-config better. I mostly wanted to see if we want to add this functionality native into nodepool / zuul as an audit / testing stage 22:53:13 clarkb: because of git clones? 22:53:26 clarkb: oh, well, i suppose we could require that at least those minimal elements pass their validation 22:53:33 then have operator either manually promote images or auto promote if the audit passed 22:53:36 pabelanger: i feel like with our limited developer bandwidth at the moment, the best bang for the buck is gate jobs and dib validation. i think that nodepool image validation is workable and i'm not opposed to implemeting it (i think that spec was heading in a good direction), but from the openstack-infra operator perspective, i don't think it's critical to our workflow if we are able to do the other checks. 22:54:21 it does seem like something an industrious downstream consumer could contribute if they need it 22:54:24 fungi: yes, we can add nodepool-dsvm job to project-config, but full builds today are 1hr+ 22:54:39 jeblair: okay, that works for now. I'll work on this from openstack-infra POV today 22:54:42 jeblair: our builds take like an hour right now with hot cache 22:54:52 not sure how feasible a full build in integration job is as a rsult 22:55:02 if we drop cache-devstack / openstack-repos (what I am testing) it is about 20mins 22:55:33 again, though, if this is because of iterating over all 1.7k (or however many we're up to now) git repos to see whether they need to be updated, that's likely to get much better if we can drop git caches from our im ages with zuul v3 22:55:39 clarkb: i'd say 1 hour is in the realm of feasible, certainly for an experimental pipeline job. 22:55:59 jeblair: it will be longer on test nodes with a cold cache 22:56:07 clarkb: test nodes have a warm cache :) 22:56:19 sure, but today there is only 2 things that depend on openstack cached repos at build time, zuul-env and tempest-env 22:56:34 which we could update to git clone from another source 22:56:48 we should probably look into moving the tempest-env addition to job runtime 22:57:15 pabelanger: agreed, then the only issue becomes how important it is that we validate the cache on image builds (maybe it's not that important) 22:57:29 creating zuul-env at job runtime is more likely to be a catch-22 but maybe we make it stop requiring a local cache 22:57:43 jeblair: yes, so far, that hasn't been our issues 22:58:11 unless repo didn't mirror from gerrit to git.o.o properly 22:58:22 pabelanger: yeah, i can get behind severing those deps and dropping the repo cache from the gate testing 22:58:49 fungi: pabelanger's proposed change just clones master directly, which i think is reasonable. it's functionally the same as what we're doing now. 22:58:49 that totally brings the runtime into a reasonable realm for check jobs, in my opinion 22:59:11 fungi: yeah, we're into the territory where we can run it on a wider variety of changes 22:59:17 but if all we check is can it build we haven't done much to fix real problems? 22:59:28 if something fails to build we continue to use old image happily 22:59:36 if the elements incorporate validation 22:59:43 next step would be post build auditing 22:59:46 i think was the assumption 22:59:54 fungi: if elements incorporate validation we can likely even defer that to build time too 22:59:56 for the same reason 23:00:08 (though in check it will make review easier) 23:00:09 or upload the image somewhere (as SpamapS suggested, or to a devstack, if we're really patient) 23:00:21 we're out of time 23:00:28 well, if we defer validation to production build time (is that what you mean?) then we get to merge changes which break image builds 23:00:33 thanks everyone! 23:00:39 fungi: yes but don't impact production 23:00:41 thanks jeblair! 23:00:44 #endmeeting