20:04:12 <lifeless> #startmeeting tripleo
20:04:13 <openstack> Meeting started Mon Jul 15 20:04:12 2013 UTC.  The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:04:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:04:17 <openstack> The meeting name has been set to 'tripleo'
20:04:27 <lifeless> sorry I'm late; had a non-sleeping baby night :(
20:04:45 <SpamapS> lifeless: had one of those too. Pondering a post meeting nap :)
20:05:26 <lifeless> mmm nap
20:05:35 <jog0> o/
20:05:41 <lifeless> #topic agenda
20:05:42 <lifeless> bugs
20:05:43 <lifeless> Grizzly test rack status
20:05:43 <lifeless> CI virtualized testing progress
20:05:43 <lifeless> open discussion
20:05:49 <lifeless> #topic bugs
20:06:05 <lifeless> https://bugs.launchpad.net/tripleo/
20:06:19 <lifeless> #link https://bugs.launchpad.net/tripleo/
20:06:23 <lifeless> #link https://bugs.launchpad.net/diskimage-builder/
20:06:55 <lifeless> #link https://bugs.launchpad.net/os-refresh-config
20:07:10 <lifeless> #link https://bugs.launchpad.net/os-config-applier
20:07:11 <SpamapS> as usual, my two bugs are both a bit lacking in context
20:07:35 <lifeless> SpamapS: is there a config-collector tracker now?
20:08:04 <SpamapS> lifeless: no, I set it aside a bit to get my tripleo setup in order for testing it..
20:08:08 <SpamapS> that was a week ago
20:08:10 <SpamapS> have not booted a VM yet. :-/
20:08:27 <lifeless> ok. Would you like me to do the LP administrivia  ?
20:09:00 <SpamapS> lifeless: yeah, I don't think I'm admin of the teams anyway
20:10:11 <lifeless> SpamapS: will do. (YOu don't need to be admin of them to make and hand-off a project)
20:12:09 <lifeless> ok so
20:12:22 <lifeless> hmm, bug https://bugs.launchpad.net/tripleo/+bug/1182241
20:12:25 <uvirtbot> Launchpad bug 1182241 in tripleo "first-boot.d rules are running on every boot" [Critical,Triaged]
20:12:56 <lifeless> I think thats fixed, should have been closed.
20:13:17 <lifeless> I moved the rules to orc scripts
20:13:29 <lifeless> and made them idempotent
20:13:46 <lifeless> we can't delete the first-boot feature yet
20:13:55 <lifeless> perhaps we should deprecate it though?
20:14:29 <SpamapS> lifeless: indeed I think it may need to stay around with big ugly DEPRECATED warnings for a while .. since we seem to have some adoption now.
20:14:30 <lifeless> salv-orlando can't reproduce the quantum load issue
20:14:43 <stevemar2> ayoung: why would having two delegated auth mechanism bad?
20:14:50 <stevemar2> be bad*
20:14:50 <lifeless> in bug 1184484
20:14:52 <uvirtbot> Launchpad bug 1184484 in tripleo "Quantum default settings will cause deadlocks due to overflow of sqlalchemy_pool" [Critical,Triaged] https://launchpad.net/bugs/1184484
20:14:58 <lifeless> stevemar2: this is a different meeting
20:15:08 <stevemar2> lifeless, wrong window, sry
20:15:10 <ayoung> stevemar2, having a broken delegation mechanism would be bad
20:15:13 <lifeless> stevemar2: please use the dev channel for out-of-meeting chat
20:15:16 <lifeless> stevemar2: np, thanks!
20:15:48 <ayoung> stevemar2, and having two mechanisms is fine, but duplication in general leads to fixes in one needing to be made in the other as well
20:15:51 <lifeless> We may need to do a manual update of the control plane in the POC to help him reproduce, or perhaps we can trigger it in virt.
20:16:01 <lifeless> ayoung: hey, you too please! -> ~-meeting.
20:16:10 <ayoung> lifeless, sorry
20:16:31 <lifeless> bug 1199412
20:16:33 <uvirtbot> Launchpad bug 1199412 in tripleo "seed vm build fails during cinder service install" [Critical,Triaged] https://launchpad.net/bugs/1199412
20:17:07 <lifeless> Thats fixed too isn't it ?
20:17:46 <SpamapS> lifeless: yes
20:17:59 <lifeless> and so is bug 1199568
20:18:01 <uvirtbot> Launchpad bug 1199568 in tripleo "keystone service not running during wipe-openstack" [Critical,Triaged] https://launchpad.net/bugs/1199568
20:19:08 <lifeless> jog0: you've got docs somewhere on the fake virt driver for nova right?
20:19:26 <jog0> lifeless: yeah
20:19:29 <lifeless> jog0: so that we can try booting 100 vms at once without having a 100-vm capable control plane
20:19:47 <lifeless> jog0: perhaps you'd like to try reproducing 1184484 ?
20:20:10 <jog0> https://github.com/openstack-dev/devstack/commit/baf37ea81720982050eceea2b1b1e9bbdf6f0c94
20:20:29 <lifeless> jog0: just take an overcloud and change the virt driver on the compute node then throw a big boot request at it
20:21:03 <lifeless> ok, thats all the crits I can see
20:21:06 <jog0> lifeless: sounds good to me
20:21:16 <lifeless> anyone have high bugs they want to discuss?
20:22:09 <lifeless> nada, ok.
20:22:17 <jog0> lifeless: getting a gate up?
20:22:25 <jog0> not sure if that counts as a bug
20:22:27 <SpamapS> https://bugs.launchpad.net/tripleo/+bug/1201056
20:22:29 <uvirtbot> Launchpad bug 1201056 in tripleo "init-nova requires internet access" [High,Triaged]
20:22:29 <SpamapS> fix released right?
20:22:36 <lifeless> SpamapS: yes plox
20:22:45 <lifeless> jog0: yeah, other business
20:22:53 <lifeless> #topic grizzly POC rack status
20:23:02 <lifeless> SpamapS: you were going to file some bugs about this?
20:23:08 <SpamapS> I'm having problems right now with setup-baremetal ...
20:23:33 <SpamapS> lifeless: drafting them now.
20:23:44 <lifeless> SpamapS: cool
20:24:02 <lifeless> #action SpamapS to finish drafting the bugs about long term rack running
20:24:21 <lifeless> we had the control plane for the POC go offline mid last week
20:24:45 <lifeless> the good news is that 'nova boot' on the undercloud brought it right back.
20:25:28 <lifeless> The bad news is that I suspect the [I think fixed in nova trunk - devananda will know] 'oh look IPMI didn't respond quickly, clearly the machine wants to be off' bug turned it off in the first place.
20:25:50 <lifeless> we haven't confirmed that via logs.
20:26:15 <lifeless> I'm not sure if we need to bother, since the only reason it was a fire drill was this being a non-HA setup.
20:26:27 <lifeless> thoughts?
20:27:01 <SpamapS> I think it is worth confirming that is what happened.
20:27:13 <SpamapS> Its a serious enough thing that we don't want to gloss over because it is "likely"
20:27:29 <SpamapS> Also if we had better monitoring on our POC we'd have known sooner.
20:27:43 <lifeless> SpamapS: entirely agreed.
20:27:51 <lifeless> SpamapS: perhaps you could include a bug about both of those points.
20:28:00 <lifeless> SpamapS: in your drafts
20:28:44 <SpamapS> https://bugs.launchpad.net/tripleo/+bugs?field.tag=poc
20:30:18 <SpamapS> lifeless: there, I think now I have all of them
20:30:57 <lifeless> #link https://bugs.launchpad.net/tripleo/+bugs?field.tag=poc
20:31:16 <lifeless> ok
20:31:46 <lifeless> so I think we should treat these as indeed critical and move on them after the current crits are closed
20:31:50 <lifeless> which is spamaps two fuzzy ones
20:32:04 <lifeless> #topic CI virtualized testing progress
20:32:10 <pleia2> hello
20:32:15 <lifeless> pleia2: how goes it? We synced a little on the weekend
20:32:46 <pleia2> yeah, so that was helpful in understanding some of the networking stuff that I was tripping up on, now just nailing down the specific things I want to run for this testing
20:33:15 <lifeless> pleia2: is your kvm seed setup working - can you nova boot a bm node?
20:33:32 <pleia2> lifeless: unfortunately not :(
20:33:42 <pleia2> in the middle of backing out some changes
20:34:07 <lifeless> pleia2: perhaps we should pair up again after this meeting?
20:34:31 <pleia2> lifeless: yeah, that would be good (but lunch for me first)
20:34:41 <lifeless> pleia2: kk; ping me maybe.
20:34:45 <lifeless> #topic open discussion
20:34:45 <pleia2> will do
20:35:07 <lifeless> jog0: You wanted to talk CI gates
20:35:14 <SpamapS> lifeless: I feel like we need to start pushing harder down the path toward gating.
20:35:19 <dkehn> https://review.openstack.org/#/c/30441/ fingers crossed again
20:35:59 <SpamapS> There are enough people involved, and enough moving parts, that breaking stuff is worse than not moving forward at the highest possible speed.
20:36:36 <lifeless> so we break for two reasons. Other projects. Our stuffups.
20:36:54 <pleia2> are there any small gating tests that can help that don't require my bit yet?
20:36:56 <lifeless> My sense is that in the last 2 weeks its been about 50-50 split
20:37:36 <jog0> SpamapS: ++
20:37:36 <lifeless> to gate we need all our components mutually gated/low risk of random breaks/or used via releases
20:38:27 <lifeless> uhm
20:38:49 <jog0> what kinds of issues have been breaking trunk? Perhaps there is a smaller gate we can start with as pleia2 suggested
20:38:53 <lifeless> e.g. diskimage-builder - to gate on that we need to get the ubuntu and fedora images it needs cached into the CI infrastructure so we're not dependent on internet access.
20:39:09 <jog0> maybe just getting through  DIB or something
20:39:22 <lifeless> for tie we need the git caching derekh is working on, and pip caching which I put a etherpad up designing
20:39:37 <lifeless> jog0: so we do need that; but note that dib hasn't broken.
20:40:09 <jog0> lifeless: I was refering to the image elements aspect, so what generally breaks.
20:40:14 <lifeless> we had a tie rule break (cinder builds), we had neutron break (the rename of the client) and then (the quantum-server compoatbility script broke)
20:40:46 <lifeless> bah, spelling
20:40:58 <lifeless> anyhow, just to say - I'm totall +100 on CI
20:41:41 <jog0> how much of that can be detected with just getting a seed-stack vm running?
20:41:58 <lifeless> the cinder rule break would have been detected
20:42:02 <jog0> (and not any fake baremetal booting)
20:42:16 <lifeless> both neutron failures were silent until we tried to do stuff in anger
20:42:44 <lifeless> jog0: pleia2: so - we can indeed get some benefit from smaller gate checks.
20:43:00 <jog0> is it possible to do seed-stack + tempest?
20:43:18 <lifeless> jog0: in principle yes.
20:43:27 <jog0> err rather would that help us
20:43:29 <lifeless> jog0: in the current gate I very much doubt it.
20:43:32 <rwsu> assuming we can get toci back in working order, is it possible to ask everyone to run it before checking in new changes? if there is a smaller set of tests folks can run, i would be in favor of that too
20:43:51 <SpamapS> Another thing that might help is getting error reporting into the heat templates (via waitconditions + orc)
20:44:04 <lifeless> rwsu: what would be awesome would be if toci just subscribed to branches proposed to tie/the/dib/oac/orc/occ
20:44:16 <jog0> lifeless: ++
20:45:12 <lifeless> so at the moment, the toci folk are carrying the CI burden in chase-mode for most of tripleo, and pleia2 is the only person working directly on gating infrastructure
20:45:25 <lifeless> pleia2 is working on /nova/ gating infrastructure atm
20:45:28 <rwsu> lifeless: good idea, it would be nice to have it report yea or nay in the review process
20:45:30 <pleia2> and I'm getting impacted by breakage to, so it's a bit slow going :(
20:45:30 <lifeless> because nova bm isn't gated
20:46:10 <lifeless> a consequence of that (that it will gate nova) is that its going to have to be super reliable
20:46:31 <lifeless> which probably means stevebaker's packaging patch set, derekh's git cache stuff, and a pip cache
20:46:44 <lifeless> plus other ancilliary changes will all be needed just to get that
20:47:07 <lifeless> I'm going to suggest that pushing straight at that target is better than picking small side gates
20:47:55 <lifeless> because I don't think small side gates will catch any breakage that actually impacts pleia2's work - she has been hitting all the quantum rename stuff, and also issues with running in lxc not kvm etc.
20:48:23 <lifeless> opinion: we should make bugs that will prevent her gate being activated critical
20:48:37 <lifeless> because getting CI for us is now critical
20:48:54 <jog0> lifeless: ++
20:49:37 <lifeless> we may have program status soon
20:49:50 <lifeless> if we do, we can ask for -infra help gating everything
20:49:53 <lifeless> which we can't at the moment
20:50:37 <pleia2> I'm going to be out of the office next week for OSCON (checking in, but solid testing+work will be hard) but I hope to have enough progress this week that I have some kind of dependency list of what we'll need in the gate (caches, etc)
20:50:55 <rwsu> is someone already working on the pip cache?
20:51:24 <SpamapS> infra has a nice pip cache :)
20:51:51 <lifeless> rwsu: we have a design nutted out - http://etherpad.openstack.org/TripleO-pip-mirror
20:52:17 <rwsu> nice
20:52:47 <lifeless> hmm last minutes
20:52:51 <lifeless> I followed up SpamapS sprint idea
20:53:11 <lifeless> one thing is that a bunch of folk have said 'after the beta milestone please'
20:53:20 <lifeless> implicitly, just by 'this date is better'
20:53:38 <lifeless> how important is it that we be sprinting before the milestone (to get things in order for it)
20:53:48 <lifeless> vs that we be sprinting together (to get tings together)
20:54:31 <SpamapS> Hrm
20:54:52 <SpamapS> Well my thinking was to get together to push things into h3.
20:54:58 <lifeless> indeed
20:55:04 <lifeless> which is sept 5th
20:55:18 <SpamapS> But if people would rather get together to hash out ideas for working on before the icehouse summit.. I would find value in that as well.
20:55:45 <jog0> lifeless: why not something over sept 5th so we can both push to finish features and then switch to finding/triaging and fixing bugs?
20:56:09 <jog0> (so we can make things work for the burners)
20:56:14 <lifeless> jog0: I have a conference 6/7/8 sept
20:56:14 <SpamapS> I get the feeling that people not explicitly "working on tripleo" are constrained by their primary focus.
20:56:48 <lifeless> jog0: though like mordred I don't strictly need to be at the sprint, I'd really /like/ to be there.
20:57:19 <lifeless> jog0: aug 19th is early enough for most burners I think, at least for 1/2 the week.
20:57:28 <SpamapS> lifeless: I think we'd be less productive without you
20:57:29 <lifeless> 26th is the burn start
20:57:56 <jog0> aug 19th doesn't work for me, but if it works for everyone else I will just have to skip it
20:58:01 <SpamapS> And yeah this is a short sprint.. 19/20 gives them 6 days of pre-burn prep time :)
20:58:29 <jog0> what about sept 2/3?
20:58:31 <devananda> lifeless: 19/20 is probably no good for either monty or me, FWIW
20:58:39 <lifeless> devananda: ack
20:58:57 <lifeless> devananda: mordred had indicated he could do mon maybe tuesday
20:59:08 <lifeless> jog0: terrible for burners
20:59:09 <mordred> maybe tuesday, but it would be pushing it
20:59:17 <devananda> mon yes, maybe. tues is kinda driving day
20:59:18 <lifeless> jog0: they need a couple weeks decompression after the thing
20:59:31 <jog0> right
20:59:31 <jog0> maybe we should do this in http://www.doodle.com/
20:59:37 <SpamapS> and I'm not available 8/27 - 9/9 .. (not burning.. ;)
20:59:38 <devananda> doodle ++
20:59:39 <lifeless> out of time
20:59:44 <lifeless> I will take it to the list.
20:59:49 <lifeless> #endmeeting