20:00:21 <lifeless> #startmeeting tripleo
20:00:22 <openstack> Meeting started Mon Jun  3 20:00:21 2013 UTC.  The chair is lifeless. Information about MeetBot at http://wiki.debian.org/MeetBot.
20:00:23 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
20:00:25 <openstack> The meeting name has been set to 'tripleo'
20:00:48 <lifeless> #agenda
20:00:51 <lifeless> bugs
20:00:51 <lifeless> Grizzly test rack progress
20:00:51 <lifeless> CI virtualized testing progress
20:00:51 <lifeless> open discussion
20:00:52 <lifeless> bah
20:00:57 <lifeless> #topic agenda
20:01:01 <lifeless> bugs
20:01:01 <lifeless> Grizzly test rack progress
20:01:01 <lifeless> CI virtualized testing progress
20:01:01 <lifeless> open discussion
20:01:10 <lifeless> #topic bugs
20:01:26 <lifeless> https://bugs.launchpad.net/tripleo/
20:01:30 <lifeless> sigh.
20:01:32 <lifeless> #link https://bugs.launchpad.net/tripleo/
20:02:15 <SpamapS> o/
20:02:20 <lifeless> 10 criticals
20:02:37 <lifeless> 4 in progress
20:02:53 <lifeless> SpamapS: am I wrong, or do you have 1182249 too ?
20:03:11 <lifeless> and 1182732 and 1182737 ?
20:03:24 <SpamapS> checking
20:03:35 <lifeless> and 1183442 ? :)
20:03:45 <SpamapS> I believe 1182249 yes
20:04:08 <Ng> lifeless: which bug? :)
20:04:19 <lifeless> #action lifeless https://bugs.launchpad.net/tripleo/+bug/1184484 I will add it to the discussion about defaults on the -dev list.
20:04:20 <uvirtbot> Launchpad bug 1184484 in tripleo "Quantum default settings will cause deadlocks due to overflow of sqlalchemy_pool" [Critical,Triaged]
20:04:23 <lifeless> Ng: iLO
20:04:24 <SpamapS> Not sure if my patches in review handle 1182732
20:04:32 <lifeless> Ng: https://bugs.launchpad.net/tripleo/
20:05:23 <SpamapS> I do have 1183442
20:06:22 <SpamapS> is there some reason gerrit doesn't manage LP projects for stackforge?
20:07:15 <lifeless> unlinking https://bugs.launchpad.net/tripleo/+bug/1182732 - we have a separate workaround task.
20:07:16 <uvirtbot> Launchpad bug 1182732 in quantum "bad dependency on quantumclient breaks metadata agent" [High,Confirmed]
20:07:55 <lifeless> SpamapS: it will
20:07:58 <lifeless> SpamapS: if it's configured correctly
20:08:12 <lifeless> SpamapS: we should be configured correctly now; clarkb gave us a hand last week to sort it out
20:09:16 <SpamapS> good, will cross my fingers :)
20:09:48 <lifeless> Ng: https://bugs.launchpad.net/tripleo/+bug/1178112 specifically
20:09:49 <uvirtbot> Launchpad bug 1178112 in tripleo "baremetal kernel boot options make console inaccessible on ILO environments" [Critical,Triaged]
20:11:14 <lifeless> so that leaves two
20:11:21 <lifeless> one is a workaround issue
20:11:34 <lifeless> not a lot we can do; clearly quantum hasn't been used at moderate scale
20:11:43 <lifeless> thats bug
20:11:48 <lifeless> bug 1184484
20:11:49 <uvirtbot> Launchpad bug 1184484 in tripleo "Quantum default settings will cause deadlocks due to overflow of sqlalchemy_pool" [Critical,Triaged] https://launchpad.net/bugs/1184484
20:12:10 <lifeless> and I'm fairly sure we have to have 1182737 fixed to bring up an automated overcloud
20:12:16 <Ng> lifeless: repointed at the commit that landed in dib and market as fix committed
20:12:36 <lifeless> Ng: \o/ - as dib isn't doing releases yet, just fix released please.
20:12:49 <Ng> k
20:12:53 <lifeless> SpamapS: as you sure you're not installing git trunk of quantumclient yet ?
20:13:07 <SpamapS> lifeless: was just looking
20:13:08 <lifeless> SpamapS: I thought you brought up an overcloud in fully automated fashion and it worked?
20:13:57 <SpamapS> lifeless: 99% automated.. still getting stuck at booting an instance and having metadata because of a lack of routers..
20:14:11 <SpamapS> lifeless: in my notes I have "install quantumclient from trunk in quantum venv"
20:14:23 <lifeless> kk
20:14:33 <lifeless> SpamapS: I will debug that with you later today?
20:14:51 <lifeless> so, bugs done to death I think; lots of high, but lets get the fire drill sorted before we worry about that.
20:15:07 <SpamapS> lifeless: yes I've got it working well with very straight forward manual steps
20:15:08 <lifeless> #topic Grizzly test rack POC
20:15:21 <SpamapS> lifeless: also we should lean on quantumclient maintainers maybe?
20:15:25 <lifeless> So, we have a live working grizzly cloud.
20:15:42 <lifeless> SpamapS: we should.
20:16:28 <lifeless> but we have no monitoring in place.
20:16:33 <SpamapS> lifeless: I need to lean on the keystoneclient maintainers for similar reasons. :)
20:16:47 <lifeless> Ng: GheRivero: perhaps thats something you guys have lots of experience and would like to take up the mantle for?
20:16:48 <SpamapS> lifeless: sure we do, our POC users will phone us if it breaks. ;)
20:16:49 * SpamapS hides
20:17:50 <Ng> lifeless: monitoring? sure. do we have any ideas about what we want?
20:18:01 <GheRivero> lifeless: yeah, sure (don't know what, but if you say so... :)
20:18:13 <lifeless> well
20:18:38 <lifeless> icinga or nagios perhaps? NobodyCam had the start of an element, but it's stalled AFAICT'
20:18:58 <lifeless> perhaps work with him on that, and on the heat template for same
20:19:07 <jog0> I assume alerting is included in monitoring?
20:19:10 <cody-somerville> I can also give a hand there.
20:19:20 <lifeless> cody-somerville: cool
20:19:25 <SpamapS> I'd love to have an icinga-heat bit that, given heat read credentials, can interpret a heat stack and generate all of the monitoring.
20:19:40 <lifeless> SpamapS: +1, and I welcome our strong AI overlords.
20:19:49 <NobodyCam> I ment to get back to the nagios ... but got side tracked on ironic
20:20:17 <lifeless> so, we've three weeks of POC to go; be really good to have monitoring sooner rather than earlier.
20:20:20 <lifeless> jog0: oh yeah, HI
20:20:21 <lifeless> !
20:20:26 <lifeless> what do we want?
20:20:34 <lifeless> I think we want base level hardware/OS monitoring.
20:20:49 <lifeless> We want cloud health - have we maxed out any resource
20:20:50 * jog0 waves to lifeless and the rest of the room
20:21:05 <lifeless> We want API health: are all the API endpoints answering, and doing so in a reasonable timeframe.
20:21:27 <lifeless> We want functional monitoring - is spinning up/down instances working, is networking working.
20:21:43 <lifeless> it's likely we want more than one tool; but a consolidated view of their data.
20:21:58 <lifeless> should I turn this into a blueprint/etherpad?
20:22:34 <Ng> we should probably have this captured somewhere
20:22:39 <lifeless> I'd love it if someone can pick it up and run with it; dragging other folk in as needed.
20:22:45 <jog0> lifeless: and cloud health also detects if a box dies?
20:23:00 <lifeless> jog0: the base hardware/os layer stuff would capture that
20:23:25 <lifeless> jog0: I'm inclined to worry about automated remedial actions at a later date
20:23:40 <jog0> ah, perhaps we should sync up offline, so I can get up to speed
20:24:00 <lifeless> jog0: sure; though we have some time here.
20:24:24 <SpamapS> Note that Heat wants to be able to do some of that remedial action stuff too.
20:24:27 <lifeless> Basically, tripleo aims to deliver a production ready cloud; having *an* answer for monitoring is an important thing.
20:24:32 <lifeless> SpamapS: right
20:24:41 <lifeless> SpamapS: I'm thinking an icinga endpoint can be the canary check.
20:24:52 <SpamapS> yes indeed
20:24:55 <lifeless> SpamapS: at 10000ft view
20:25:16 <lifeless> jog0: the other thing as SpamapS just brought up, is that solid service monitoring is a key part of safe deployment automation.
20:25:28 <lifeless> jog0: so you can stop a deploy mid-way if things go pear shaped.
20:26:24 <jog0> right
20:26:33 <lifeless> right now we have nothing; so we need someting
20:26:47 <lifeless> #action lifeless to capture 10000ft view of monitoring needs in a blueprint
20:26:58 <lifeless> #action someone to take point on monitoring
20:27:44 <lifeless> I have an open todo to track down the missing machines; echohead gave me a spreadsheet, but I don't know [yet] the network topology
20:27:51 <lifeless> anything else about the test rack?
20:28:53 <lifeless> ok
20:29:02 <lifeless> #topic
20:29:02 <SpamapS> next topic: 2nd test rack? :)
20:29:07 <lifeless> #topic CI virtualized testing progress
20:29:15 <pleia2> so, this one is lots of fun
20:29:24 <lifeless> SpamapS: once we can bring up and pull down parallel clouds in this rack I'll ask for a test row.
20:29:35 <pleia2> a couple months ago I was tasked with testing nova-baremtal https://bugs.launchpad.net/openstack-ci/+bug/1082795
20:29:36 <lifeless> pleia2: tag, you're it
20:29:36 <uvirtbot> Launchpad bug 1082795 in openstack-ci "Add baremetal testing" [High,Triaged]
20:29:55 <pleia2> as we all know, a ton has changed since then, ironic and all
20:30:19 <pleia2> but I've still been focusing on tripleo to do virtualized testing of the soundness of launching these test bmnodes
20:30:30 <pleia2> so two things
20:30:59 <pleia2> 1. This is difficult, I tried just straight toci that dprince worked on, but our virtualized environment don't really allow for this (they don't have kvm, and qemu is way way too slow)
20:31:18 <lifeless> how slow?
20:31:26 <lifeless> Like, can we get working-but-slow, and then iterate?
20:31:30 <pleia2> openstack starts exhibiting strange timeout bugs slow, not usable
20:31:40 <pleia2> 2 minutes to ssh in
20:32:19 <lifeless> ok, thats pretty messed up.
20:32:29 <pleia2> 2. Am I still on the right track here at all by using tripleo? If I use lifeless' takeover node I end up pulling out so much virtualization that I'm really just testing dib and launching of nodes (and haven't quite figured out networking on that)
20:32:57 <pleia2> which isn't really tripleo anymore, but probably is where I want to be testing baremetal-wise (I think)
20:33:01 <SpamapS> pleia2: I have working nested kvm on my i7 laptop
20:33:05 <lifeless> pleia2: so, what code path do you want to test ?
20:33:07 <SpamapS> pleia2: using boot-stack
20:33:14 <lifeless> SpamapS: cloud test environments are rackspace/HPCS.
20:33:21 <lifeless> SpamapS: so thats interesting but irrelevant
20:33:22 <pleia2> SpamapS: me too, but not on hpcloud
20:33:27 <SpamapS> gah
20:33:32 <pleia2> can't even load kvm module on hpcloud
20:33:42 <SpamapS> yeah didn't realize thats what we were talking about
20:33:45 <lifeless> pleia2: what codepath are you aiming to test.
20:34:20 <pleia2> lifeless: so that's what I realized this morning - I don't know, nova-baremetal is now ironic (and doesn't yet have a nova driver afaik)
20:34:30 <lifeless> nova-baremetal still exists.
20:34:36 <lifeless> ironic is coming together.
20:34:47 <pleia2> right, and it's a goal for ironic to have a driver which I assume will behave the same way
20:34:53 <pleia2> +for nova
20:34:58 <lifeless> once ironic is integrated, we'll still want to know that the use case of 'nova boot baremetal' still works.
20:35:05 <lifeless> so, lets ignore ironic.
20:35:28 <lifeless> if we do that, what codepath do you want to test?
20:35:51 <pleia2> nova
20:35:58 <lifeless> ok
20:36:04 <lifeless> so the minimum you need for that is
20:36:07 <lifeless> the nova code
20:36:10 <lifeless> configured for baremetal
20:36:15 * pleia2 nods
20:36:19 <lifeless> you need a dedicated network
20:36:37 <lifeless> with 'physical' machines on it w/PXE boot configured
20:36:38 <pleia2> yes, this is my current challenge when trying to do this virtualized without nesting
20:36:52 <lifeless> and you need a power driver capable of turning them on / off.
20:37:26 <lifeless> I suggest that a solid 'it worked' test is to boot a vanilla ubuntu image, ssh in with metadata supplied ssh key.
20:37:48 <lifeless> tripleo's boot-stack is neither here nor there w.r.t. to testing this specific code path.
20:37:55 <pleia2> ok
20:37:57 <SpamapS> Just a thought. Have we ever tried lxc as a way around the nesting problems?
20:38:05 <pleia2> SpamapS: nope
20:38:10 <lifeless> SpamapS: lxc container set to pxe boot ?
20:38:19 <lifeless> SpamapS: with a different kernel....
20:38:20 <pleia2> just qemu (drop in for kvm, easy to test)
20:38:26 <lifeless> SpamapS: I don't think its a fit.
20:38:37 <lifeless> SpamapS: though it's an interesting idea
20:38:40 <SpamapS> lifeless: oh well if we're testing _that_ part yeah there's no point.
20:38:50 <pleia2> so I've had a few ideas, but they end up being so weird that we don't end up testing what we think we're testing (and the tests could break in weird ways)
20:39:06 <lifeless> pleia2: so, *tripleo* want to test the full path.
20:39:26 <lifeless> pleia2: one reason you've been steered at tripleo, I think, is so that you can kill two birds with one stone.
20:39:47 <lifeless> pleia2: a) nova baremetal functional/integration test. b) tripleo boot-stack functional/integration test.
20:40:01 <lifeless> pleia2: I'll let -infra folk weigh in on the relative importance of that, but...
20:40:03 <pleia2> yeah, and it also tests dib
20:40:16 <lifeless> pleia2: for my part, I think 'lets get /a/ test in place and upgrade it later'
20:40:18 <pleia2> (or, is potentially broken by dib :))
20:40:59 <lifeless> now, in the absence of a test cloud with nested vm enabled.... which btw the grizzly POC rack could be setup
20:41:06 <lifeless> or a bare metal test cloud
20:41:15 <lifeless> we're going to have nested KVM for the baremetal node you boot
20:41:27 <lifeless> we don't have to have nested KVM for the boot-stack node.
20:41:34 <lifeless> SpamapS: oh, I may have misinterpreted you....
20:41:41 <pleia2> so my thought was to spin up 3 hpcloud/rackspace instances
20:41:43 <lifeless> pleia2: we could run the boot-stack image in lxc perhaps.
20:42:08 <lifeless> SpamapS: ^ is that what you meant ?
20:42:28 <pleia2> one would be what usually is physical hardware, then boot-stack then the baremetal node, but those are all public machines, not a private lan where they all talk
20:43:06 <lifeless> I don't think it will buy you anything
20:43:16 <pleia2> yeah, it's a mess
20:43:20 <lifeless> as they'll have to run a VPN to get the layer 2 network to do PXE
20:43:27 <lifeless> and that implies nested KVM on each machine
20:43:50 <lifeless> except the boot-stack on; but - see lxc.
20:43:55 <lifeless> sp, SpamapS is afk ;). I'll riff
20:44:23 <lifeless> use dib to build a boot-stack image. loopback mount it and lxc boot it - no nested kvm
20:44:29 <pleia2> hah, so lxc container inside the hpcloud instance?
20:44:33 <lifeless> we document 'use kvm to boot the seed cloud'
20:44:46 <lifeless> we can also document 'use lxc to boot the seed cloud', just as well
20:44:56 <lifeless> bmnodes will still be nested kvm
20:45:14 <lifeless> and you'll still have a br99 or whatever between the bm nodes and eth1 in the boot-stack container.
20:45:14 <SpamapS> sorry yeah had local interrupt
20:45:17 <pleia2> well, qemu, right?
20:45:23 <pleia2> since we can't do nested kvm
20:45:23 <lifeless> pleia2: ack
20:45:35 <SpamapS> lifeless: and yes I meant run boot-stack in lxc
20:45:49 <lifeless> SpamapS: so yeah, I misinterpreted you :(.
20:45:53 <lifeless> SpamapS: argue more, dammit!
20:46:12 <SpamapS> lifeless: I was mid-argument when wife needed muscles
20:47:00 <lifeless> doh!
20:47:03 <lifeless> pleia2: what do you think ?
20:47:05 <pleia2> ok, so instead of booting boot-stack as a kvm instance, we make it lxc (can lxc boot qcow2?), right?
20:47:36 <pleia2> then we just create the bmnode as usual (except with qemu rather than kvm)
20:48:12 <NobodyCam> just I run boot-stack setup on three virtual box vms, dib,boot-stack,bm-node.... no nested vms at all
20:48:25 <NobodyCam> *justFYI*
20:48:34 <lifeless> NobodyCam: those are nested when your host is a cloud instance
20:48:38 <lifeless> NobodyCam: thats the issue
20:48:44 <pleia2> yeah, we're doing this on a public cloud
20:48:47 <lifeless> pleia2: yes. And qemu-nbd can loopback mount qcow2.
20:48:57 <pleia2> lifeless: ok, cool
20:49:15 <pleia2> ok, I have a plan, thanks lifeless and SpamapS
20:49:19 <lifeless> cool
20:49:23 <pleia2> (now to learn more about lxc :))
20:49:32 <lifeless> #topic open discussion
20:50:15 <lifeless> anything?
20:50:17 <SpamapS> so many bugs
20:50:20 <SpamapS> so little time :)
20:50:32 <SpamapS> (I think that may mean tripleo is healthy)
20:51:44 <SpamapS> oh
20:51:46 <lifeless> MORE PEOPLEZ PLEAHSE
20:51:51 <SpamapS> os-config-applier is now os-apply-config
20:52:24 <SpamapS> also I was thinking o-a-c should have a way to reference instance metadata the same way it references heat metadata.
20:52:43 <lifeless> mmm
20:52:55 <lifeless> what about a thing to suck instance metadata down to disk as json
20:53:13 <lifeless> and oac unions in some well defined manner multiple json files?
20:53:26 <SpamapS> yeah thats the way I was thinking of doing it actually.
20:53:58 <SpamapS> local_ip = {{instance_metadata.private_ip}} or something like that.
20:54:16 <lifeless> sob, you want to kill my sed ?:)
20:54:39 <lifeless> should we namespace the heat variables too ?
20:54:43 <lifeless> {{heat.goo}} ?
20:54:49 <SpamapS> Yeah thats what pops into my head as well
20:55:01 <lifeless> so
20:55:04 <SpamapS> though another thought is to just reserve some namespaces
20:55:07 <lifeless> what I was thinking was that neither was namespaced
20:55:22 <lifeless> and we define what happens on conflicts in a formal predictable manner
20:55:25 <sthakkar> hey guys
20:55:35 <lifeless> so that you can locally override something
20:55:41 <lifeless> sthakkar: hi ?
20:56:04 * mestery thinks sthakkar is early for the next meeting. :)
20:56:20 <sthakkar> mestery is right. sorry guys :)
20:56:55 <SpamapS> lifeless: well I will put together a bug about the need for access to metadata.. the design can come later.
20:57:16 <lifeless> ok, so I think thats a wrap then.
20:57:19 <lifeless> last call
20:57:57 <lifeless> #endmeeting