19:00:30 <devananda> #startmeeting ironic
19:00:31 <openstack> Meeting started Mon Dec  2 19:00:30 2013 UTC and is due to finish in 60 minutes.  The chair is devananda. Information about MeetBot at http://wiki.debian.org/MeetBot.
19:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
19:00:35 <openstack> The meeting name has been set to 'ironic'
19:00:37 <devananda> #chair NobodyCam
19:00:38 <openstack> Current chairs: NobodyCam devananda
19:00:45 <NobodyCam> :)
19:00:52 <devananda> #topic greetings & roll call
19:00:55 <devananda> hi all! who's here?
19:00:57 <GheRivero> o/
19:00:58 <lucasagomes> o/
19:01:03 <agordeev> o/
19:01:04 * NobodyCam o/
19:01:08 <rloo> o/
19:01:11 <yuriyz> o/
19:01:29 <devananda> great!
19:01:33 <devananda> for reference, here's the agenda
19:01:34 <devananda> #link https://wiki.openstack.org/wiki/Meetings/Ironic#Agenda_for_next_meeting
19:02:02 <devananda> #topic announcements / updates
19:02:13 <devananda> just one today
19:02:42 <devananda> i started writing a consistent hash ring for the conductor service
19:03:03 <devananda> to solve several problems that we ran into over the last two weeks
19:03:10 <lucasagomes> nice, any idea when ur going to put a review up?
19:03:29 <devananda> around routing RPC messages, knowing which conductor is responsible for which instance, and handling failures
19:03:32 <devananda> yes
19:03:34 <devananda> one already landed
19:03:40 <lucasagomes> and btw, does we need it in order to get out of the incubation process?
19:04:24 <NobodyCam> lucasagomes: I think we do. our we'll have to do a lot of reengerring
19:04:35 * devananda looks for the commit
19:04:35 <lucasagomes> NobodyCam, right
19:04:37 <NobodyCam> our = or
19:04:54 <lucasagomes> right yea def it's important
19:05:03 <devananda> #link https://review.openstack.org/#/c/58607/
19:05:14 <lucasagomes> just asking cause in our last conversation it was unclear whether we would need it or not
19:05:17 <devananda> yea
19:05:40 <NobodyCam> #link https://blueprints.launchpad.net/openstack/?searchtext=instance-mapping-by-consistent-hash <- fyi
19:05:43 <devananda> so i did the mental excersize of: what happens if we dont have this? --> we can only run one conductor instance --> can we exit incubation like that? --> probably not.
19:06:05 <devananda> and decided to dig in and get'er'done
19:06:19 <lucasagomes> great!
19:06:32 <devananda> also talked with the nova folks briefly about that, and got the same impression from them
19:07:05 <vkozhukalov> o/
19:07:24 <lucasagomes> I know tripleo might not need it in the moment, but yea having it done is def a big/great step
19:08:21 <devananda> the lack of this in nova-bm can be worked around with eg. pacemaker + drbd
19:09:17 <devananda> in theory that could also provide some HA for ironic. but we would still need to restrict it to a single conductor instance (until we add this hash ring)
19:09:26 <devananda> so. bleh. more to do.
19:09:46 <devananda> ok, moving on (we can come back to this in open discussion)
19:09:55 <devananda> #topic action items from last week
19:10:06 <devananda> #link http://eavesdrop.openstack.org/meetings/ironic/2013/ironic.2013-11-25-19.01.html
19:10:20 <devananda> i think 2 of those were romcheg's, and he's not here today
19:10:30 <lucasagomes> I ported the last bug on the list at whiteboard today
19:10:42 <lucasagomes> gotta find some more things that might need to be ported from nova bm havanna to ironic
19:10:54 <devananda> dkehn and NobodyCam and I talked about the nova-network api bits, and I think it's clear. dkehn, any updates?
19:10:56 <lucasagomes> #link https://review.openstack.org/#/c/59493/
19:11:04 <devananda> lucasagomes: awesome, thanks!
19:11:12 <dkehn> nothing that I've heard
19:11:14 <NobodyCam> lucasagomes: great!
19:11:45 <dkehn> I think everyone is watching the neutron stabilization progress
19:12:09 <devananda> dkehn: when do you think you'll have some code for the pxe driver -> neutron integration?
19:12:37 <dkehn> working it presently, ran into issues with bring up the dev env
19:12:47 <dkehn> working with NobodyCam to resolver
19:12:53 <devananda> ack
19:13:21 <dkehn> but the 1st stage is just the PXE data then will work the rest assuming no issue with env
19:14:24 <NobodyCam> dkehn: I will be afk a good chuck ot today hit me on gtalk if you have questions
19:14:35 <devananda> as far as oslo/sqla-migrate vs. alembic, folks seem to still like alembic in principle, but i don't have any concrete "yes we should move" answer yet
19:14:36 <dkehn> k
19:15:20 <devananda> #topic integration and testing
19:15:53 <devananda> romcheg isn't here today, and he's been doing most of the work in this area
19:16:19 <devananda> yuriyz: don't suppose you have any updates from him?
19:16:26 <NobodyCam> I have been working on the nova integration
19:16:42 <NobodyCam> we are making progress.
19:16:51 <yuriyz> no from romcheg
19:17:02 <vkozhukalov> we are going to make prove of concept scheme for integration testing
19:17:06 <NobodyCam> Exposing node deploy() will be a biggie
19:17:33 <devananda> vkozhukalov: are you working with -infra on that?
19:17:59 <vkozhukalov> something like launching one VM, installing ironic on it, then launching another VM and booting it from the first one
19:18:23 <vkozhukalov> devananda: no, we just started to do that
19:18:44 <devananda> vkozhukalov: there's a lot of work that has / is being done around testing tripleo in -infra, which means using nova-baremetal. much of that work can probably be used in the same way for testing ironic
19:19:16 <devananda> vkozhukalov: will you be around after this meeting? we should chat with infra team :)
19:19:31 <vkozhukalov> devananda: ok, we can
19:19:36 <devananda> vkozhukalov: great, thanks!
19:19:43 <devananda> #topic nova driver
19:19:53 <NobodyCam> oh thats me
19:19:55 <devananda> (skipping the client because NobodyCam has to leave soon -- will come back to it)
19:20:03 <devananda> NobodyCam: hi! how goes it?
19:20:12 <NobodyCam> we are making progress. :-p
19:20:22 <NobodyCam> can we jump to api
19:20:29 <devananda> oh. sure
19:20:39 <devananda> #topic python-ironicclient & API service
19:20:43 <devananda> lucasagomes: that's you!
19:20:49 <NobodyCam> lucasagomes: you have thoughts on deploy?
19:20:49 <lucasagomes> vkozhukalov, might worth to take a look at https://github.com/openstack-infra/tripleo-ci
19:20:56 <lucasagomes> oh right, so as NobodyCam mentioned
19:21:11 <lucasagomes> we need to expose a way to trigger the node deploy from the API/client libraries
19:21:16 <lucasagomes> I thought about something like
19:21:48 <lucasagomes> POST /nodes/<uuid>/deploy returning 202 with the location header field pointing to /nodes/<uuid>/state case the request gets accepted
19:22:00 <lucasagomes> 403 in case the deployment was already trigged and not completed
19:22:06 <lucasagomes> also we need a way to abort
19:22:19 <lucasagomes> so folk could do a DELETE /nodes/<uuid>/deploy
19:22:26 <NobodyCam> lucasagomes: will deploy be sync or async?
19:22:28 <lucasagomes> to abort the operation
19:22:31 <lucasagomes> async
19:22:36 <lucasagomes> that's why 202 + location
19:22:55 <lucasagomes> location = he can look at the state resource to see in which state the node currently is
19:23:01 <devananda> lucasagomes: what will be in the POST body?
19:23:02 <lucasagomes> + the target state
19:23:26 <lucasagomes> devananda, didn't think about it yet
19:23:35 <lucasagomes> just got me think about how it would work at the end of today
19:23:42 <devananda> NobodyCam: you'll need to have a while loop in the nova driver, polling nodes/<uuid>/state, to see when it reaches "done", or if it errors, and also tracking some timeout in Nova
19:23:42 <lucasagomes> so there's some gaps, just the initial idea
19:23:44 <NobodyCam> so nova driver will have to poll
19:23:50 <NobodyCam> yep
19:24:35 <NobodyCam> if nova times out (ie.. very long deploy) we will be able to roll back / delete
19:24:45 <rloo> does the nova driver use the ironicclient, or issue a POST directly?
19:24:46 <devananda> NobodyCam: eg, https://github.com/openstack/nova/blob/master/nova/virt/baremetal/pxe.py#L455
19:24:53 <devananda> rloo: ironicclient
19:24:58 <lucasagomes> rloo, it uses the ironic client libs
19:24:59 <NobodyCam> currently ironic client
19:25:04 <NobodyCam> :)
19:25:04 <rloo> hmm, what about an eg --poll option, like nova boot has?
19:25:21 <devananda> rloo: that's a CLI thing
19:25:40 <devananda> rloo: CLI and API are distinct, even though they're packaged together
19:26:01 <NobodyCam> ya that loop really need to be in nova
19:26:05 <rloo> oops.
19:26:08 <NobodyCam> *nova driver
19:26:20 <devananda> nova driver wraps the client API. the CLI also wraps the client API,.... BUT the CLI shouldn't include any "deploy" method
19:26:22 <romcheg_> Is Ironic meeting still here?
19:26:27 <devananda> romcheg_: hi! yes
19:26:34 <NobodyCam> yea
19:26:39 <NobodyCam> hi romcheg_ :)
19:26:46 <lucasagomes> yea just the lib will contain the method to trigger the deployment
19:26:57 <lucasagomes> cli won't expose it
19:26:58 <romcheg_> Hi, sorry for being late. Street meeting took more time :)
19:27:00 <devananda> romcheg_: we can come back to your things in a few minutes
19:27:24 <devananda> lucasagomes: I think a POST with 202 is fine in principle
19:27:42 <NobodyCam> I'm good with polling
19:27:44 <lucasagomes> right, it's also good to say that
19:28:01 <lucasagomes> WSME right now doesnt support returning a Location in the HTTP header
19:28:03 <NobodyCam> so lone as we can "break/stop" the deploy
19:28:10 <lucasagomes> #link https://bugs.launchpad.net/wsme/+bug/1233687
19:28:12 <uvirtbot> Launchpad bug 1233687 in wsme "Return Location with POST 201 return code" [Wishlist,New]
19:28:24 <lucasagomes> same prob for 202 ^
19:28:25 <devananda> hah
19:28:27 <devananda> ok
19:28:41 <lucasagomes> so people go there and clicks in the affect me button :P
19:28:44 <devananda> so we can just do that in the nova driver anyway
19:28:58 * devananda clicks "affects me"
19:29:01 <NobodyCam> lucasagomes: I could build the link
19:29:08 <NobodyCam> ya
19:29:19 <lucasagomes> NobodyCam, yes you can build it, np
19:29:40 <devananda> i dont think you need to build it, really
19:29:49 <lucasagomes> yea build it = call a method in the lib
19:29:51 * NobodyCam is running short on time
19:29:51 <devananda> the client lib already has a node.state object, ya?
19:29:53 <devananda> right
19:30:24 <devananda> NobodyCam: go if/when you need to. i can fill you in later
19:30:46 <NobodyCam> devananda: TY ... sorry for running out 1/2 way thru..
19:31:09 <lucasagomes> NobodyCam_afk, see ya later
19:31:26 <devananda> lucasagomes: have more to discuss on the API / client libs?
19:31:34 <lucasagomes> devananda, not from me
19:31:41 <lucasagomes> if there's no objections I will start working on that tomorrow
19:31:48 <devananda> lucasagomes: ++
19:31:52 <lucasagomes> so I NobodyCam_afk can start using it asap
19:31:58 <devananda> anyone else, questions on API / client?
19:32:12 <rloo> what about a way to interrupt it?
19:32:30 <lucasagomes> rloo, it will use DELETE to abort operation
19:32:41 <lucasagomes> so the same way you POST to that resource to trigger the deploy
19:32:47 <lucasagomes> you can DELETE to abort
19:32:55 <rloo> Ok. (I have to admit, i don't know what already exists.)
19:32:58 <devananda> as far as an API goes, I think that's reasonable
19:33:18 <devananda> i'm not sure how easily we can get the plubming to actually interrupt an in-progress deploy
19:33:38 <lucasagomes> yea, that will be another challenge :)
19:33:50 <lucasagomes> probably solved by the way the ramdisk will do things
19:33:53 <devananda> and we probably shouldn't implement DELETE /nodes/<uuid>/deploy until we can actually satisfy that request
19:33:57 <lucasagomes> like asking for the next steps
19:34:10 <devananda> perhaps
19:34:16 <devananda> my concern is more around the node locking
19:34:57 <lucasagomes> like aborting not release the node?
19:35:13 <devananda> whether DELETE // interrupt is async or sync, we'll still have the problem that the node resource is locked by the greenthread which is doing the deploy
19:35:42 <devananda> we can't just go update the DB record while that's going on and expect deploy() to behave reasonably
19:36:15 <lucasagomes> oh yea, aboarting will need efforts in a couple of areas
19:36:24 <lucasagomes> most important maybe is the ramdisk
19:36:30 <lucasagomes> how it will know it has to abort etc
19:36:39 <devananda> right. I think DELETE an in-progress deploy should wait until we can look more into those areas, and it's not needed for coming out of incubation
19:37:00 <lucasagomes> cool, so do not expose DELETE for now?
19:37:13 <lucasagomes> or expose it and raise an NotImplemented error?
19:37:28 <devananda> I would not expose it yet
19:37:41 <lucasagomes> right
19:37:47 <romcheg_> +1 for not exposing DELETE
19:37:55 <devananda> NotImplemented vs NotFound. I prefer the latter
19:38:00 <rloo> I was going to suggest exposing/raising error, and adding comment why.
19:38:45 <rloo> or have some place so someone knows why DELETE doesn't exist yet.
19:39:03 <lucasagomes> rloo, I think the main thing is that we don't need it for coming out of incubation so, we can add it after (and also docs about it)
19:39:09 <devananda> we can certainly add inline documentation in the API code about it
19:39:16 <devananda> and it may be worth adding a BP to track the intent for it
19:39:17 <lucasagomes> yea like a TODO there
19:39:29 <rloo> yeah, i understand. but it is hard to know, during all this progress, what is avail, not avail, and why.
19:39:31 <lucasagomes> devananda, if u want to give me one action to write a bp
19:39:49 <devananda> #action lucasagomes to file a BP for DELETE /nodes/<uuid>/deploy
19:40:03 <lucasagomes> rloo, do you think a TODO in the code explanation our intentions and why it's not implemented in the moment would be enough?
19:40:16 <rloo> yup. enough for me anyway :-) thx.
19:40:25 <lucasagomes> ok will do that :)
19:41:19 <devananda> ok, moving on
19:41:23 <devananda> romcheg_: still around?
19:41:48 <romcheg_> devananda: yup
19:42:20 <devananda> #topic integration and testing
19:42:22 <romcheg_> Actually I do not have a lot of updates.
19:42:52 <devananda> romcheg_: give us what you've got :)
19:43:16 <romcheg_> I rebased my patch to infra-config to clarkb's and waiting until that refactoring is finished
19:43:54 <clarkb> romcheg_: my change just got approved, it needs a little babysitting, but once we are happy with it, your change will be reviewable
19:43:55 <romcheg_> The tempest patch does not attract a lot of people unfortunatelly
19:44:20 <romcheg_> clarkb: Cool. Will take a look at that in the morning
19:44:28 <devananda> romcheg_: if i understand correctly, after your infra/config patch lands, we should have some devstack tests in ironic's pipeline, yes?
19:45:31 <romcheg_> devananda: as we discussed previously, we will add tempest tests for Ironic for gate and check pipelines to Ironic and to the experimental pipeline for tempest
19:45:48 <devananda> romcheg_: right
19:46:32 <devananda> romcheg_: i'm wondering if there are any other dependencies, besides https://review.openstack.org/#/c/53917, to get it working in the ironic pipeline
19:47:25 <romcheg_> devananda: No, only this configuration change and the tests
19:47:39 <devananda> great
19:48:09 <devananda> #topic open discussion
19:48:15 <lucasagomes> aight
19:48:24 <devananda> look! a whole 12 mintues for open discussion today :)
19:48:29 <lucasagomes> devananda, is it part of Ironic plan's to get metrics from other devices (e.g storage arrays) just like we will be getting metrics for servers (via IPMI) ?
19:48:36 <romcheg_> I continuously check the tests against the latest Ironic to detect any changes that broke Ironic
19:48:52 <devananda> lucasagomes: only devices which ironic is managing/deploying to
19:49:03 <lucasagomes> right
19:49:16 <romcheg_> Hopefully everything works now so as soon as those two patches landed, we will have tempest tests for Ironic
19:49:28 <devananda> lucasagomes: there was some interest in having ironic able to deploy firmware // small OS images to network switches (eg, open compute / ODCA stuff)
19:49:46 <devananda> lucasagomes: which I generally dont think we're ready for. but that kinda touches on the same space as your question
19:49:47 <lucasagomes> so we plan to do things on switches for e.g that would be one of the cases of devices we would be able to control using ironic?
19:49:49 <lucasagomes> right
19:50:18 <lucasagomes> that makes sense to me
19:50:45 <devananda> lucasagomes: the whole "configure the hardware" bit gets very wierd if ironic starts configuring switches and SAN
19:51:04 <devananda> lucasagomes: i really dont think it shoould do that. we have other services for that
19:51:45 <lucasagomes> right yea we should not start squashing a lot of things into ironic for sure
19:51:49 <lucasagomes> focused tools
19:52:09 <devananda> lucasagomes: right. OTOH, if someone wants to install a new OS on their switch, well, _that_ is Ironic's domain
19:52:28 <devananda> but it exposes some really wierd questions
19:53:03 <devananda> talking to Nova's API to have Ironic deploy an image from Glance onto their hardware switch, then using Neutron to configure that switch
19:53:15 <lucasagomes> yea, it's not something for icehouse for sure, but in the future we might start need to discuss things like it
19:53:17 <devananda> but we needed some networking in order to do the deploy in the first place ....
19:53:37 <devananda> definitely worth longer discussions
19:54:20 <lucasagomes> devananda, another question re consistent hashing... I saw ur implementing the hashring class, we r not going to use any lib that does it already? any reasons for that, lack of py3 support?
19:54:44 <lucasagomes> devananda, +1 for discussions
19:55:30 <devananda> lucasagomes: i found 2 py libs out there, one of which was unmaintained, and none within openstack yet
19:55:40 <devananda> i looked at swift's hash ring
19:55:55 <devananda> and talked with notmyname to get some ideas, but that code is too tightly coupled to swift's particular needs
19:56:17 <devananda> so this is coupled to our needs, and the ring code itself is pretty small
19:56:30 <lucasagomes> right
19:56:34 <devananda> the complexity is going to be in the routing and rebalancing code that I'm working on now
19:56:54 <lucasagomes> that has to do with the list of dead conductors?
19:56:57 <devananda> yes
19:57:05 <lucasagomes> right yea I understand
19:57:18 <lucasagomes> good stuff :)
19:57:21 <devananda> eg, a conductor misses a few heartbeats -- don't take it out and rebalance everything. just skip it and talk to the next replica
19:57:58 <devananda> i think we only need to do a full rebalance in two cases: new conductor joins the ring; admin removes a conductor from the ring
19:57:59 <lucasagomes> anything about the number of replicas?
19:58:08 <lucasagomes> I saw that they tend to use like loads of replicas to make it faster
19:58:31 <devananda> lucasagomes: in swift, sure. in ironic, more replicas won't make deploys faster or anything
19:58:35 <lucasagomes> devananda, ahh thats interesting yea, if someone joins
19:58:36 <devananda> just means more resilience to temporary failures
19:58:53 <lucasagomes> we would need to rebalance and set the nodes to be controller by specific conductors
19:59:22 <devananda> rebalance will redistribute the nodes across conductors (with the appropriate drivers)
19:59:39 <devananda> it's not a manual admin-has-to-move-nodes thing
20:00:08 <devananda> and the conductor<->node relationship isn't stored in the DB (apart from TaskManager locks)
20:00:38 <devananda> i think we'll need some good docs for the hash ring stuff
20:00:39 <lucasagomes> oh yea otherwise it would be more complicated even to do a take over
20:00:47 <devananda> so i'm going to work on diagrams today to explain them
20:00:53 <lucasagomes> cool
20:00:57 <lucasagomes> looking fwd to see some patches coming
20:01:18 <lucasagomes> even more complicated*
20:01:19 <devananda> i'll un-draft the patch i have once i've cleaned it up a bit
20:01:45 <lucasagomes> great :)
20:02:19 <devananda> anything else? we're a tad over time now
20:02:49 <devananda> ok, thanks all!
20:02:58 <devananda> #endmeeting