#openstack-meeting-cp log

17:00:08 <ildikov> #startmeeting cinder-nova-api-changes
17:00:09 <openstack> Meeting started Mon May 23 17:00:08 2016 UTC and is due to finish in 60 minutes.  The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:12 <openstack> The meeting name has been set to 'cinder_nova_api_changes'
17:00:17 <ildikov> scottda ildikov DuncanT ameade cFouts johnthetubaguy jaypipes takashin alaski e0ne jgriffith tbarron andrearosa hemna erlon mriedem gouthamr ebalduf patrickeast smcginnis diablo_rojo gsilvis
17:00:28 <scottda> Hi
17:00:35 <scottda> ildikov: Just pinged you in Cinder :)
17:00:44 <ildikov> scottda: :)
17:01:04 <geguileo> Hi
17:01:05 <patrickeast> hi
17:01:10 <ildikov> I started to answer there, but then I switch to this channel
17:01:53 <hemna> hey
17:02:02 <hemna> is this our meeting slot now?
17:02:06 <hemna> I can't keep it straight
17:02:08 <ildikov> hemna: yes :)
17:02:18 <hemna> seems to change every week
17:02:19 <ildikov> hemna: sorry for the many variations
17:02:35 <hemna> nah it's cool.
17:02:36 <hemna> :)
17:02:51 <ildikov> hemna: last week's slot was meant to be the one, but then it turned out we don't have a channel for that slot anymore...
17:03:18 <hemna> coolio
17:03:23 <ildikov> hemna: and next week Monday is a US holiday, so we need to find another slot or sync up separately...
17:03:25 <hemna> so do we have an agenda today?
17:03:43 <ildikov> I hope we have jgriffith around
17:03:59 <ildikov> to discuss his part as I haven't seen patches from him but I might missed it
17:04:29 <hemna> man that'd be great to see that patch(s)
17:04:30 <ildikov> hemna: I also checked check_attach a bit last week
17:04:57 <ildikov> and I found that in case of attaching a volume at boot time Nova does not call reserve_volume
17:05:17 <hemna> w t f
17:05:19 <hemna> seriously?
17:05:30 <patrickeast> lol
17:05:35 <hemna> ugh, lets follow that up
17:05:39 <hemna> that needs to get fixed asap
17:05:56 <ildikov> I raised an exception from the reserve call and attach at boot time went through and then detach and the attach after that failed there
17:06:24 <ildikov> I mean I had this dumb way of testing, but anyway, the point is the same
17:06:34 <hemna> yah
17:06:47 <ildikov> I think if we can have that fixed than check_attach can be removed from there too
17:06:57 <ildikov> s/than/then/
17:07:25 <ildikov> maybe where they call check_attach first we can call reserve volume there
17:07:39 <ildikov> let's follow that up after the meeting
17:07:54 <hemna> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L1288
17:08:07 <hemna> that's the one I thought should be changed to a reserve_volume call
17:08:18 <hemna> I haven't tried it yet though
17:08:26 <ildikov> yeah, I meant this one
17:08:38 <ildikov> I think this method is called only in that flow
17:08:48 <hemna> in the boot from volume flow ?
17:08:53 <ildikov> yeap
17:09:05 <ildikov> otherwise they just call attach and that creates the BDM
17:09:19 <hemna> ok, so do you think I should try the same thing I did here: https://review.openstack.org/#/c/315789/ ?
17:09:36 <ildikov> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3101
17:10:21 <ildikov> that and also call reserve_volume
17:10:25 <hemna> ildikov, so for that one, my patch changes it already
17:10:27 <ildikov> as otherwise no one will
17:10:29 <hemna> https://review.openstack.org/#/c/315789/2/nova/compute/api.py
17:11:14 <ildikov> right, I meant that for other attach cases this method is called here, but for boot from volume they use the _validate_bdm, etc
17:11:36 <hemna> *smh*
17:12:30 <hemna> ok, so at some point in the boot from volume workflow there has to be a call to initialize_connection and attach
17:12:38 <ildikov> regarding the reserve call, I would guess we need to check that what if anything goes wrong, I did not analyse the full call sequence...
17:12:40 <hemna> I'll see if I can track those down
17:13:06 <ildikov> I didn't look for initialize_connection, next time I will be smarter
17:13:12 <hemna> I can't comprehend why nova has copy/paste code for doing the same operation.
17:13:16 <hemna> mind boggling
17:13:18 <ildikov> do you have any other case I should check?
17:13:46 <ildikov> agreed, good that we're doing some cleanup
17:14:39 <ildikov> so normal attach has reserve, live migration is a special case anyway
17:15:17 <hemna> http://paste.openstack.org/show/498223/
17:15:23 <hemna> that's the entire list of check_attach references
17:15:26 <hemna> fwiw
17:15:46 <hemna> lots of tests referencing it
17:16:44 <hemna> anyway, I'll try testing my patch against he boot from volme and modify the check_attach case there to call check_availability_zone and reserve_volume instead
17:16:48 <ildikov> there is a fake check_attach code as well, which is the copy of the normal basically
17:16:54 <hemna> and see how that goes
17:16:56 <ildikov> that code is a nightmare
17:17:14 <johnthetubaguy> +1 to the above comments, that seems crazy
17:17:19 <hemna> check_attach should go away IMHO
17:17:28 <johnthetubaguy> where is the cut and paste again?
17:17:34 <ildikov> I meant line 33 for my previous comment
17:17:34 * johnthetubaguy reads more of scrollback
17:17:43 <hemna> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L1287-L1291
17:18:02 <hemna> https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3097-L3099
17:18:08 <hemna> those should be the same thing IMHO
17:18:14 <hemna> but should be changed to....
17:18:14 <ildikov> johnthetubaguy: the base of this discussion is that the BFV case with existing Cinder volume does not call reserve_volume
17:18:22 <hemna> https://review.openstack.org/#/c/315789/2/nova/compute/api.py
17:18:23 <hemna> that
17:18:24 <johnthetubaguy> ah...
17:20:05 <ildikov> so we thought to add that missing call and also remove check_attach from there as well
17:20:31 <johnthetubaguy> a new method in volume_api to do those calls?
17:20:40 <hemna> possibly yah
17:20:49 <hemna> I had thought of refactoring those 2 into a single place
17:20:53 <hemna> if all goes well :)
17:21:13 <johnthetubaguy> yeah, that makes sense
17:21:34 <johnthetubaguy> seems like reserve and check should always both we done?
17:21:49 <hemna> possibly
17:21:55 <hemna> live migration is a special case though
17:21:57 <hemna> :(
17:22:02 <johnthetubaguy> oh, true
17:22:15 <johnthetubaguy> its always a special case
17:22:20 <johnthetubaguy> thats why its so dammed broken
17:22:25 <hemna> yup
17:22:26 <hemna> pain
17:22:33 <hemna> ok so baby steps
17:22:34 <ildikov> but in general the less state check in Nova is the better
17:22:54 <hemna> ildikov, +1
17:23:04 <ildikov> and use reserve what it's for
17:23:07 <johnthetubaguy> yeah, +1
17:23:16 <ildikov> do we have any other special case like live migration?
17:23:27 <hemna> I'm shocked that BFV doesn't use reserve
17:23:35 <ildikov> or it's the only, but big monster?
17:23:37 <hemna> I'd like to confirm that
17:23:46 <ildikov> I haven't checked how it is done when Nova creates the volume
17:23:58 <johnthetubaguy> so I wonder if we should get you lovely cinder folks to write a very quick doc on how we *should* use your API? we could put that in our devref thingy?
17:23:59 <scottda> ildikov: I think we need to look at shelve_offload, as cinder state if fudged there as well.
17:24:12 <hemna> heh
17:24:12 <johnthetubaguy> ildikov: so migrate, and shelved, evacuate, resize
17:24:16 <ildikov> just add an exception to the client call and it will get through or I'm a magician...
17:24:16 <scottda> johnthetubaguy: That's a good idea.
17:24:23 <hemna> I think jgriffith created a nice blog about it and was working on a devref for it as well
17:24:31 <johnthetubaguy> that would be prefect
17:25:07 <ildikov> johnthetubaguy: migrate is live migrate or Cinder migrate there?
17:25:12 <johnthetubaguy> so I think its when the server moves, we have issues
17:25:20 <johnthetubaguy> let me find a doc on those, one second...
17:25:36 <hemna> johnthetubaguy, http://docs.openstack.org/developer/cinder/devref/attach_detach_conventions.html
17:25:37 <hemna> :)
17:25:51 <johnthetubaguy> http://developer.openstack.org/api-guide/compute/server_concepts.html
17:26:22 <johnthetubaguy> hemna: ah, coolness, we might want to add something in the Nova docs, that just link to your doc :) but that might be overkill!
17:26:22 <ildikov> hemna: nice, we could add a ref to your diagrams there as well
17:26:34 <hemna> johnthetubaguy, I think that'd be nice.
17:26:43 <johnthetubaguy> so the api docs cover server moves, we have very silly names
17:26:49 <johnthetubaguy> live-migrate is what you know
17:26:54 <johnthetubaguy> migrate is cold migrate
17:27:01 <hemna> server move = live migration ?
17:27:02 <johnthetubaguy> i.e. turn it off, move it, turn it on
17:27:20 <johnthetubaguy> yeah, live-migration is move the server, but with only a slight pause and no reboot
17:27:22 <ildikov> johnthetubaguy: yeah, right, I forgot you call it migrate as well, my bad :)
17:27:36 <johnthetubaguy> yeah, live migrate and cold migrate is what we often call it now
17:27:56 <johnthetubaguy> now we have dead parrot migrate (I totally just made that up)
17:28:03 <johnthetubaguy> its called evacuate
17:28:08 <ildikov> johnthetubaguy: how much are these things tested?
17:28:13 <johnthetubaguy> when you host is dead, you need to move your server
17:28:18 <johnthetubaguy> ildikov: honestly, its a WIP
17:28:47 <johnthetubaguy> ildikov: migrate is OK, live-migrate is only tested in the experimental queue, we can't make it pass enough yet
17:28:48 <hemna> is cold migrate = shelve/unshelve ?
17:28:53 <johnthetubaguy> nope
17:28:54 <hemna> ok
17:28:58 <johnthetubaguy> lets do evacuate
17:29:03 <johnthetubaguy> your host is head
17:29:06 <johnthetubaguy> you can't get to it
17:29:11 <johnthetubaguy> its in a skip
17:29:15 <johnthetubaguy> but you want your VM back
17:29:19 <johnthetubaguy> so you use evacuate
17:29:38 <johnthetubaguy> (resurrect is another name we came up with, since we created that API call)
17:29:43 <johnthetubaguy> does that make any sense?
17:29:50 <hemna> yah
17:29:53 <ildikov> I know the operation
17:29:53 <hemna> thanks :)
17:29:56 <johnthetubaguy> sweet
17:29:59 <ildikov> but what happens with the volume there?
17:30:10 <johnthetubaguy> it gets re-attached on the destination host
17:30:21 <johnthetubaguy> we don't have access to the old host to do anything there
17:30:32 <johnthetubaguy> we technically share that code with rebuild
17:30:39 <ildikov> but then from volume perspective it's like a migrate?
17:30:44 <scottda> so volume does not need to be "reserved", i.e. set to attaching, it should already be attached and in-use
17:30:46 <johnthetubaguy> I am not sure if we call terminate correctly
17:30:50 <johnthetubaguy> ildikov: kinda, yeah
17:31:02 <johnthetubaguy> scottda: yeah, thats the key bit I guess
17:31:14 <johnthetubaguy> the detach is harder, as we can't call os-brick on the old host, its dead
17:31:27 <hemna> so nova will call initialize_connection  on a new host then, when the volume is already 'in-use'
17:31:32 <scottda> and we don't want Nova to check the volume state, because it won't be correct for non-multi-attach case
17:31:37 <johnthetubaguy> hemna: I think so
17:31:40 <hemna> ok
17:31:50 <hemna> yet another case where the connector information should be updated
17:31:52 <ildikov> yeah, like migrate
17:31:57 <hemna> ildikov, yup
17:32:02 <johnthetubaguy> yeah, its very similar
17:32:09 <scottda> johnthetubaguy: Yes, detach won't work for Nova, we'll want to call cinder force-detach (which needs fixing after jgriffith patches)
17:32:32 <johnthetubaguy> scottda: yeah, that sounds correct for evacuate (although wrong for rebuild/migrate)
17:32:56 <johnthetubaguy> which might be tricky, and rebuild and evacuate are mostly common code, but lets ignore that for now
17:33:02 <ildikov> I just wanted to ask whether we could call force-detach :)
17:33:16 <johnthetubaguy> hehe, I think we should be able to fix that
17:33:21 <scottda> ildikov: Force detach won't work properly until Cinder saves the connector info...
17:33:29 <hemna> scottda, +!
17:33:30 <scottda> Which is will soon with jgriffith patches.
17:33:31 <hemna> +1
17:33:31 <johnthetubaguy> scottda: +1
17:33:40 <johnthetubaguy> cool beans, we should do shelved?
17:33:42 <ildikov> scottda: right, I got it now :)
17:34:00 <johnthetubaguy> so that migrate, with a really long pause, and possible a delete before the resume
17:34:02 <scottda> johnthetubaguy: Yes, please proceed with shelved :)
17:34:13 <johnthetubaguy> the idea is this:
17:34:18 <ildikov> johnthetubaguy: please educate us :)
17:34:21 <johnthetubaguy> user wants to keep all their resources
17:34:36 <johnthetubaguy> but they don't want to pay for compute resources when the instance isn't used
17:34:43 <johnthetubaguy> (i.e. over night)
17:34:58 <johnthetubaguy> and what a really quick resume, where possible, when they do want to use it
17:35:09 <johnthetubaguy> its a bit messy
17:35:27 <johnthetubaguy> but the instance is basically not really on a host, once it is selved (offloaded)
17:35:45 <johnthetubaguy> the offloaded bit is about snapshotting any local disks, so you can resume on a different host)
17:35:50 <hemna> but the volume is still 'in-use'
17:35:56 <johnthetubaguy> yeah, volume stays in-use
17:36:06 <johnthetubaguy> it shouldn't be attached to any other place
17:36:16 <scottda> so similar to evacuate, we need to skip state checking
17:36:37 <johnthetubaguy> ... now there is a spec to detach disks from a shelved instance, which we currently don't support, but thats not approved yet, AFAIK
17:36:42 <johnthetubaguy> scottda: +1
17:37:14 <scottda> johnthetubaguy: I believe that under the covers, os-brick is called to actually disconnect the disk when we shelve, does that sound right?
17:37:29 <scottda> and a new connection occurs during un-shelve.
17:37:31 <johnthetubaguy> yes, that does sound correct
17:37:43 <johnthetubaguy> yeah, its assumed to be on a new host, and you want to clean up the old host
17:38:03 <johnthetubaguy> not sure how well shelve is tested yet, I don't remember if the tempest stuff landed or not
17:38:11 <johnthetubaguy> but thats the general intent
17:38:16 <scottda> So, there's no messy force-detach issues, or dangling connection. Just a basic disconncect/re-connect in brick and dealing with weird state management.
17:38:26 <johnthetubaguy> thats my understanding of it
17:38:34 <johnthetubaguy> evacuate is the only one with the force, I think...
17:39:31 <johnthetubaguy> I think thats the main move operations all covered
17:39:52 <johnthetubaguy> oh, right, live-migrate is slightly different to the other migrates
17:40:04 <smcginnis> johnthetubaguy: So for shelve, were you saying we need to detach the volume but still keep it in-use?
17:40:12 <johnthetubaguy> smcginnis: yes
17:40:39 <johnthetubaguy> its possible we just don't call terminate_connection thinking about it
17:40:44 <ildikov> I guess it's only disconnected from the host
17:40:50 <johnthetubaguy> ildikov: +1
17:40:56 <johnthetubaguy> ildikov: at least, thats the intent
17:41:05 <johnthetubaguy> its logically still attached to the instance
17:41:23 <ildikov> johnthetubaguy: otherwise the state would change and you don't want to have the volume available
17:41:31 <johnthetubaguy> +1
17:42:02 <scottda> I need to move to IRC_on_phone mode, sorry folks, mostly Read-only from here on out....
17:42:20 <ildikov> scottda: any updtes on the migrate test?
17:42:35 <ildikov> scottda: on the Cinder migrate ones I meant
17:43:58 <johnthetubaguy> are there any other questions like the above ones I can help with?
17:44:46 <scottda> ildikov: sorry, no update.
17:44:54 <ildikov> johnthetubaguy: I wonder where we will update the host info for the shelve case
17:45:04 <ildikov> scottda: no worries, I just wanted to check
17:45:31 <johnthetubaguy> ildikov: so there is no host, until you do unshelve, and we call initialize_connection again
17:45:32 <hemna> ildikov, any time initialize_connection is called the connector passed in has the host and it should be updated
17:46:01 <johnthetubaguy> I guess for local volumes, that messes things up, but otherwise, that should be OK?
17:46:02 <hemna> I'm hoping jgriffith's change has this in place as part of his patch(es)
17:46:05 <ildikov> johnthetubaguy: ah, ok, I wasn't sure we have the initialize_connection call
17:46:14 <johnthetubaguy> ildikov: I think we should have
17:46:56 <ildikov> ok, cool, I'll check if I get there
17:47:17 <ildikov> johnthetubaguy: can you also take a look at the multiattach spec?
17:47:26 <ildikov> next week is the deadline if I'm not mistaken
17:48:03 <johnthetubaguy> ildikov: for unshelve, its doing this: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4205
17:48:16 <johnthetubaguy> ildikov: yeah, sorry, I must look at that again, ASAP
17:48:38 <hemna> I think we are stuck waiting for jgriffith's patches no ?
17:48:42 <johnthetubaguy> I have a quite a few specs in that list, I am afraid, but hope to get through more of those tomorrow
17:49:38 <johnthetubaguy> we should be able to merge the spec before the patches, or is there something undecided we are depending on?
17:50:29 <ildikov> I think we should be able to conclude on the spec
17:51:00 <johnthetubaguy> cool
17:51:37 <ildikov> at least in my opinion, but it would be nice to get a heads up if there's anything missing
17:51:52 <johnthetubaguy> totally, I will try hit that soon
17:52:04 <johnthetubaguy> ildikov: you totally have permission to keep bugging me every morning till I review it
17:52:14 <ildikov> I tried to handle things like jgriffith's patches as an identified dependency, but not part of the multiattach spec technically
17:52:34 <johnthetubaguy> ildikov: that sounds correct
17:52:46 <johnthetubaguy> we could call the other stuff a bug fix
17:52:47 <johnthetubaguy> ish
17:52:48 <ildikov> johnthetubaguy: thanks :) I might use it, but hopefully I will not need :)
17:53:02 <ildikov> johnthetubaguy: +1
17:53:23 <ildikov> many of the discussions today sound like a cleanup and fixing issues we have hidden
17:53:51 <johnthetubaguy> yeah
17:53:52 <smcginnis> ildikov: +1
17:54:04 <ildikov> ok, we have 7 minutes left from the today's slot
17:54:05 <johnthetubaguy> honestly, most of this comes from neither side understand what the other is wanting to do
17:54:16 <johnthetubaguy> so great we are cleaning this up now :)
17:54:25 <ildikov> is there anything we should discuss?
17:54:37 <johnthetubaguy> I was looking at the devref stuff in cinder, I think we might want a more API focused doc
17:54:37 <ildikov> johnthetubaguy: my thinking as well :)
17:54:48 <johnthetubaguy> that includes os-brick in the discussion, maybe
17:54:49 <smcginnis> johnthetubaguy: I do like the suggestion of a devref in Nova for some of this. It's duplication, but hopefully it saves this from happening again once it's cleaned up.
17:55:01 <johnthetubaguy> actually, that would work
17:55:09 <johnthetubaguy> do the nova focused one in Nova
17:55:16 <johnthetubaguy> i.e. here are the patterns
17:55:41 <ildikov> smcginnis: johnthetubaguy: I think we can also add hemna's diagrams
17:55:44 <johnthetubaguy> if we do create server, delete server, live-migrate server, evacuate and shelve, we should be there
17:55:56 <johnthetubaguy> yeah, diagrams are good to make this clearer
17:56:15 <ildikov> not as an API ref, but if someone would like to figure it out what's in the background it's pretty useful
17:56:22 <johnthetubaguy> yeah
17:57:16 <scottda> I'll put it on my list to put up a patch in Nova devref
17:57:18 <johnthetubaguy> it goes the core reviews something to go double check when we can't remember the details next time
17:57:31 <johnthetubaguy> s/ws/wers/
17:57:40 <johnthetubaguy> scottda: awesome, thank you
17:58:01 <johnthetubaguy> nothing more from me for this week :)
17:58:35 <ildikov> I think we have home work with these server moving actions and fix the reserve_volume in BFV
17:58:48 <ildikov> I will try to hunt down jgriffith as soon as I can
17:58:58 <johnthetubaguy> yeah, feel free to ping me in channel if later in the week, if thats useful
17:59:24 <ildikov> johnthetubaguy: cool, thanks
17:59:41 <ildikov> anything else from anyone before we close?
18:00:07 <ildikov> then, thanks for the good chat!
18:00:12 <johnthetubaguy> +1
18:00:15 <scottda> Thanks everyone
18:00:17 <ildikov> I learnt a lot today :)
18:00:18 <smcginnis> ildikov: Thanks again for arranging this.
18:00:24 <ildikov> oh one thing
18:00:44 <ildikov> next Monday is US holiday, so I will ask around when we could have a quick sync
18:00:59 <scottda> I'm out until Wed next week
18:01:12 <hemna> I think next week is going to be a bit of a loss for some
18:01:20 <hemna> I might be taking time off next week too.
18:01:22 <johnthetubaguy> actually, its a UK holiday too
18:01:27 <ildikov> thanks for the info, Matt is out too at the beginning
18:01:31 <johnthetubaguy> I forget about those
18:01:52 <johnthetubaguy> by which I mean, I forget they are coming up
18:02:05 <ildikov> ok, then let's aim for having a short chat mid next week
18:02:13 <scottda> cool
18:02:13 <johnthetubaguy> that works
18:02:26 <ildikov> johnthetubaguy: we need to get the spec landed, otherwise we should be good
18:02:32 <johnthetubaguy> yeah, +1
18:02:39 <smcginnis> Maybe sync up on Thurday?
18:02:54 <johnthetubaguy> thats a good plan
18:02:57 <ildikov> smcginnis: yeap, sounds good
18:03:00 <scottda> yup
18:03:23 <ildikov> I hope 1700UTC can work
18:03:39 <johnthetubaguy> if its quick, that should be OK
18:03:46 <ildikov> I will send out a heads up mail later this week
18:03:54 <scottda> thanks ildikov
18:03:54 <smcginnis> Works for me. We can hijack the cindre channel again if we need to.
18:04:01 <ildikov> johnthetubaguy: due to the holidays I would assume we can keep it short
18:04:02 <johnthetubaguy> +1
18:04:09 <johnthetubaguy> :)
18:04:24 <ildikov> smcginnis: cool, thanks, I'll check the channels
18:04:51 <ildikov> ok, I don't want steal more of your time today :)
18:05:00 <ildikov> thanks guys!
18:05:17 <ildikov> #endmeeting