17:00:08 #startmeeting cinder-nova-api-changes 17:00:09 Meeting started Mon May 23 17:00:08 2016 UTC and is due to finish in 60 minutes. The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:00:10 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:00:12 The meeting name has been set to 'cinder_nova_api_changes' 17:00:17 scottda ildikov DuncanT ameade cFouts johnthetubaguy jaypipes takashin alaski e0ne jgriffith tbarron andrearosa hemna erlon mriedem gouthamr ebalduf patrickeast smcginnis diablo_rojo gsilvis 17:00:28 Hi 17:00:35 ildikov: Just pinged you in Cinder :) 17:00:44 scottda: :) 17:01:04 Hi 17:01:05 hi 17:01:10 I started to answer there, but then I switch to this channel 17:01:53 hey 17:02:02 is this our meeting slot now? 17:02:06 I can't keep it straight 17:02:08 hemna: yes :) 17:02:18 seems to change every week 17:02:19 hemna: sorry for the many variations 17:02:35 nah it's cool. 17:02:36 :) 17:02:51 hemna: last week's slot was meant to be the one, but then it turned out we don't have a channel for that slot anymore... 17:03:18 coolio 17:03:23 hemna: and next week Monday is a US holiday, so we need to find another slot or sync up separately... 17:03:25 so do we have an agenda today? 17:03:43 I hope we have jgriffith around 17:03:59 to discuss his part as I haven't seen patches from him but I might missed it 17:04:29 man that'd be great to see that patch(s) 17:04:30 hemna: I also checked check_attach a bit last week 17:04:57 and I found that in case of attaching a volume at boot time Nova does not call reserve_volume 17:05:17 w t f 17:05:19 seriously? 17:05:30 lol 17:05:35 ugh, lets follow that up 17:05:39 that needs to get fixed asap 17:05:56 I raised an exception from the reserve call and attach at boot time went through and then detach and the attach after that failed there 17:06:24 I mean I had this dumb way of testing, but anyway, the point is the same 17:06:34 yah 17:06:47 I think if we can have that fixed than check_attach can be removed from there too 17:06:57 s/than/then/ 17:07:25 maybe where they call check_attach first we can call reserve volume there 17:07:39 let's follow that up after the meeting 17:07:54 https://github.com/openstack/nova/blob/master/nova/compute/api.py#L1288 17:08:07 that's the one I thought should be changed to a reserve_volume call 17:08:18 I haven't tried it yet though 17:08:26 yeah, I meant this one 17:08:38 I think this method is called only in that flow 17:08:48 in the boot from volume flow ? 17:08:53 yeap 17:09:05 otherwise they just call attach and that creates the BDM 17:09:19 ok, so do you think I should try the same thing I did here: https://review.openstack.org/#/c/315789/ ? 17:09:36 https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3101 17:10:21 that and also call reserve_volume 17:10:25 ildikov, so for that one, my patch changes it already 17:10:27 as otherwise no one will 17:10:29 https://review.openstack.org/#/c/315789/2/nova/compute/api.py 17:11:14 right, I meant that for other attach cases this method is called here, but for boot from volume they use the _validate_bdm, etc 17:11:36 *smh* 17:12:30 ok, so at some point in the boot from volume workflow there has to be a call to initialize_connection and attach 17:12:38 regarding the reserve call, I would guess we need to check that what if anything goes wrong, I did not analyse the full call sequence... 17:12:40 I'll see if I can track those down 17:13:06 I didn't look for initialize_connection, next time I will be smarter 17:13:12 I can't comprehend why nova has copy/paste code for doing the same operation. 17:13:16 mind boggling 17:13:18 do you have any other case I should check? 17:13:46 agreed, good that we're doing some cleanup 17:14:39 so normal attach has reserve, live migration is a special case anyway 17:15:17 http://paste.openstack.org/show/498223/ 17:15:23 that's the entire list of check_attach references 17:15:26 fwiw 17:15:46 lots of tests referencing it 17:16:44 anyway, I'll try testing my patch against he boot from volme and modify the check_attach case there to call check_availability_zone and reserve_volume instead 17:16:48 there is a fake check_attach code as well, which is the copy of the normal basically 17:16:54 and see how that goes 17:16:56 that code is a nightmare 17:17:14 +1 to the above comments, that seems crazy 17:17:19 check_attach should go away IMHO 17:17:28 where is the cut and paste again? 17:17:34 I meant line 33 for my previous comment 17:17:34 * johnthetubaguy reads more of scrollback 17:17:43 https://github.com/openstack/nova/blob/master/nova/compute/api.py#L1287-L1291 17:18:02 https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3097-L3099 17:18:08 those should be the same thing IMHO 17:18:14 but should be changed to.... 17:18:14 johnthetubaguy: the base of this discussion is that the BFV case with existing Cinder volume does not call reserve_volume 17:18:22 https://review.openstack.org/#/c/315789/2/nova/compute/api.py 17:18:23 that 17:18:24 ah... 17:20:05 so we thought to add that missing call and also remove check_attach from there as well 17:20:31 a new method in volume_api to do those calls? 17:20:40 possibly yah 17:20:49 I had thought of refactoring those 2 into a single place 17:20:53 if all goes well :) 17:21:13 yeah, that makes sense 17:21:34 seems like reserve and check should always both we done? 17:21:49 possibly 17:21:55 live migration is a special case though 17:21:57 :( 17:22:02 oh, true 17:22:15 its always a special case 17:22:20 thats why its so dammed broken 17:22:25 yup 17:22:26 pain 17:22:33 ok so baby steps 17:22:34 but in general the less state check in Nova is the better 17:22:54 ildikov, +1 17:23:04 and use reserve what it's for 17:23:07 yeah, +1 17:23:16 do we have any other special case like live migration? 17:23:27 I'm shocked that BFV doesn't use reserve 17:23:35 or it's the only, but big monster? 17:23:37 I'd like to confirm that 17:23:46 I haven't checked how it is done when Nova creates the volume 17:23:58 so I wonder if we should get you lovely cinder folks to write a very quick doc on how we *should* use your API? we could put that in our devref thingy? 17:23:59 ildikov: I think we need to look at shelve_offload, as cinder state if fudged there as well. 17:24:12 heh 17:24:12 ildikov: so migrate, and shelved, evacuate, resize 17:24:16 just add an exception to the client call and it will get through or I'm a magician... 17:24:16 johnthetubaguy: That's a good idea. 17:24:23 I think jgriffith created a nice blog about it and was working on a devref for it as well 17:24:31 that would be prefect 17:25:07 johnthetubaguy: migrate is live migrate or Cinder migrate there? 17:25:12 so I think its when the server moves, we have issues 17:25:20 let me find a doc on those, one second... 17:25:36 johnthetubaguy, http://docs.openstack.org/developer/cinder/devref/attach_detach_conventions.html 17:25:37 :) 17:25:51 http://developer.openstack.org/api-guide/compute/server_concepts.html 17:26:22 hemna: ah, coolness, we might want to add something in the Nova docs, that just link to your doc :) but that might be overkill! 17:26:22 hemna: nice, we could add a ref to your diagrams there as well 17:26:34 johnthetubaguy, I think that'd be nice. 17:26:43 so the api docs cover server moves, we have very silly names 17:26:49 live-migrate is what you know 17:26:54 migrate is cold migrate 17:27:01 server move = live migration ? 17:27:02 i.e. turn it off, move it, turn it on 17:27:20 yeah, live-migration is move the server, but with only a slight pause and no reboot 17:27:22 johnthetubaguy: yeah, right, I forgot you call it migrate as well, my bad :) 17:27:36 yeah, live migrate and cold migrate is what we often call it now 17:27:56 now we have dead parrot migrate (I totally just made that up) 17:28:03 its called evacuate 17:28:08 johnthetubaguy: how much are these things tested? 17:28:13 when you host is dead, you need to move your server 17:28:18 ildikov: honestly, its a WIP 17:28:47 ildikov: migrate is OK, live-migrate is only tested in the experimental queue, we can't make it pass enough yet 17:28:48 is cold migrate = shelve/unshelve ? 17:28:53 nope 17:28:54 ok 17:28:58 lets do evacuate 17:29:03 your host is head 17:29:06 you can't get to it 17:29:11 its in a skip 17:29:15 but you want your VM back 17:29:19 so you use evacuate 17:29:38 (resurrect is another name we came up with, since we created that API call) 17:29:43 does that make any sense? 17:29:50 yah 17:29:53 I know the operation 17:29:53 thanks :) 17:29:56 sweet 17:29:59 but what happens with the volume there? 17:30:10 it gets re-attached on the destination host 17:30:21 we don't have access to the old host to do anything there 17:30:32 we technically share that code with rebuild 17:30:39 but then from volume perspective it's like a migrate? 17:30:44 so volume does not need to be "reserved", i.e. set to attaching, it should already be attached and in-use 17:30:46 I am not sure if we call terminate correctly 17:30:50 ildikov: kinda, yeah 17:31:02 scottda: yeah, thats the key bit I guess 17:31:14 the detach is harder, as we can't call os-brick on the old host, its dead 17:31:27 so nova will call initialize_connection on a new host then, when the volume is already 'in-use' 17:31:32 and we don't want Nova to check the volume state, because it won't be correct for non-multi-attach case 17:31:37 hemna: I think so 17:31:40 ok 17:31:50 yet another case where the connector information should be updated 17:31:52 yeah, like migrate 17:31:57 ildikov, yup 17:32:02 yeah, its very similar 17:32:09 johnthetubaguy: Yes, detach won't work for Nova, we'll want to call cinder force-detach (which needs fixing after jgriffith patches) 17:32:32 scottda: yeah, that sounds correct for evacuate (although wrong for rebuild/migrate) 17:32:56 which might be tricky, and rebuild and evacuate are mostly common code, but lets ignore that for now 17:33:02 I just wanted to ask whether we could call force-detach :) 17:33:16 hehe, I think we should be able to fix that 17:33:21 ildikov: Force detach won't work properly until Cinder saves the connector info... 17:33:29 scottda, +! 17:33:30 Which is will soon with jgriffith patches. 17:33:31 +1 17:33:31 scottda: +1 17:33:40 cool beans, we should do shelved? 17:33:42 scottda: right, I got it now :) 17:34:00 so that migrate, with a really long pause, and possible a delete before the resume 17:34:02 johnthetubaguy: Yes, please proceed with shelved :) 17:34:13 the idea is this: 17:34:18 johnthetubaguy: please educate us :) 17:34:21 user wants to keep all their resources 17:34:36 but they don't want to pay for compute resources when the instance isn't used 17:34:43 (i.e. over night) 17:34:58 and what a really quick resume, where possible, when they do want to use it 17:35:09 its a bit messy 17:35:27 but the instance is basically not really on a host, once it is selved (offloaded) 17:35:45 the offloaded bit is about snapshotting any local disks, so you can resume on a different host) 17:35:50 but the volume is still 'in-use' 17:35:56 yeah, volume stays in-use 17:36:06 it shouldn't be attached to any other place 17:36:16 so similar to evacuate, we need to skip state checking 17:36:37 ... now there is a spec to detach disks from a shelved instance, which we currently don't support, but thats not approved yet, AFAIK 17:36:42 scottda: +1 17:37:14 johnthetubaguy: I believe that under the covers, os-brick is called to actually disconnect the disk when we shelve, does that sound right? 17:37:29 and a new connection occurs during un-shelve. 17:37:31 yes, that does sound correct 17:37:43 yeah, its assumed to be on a new host, and you want to clean up the old host 17:38:03 not sure how well shelve is tested yet, I don't remember if the tempest stuff landed or not 17:38:11 but thats the general intent 17:38:16 So, there's no messy force-detach issues, or dangling connection. Just a basic disconncect/re-connect in brick and dealing with weird state management. 17:38:26 thats my understanding of it 17:38:34 evacuate is the only one with the force, I think... 17:39:31 I think thats the main move operations all covered 17:39:52 oh, right, live-migrate is slightly different to the other migrates 17:40:04 johnthetubaguy: So for shelve, were you saying we need to detach the volume but still keep it in-use? 17:40:12 smcginnis: yes 17:40:39 its possible we just don't call terminate_connection thinking about it 17:40:44 I guess it's only disconnected from the host 17:40:50 ildikov: +1 17:40:56 ildikov: at least, thats the intent 17:41:05 its logically still attached to the instance 17:41:23 johnthetubaguy: otherwise the state would change and you don't want to have the volume available 17:41:31 +1 17:42:02 I need to move to IRC_on_phone mode, sorry folks, mostly Read-only from here on out.... 17:42:20 scottda: any updtes on the migrate test? 17:42:35 scottda: on the Cinder migrate ones I meant 17:43:58 are there any other questions like the above ones I can help with? 17:44:46 ildikov: sorry, no update. 17:44:54 johnthetubaguy: I wonder where we will update the host info for the shelve case 17:45:04 scottda: no worries, I just wanted to check 17:45:31 ildikov: so there is no host, until you do unshelve, and we call initialize_connection again 17:45:32 ildikov, any time initialize_connection is called the connector passed in has the host and it should be updated 17:46:01 I guess for local volumes, that messes things up, but otherwise, that should be OK? 17:46:02 I'm hoping jgriffith's change has this in place as part of his patch(es) 17:46:05 johnthetubaguy: ah, ok, I wasn't sure we have the initialize_connection call 17:46:14 ildikov: I think we should have 17:46:56 ok, cool, I'll check if I get there 17:47:17 johnthetubaguy: can you also take a look at the multiattach spec? 17:47:26 next week is the deadline if I'm not mistaken 17:48:03 ildikov: for unshelve, its doing this: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L4205 17:48:16 ildikov: yeah, sorry, I must look at that again, ASAP 17:48:38 I think we are stuck waiting for jgriffith's patches no ? 17:48:42 I have a quite a few specs in that list, I am afraid, but hope to get through more of those tomorrow 17:49:38 we should be able to merge the spec before the patches, or is there something undecided we are depending on? 17:50:29 I think we should be able to conclude on the spec 17:51:00 cool 17:51:37 at least in my opinion, but it would be nice to get a heads up if there's anything missing 17:51:52 totally, I will try hit that soon 17:52:04 ildikov: you totally have permission to keep bugging me every morning till I review it 17:52:14 I tried to handle things like jgriffith's patches as an identified dependency, but not part of the multiattach spec technically 17:52:34 ildikov: that sounds correct 17:52:46 we could call the other stuff a bug fix 17:52:47 ish 17:52:48 johnthetubaguy: thanks :) I might use it, but hopefully I will not need :) 17:53:02 johnthetubaguy: +1 17:53:23 many of the discussions today sound like a cleanup and fixing issues we have hidden 17:53:51 yeah 17:53:52 ildikov: +1 17:54:04 ok, we have 7 minutes left from the today's slot 17:54:05 honestly, most of this comes from neither side understand what the other is wanting to do 17:54:16 so great we are cleaning this up now :) 17:54:25 is there anything we should discuss? 17:54:37 I was looking at the devref stuff in cinder, I think we might want a more API focused doc 17:54:37 johnthetubaguy: my thinking as well :) 17:54:48 that includes os-brick in the discussion, maybe 17:54:49 johnthetubaguy: I do like the suggestion of a devref in Nova for some of this. It's duplication, but hopefully it saves this from happening again once it's cleaned up. 17:55:01 actually, that would work 17:55:09 do the nova focused one in Nova 17:55:16 i.e. here are the patterns 17:55:41 smcginnis: johnthetubaguy: I think we can also add hemna's diagrams 17:55:44 if we do create server, delete server, live-migrate server, evacuate and shelve, we should be there 17:55:56 yeah, diagrams are good to make this clearer 17:56:15 not as an API ref, but if someone would like to figure it out what's in the background it's pretty useful 17:56:22 yeah 17:57:16 I'll put it on my list to put up a patch in Nova devref 17:57:18 it goes the core reviews something to go double check when we can't remember the details next time 17:57:31 s/ws/wers/ 17:57:40 scottda: awesome, thank you 17:58:01 nothing more from me for this week :) 17:58:35 I think we have home work with these server moving actions and fix the reserve_volume in BFV 17:58:48 I will try to hunt down jgriffith as soon as I can 17:58:58 yeah, feel free to ping me in channel if later in the week, if thats useful 17:59:24 johnthetubaguy: cool, thanks 17:59:41 anything else from anyone before we close? 18:00:07 then, thanks for the good chat! 18:00:12 +1 18:00:15 Thanks everyone 18:00:17 I learnt a lot today :) 18:00:18 ildikov: Thanks again for arranging this. 18:00:24 oh one thing 18:00:44 next Monday is US holiday, so I will ask around when we could have a quick sync 18:00:59 I'm out until Wed next week 18:01:12 I think next week is going to be a bit of a loss for some 18:01:20 I might be taking time off next week too. 18:01:22 actually, its a UK holiday too 18:01:27 thanks for the info, Matt is out too at the beginning 18:01:31 I forget about those 18:01:52 by which I mean, I forget they are coming up 18:02:05 ok, then let's aim for having a short chat mid next week 18:02:13 cool 18:02:13 that works 18:02:26 johnthetubaguy: we need to get the spec landed, otherwise we should be good 18:02:32 yeah, +1 18:02:39 Maybe sync up on Thurday? 18:02:54 thats a good plan 18:02:57 smcginnis: yeap, sounds good 18:03:00 yup 18:03:23 I hope 1700UTC can work 18:03:39 if its quick, that should be OK 18:03:46 I will send out a heads up mail later this week 18:03:54 thanks ildikov 18:03:54 Works for me. We can hijack the cindre channel again if we need to. 18:04:01 johnthetubaguy: due to the holidays I would assume we can keep it short 18:04:02 +1 18:04:09 :) 18:04:24 smcginnis: cool, thanks, I'll check the channels 18:04:51 ok, I don't want steal more of your time today :) 18:05:00 thanks guys! 18:05:17 #endmeeting