17:01:01 #startmeeting cinder-nova-api-changes 17:01:02 Meeting started Mon Nov 21 17:01:01 2016 UTC and is due to finish in 60 minutes. The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:01:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:01:05 The meeting name has been set to 'cinder_nova_api_changes' 17:01:18 scottda ildikov DuncanT ameade cFouts johnthetubaguy jaypipes takashin alaski e0ne jgriffith tbarron andrearosa hemna erlon mriedem gouthamr ebalduf patrickeast smcginnis diablo_rojo gsilvis xyang1 raj_singh lyarwood 17:01:29 o/ 17:01:45 hi 17:02:07 mriedem: I would offer that I ping you to do home work this week, but Thanksgiving kills my joy of annoying people... 17:02:07 o/ 17:02:59 I know that smcginnis and patrickeast are traveling this week 17:03:10 my todo this week is to review johnthetubaguy's updates to https://review.openstack.org/#/c/373203/ which i didn't get to last week because of the nova spec freeze 17:03:21 but i'm also straddling some internal things i need to work on, and it's a short week 17:03:40 mriedem: the Cinder side is updated too 17:04:00 mriedem: code #link https://review.openstack.org/#/c/387712/ 17:04:22 mriedem: spec #link https://review.openstack.org/#/c/361512/ 17:04:26 yough 17:05:05 there is a comment from jaypipes regarding the reserve part, which creates an empty attachment and also changes the volume state to 'reserved' 17:05:14 I am curious if they are both pointing at the same thing 17:05:31 johnthetubaguy: that's just illusion :) 17:05:58 i'd kind of hoped we wouldn't have reserve, and we'd just have nova-api call create_attachment which would create the empty attachment and set the volume status to 'attaching' 17:06:13 I think the reserve should return the attachment uuid 17:06:15 then update_attachment in nova-compute when we have the connector details from os-brick 17:06:18 that's what I asked for a while back 17:06:29 I think it just makes the entire flow more explicit and sane 17:06:51 either way i think we agree to create the attachment from nova-api 17:06:57 mriedem, the problem with that is the create_attachment has to call the cinder backend 17:06:59 which is slow 17:07:10 so my spec calls reserve create attachment 17:07:10 mriedem: the current reserve does this except the name of the volume state 17:07:14 hemna: even if nova doesn't provide a connector? 17:07:26 right, connector is later 17:07:41 (optionally, later) 17:07:42 why would it call it w/o a connector? 17:07:43 i'm cool with leaving a reserve call in 17:07:52 johnthetubaguy: the Cinder API proposal is not finalized yet completely, so if the functionality is the same I think we can wait with updating the name in your spec 17:07:54 it would do that to reserve the volume from other people 17:07:56 just to reserve? 17:07:59 it's just a bit odd that reserve will create an attachment too 17:08:07 and we'll also have a POST attachment 17:08:07 johnthetubaguy: but basically the answer is yes :) 17:08:14 well it's 'odd' either way 17:08:28 calling create_attachment, which doesn't create an attachment but is used to reserve 17:08:30 yes it's odd either way i agree 17:08:36 kinda....hidden API 17:08:40 it would create an attachment record in the db 17:08:42 mriedem: I know, it's attachment-reserve atm though 17:08:51 I prefer having a named API for the capability 17:08:55 that returns the attachment uuid 17:09:03 sure that's fine 17:09:10 os-reserve creates an empty attachment and returns the id 17:09:14 s/id/uuid/ 17:09:19 nova stores the attachment uuid in the bdm 17:09:20 mriedem: reserve only asks for a volume_id and an instance_id IIRC 17:09:21 yuh 17:09:23 if you look at build instance, and how retry works, it might change your mind about reserve being separate to attachment uuid, at least thats what pushed me the way I suggested in the current Nova spec 17:09:30 then nova-compute uses the bdm.attachment_id to update the attachment later with the connector 17:10:16 johnthetubaguy: with bfv wouldn't we still reserve the volume in the api? 17:10:41 in the case that a volume_id is provided for server create i mean 17:10:54 i'm trying to think of it like with ports 17:11:09 I would guess that reserving the volume or move it to 'attaching' state or smth is needed in the API anyhow 17:11:29 we don't do that today for bfv i don't think, but i think that's wrong 17:11:51 mriedem: so attaching a volume during boot, I like to think about that separately from bfv, yes you reserve in the API, and then the compute 17:12:01 mriedem: no it's not there today in BFV 17:12:10 mriedem: yeah, its broken today, but thats what we should do 17:12:11 mriedem: I have a patch for that though on review 17:12:38 mriedem: I have a few questions on the review to ensure we're on the same page, so if you can take a look that would be pretty great too :) 17:13:33 remove check_attach patch #link https://review.openstack.org/#/c/335358/ 17:15:30 created a new section in here: https://etherpad.openstack.org/p/ocata-nova-priorities-tracking 17:16:37 johnthetubaguy: coolio, thanks! 17:17:27 so I guess we agree that having smth to call from the API that locks the volume and returns an attachment_id is what we would like to have 17:17:34 is my understanding correct? 17:18:08 i think so 17:18:17 That's my understanding 17:18:35 the cinder spec seems to include reserver already 17:19:13 johnthetubaguy: it does, but the functionality of that attachment-reserve is what I typed in above 17:19:37 so it seems that we kind of have the functionality, but don't like the naming 17:19:51 * ildikov maybe oversimplifies the situation 17:21:39 actually, I am not sure if reserve works for the retry build instance on a different host 17:22:09 we want to keep the volume attached to the instance, but we need to do a detach on the old host, so we end up with two attachments 17:22:29 I am not sharing enough context there probably 17:22:46 so instance is building on host 1 17:22:47 volume 1 is attached to host 1 17:22:48 it fails, boo 17:22:55 does retry build equal to evacuate? 17:23:07 close, but not quite 17:23:17 we can actually clean up host 1, and host 1 is still alive 17:23:30 but we do need to keep it reserved during the process 17:24:10 johnthetubaguy: isn't it like live migrate? 17:24:19 yeah, basically the same 17:24:28 where we have two attachments for the same instance? 17:24:47 yeah 17:24:52 johnthetubaguy: we don't reserve from the compute 17:25:02 just like with ports, we bind on each host 17:25:10 mriedem: +1 17:25:14 and cleanup any garbage on retry 17:25:18 ah, so maybe build is different 17:25:26 ah, oops, it is 17:25:31 note that today we don't retry on BFV failures with attachment errors 17:25:32 we diconnect host 1 17:25:38 then go to host 2 17:25:59 do we are not a host 2 at the time we want to clean up host 1 17:26:01 https://review.openstack.org/#/c/246505/ attempts to retry when bfv fails attachment on host 1 17:26:05 s/do/so/ 17:26:33 the author has basically abandoned ^ 17:26:45 I really mean when there is a failure for some other reason, we need to sort out the volume attachments 17:27:09 i guess it depends on what we need to cleanup 17:27:15 i think of it like, 17:27:24 today we update the port's host binding details, 17:27:32 on failure we don't clean that up 17:27:34 mriedem: I think the idea was to handle what we can with a new attachment 17:27:39 we probably should...but we don't today, 17:27:49 mriedem: and move old ones to error state if needed maybe 17:27:53 with volumes i think of that like we update the vol attachment with the connector from that host 17:28:06 ildikov: having 3 vol attachments in error state sucks 17:28:12 who cleans those up? what does it mean for the volume? 17:28:31 I'm fine with just deleting the attachments 17:28:38 well, 17:28:45 i think we have a single vol attachment per vol and instance 17:28:49 and the host changes 17:28:52 per the connector 17:28:57 I think there is at least one part of johnthetubaguy's patch where we want to move it to error state and clean up later I guess 17:28:58 that's what we have with ports 17:29:08 ildikov: thats evacaute I think 17:29:20 i could see that for evacuate 17:29:26 johnthetubaguy: yeap, I think so too 17:29:26 cleanup the error attachments when the compute comes back 17:29:33 evacuate is a special sort of terrible 17:29:35 mriedem: I forgot, we are missing reserve on ports today, technically 17:29:41 johnthetubaguy: yeah i know 17:29:54 mriedem: we certainly have an agreement there 17:29:55 mriedem: they do have a plan to fix that already 17:29:57 johnthetubaguy: but we don't reserve for BFV today either, so it's similar 17:30:03 yeah, its similar 17:30:26 I hope we fix that soon, I mean the BFV reserve case 17:31:02 it would be great to clean up check_attach before introducing a new Cinder flow IMHO 17:32:05 so I think this is how I described the retry build in the spec: https://review.openstack.org/#/c/373203/13/specs/ocata/approved/cinder-new-attach-apis.rst@237 17:32:12 its basically the same thing we are desribing 17:32:20 I just wanted to avoid a window where the volume is not reserved 17:32:47 oops, wrong line 17:32:47 https://review.openstack.org/#/c/373203/13/specs/ocata/approved/cinder-new-attach-apis.rst@273 17:35:08 johnthetubaguy: is this a case when we only reserved the volume and nothing else happened due to any reason before retry? 17:35:45 I was meaning when the volume is fully attached and working on host 1, but we still need to retry the build and move to host 2 17:35:58 johnthetubaguy: in any case the attachment-create call is now create_and_or_update practically 17:36:26 johnthetubaguy: so if you call it from the compute after having a connector it reserves the volume and does everything else as well 17:36:42 I don't know what the new host will be, when host 1 is being disconnected, I think... 17:37:01 so I am calling reserve on an already attached volume, basically 17:37:09 but with a matching instance_uuid 17:37:23 to make sure it says reserved when host 1 is disconnected 17:37:36 I am just wondering if that will work with the current definition of reserver 17:37:40 reserve 17:38:08 I need to double check, it might does 17:39:34 so basically we need to call something that ensure the volume is locked, when we know retry is needed 17:39:51 I think, yes 17:40:06 right now it looks like we just skip any cleanup, but I could be wrong 17:40:17 as we need the scheduler to figure out the host first so we can have the connector, but it takes some time and we don't want to leave the volume available for that window 17:40:21 left a comment in the review in that section 17:41:04 since i haven't read the full spec since it was last updated i might be missing context, 17:41:14 mriedem: yeah, I was just digging, I couldn't see any cleanup on reschedule either 17:41:20 but really wanted at a high level that the volume is reserved (locked) in n-api 17:41:30 and the computes update the connector per compute, per retry 17:41:44 johnthetubaguy: there isn't cleanup b/c we don't retry for bfv failures 17:42:00 the only thing we cleanup is calling os-detach and os-terminate_connection 17:42:46 so... its the update the connector bit 17:43:01 we can't really do that, I thought, as that would break all our locking around shared connections 17:43:17 ildikov: did we get anywhere on deciding if they actually exist? 17:43:46 so with live migrate we create a new attachment, with evacuate we also and mark the old one as in error state and the rest we would like to just update the existing attachment? 17:44:18 johnthetubaguy: they exist as I saw from the discussions on the Cinder channel last week after the meeting 17:44:28 OK 17:44:50 I would prefer to always create a new attachment, and delete the old one, just so the detach logic is always the same 17:44:56 but maybe I am overthinking it 17:45:07 johnthetubaguy: I think there are still differences in how those connections are handled, but we don't have patrickeast around this time to ask as he had one of the examples 17:45:18 johnthetubaguy: so then nova-api and nova-compute are creating attachments? 17:45:25 i don't really like that 17:45:35 johnthetubaguy: not having exceptions is what I favor too 17:45:45 oh, true, they are in this model 17:46:00 but that would be with the server token + user token, once we get that fix in 17:46:13 not sure how that matters 17:46:17 it actually makes it more complicated 17:46:27 b/c then the user listing attachments on a volume doesn't see the ones that nova created 17:46:30 and can't clean them up 17:47:04 its just an RBAC thing, not an owner thing, but thats a separate conversation I guess 17:47:41 I don't mean RBAC, I really mean expiry in this context 17:48:02 so neutorn port bindings have some stuff that will need a service token (setting the host), but lets back out of that 17:48:17 I get its simpler to just update the connector 17:48:35 rather than create empty, delete old one, etc 17:48:46 I just like disconnect always deleting the attachment 17:48:54 that seemed simpler 17:49:13 I guess they are logically the same for the cases that matter 17:49:41 ah, no... 17:49:53 disconnect is cleanup on the host via os-brick though right? 17:49:54 shared volume, you need to delete (or update) the attachment with the lock held 17:50:00 shared host connection I mean 17:50:26 mriedem: I believe yes, at least that's my understanding 17:50:27 the lock being local to host 1 17:50:41 it's fine to cleanup from a failed connection with the lock held, 17:50:50 i don't see how that messes with the cardinality of how many attachments we have 17:51:21 I am talking about a VM that fails after the volume is attached, and has to re try the build on a new host 17:51:45 although you said thats not a thing, I am sure I saw that happen... but maybe thats some crazy local patch we had 17:51:47 sure, before we reschedule why can't we disconnect (os-brick) from host and remove the connector from the attachmnet? 17:52:20 remove connector works, rather than delete, but then it is a special case in the volume_api 17:52:47 can we update as opposed to remove? 17:52:48 this is what we cleanup on a failed attach today https://github.com/openstack/nova/blob/a58c7e5173103755f5a60f0a3ecede77e662ada3/nova/virt/block_device.py#L299 17:53:46 and for vol attach we unreserve in the compute manager here https://github.com/openstack/nova/blob/a58c7e5173103755f5a60f0a3ecede77e662ada3/nova/compute/manager.py#L4731 17:53:52 b/c we reserved in the api 17:54:01 (for attach vol, not BFV) 17:54:33 wherever we end up, i just don't like n-api creating an attachment, n-cpu updating it, maybe deleting it and creating a new attachment on reschedule, 17:54:48 in my spec, unreserve doesn't exist, that just happens when you delete an attachment 17:54:52 i'd rather see n-api reserve the volume (no attachment created), n-cpu handles CRUD ops on attachments, 17:55:00 or n-api creates attachment, n-cpu updates it, that's it 17:55:21 yeah maybe n-cpu deletes the attachment at the end if things fail 17:55:24 like unreserve 17:55:43 except live migrate I guess where we need the two attachments? 17:56:09 conductor would create the 2 attachments for live migration i'd think 17:56:39 anyway, i'm sure it's in john's spec and i just need to sit down with it for awhile 17:56:54 so I guess I don't understand why we would want reserve/unreserve as separate calls, I much prefer them just being an attachment thats in a specific state. there must be something I am missing here 17:57:11 johnthetubaguy: i agree with you there 17:57:23 but it does bake extra logic into the cinder api, 17:57:32 so i'm guessing that's why cinder people are opposed to that 17:57:37 and by cinder people i mean hemna 17:58:11 johnthetubaguy: just to confirm, create_attachment w/o a host/connector == reserve for you right? 17:58:18 mriedem: yeah 17:58:19 in the plans we don't have unreserve 17:58:32 and update_attachment + connector == attached 17:58:39 i.e. initialize_connection 17:58:40 yeah 17:58:47 ok we're on the same page then 17:59:06 ildikov: unreserve == delete attachment i think 17:59:38 mriedem: yeap, I think delete kind of contains that too in this sense 17:59:43 mriedem: yeah, in my head delete = unreserve + terminate connection (on cinder side, if required) 18:00:05 mriedem: as the current reserve creates the empty attachment it kind of covers what you described above 18:00:10 i hope the spec has a mapping of old to new world things :) 18:00:22 so we can talk the same language for people not involved in this for the last year 18:00:22 my one tries to do that 18:00:45 we're out of time 18:00:49 https://review.openstack.org/#/c/373203/13/specs/ocata/approved/cinder-new-attach-apis.rst@86 18:00:55 mriedem: we might think about what the volume state is called there though... 18:01:11 johnthetubaguy: tops 18:01:23 is that the correct british thing to say? 18:01:28 :) 18:01:37 LOL :) 18:02:03 we say lots of random rubbish to mean the same thing 18:02:18 johnthetubaguy: mriedem: should we move the discussion to the Nova spec regarding the edge cases? 18:02:19 that's english in general i believe 18:02:24 ildikov: yes 18:02:32 yeah, I am good with that 18:02:34 johnthetubaguy: mriedem: and we can argue on the Cinder API on the Cinder spec 18:03:04 that sounds logical, I need to look at that update 18:04:22 mriedem: if you can get to the remove check_attach patch before Thanksgiving that would be great as I can update the code while you're enjoying turkey :) 18:05:32 johnthetubaguy: cool, I'm trying my best to keep it in sync with the code and discussions 18:06:03 ildikov: awesome, thank you! 18:06:10 johnthetubaguy: we would like to get the Cinder part merged as soon as possible, so if you can raise any concerns on the spec or the API part of the code that would be great 18:06:47 honestly, its hard to agree to the API before understanding if it covers all the cases we need 18:06:47 having said that, it feels like we are really quite close now 18:07:42 yeap and if we can keep things simple like how mriedem described to create and update the attachment where needed then we should be good to go 18:08:13 like having the basic API calls on the Cinder side and play Lego with it in Nova as we need it sounds the way to go I think 18:08:21 as long as the cinder api is close and uses microversions we can tweak it over time too 18:08:38 being able to reset the host connection is probably the simplification bit thats useful 18:08:41 mriedem: good point 18:09:18 mriedem: it does, scottda is our person on the microversions part when we get there 18:09:22 anyway, times up, i've got to move on here, i'll try to review all things under the sun and not sleep the rest of this week 18:09:40 mriedem: thanks in advance 18:09:54 mriedem: have a nice Thanksgiving! 18:10:02 thanks 18:11:21 all right, see you all next week! 18:11:31 and on the reviews in the meantime :) 18:11:38 thanks and have a nice rest of the day! 18:11:52 #endmeeting