21:01:21 <ildikov> #startmeeting cinder nova api changes
21:01:21 <openstack> Meeting started Wed Apr  6 21:01:21 2016 UTC and is due to finish in 60 minutes.  The chair is ildikov. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:01:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:01:24 <openstack> The meeting name has been set to 'cinder_nova_api_changes'
21:01:27 <scottda> Good timing!
21:01:30 <scottda> scottda ildikov DuncanT ameade cFouts johnthetubaguy jaypipes takashin alaski e0ne jgriffith tbarron andrearosa hemna erlon mriedem gouthamr ebalduf patrickeast smcginnis diablo_rojo
21:01:39 <jgriffith> o/
21:01:40 <andrearosa> o/
21:01:45 <takashin> o/
21:01:45 <DuncanT> o/
21:01:48 <ildikov> scottda: thanks :)
21:01:58 <ildikov> #chair scottda
21:01:59 <openstack> Current chairs: ildikov scottda
21:02:12 <mriedem> o/
21:02:18 <smcginnis> o/
21:02:27 <diablo_rojo> Hello
21:02:38 <smcginnis> Nice, learned about the #chair command.
21:02:47 <hemna> yough
21:03:10 <ildikov> smcginnis: I learnt new things like this today too :)
21:03:24 <smcginnis> Always more to learn. ;)
21:03:39 <ildikov> etherpad: #link https://etherpad.openstack.org/p/cinder-nova-api-changes
21:03:53 <ildikov> smcginnis: I hear ya ;)
21:04:00 <scottda> I was concerned that this meeting is pretty late for some people. I think DuncanT is UTC + 300 so it's 2400 there.
21:04:09 <ildikov> so the first topic on the etherpad is the meeting time slot
21:04:19 <DuncanT> Yup, midnight here
21:04:21 <scottda> Who is furthest East, i.e. is anyone earlier than UTC -700 PDT TZ?
21:04:33 <scottda> takashin: ?
21:04:34 <ildikov> takashin: what is your time zone?
21:04:46 <takashin> JST
21:04:51 <takashin> UTC+9
21:05:15 <ildikov> so it's 6am in yours now, right?
21:05:21 <takashin> Yes.
21:05:31 <scottda> OK, so this is about as good as it gets then.
21:05:55 <scottda> Just checking. Duncan already said he was going to hate me in the morning :)
21:06:02 <ildikov> scottda: we couldn't get much better :(
21:06:13 <DuncanT> I'll survive and just curse Scott in the morning
21:06:15 <hemna> dang, that's rough
21:06:15 <ildikov> DuncanT: sorry, it's 11pm here too, I almost feel your pain
21:06:23 <scottda> ildikov: Fair enough, I just wanted to check. We can move on....
21:06:37 <ildikov> alright
21:06:41 <andrearosa> I am on holiday ATM and it is 11pm here so I win!
21:06:55 <ildikov> andrearosa: kudos!
21:07:18 <scottda> Hopefully, we do a couple more of these before the Summit and then get to some consensus at the Summit...
21:07:19 <ildikov> so we have a few alternatives on the etherpad to track more info about attachments
21:07:29 <scottda> So we won't have to keep doing this for too terribly long.
21:07:36 <ildikov> scottda: +1
21:07:51 <ildikov> the idea about the meetings is to ensure progress
21:08:07 <ildikov> hopefully the time zone issue will give us some extra motivation
21:08:25 <scottda> So, this is a bit of work, but hemna started creating diagrams....
21:08:36 <scottda> https://www.gliffy.com/go/publish/image/10360157/L.png
21:09:08 <scottda> We don't want to just make some pictures for the fun of it, but it seems useful to figure out the flow of things for attach, detach, nova live-migration, and anything else relevant...
21:09:20 <andrearosa> wow very useful
21:09:51 <scottda> Before we jump into details about how to fix all this, it'd be good to decide all the things that need fixing at the high level first. That's  my opinion.
21:10:04 <smcginnis> scottda: +1
21:10:37 <hemna> scottda, +1
21:10:39 <hemna> yah
21:10:49 <hemna> I wanted to create another diagram like that for live migration as well
21:10:52 <ildikov> I think we could add these diagrams to developer docs as well
21:10:58 <hemna> since that has a lot of interactions with Cinder as well
21:11:08 <hemna> I guess another with detach would be helpful too
21:11:09 <scottda> Yeah, once you start in on the diagrams, there's uses everywhere.
21:11:11 <jgriffith> scottda: might be worth mentioning there's also an explanation of the cinder side in the official docs here:  http://docs.openstack.org/developer/cinder/devref/attach_detach_conventions.html
21:11:35 <jgriffith> scottda: might be good to level set on the design and how things are intended to work
21:11:46 <scottda> jgriffith: Thanks, That's very useful.
21:11:47 <ildikov> jgriffith: looks like a good fit for diagrams like the one above
21:12:22 <cFouts> o/
21:12:33 <ildikov> hemna: yeah, detach would be needed as well, it's similar regarding interaction with Cinder and os-brick
21:12:42 <hemna> yup
21:13:15 <ildikov> is it only me or that flow really is complicated?
21:13:24 <scottda> ildikov: It's just you.
21:13:28 <scottda> just kidding
21:13:31 <hemna> :)
21:13:40 <ildikov> lol :)
21:13:42 <hemna> the flow is complicated.
21:13:48 <scottda> So, I had thought a few months back about creating a simpler API...
21:13:55 <hemna> and my diagram probably isn't the best way to show it.
21:14:03 <scottda> https://review.openstack.org/#/c/267715/
21:14:14 <smcginnis> hemna: That does capture a lot. Thanks for pulling that together.
21:14:15 <scottda> But then issues around multi-attach came up, and live migration....
21:14:47 <ildikov> smcginnis: +1
21:14:50 <hemna> smcginnis, welcome :)
21:15:00 <scottda> But this does bring up a question: Should we be thinking more globally , and see if there's a simpler API that we could create that makes most/all of these problems go away?
21:15:02 <ildikov> scottda: do you mean that those features are contradicting with your spec?
21:15:13 <scottda> ildikov: No, not a contradiction...
21:15:14 <smcginnis> scottda: I'm all for a simpler API.
21:15:33 <jgriffith> smcginnis: scottda can you clarify what you mean by "simpler" API
21:15:34 <scottda> Just that we are like the 5 blind men trying to describe the elephant......
21:15:40 <jgriffith> smcginnis: scottda the API is pretty simple as it is IMO
21:15:46 <scottda> Eveyone looks at a different part, and sees it differently
21:15:50 <jgriffith> "nova volume-attach xxx yyyy /dev/vdb"
21:15:52 <jgriffith> how much simpler?
21:15:57 <DuncanT> jgriffith: I think we can get it down to three calls
21:16:08 <scottda> jgriffith: I mean the underlying calls, not the initial command
21:16:11 <jgriffith> DuncanT: again, please clarify what layer/api you're referring to
21:16:14 <jgriffith> scottda: thanks
21:16:16 <DuncanT> Not that API I think, the one between cinder and nova
21:16:21 <smcginnis> DuncanT: +1
21:16:22 <jgriffith> DuncanT: it's already only 3 calls
21:16:34 <hemna> jgriffith, didn't we whiteboard the 'simpler' API in one of the cinder midcycle meetups and it kinda turned into exactly what we have already today ?
21:16:40 <scottda> jgriffith You and hemna and I started talking about this a couple mid-cycles ago...
21:16:46 <scottda> snap
21:16:48 <smcginnis> hemna: Hah, really. OK.
21:16:55 <scottda> hemna: Yes, in Ft. Collins last summer
21:17:03 <jgriffith> scottda: hemna correct
21:17:08 <jgriffith> I don't know how/what you want to make simpler
21:17:11 <scottda> See that spec for a basic idea of where we went with that ^^
21:17:15 <jgriffith> it's pretty basic as is today
21:17:36 <jgriffith> it's just not followed and there's some clunky things we have going on with managing and tracking
21:17:53 <DuncanT> jgriffith: I'd get rid of reserve and its converse... not a huge win for nova but helps reduce failure windows for bare metal
21:18:07 <jgriffith> DuncanT: how come?
21:18:30 <DuncanT> jgriffith: Because things die between calls. With annoying frequency in dev systems
21:18:34 <hemna> I thought nova needed reserve
21:18:36 <jgriffith> DuncanT: I'm not familiar with what reserving the resource in the DB causes?
21:18:53 <DuncanT> jgriffith: Things stuck in 'attaching'
21:18:56 <jgriffith> DuncanT: hmm... yeah, but then you introduce races
21:18:57 <hemna> to ensure the volume was 'locked' for it to attach through the process
21:18:58 <mriedem> nova needs reserve so the volume is 'attaching' from the api before we cast to the compute to do the actual attach
21:19:08 <hemna> mriedem, +1
21:19:32 <mriedem> we don't have a reserve like that for networks, for example, so we can hit races with neutron today too
21:19:36 <mriedem> and quota issues
21:19:44 <jgriffith> DuncanT: I think the idea of something failing between reserve and initialize is another problem that should be solved independently
21:19:46 <DuncanT> jgriffith: Make it valid to call initialise_connection with do_reserve_for_me=True and bare metal is easier I think
21:19:59 <scottda> Yeah, but why not wait until Cinder returns the connector info before Nova even proceeds? Do we need to be async here?
21:20:01 <jgriffith> mriedem: +1 and thus my point about races... that's why that call is there
21:20:19 <jgriffith> scottda: because somebody could come in and delete the volume
21:20:21 <mriedem> does the ironic driver in nova even support volume attach?
21:20:41 <hemna> jgriffith, +1
21:20:45 <scottda> internally, we can still set to 'attaching' and then proceed to initialize connector:
21:20:53 <hemna> or another n-cpu node to put it into attaching as well
21:20:57 <hemna> (multi-attach)
21:21:06 <DuncanT> jgriffith: Either the API call wins the race against the delete, puts the volume into 'attached' and the delete fails, or the API call looses and returns 404
21:21:13 <scottda> https://www.irccloud.com/pastebin/1L3KWe0j/
21:21:18 <DuncanT> mriedem: It's being worked on
21:21:42 <jgriffith> DuncanT: ok... but personally I'd say if you're resources are disappearing in the 1 second between those two calls we should fix soemthing else up to recover from it
21:21:53 <jgriffith> DuncanT: and I don't know how you do that anyway if you lost something
21:22:04 <mriedem> scottda: that diagram assumes that smart_attach is going to be fast on the cinder side?
21:22:20 <mriedem> scottda: nova api can't be waiting for a long call to return from cinder else we timeout the request
21:22:23 <jgriffith> I don't want to take up a ton of time on this though because I'm not overly familiar or understanding the problem completely
21:22:27 <DuncanT> jgriffith: It is a small optimisation, but as I said, I managed to hit that window a fair bit, and we used to see it in the public cloud too, e.g. rabbit restarts
21:22:27 <scottda> mriedem: It does, but that's not necessarily a good assumption....
21:22:38 <hemna> mriedem, that smart_attach might take a while
21:22:51 <mriedem> scottda: right, this is why we rejected doing the get-me-a-network allocation with neutron from nova-api
21:22:54 <mriedem> b/c we can't hold up nova-api
21:22:56 <hemna> because it looks like it is doing the work of initialize_connection today, which is entirely driver dependent.
21:23:03 <DuncanT> mriedem: I wouldn't force nova to do a one step rather than two, I'm thinking of bare metal and none-nova consumers
21:23:08 <scottda> Nova still has to wait for the connector info during initialize_connection...
21:23:22 <mriedem> DuncanT: ok, well, i'm not really interested in the non-nova case
21:23:25 <hemna> if smart_attach was async to nova?
21:23:31 <hemna> but man that is a lot of changes
21:23:45 <mriedem> scottda: sure, but we wait on the compute
21:23:51 <mriedem> after we already cast from nova-api
21:23:58 <scottda> And maybe not worth it. It's just an idea that gets proposed at various times.
21:24:03 <jgriffith> mriedem: scottda I'm not sure why this is *good*?
21:24:03 <DuncanT> I guess I'm in a minority though, and I don't think my proposal overlaps with the other discussion much at all, so maybe shelve it for now and let people think about it later?
21:24:16 <jgriffith> mriedem: scottda as opposed to just doing better management of what we have?
21:24:34 <DuncanT> jgriffith: initalise_connection is slow, so kicking it off earlier and picking up the result later can be a win
21:24:36 <jgriffith> mriedem: scottda or... "more efficient" better implementation of what we have
21:24:54 <jgriffith> DuncanT: why? / how?  sorry, I'm not following
21:24:56 <scottda> jgriffith: That may be the way to go. I'm just bringing this up because it had been discussed previously.
21:25:13 <jgriffith> DuncanT: no matter what you can't attach a volume w/out an iqn
21:25:21 <scottda> We can move on to discussing just fixing what is missing at the moment, as far as I'm concerned.
21:25:28 <mriedem> i've had thoughts on ways to make boot from volume better in nova, but it doesn't really have anything to do with what we need to accomplish before the summit, which is what this meeting is supposed to be about
21:25:29 <DuncanT> jgriffith: Look at EMC.... they were taking long enough for the RPC to time out when having to do FC zone setup too
21:25:29 <jgriffith> so if the backend is slow for giving that for some reason ???  you can't do anything without it
21:25:33 <mriedem> i have a hard stop in like 30 minutes
21:25:39 <jgriffith> DuncanT: no, that's different
21:25:47 <scottda> Let's move on....
21:25:52 <DuncanT> Ok, we can come back to this later
21:25:57 <jgriffith> DuncanT: that was an API response issue with multiple clones to their backend API simultaneously
21:26:07 <jgriffith> "their" API
21:26:24 <mriedem> so i see 3 solutions in https://etherpad.openstack.org/p/cinder-nova-api-changes
21:26:27 <mriedem> for multiattach
21:26:29 <jgriffith> DuncanT: they have a limit on # simultaneous API requests on teh backend
21:26:31 <mriedem> who wants to go over those?
21:26:36 <scottda> mriedem: +1
21:26:43 * jgriffith can go over his
21:26:44 <ildikov> mriedem: +1
21:27:08 <ildikov> jgriffith: the floor is yours :)
21:27:16 <jgriffith> k... thanks
21:27:38 <jgriffith> So IMO the bigger challenge with a lot of things we've been circling on is more around managing attach status
21:27:46 <mriedem> which one is this? #4?
21:27:55 <scottda> I think it's #2
21:27:56 <hemna> mriedem, #2
21:28:03 <mriedem> ok
21:28:14 <jgriffith> oh... yeah sorry
21:28:21 <jgriffith> Also to help:  #link https://review.openstack.org/#/c/284867/
21:28:46 <jgriffith> so eharney brought up a point about that being a bit iSCSI centric, we can revisit that
21:29:11 <jgriffith> but the proposal I think solves the multi-attach problem as well as some of this circling about the current API and how things work
21:29:29 <jgriffith> we already pass the connector info in to initialize_connection... but sadly we don't really do much with it
21:29:51 <jgriffith> if we took that info and built out the attachments table properly, we could just give the caller (Nova) back the attachment ID
21:30:00 <jgriffith> Nova wouldn't need to care about multi-attach etc
21:30:05 <jgriffith> Just the attachment ID
21:30:17 <jgriffith> no tracking, no state checking etc
21:30:29 <jgriffith> Same holds true on the cinder side for detach
21:30:46 <jgriffith> we don't care anymore... if we get a detach request for a volume with an attachment-ID we just act on it
21:30:55 <jgriffith> quit adding complexity
21:31:04 <scottda> Who would determine when to remove the export in your case jgriffith ?
21:31:12 <hemna> sonus, I'm confused how initialize_connection works for multi-attach looking at the code.  re:  attachment = attachments[0]
21:31:14 <scottda> IN the case where some drivers multiplex connections.
21:31:32 <ildikov> jgriffith: we still need to figure out in Nova when to call disconnect_volume
21:31:32 <jgriffith> scottda: that's always up to the consumer
21:31:50 <jgriffith> scottda: that's the point here... if it's multiplexed that's fine, but each "plex" has it's own attachment-id
21:31:52 <scottda> Would Nova just call detach with an attachment_id, and then Cinder (manager?) or driver? figures it out.
21:32:00 <hemna> ildikov, if we stored the host and the instance_uuid, nova can make the correct choice on it's side.
21:32:12 <jgriffith> scottda: yes, that's the only way you can handle the deltas between drivers and their implementations
21:32:29 <jgriffith> hemna: sorry... which part are you confused on?
21:32:31 <ildikov> hemna: we need the back end info as well, regarding the multiplex or not cases
21:32:52 <jgriffith> hemna: ildikov wait.. back up a second
21:32:55 <scottda> jgriffith: OK, I do like that the consumer (Nova ) just calls detach(attachment_id) and let's Cinder figure out when to un-export based on driver
21:32:57 <jgriffith> let me try this
21:33:00 <jgriffith> nova has node-A
21:33:05 <hemna> jgriffith, line 920 in manager.py
21:33:18 <jgriffith> we request an attach of a volume to instance Z on node-A  (Z-A)
21:33:20 <hemna> makes an assumption that the correct attachment is always the first found.
21:33:37 <jgriffith> we then request another attach of the same volume also on node-A but now instance X
21:33:44 <jgriffith> (X-A)
21:33:53 <jgriffith> we create an attachment ID for each one
21:34:03 <hemna> as attachments has a list of all attachments for that volume, which can be on any host or any instance.
21:34:15 <jgriffith> we don't care if your particular backend shares a target, creates a new one or boils a pigeon
21:34:25 <jgriffith> hemna: please let me finish
21:34:41 <hemna> ok, sorry, you asked why I was confused.
21:34:43 <jgriffith> at this point if nova is done with the volume on Instance Z
21:34:58 <jgriffith> it issues the detach using the attachment-id associated with Z-A
21:35:39 <mriedem> i don't think that's the hard part, the os-detach call to cinder that is, the hard part is figuring out if/when we call virt.driver.disconnect_volume, which comes before calling cinder os-detach
21:35:49 <jgriffith> that goes into the detach flow on Cinder.. and IF and ONLY if a driver needs to do some special magic they can do something different
21:36:03 <jgriffith> like "hey... how many attachments to this host, volume etc etc"
21:36:10 <ildikov> mriedem: yeah, my point exactly
21:36:36 <jgriffith> mriedem: I'm not sure which part you're referring to?
21:36:46 <hemna> mriedem, if cinder has the host and the instance_uuid for every entry in the volume_attachments table, nova can loop through those and decide if the volume has anything left on that n-cpu node.
21:36:48 <jgriffith> mriedem: I mean...
21:36:55 <hemna> if it doesn't, then it calls disconnect_volume.
21:36:57 <jgriffith> I am unclear on the challenge there?
21:37:11 <jgriffith> hemna: +1
21:37:12 <mriedem> jgriffith: during detach in nova, the first thing we do is disconnect the volume in the virt driver
21:37:17 <smcginnis> hemna: So Cinder would call disconnect_volume in that case, not nova, right?
21:37:22 <hemna> smcginnis, no
21:37:25 <hemna> nova does
21:37:29 <jgriffith> mriedem: right...sorry, hemna 's comment pointed it out for me
21:37:33 <mriedem> hemna: yeah we talked about this last week a bit
21:37:44 <smcginnis> hemna: Oh, right, I see.
21:37:45 <mriedem> i had some pseudo logic in that meeting for the nova code
21:37:55 <hemna> only n-cpu can, because that's the host where the volume exists (/dev/disk/by-path/<entry here>)
21:38:06 <jgriffith> mriedem: so the trick is that in my case and LVM's case I've written it such that you get a unique target for each attach
21:38:26 <jgriffith> mriedem: so even if you have the same LVM volume attached twice on a compute node it has two attachments/iqns
21:38:30 <jgriffith> mriedem: so we don't care
21:38:38 <jgriffith> you said detach... we detach
21:38:44 <jgriffith> mriedem: that's how it solves the problem
21:39:01 <ildikov> I think we said something about having a flag about the back end
21:39:08 <mriedem> so disconnecting Z-A doesn't mess up X-A
21:39:14 <jgriffith> mriedem: exactly
21:39:20 <jgriffith> mriedem: the idea is to completely decouple them
21:39:25 <mriedem> ildikov: jgriffith is saying that wouldn't be a problem with his soloution
21:39:29 <jgriffith> mriedem: make them totally independent
21:39:29 <ildikov> as when we have a target per volume than we need to call disconnect volume regardless of how many attachments we have on the host
21:39:43 <mriedem> ildikov: the flag, ifi remember correctly, was something in the connection_info about the cinder backend
21:39:53 <jgriffith> mriedem: for devices that 'can't' do multiple targets for the same volume that's fixable as well
21:40:13 <jgriffith> mriedem: you can still spoof it easy enough on the /dev/disk-by-path entry
21:40:28 <mriedem> you being cinder
21:40:29 <mriedem> ?
21:40:32 <ildikov> mriedem: yeah, that might be the one, I don't remember where we placed it
21:40:43 <hemna> ildikov, yah I think that's where it was.
21:40:44 <jgriffith> mriedem: no, that is up to Nova/Brick when they make the iscsi attachment
21:40:50 <hemna> shared/notshared flag
21:41:00 <jgriffith> mriedem: well... yeah, cinder via volume-name change
21:41:12 <mriedem> volume-name change?
21:41:18 <jgriffith> mriedem: yeah
21:41:25 <ildikov> hemna: right, I didn't like the name, but I like the flag itself I remember now :)
21:41:25 <mriedem> i'm not following
21:41:46 <jgriffith> mriedem: so in the model info instead of attaching and mounting at /dev/disk-by-path/volume-xxxxx.iqn umptysquat
21:41:49 <jgriffith> twice!!
21:42:07 <jgriffith> you do something like:  iqn umptysquat_2
21:42:10 <jgriffith> you do something like:  iqn umptysquat_3
21:42:12 <jgriffith> you do something like:  iqn umptysquat_4
21:42:14 <jgriffith> etc
21:42:35 <cFouts> no reference counting anywhere then
21:42:39 <jgriffith> nova gets a detach call... uses the attach ID to get the device path and disconnects that one
21:42:55 <jgriffith> cFouts: correct ZERO reference counting becuase they're indepndent
21:43:01 <jgriffith> and decoupled
21:43:29 <hemna> so I think the question was, how to accomplish that for backends that can't do separate targets for the same volume on the same initiator
21:43:39 <mriedem> the attach id gets us the attachment which has the device path from the connector dict?
21:43:45 <jgriffith> mriedem: I haven't worked out a POC patch on the Nova side yet because I've been told that people don't like the proposal so I don't want to spend time on somethign for nothing :)
21:43:46 <mriedem> and we pass that connector to os-brick to disconnect?
21:44:11 <jgriffith> mriedem: it can we have everything in the attach object
21:44:24 <jgriffith> hemna: that's the exact case that I just described?
21:44:48 <jgriffith> hemna: you just attach it using a modified name
21:44:55 <hemna> but udev is what creates those paths in /dev/disk/by-path
21:45:01 <hemna> so I'm confused
21:45:10 <jgriffith> hemna: nah, we have the ability to specify those
21:45:27 <hemna> we don't create those paths though
21:45:39 <hemna> and by we, I mean os-brick
21:45:54 <jgriffith> hemna: I can modify how that works
21:46:05 <jgriffith> anyway....
21:46:13 <jgriffith> is there even any general interest here?
21:46:23 <ildikov> scottda: I think you mentioned NFS and SMBFS on the etherpad which have detach issues as well
21:46:40 <jgriffith> the snowflakes that require only a single target are a challenge, but it should be solvable
21:46:52 <scottda> jgriffith: I think the idea sounds good. I certainly would keep it around until we have all alternatives discussed.
21:47:08 <ildikov> scottda: do we have an alternative that addresses all types of detach issues we're facing with?
21:47:13 <jgriffith> scottda: well I wasn't going to burn it in the next 15 minutes :)
21:47:21 <smcginnis> jgriffith: My concern is the single target systems. But if we can get that to work, I like the simplicity.
21:47:33 <scottda> ildikov: You mean including NFS and SMB? I'm not sure. Maybe hemna helps with that...
21:47:52 <ildikov> jgriffith: I need to process more the snowflakes part to be able to have a solid opinion
21:47:59 <jgriffith> smcginnis: the single target systems are going to be a challenge no matter what I think.  I've yet to see a proposal that I think will work really
21:48:04 <hemna> smcginnis, that's always been the problem we haven't solved yet in general, nova not knowing when it can safely call os-brick.disconnect_volume
21:48:24 <jgriffith> hemna: add some info or another check to Cinder
21:48:34 <jgriffith> I mean worst case scenario you could just do that
21:48:47 <jgriffith> hemna: cinder.api.safe-to-discon
21:48:56 <ildikov> scottda: there's a note in the Issues summary part that says "Problem exists in non-multi-attach for NFS and SMBFS volumes"
21:49:03 <hemna> jgriffith, if we stored both the host and instance_uuid in each volume_attachment table entry, nova can use that along with the 'shared' flag in the connection_info coming back from initialize_connection, to decide if it should call disconnect_volume or not.
21:49:04 <jgriffith> hemna: snowflakes can implement a check/response
21:49:05 <jgriffith> True/False
21:49:14 <jgriffith> hemna: my proposal does just that
21:49:49 <jgriffith> without the extra tracking complexity you mention
21:50:16 <ildikov> jgriffith: I think if we can have something like that shared flag Nova can figure out at detach time that would help
21:50:23 <hemna> ok, maybe I simply don't understand the nova side of your changes then.
21:50:34 <jgriffith> hemna: https://review.openstack.org/#/c/284867/2/cinder/volume/manager.py Line 947
21:50:36 <hemna> re: /dev/disk/by-path entries being created outside of udev
21:50:48 <jgriffith> alright, fair enough
21:50:55 <jgriffith> let's hear your proposal?
21:51:26 <ildikov> 10 minutes left fro the official hour
21:51:27 <hemna> mine isn't that different really.
21:51:29 <hemna> heh
21:51:34 <jgriffith> ildikov: yeah, I think it's a lot easier to add another check
21:51:39 <ildikov> hemna: can we run through yours?
21:51:45 <jgriffith> hemna: well how do you solve the problem that you said I don't solve?
21:51:51 <hemna> have os-reserve create the attachment table entry and return the attachment_id
21:51:57 <hemna> nova has it for every cinder call after that.
21:52:01 <jgriffith> hemna: uhhh
21:52:06 <hemna> including initalize_connection
21:52:09 <jgriffith> so 2 questions:
21:52:18 <hemna> can I finish please ?
21:52:22 <jgriffith> 1. why put it in reserve
21:52:39 <jgriffith> 2. how is it better to just use my proposal in a different call?
21:52:44 <jgriffith> hemna: yes, sorry
21:52:49 <hemna> thanks
21:53:14 <hemna> so I like having it returned in os-reserve, because then every nova call has the attachment_id for what it's working on.  it's clean, and explicit.
21:53:47 <hemna> that solves the issue that I see in your wip for initialize_connection not handling multi-attach (re: manager.py line 920)
21:54:10 <jgriffith> ?
21:54:14 <hemna> we still have the issue of nova needing to know when it's safe to call osbrick.disconnect_volume.
21:54:46 <hemna> but that can be overcome with the shared flag in connection_info coming back from initialize_connection, as well as having the host in the attachments table along with instance_uuid.
21:55:08 <hemna> https://review.openstack.org/#/c/284867/2/cinder/volume/manager.py  line 920
21:55:15 <hemna> attachment = attachments[0]
21:55:15 <jgriffith> hemna: so what your proposing though just creates an empty entry in the table during reserve, then it gets updated the same places as it does in my wip no?
21:55:25 <jgriffith> hemna: because reserve doesn't have any info (it can't)
21:55:34 <hemna> that gets the first attachment it finds for that volume, which can be on any host or against any instance_uuid.
21:55:48 <hemna> all reserve needs to do is create the volume_attachments entry
21:55:54 <hemna> and get the attachment_id and return that.
21:56:12 <jgriffith> hemna: that's better why?
21:56:14 <hemna> nova has the instance_uuid, it can pass that to os-reserve as well.
21:56:36 <mriedem> i was going to say, seems os-reserve needs the instance uuid
21:56:37 <hemna> because we have the attachment_id for initialize_connection and we can always work on the correct attachment.
21:56:42 <hemna> mriedem, yup.
21:56:49 <hemna> then the API from nova to cinder is explicit
21:57:04 <hemna> both nova and cinder knows which attachment is being worked on
21:57:13 <hemna> there is no guess work.
21:57:17 <jgriffith> mriedem: hemna I'm still unclear on how this is any different?
21:57:17 <mriedem> hemna: does nova also pass the instance.host to os-reserve?
21:57:29 <hemna> it can
21:57:46 <hemna> then the attachment has both of those items already set at reserve time.
21:57:52 <jgriffith> hemna: mriedem in fact all the code/impl is exactly the same.  You just turn reserve into something different that creates a dummy db entry?
21:57:55 <ildikov> maybe it should
21:57:57 <scottda> maybe we should focus on how these 2 alternatives are different....next meeting.
21:58:07 <ildikov> I don't know the calls during live migration, etc. though
21:58:42 <ildikov> scottda: +1
21:58:45 <hemna> live migration just updates the attachment and changes the host from A to B
21:58:58 <scottda> Perhaps disregard whether the attachment_id is created and returned to Nova in reserve() or intitialize_conn and see where there is agreement.
21:59:11 <mriedem> i have to head out folks
21:59:12 <jgriffith> mriedem: hemna but it already passed *ALL* of that during the existing intialize_connection that we already have
21:59:18 <scottda> And, if possible, create more diagrams!
21:59:23 <mriedem> can we agree on some action items for next week?
21:59:36 <hemna> I can try and work on the other diagrams
21:59:42 <hemna> I'll see if I can switch to using dia
21:59:50 <hemna> since it's opensource.
21:59:52 <scottda> Perhaps some diagrams that show the proposed changes as well?
22:00:12 <mriedem> jgriffith: the only difference from my POV there is os-reserve is in nova-api before we cast to compute where we do os-initialize_connection
22:00:14 <hemna> initialize_connection doesn't have the attachment_id currently
22:00:16 <mriedem> i don't know if that makes much difference
22:01:01 <jgriffith> mriedem: sure.. but it's odd to me.  Reserve was specifically to address race conditions and only set the state in the DB to keep from loosing the volume during the process
22:01:02 <ildikov> we could also evaluate the cases like live migration, shelve, etc. to see whether we have issues with any of the above listed calls
22:01:18 <jgriffith> I can't see what the advantage to changing that is
22:01:29 <mriedem> only the attachment_id if we need that later i guess
22:01:31 <hemna> jgriffith, it sets the volume to attaching and yet there is no attachment entry.
22:01:35 <mriedem> like a reservation id
22:01:39 <jgriffith> mriedem: it's not a big deal, and probably would be just fine.  Just trying to understand the advantage
22:01:56 <hemna> then nova could have the attachment_id during calls to initialize_connection
22:02:04 <hemna> which works for single and multi-attach
22:02:12 <mriedem> if there are other gaps with live migration/evacuate/shelve offload, then maybe we work those through both approaches and see if there is an advantage, like ildikov said
22:02:13 <jgriffith> hemna: ok, sure I guess
22:02:31 <hemna> ok, so diagram for detach
22:02:35 <hemna> diagram for live migration
22:02:39 <mriedem> #action hemna diagram for detach
22:02:40 <hemna> if I can get to those.
22:02:49 <mriedem> #action hemna diagram for live migration
22:02:56 <scottda> hemna: I'll help with those
22:02:58 <mriedem> #action hemna and jgriffith enter the pit one leaves
22:03:05 <hemna> lol
22:03:06 <cFouts> heh
22:03:08 <jgriffith> mriedem: LOL
22:03:10 <jgriffith> mriedem: nahhh
22:03:13 <scottda> mriedem: You didn't know about the cage at the Summit?
22:03:14 <jgriffith> hemna: can have it
22:03:20 <ildikov> mriedem: yeah, that might show some differences, if not than we can still decide by complexity, amount of changes, any risks, etc.
22:03:30 <ildikov> mriedem: lol :)
22:03:52 <scottda> Whichever we decide, we're still going to have to explain to other Cinder and Nova reviewers...
22:03:54 <jgriffith> just change all of the api signatures... wtf, it'll be fun :)
22:03:55 <mriedem> as a cinder outsider, i'm interested in both but don't know the details or pitfalls enough either way
22:03:58 <scottda> so the diagrams will come in handy.
22:04:16 <mriedem> i.e. i'll need to be told in both cases what the nova changes would look like
22:04:24 <hemna> yup
22:04:32 <mriedem> i think i understood hemna's more last week when we were talking through it
22:04:47 <ildikov> scottda: those are handy in general, I'm already a fan!
22:04:50 <mriedem> anyway, my 2 cents
22:04:56 <mriedem> gotta go though
22:04:59 <mriedem> thanks everyone
22:05:00 <ildikov> scottda: helps much in figuring out what's going on
22:05:06 <scottda> mriedem: Thanks!
22:05:18 <scottda> I'm going to have to head soon myself....
22:05:26 <scottda> Anything else we can get done here today?
22:05:34 <ildikov> I think we're done for today
22:05:43 <andrearosa> thanks everyone
22:05:44 <ildikov> or well, my brain is at least :)
22:06:06 <ildikov> I will announce the next meeting on the ML for next week as a reminder
22:06:06 <scottda> Get some sleep ildikov andrearosa DuncanT
22:06:14 <scottda> Thanks!
22:06:15 <ildikov> we can track the action items on the etherpad
22:06:37 <ildikov> hemna: can you link the diagrams there if you haven't done already?
22:06:46 <DuncanT> G'night all
22:06:58 <hemna> ildikov, I linked the attach diagram in the etherpad
22:07:08 <ildikov> hemna: coolio, thanks much
22:07:19 <scottda> Bye all
22:07:21 <ildikov> ok, then thanks everyone
22:07:43 <ildikov> have a good night/evening/afternoon/morning :)
22:07:54 <ildikov> #endmeeting