16:13:17 #startmeeting Cinder 16:13:18 Meeting started Wed Nov 13 16:13:17 2013 UTC and is due to finish in 60 minutes. The chair is DuncanT-. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:13:19 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:13:22 The meeting name has been set to 'cinder' 16:13:37 DuncanT-: good call 16:13:44 avishay isn't here yet AFAICT 16:14:09 Is Ehud Trainin here? 16:14:17 yes 16:14:21 go figure 16:14:28 #topic Fencing 16:14:44 Oh, hi Jogn, all yours ;-) 16:15:02 Jogn? John even 16:15:07 :) 16:15:25 Looks like Ehud is hanging with Dave W 16:15:41 ehudtr: The stage is all yours.... 16:15:59 Following last dicussion I agree two of your comments regarding the fencing implementation 16:16:13 I accept your comment fencing should take care also for detaching the volumes at the Cinder level. 16:16:28 I accept your comment it is not necessary to add into Cinder a blacklist of hosts nor an unfence method. 16:17:03 I do think it should be possible to fence/force-detach a host through a new method of force-detach-host, rather then trying to change the current detach-volume method to force a detachment at the storage level and use such method for each one of the volumes attached to a host. 16:17:40 for several reasons 16:17:43 In case of NFS it is not possible to force-detach a volume. 16:18:00 In cases it is possible, there would still be a problem when shared volumes would supported by Cinder 16:18:06 ehudtr: TBH in NFS we don't really ever detach to begin with :) 16:18:23 It is an optimization, which may be valuable for fast recovery: send 1 request rather than N (e.g. 100) requests 16:19:37 Isn't "NFS support" something the specific volume driver would be responsible for? 16:19:53 I still have the same concerns I raised previously WRT the complication and potential for errant fencing to occur 16:20:11 I think we would like to prevent access at the storage level 16:20:15 jgriffith: +1 16:20:16 My only other question is "is this a real problem" 16:20:48 anybody else have any thoughts on this? 16:21:01 is there a proposal written up somewhere? 16:21:02 live migration, maybe 16:21:04 I think it is a real problem, yes. We do something very similar to a fence in compute node startup 16:21:11 https://blueprints.launchpad.net/cinder/+spec/fencing-and-unfencing 16:21:16 jgriffith: thanks! 16:21:16 Fencing is something standard done in HA clusters like pacemaker 16:21:32 someone reported such issue, when using ceph 16:21:46 too bad this wouldn't apply to Ceph :) 16:22:04 winston-d: did they raise a bug for that in the end, i did not see one 16:22:20 I think as OpenStack is moving to HA we need to be considering how Cinder fits into that. 16:22:21 The current option in openstack to do a rebuild without first fencing is a bug in my opinion 16:22:52 dosaboy: no, i don't think so. 16:22:58 jungleboyj: don't confuse this with HA impl 16:23:05 Ok 16:23:17 ehudtr: sounds like folks are in favor of moving forward on this 16:23:35 jgriffith: Ah, sorry. 16:23:42 ehudtr: My concerns as I stated are just how to do this cleanly and mitigating having admins shoot themselves 16:23:53 as a counter example, we were looking at a way to do the exact opposite and whitelist at the ip level 16:24:13 guitarzan: I thnk that's kinda how the LIO target works 16:24:43 guitarzan: or how it "could" work I guess, right now we just read the connector info but it has hooks to have a white list 16:25:15 jgriffith: nice, I'll have to look at that 16:25:24 I think one possible way to prevent admin to press to easily on the fence botton is enabling only if a host in a failed state 16:25:41 I can see the point of the idea... waiting to see code before I have a strong opinion... 16:25:48 How does cinder know a host is failed? 16:25:58 guitarzan: https://github.com/openstack/cinder/blob/master/cinder/brick/iscsi/iscsi.py#L441 16:26:24 Cinder needs not know the host is failed Nova and possibly Heat will know 16:26:33 DuncanT-: notified by Nova? 16:27:00 I agree, cinder cannot determine this itself. It needs to come from other OpenStack components. 16:27:03 so add an API call to "notify-invalid-iqn's" or something of the sort? 16:27:23 and how is it cleared :) 16:27:39 nova has to then have another command to add something back in 16:27:55 honestly seems like this all needs to happen in nova first 16:28:01 Would't it be a one time transition. "Clear any attached volumes held by this compute instance."? 16:28:08 I think there's more work there than on this side (ie failure detection etc) 16:28:32 caitlin56: sure, but if you blacklist a node, what happens when it comes back up and you want to add it back in to your cluster 16:28:44 You have to clear it somehow 16:29:06 It's not just clear current attach if I understand ehudtr correctly 16:29:07 A node is attempting to re-attach without having been in contact with nova? 16:29:22 ehudtr: it's clear an attach and prevent that attach from being reconnected no? 16:29:29 jgriffith: I was assuming that this would be managed by Nova and Cinder would just provide the tools. 16:29:34 thus the term "fencing" 16:29:44 jungleboyj: yes, that's what I'm getting at 16:29:58 jungleboyj: a good deal of nova work before getting to Cinder 16:30:09 and the Cinder side might not be so tough to implement 16:30:12 jgriffith: +2 16:30:20 emphasis on *might* 16:30:22 :) 16:30:25 I can definitely see the need for nova being able to tell Cinder "this guy is gone" but "don't talk to this guy" is something more dangerous. 16:30:28 jgriffith: Just wanted to emphasize that. 16:30:45 Couldn't nova use neutron to enforce that without bothering Cinder? 16:30:50 caitlin56: but I think that's what we're talking about... ehudtr ?? ^^ 16:30:54 and that's nova's problem 16:31:04 caitlin56: nope 16:31:07 Yes, I agree this would be managed by Nova. The fence host is needed to disconnect the host at the storage level 16:31:18 I don't want Neutron mucking about with my data path 16:31:26 caitlin56: Nope, neutron doesn't get in the way of storage network usually 16:31:42 ehudtr: but the question is you also want to prevent the failed node from connecting again right? 16:31:59 I think this is where the debate started last week :) 16:32:17 ehudtr: The important part is that you don't have the 'failed' host's node attempting to access the storage while the new node is being brought up. 16:32:50 jgriffith: I would assume that is only if a new node has taken over. 16:32:59 Such a "failed node" shouldn't be using *any* OpenStack services, right? It's not just Cinder. 16:33:04 Ok, two more minutes for this topic and then I think we should move on 16:33:22 jungleboyj: sure 16:33:29 caitlin56: haha... probably 16:33:37 caitlin56: Some services multiple accesses don't matter... block store it does 16:33:44 caitlin56: Depends on how it fails. In situations like this you need to cover all the bases. 16:33:46 this is where I went all wonky in the last discussion :) 16:34:04 Yes, this was part of the original suggestion, but last meeting you suggested that preventing the failed node attempting during a new node creation may be done in Nova. I checked it and it seems this might be done in Nova only after the attach volume would be moved from nova-compute to nova-conductor. 16:34:23 Ok... so my proposal: 16:34:26 ehudtr: 16:34:31 1. Take a look at the nova work 16:34:38 Focus on things like failure detection 16:34:48 How you would generate a notification 16:35:03 what would be needed from cinder (if anything versus disabling the initiator files etc) 16:35:14 2. After getting things sorted in Nova 16:35:14 I know how to do failure detection with Nova 16:35:29 Great, step 1 is almost done then :) 16:35:38 hah 16:35:46 Then put together the proposal, make sure the Nova team is good with that 16:36:00 from there we can work on providing an API in Cinder to fence of initiators 16:36:09 I'd like to see what the code for that looks like though 16:36:16 and how to clear it 16:36:22 ehudtr: sound reasonable? 16:36:27 yes 16:36:38 DuncanT-: guitarzan jungleboyj caitlin56 ok ^^ 16:36:43 winston-d: 16:36:46 Sounds sensible to me 16:36:54 jgriffith: Sounds good to me. Good summary to keep moving forward. 16:37:17 winston-d: seem like that works for the error case you were thinking of? 16:37:36 dosaboy: I have no idea how to make this work with Ceph but that's why you're an invaluable asset here :) 16:37:54 hemna_: you'll have to figure out FC :) 16:38:10 Ok... 16:38:10 just go yank the cable out 16:38:18 guitarzan: I'm down with that 16:38:28 guitarzan: DC Monkey.. fetch me that cable! 16:38:35 hi guys, we at Nexenta are implementing a storage-assisted volume migration; we have run into a problem: there can be multiple storage hosts connected to a single NFS driver. So there's one-to-many mapping... 16:38:52 dsfasd 16:39:01 #topic patches and release notes 16:39:07 winston-d: what's dsfasd? 16:39:10 jgriffith: it may actually be easier for ceph since it has the notion of 'watchers' 16:39:13 winston-d: lagging? 16:39:17 dosaboy: yeah :) 16:39:17 jgriffith: sorry lagging 16:39:21 haha 16:39:22 hodos: Please wait until the any other business section of the meeting 16:39:23 no prob 16:39:33 So quick note on this topic 16:39:40 ok, sorry ) 16:39:55 #topic patches and release notes 16:40:00 reviewers, I'd like for us when adding a patch that's associated with a BP or a Bug to update the doc/src/index file 16:40:14 that way I don't have to go back and try to do it every milestone :) 16:40:26 Same format as what's there 16:40:30 simple summary, link 16:40:34 sound reasonable? 16:40:50 rolling release notes :) 16:41:16 * jgriffith takes silence as agreement :) 16:41:20 sounds good 16:41:25 jgriffith: Sounds reasonable. 16:41:27 or just plain lack of interest and apathy 16:41:31 kk 16:41:37 now to the hard stuff :) 16:41:45 Something for reviewers to catch I guess... 16:41:51 DuncanT-: to catch yes 16:42:02 we do agree to write a cinder dev doc, right. make sure this is documented as well 16:42:10 not a horribly big deal but it would be helpful IMO 16:42:11 jgriffith: So just to be clear as a newbie ... 16:42:12 Might be able to get a bot to catch simple cases after a while 16:42:14 winston-d: excelelnt point 16:42:30 DuncanT-: hmmm... perhaps a git hook, yes 16:42:34 jgriffith: If I approve something associated with a BP I would need to go update that file with appropriate information? 16:42:41 jungleboyj: oh.. no 16:42:50 jungleboyj: so the idea is that the submitter would add it 16:43:00 when core reviews it we should look for that entry 16:43:18 if people hate the idea or think it's a waste that's ok 16:43:20 just say so 16:43:26 I don't mind doing it the way I have been 16:43:30 and if you don't approve patches without that link people should learn very quickly. 16:43:37 just don't complain if your change isn't listed :) 16:43:55 jgriffith: Ahhh, ok ... That makes more sense. Thanks for the clarification. 16:44:03 I'd say we try it for a couple of weeks and see how it works out 16:44:08 works for me 16:44:14 trial basis 16:44:18 kk 16:44:29 #topic summit summary 16:44:36 hmmm 16:44:44 #topic summit-summary 16:44:49 come on meetbot 16:45:01 #topic summit-summary 16:45:08 I started the meeting 16:45:12 It knows I am also in another summit summary meeting and my head may explode. 16:45:23 lol 16:45:30 DuncanT-: thanks :) 16:45:33 okie 16:45:37 o-) 16:45:53 I threw a quick overview together: https://etherpad.openstack.org/p/cinder-icehouse-summary 16:46:05 of course it's handy to review the etherpads from the sessions 16:46:15 but I wanted to capture the main points in one doc 16:46:36 I *think* these are the items that we had moderate concensus on 16:46:54 the capabilities reporting maybe not so much... but I'm still pushing to go this route 16:47:15 We can always make things *harder* but I don't know that we should 16:47:24 I'd add that we seemed to agree state machine with atomic transitions was a good route to try, re taskflow 16:47:35 DuncanT-: for sure 16:47:53 added 16:48:10 anything else glaring that I missed (that somebody will actually get to)? 16:48:26 jgriffith: i think most of us in this room agreed on those capabilities 16:48:32 Making snapshots a first layer object. 16:48:33 I left the import out intentionally for now by the way 16:48:35 My take-away from the capabilities reporting was that we couldn't agree on anything at all 16:48:45 DuncanT-: I don't think that's really true 16:48:55 I think one person didn't agree 16:49:05 jgriffith: I see no harm in adding it anyway 16:49:06 I think most of us agreed with what i've put on the list 16:49:28 winston-d: I think you may have some other ideas/adds that would be fine as well 16:49:41 anyway... 16:49:51 anything pressing to add or remove here? 16:49:53 jgriffith: O'll try to BP the stuff that redhat guy eventually explained, since it seemed valuable once I finally understood him 16:50:02 s/O'll/I'll/ 16:50:03 We need more thought/info around the ACL's I think 16:50:24 DuncanT-: cool 16:50:31 DuncanT-: can't wait to see the BP 16:50:33 DuncanT-: or work off the ehterpad for now 16:50:46 whichever is faster and more effective 16:50:58 if we reach concensus prior to the BP it might help :) 16:51:01 jgriffith: Etherpad might be easiest... I'll post a link when I'm done 16:51:10 sounds good 16:51:24 Everybody should feel free to put some notes add question to the etherpad 16:51:32 but the intent is not to open debate 16:51:42 just to focus on what's there and build the ideas up 16:51:54 and use that info to build blueprints 16:51:57 and assign :) 16:52:14 anybody want to talk more on that? 16:52:38 if not I believe hodos had some things to talk about 16:52:41 #topic open 16:52:45 * jgriffith never learns 16:52:51 #topic open 16:52:58 ok so it touches not only Nexenta 16:53:02 :-) 16:53:12 hodos: back up... what's "it" 16:53:27 afraid I ignored you earlier :) 16:53:54 so if we want ot do storage-to-storage migration without routing data through Cinder 16:53:57 on NFS 16:54:11 we have 2 NFS drivers 16:54:38 hodos: our bigger priority is enabling more operations on snapshots. I think the fix required for NFS is too much to tackle by icehouse. 16:55:02 lawl, time change 16:55:06 so how does the source driver knows what storage host to use on the dest 16:55:10 thingee: :) 16:55:17 thingee: you made it 16:55:29 hodos: scheduler could help with that 16:55:33 Speaking of snapshots, I didn't hear any opposition to enabling snapshot replication. Shouldn't that be added to your list jgriffith? 16:55:34 that is it's job after all 16:55:59 yes, but when I issue a command on the source storage driver 16:56:06 i need that info 16:56:15 hodos: :) 16:56:24 u 16:56:35 hodos: frankly this is why I hate the whole "I'm going to talk to this backend directly" problem 16:56:55 update_volume_stats for NFS driver not provide information about host 16:57:05 I'm not a fan of trying to implement Cinder based replication 16:57:11 hmm 16:57:27 * jgriffith thinks it's a bad idea 16:57:31 why does the driver need to know which host (sorry catching up) 16:57:44 jgriffith the alternative is inefficient replication 16:57:51 thingee: he wants to talk directly from his backend to his *other* backend 16:58:04 to *my* other backend 16:58:05 caitlin56: actually the atlernative is cinder doesn't do replication 16:58:12 vito-ordaz: feel free to add any capability that you want to report,just note that scheduler can only consume some of them (basic ones) 16:58:21 maybe hodos is not a "he" 16:58:27 i'am 16:58:29 ) 16:58:36 med_: fair 16:58:43 or not. 16:58:48 everyone... I aplogize for being gender specific 16:58:50 jgriffith, hodos: what's the use case? 16:59:06 thingee: migration from one backend to another 16:59:32 homogeneous 16:59:33 why does a driver have to know? shouldn't cinder just be the bridge with that knowledge? 16:59:33 yes, say the same vendor, so these backends know how to talk 16:59:35 problem it that NFS drivers can control many storage backend at the same time. 16:59:43 of another backend that meets that requirement...the scheduler will figure 16:59:47 thingee: so they can do it more cheaply 16:59:59 hodos: don't we have a shortcut in migration? 16:59:59 it sounds hard to me :) 17:00:05 so cinder says, hey you two backends, talk to each other 17:00:13 ) 17:00:20 the other backend never initates it is my point 17:00:26 jgriffith: that's time 17:00:28 For the record, I don't think we should even implement replication in Cinder 17:00:28 got it 17:00:44 DuncanT- has to throw the switch today 17:00:46 Let the sysadmin setup replication between devices if avaial and create a volume-type for it 17:00:57 Right, I'm afraid we need to move channels 17:01:02 yep 17:01:02 #endmeeting