16:00:01 #startmeeting Cinder 16:00:02 Meeting started Wed Feb 8 16:00:01 2017 UTC and is due to finish in 60 minutes. The chair is smcginnis. Information about MeetBot at http://wiki.debian.org/MeetBot. 16:00:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 16:00:06 The meeting name has been set to 'cinder' 16:00:08 Howdy 16:00:09 hi! 16:00:13 yough 16:00:13 hi 16:00:20 ping dulek duncant eharney geguileo winston-d e0ne jungleboyj jgriffith thingee smcginnis hemna xyang1 tbarron scottda erlon rhedlind jbernard _alastor_ bluex karthikp_ patrickeast dongwenjuan JaniceLee cFouts Thelo vivekd adrianofr mtanino yuriy_n17 karlamrhein diablo_rojo jay.xu jgregor baumann rajinir wilson-l reduxio wanghao thrawn01 chris_morrell stevemar watanabe.isao,tommylikehu 16:00:24 hi 16:00:25 hi 16:00:26 Hello. 16:00:26 mdovgal ildikov wxy viks ketonne 16:00:29 hey 16:00:31 <_alastor_> o/ 16:00:35 hello) 16:00:39 o/ 16:00:41 Dang it, we had a nice short agenda when I looked this morning. :) 16:00:41 o/ 16:01:09 #topic Announcements 16:01:11 good news 16:01:15 I'll keep it short. 16:01:35 Summit CFP has been extended, so a few more days to get in talk proposals. 16:01:51 I believe the current deadline is the 8th. 16:02:01 #link https://etherpad.openstack.org/p/ATL-cinder-ptg-planning PTG topic planning 16:02:02 o/ 16:02:24 So just around 15 hours left 16:02:26 I'll try to organize topic ideas for the PTG ahead of the event. 16:02:47 dulek: Wow, I didn't even realize today was the 8th. :D 16:03:01 Hello :) 16:03:01 Isn't it still January? 16:03:06 :) 16:03:11 smcginnis, Neither did I. :-) 16:03:13 o/ 16:03:19 Technically the CFP deadline is February 9th my time. ;) Timezones are hard. 16:03:20 o/ 16:03:37 dulek: You get bonus time. ;) 16:03:40 hi 16:03:49 #link https://etherpad.openstack.org/p/cinder-spec-review-tracking Review focus 16:03:54 I've started to update the etherpad. 16:04:12 Will do more I'm sure after the PTG discussions, but it's a start for Pike focus. 16:04:58 Final RC for Ocata needs to be next week. So if we have any critical bug fixes that need to make it in, make sure you are proposing backports to stable/ocata. 16:05:04 master is pike now 16:05:31 thank you igor 16:05:37 kul 16:05:46 Doesn't look like the pike schedule has been finalized yet, but that will be available here: 16:05:56 #link https://releases.openstack.org/pike/schedule.html Pike Schedule (coming soon) 16:06:09 smcginnis: ready to merge the codes for pike? 16:06:47 tommylikehu_: Yes, we can approve patches for Pike now since master is what will become pike. But we should really focus on wrapping up Ocata where needed yet. 16:06:56 let's the merge party started) 16:07:06 smcginnis: ok 16:07:13 smcginnis: Pike schedule draft: https://review.openstack.org/#/c/425254/ 16:07:31 Oh, also just wanted to drop the idea of a Cinder outing/gathering in Atlanta. Maybe Tuesday night? Just something to think about for now. 16:07:31 And rendered: http://docs-draft.openstack.org/54/425254/2/check/gate-releases-docs-ubuntu-xenial/74aaf33//doc/build/html/pike/schedule.html 16:07:35 dulek: Thanks! 16:07:58 #topic A/A certification of the drivers 16:08:05 dulek: lLl yours. 16:08:09 *all 16:08:11 fail 16:08:11 I just wanted to start up the discussion around A/A support for drivers. 16:08:21 There are two points of view - vendor and community drivers. 16:08:38 Should we require something special (CI? test results and logs?) from vendors to prove that their drivers are A/A capable, or do we just trust their own internal testing? 16:08:53 dulek, isn't the first step in the drivers is to change the local file locks to the coordinator locks? 16:09:06 My inclination is to just trust their internal testing. 16:09:08 hemna: Yes, but I think some drivers already did that. 16:09:23 dulek, ok, there are plenty that still have the local file locks 16:09:30 and even some drivers trying to add new local locks 16:09:34 I've been -1'ing those 16:09:42 dulek: while we don't haev a way to assert vendors are really supporting we wil lhave to rely on that 16:09:51 hemna: It may be that sometimes just a local, thread-level lock is required. 16:10:01 hemna: Then it's fine not to use coordination module. 16:10:07 hemna: Not all locks need to be changed 16:10:13 sure, but on API entry points..... 16:10:21 The special CI would be cool, but it would be pretty hard to have an automated suite that gives high confidence there were no issues for it 16:10:30 I'm also on the side of requiring no CI - that require a huge amount of resources. 16:10:51 dulek: I'm ok as well, as long as we define a testing protocol 16:11:08 That vendros can follow to validate their envs 16:11:14 s/envs/drivers 16:11:25 The basic stuff would be running tempest tests 16:11:33 geguileo: Good point, would be cool to have it. We could require posting the results before accepting the patch that switches SUPPORTS_ACTIVE_ACTIVE. 16:11:34 dulek: Like we did last week 16:11:44 geguileo: I like that. Some basic guidance of what should be tested. 16:11:46 :| 16:11:57 this doesn't lend a lot of confidence in A/A 16:12:02 Then if they want to do additional testing they can, but they at least know the minimum we think should be covered. 16:12:03 further more, we don't yet have tests that inject failures, 16:12:04 I believe scottda_phone was looking into this 16:12:11 dulek, +1 16:12:24 geguileo: I haven't developed anything yet... 16:12:43 But I'd like a plan to do some error injection, at least manually 16:12:44 erlon: I should be working on that, but I'm enjoying some multipath fun :-( 16:12:48 I've seen a Mirantis fault-injection framework somewhere. 16:12:55 os-faults was it called 16:12:57 ? 16:13:05 dulek: That's not great for us... 16:13:11 https://github.com/openstack/os-faults 16:13:13 dulek: It basically allows you to kill the service 16:13:19 And some more actions 16:13:28 but you can't control where in the code you are 16:13:40 so you can't actually control if you are testing what you want to test 16:13:44 geguileo: Oh, right, without that tests are non-deterministic. 16:13:50 yup 16:13:59 geguileo: hmm, that is so hateful, I run miles not to touch FC 16:14:13 erlon: It's iSCSI this time 16:14:33 Before discussing the tool/library to use 16:14:39 geguileo: hmm, much better then 16:14:42 We may want to talk about what we want tested 16:15:39 I guess that's too big for this meeting. 16:15:53 geguileo: we could start by the main areas you addressed in the initial specs for AA 16:15:53 yeah, but I think that's where we should start 16:15:54 But we should discuss at PTG 16:16:01 But we have this great conclusion that we should have a required test plan for vendors to run to get A/A badge. 16:16:07 Yeah, probably better at the PTG. And/or in channel or the testing meeting. 16:16:11 scottda: +1 16:16:13 dulek: +1 16:16:28 vendors probably have ideas of what parts of their drivers could have issues 16:16:39 And we can use that feedback 16:16:46 Definitely. 16:16:54 Plus should be some areas generic to all drivers. 16:17:07 i.e. stuff from geguileo 's manual tests 16:17:16 It might vary between vendors too, so I think having a base set of tests, then leaving the rest up to each vendor to test whatever else they feel is important internally is good. 16:17:28 scottda: But even for the specific cases, we would want to test those cases in all drivers 16:17:42 geguileo: yes, I agree 16:17:53 smcginnis: I think even the specifics should be run by all drivers 16:18:10 smcginnis: Even if you don't have problems with that case, it's always best to be sure 16:18:47 geguileo: If we'll get some docs from vendors that describe their specific cases - that's fine to include them, but it isn't a perfect world. ;) 16:19:05 dulek: I wasn't looking for something so formal 16:19:31 I'm ok with: "we could have problems if you run operation x and opration y simultaneously" 16:19:51 I was going to ask if we were talking about having vendors document what they were doing. 16:20:28 This is going to be one of those features that the amount of testing and pressure that goes into it is going to depend on what the vendor's customers are pushing. 16:20:55 geguileo: Oh, right, that works for me. It's just I don't expect all of vendors will give such feedback. But it will be welcomed. 16:21:04 Well, the customers are the testers, right? 16:21:11 :) 16:21:12 lol 16:21:21 I'd really rather not make this a heavy process. I think giving guidance on what areas are interesting is good, but then just leaving it at the vendor saying, yes, we are comfortable with our driver being used in production with this. 16:21:27 scottda: Aren't you that "testing guy"? 16:21:31 :D 16:21:32 dulek: I'll try to do it for the RBD driver 16:21:35 hah 16:21:38 scottda, Maybe your's are. :-) 16:21:58 dulek, Good point. 16:22:07 Okay, with geguileo note about RBD - we're at community drivers now. 16:22:09 We can share our experiences. 16:22:38 How do we plan to test vendorless drivers? 16:22:42 smcginnis, That makes sense. Heavy Weight process won't be followed anyway. 16:22:59 dulek: gate? 16:22:59 jungleboyj: Not without a lot of effort on our part. ;) 16:23:20 * jungleboyj thinks of 3rd Party CI. 16:23:30 erlon: Could be, although I don't know how many more gates can we add without infra being upset about it. 16:23:47 at least could be experimental 16:24:04 dulek: We could leave it to those vendors/communities. I.e., Red Hat can test and claim that Ceph is good, Linbit can validate sheepdog, etc. 16:24:22 And until they decide to do that, those just won't be marked as A/A capable. 16:24:23 dulek: hmm, yeah we can try to change the actual jobs to increase the cover without the need to add more 16:24:27 Just an idea.. 16:24:43 We kinda need Ceph as a reference, since LVM has issues with AA 16:24:43 smcginnis: That could work. Who's our NFS vendor? 16:24:48 scottda, That assumes we come up with test cases that are automated for A/A ? 16:24:52 smcginnis: +1 16:25:12 dulek: Hmm... erlon? :D 16:25:21 dulek: not sure if theres anyone else, but Hitachi has 16:25:25 smcginnis: :) 16:25:25 erlon: Hm, this would require switching the jobs to multinode, which still requires more resources. 16:25:50 eharney, Would RedHat do the testing? 16:26:10 So that's a good point that community drivers also have some vendor ownership. :) 16:26:20 dulek: yeah, but better have 1 multinode job than add a single node plus the multinode to test only HA 16:26:46 And those without real vendor support will be up to someone in the community (dev/user) deciding to put in the effort to test it. 16:27:04 And if no one cares, then it just doesn't get marked supported. 16:27:24 smcginnis: or leave it as AA unsopported until someone decides to do do right? 16:27:30 smcginnis: ok 16:27:32 Makes sense. 16:27:48 erlon: yep 16:28:01 Okay, I just wanted anyone to share initial thoughts on the topic, I think we've captured this. 16:28:02 smcginnis, Again, it will be driven by customer need. 16:28:17 We might want to brainstorm some ideas as to what we'd need for ideal testing (in the gate) and take to infra... 16:28:27 jungleboyj: Yep, exactly. If no one needs it, no use bothering with it. 16:28:32 i.e. can we have a mechanism for killing the service(s)? 16:29:22 scottda: I believe that falls in the undeterminist problem geguileo mentioned 16:29:23 We don't have a way to disable doing A/A right now, do we? It is just a 'support' statement. 16:29:25 or use Rally for stress testing. 16:29:31 erlon: Not necessarily 16:29:42 just deterministically kill a service 16:29:54 jungleboyj: right now a/a is disabled for all drivers 16:30:05 jungleboyj: you cannot run cinder in a/a with any driver 16:30:08 Have a multinode test, create, attach, etc...then kill one c-vol and re-test attach, etc 16:30:09 scottda: hmm, may be that would be deterministic if you didn't have any operation running 16:30:17 #action team to come up with testing plan for A/A 16:30:23 erlon: Right, just basic stuff 16:30:34 Who's starting that etherpad? 16:30:35 :) 16:30:41 geguileo, Ok, thanks. Didn't realize that. 16:30:49 np 16:31:18 I can sum this up, add PTG topic and a new etherpad to gather plans. 16:31:18 #info Up to vendors to test and claim support for a/a 16:31:29 dulek: +1 That'd be great. 16:31:56 dulek: I think I've something on the PTG agenda... 16:32:01 #info Open source drivers will need to be tested for a/a by backing vendor or someone from one of those communities 16:32:02 dulek: Please add your name 16:32:21 scottda: Sure! 16:32:59 #action dulek to create etherpad and prep for discussion at the PTG 16:33:02 Fair? ^^ 16:33:11 Sure! 16:33:19 Thank you. 16:33:26 kids -> school. AFK 16:33:39 * jungleboyj waves at hemna 16:34:00 dulek: OK, are we good on this topic for now? 16:34:32 smcginnis: Yup. 16:34:39 dulek: Great, thanks 16:34:45 #topic Bulk create of volumes 16:34:53 Guess it's on xyang1 and scottda 16:35:06 * jungleboyj is having dejavu . 16:35:12 hi 16:35:16 So, we've customers already doing bulk creates.... 16:35:22 especially for testing. 16:35:34 i.e. spin up 50-100 VMs with 50-100 volumes 16:35:36 Didn't we discuss this at the Austin summit? 16:35:49 smcginnis: We've discussed it, but... 16:36:01 scottda: using heat? 16:36:05 at the time, there was limited interest from vendors 16:36:06 we talked about it at midcycle 16:36:17 smcginnis, I know we talked about it recently. 16:36:26 We know EMC/Dell, IBM, and Datera could use this... 16:36:27 <_alastor_> I have a customer spinning up 2400 VMs + Volumes, so bulk create would be very valuable 16:36:33 scottda, Is with a new vendor. :-) 16:36:44 scottda: Excuse me, but that's Dell EMC. :P 16:36:45 Hah! 16:36:55 Problems with bulk create starts when some of creations fail… 16:36:56 :) 16:37:09 dulek: Sure, there are details to work out... 16:37:21 We're just looking for a Community OK to go forward with a spec 16:37:32 dulek, all problems start when something fails) 16:37:41 scottda: do you have a plan for vendors that dont support that? 16:37:49 mdovgal: :D 16:37:49 So the argument against is it's easy to script a loop to do this. The argument for is that for some storage it can all be passed down to the storage in one call and be much more efficient. RIght? 16:38:04 erlon: IT's a feature that doesn't have to be supported by each vendor 16:38:05 scottda: or that does not rely on a hardware capability 16:38:14 smcginnis: Correct 16:38:22 scottda: Hey, it's pretty easy to do in a generic way… 16:38:25 It's simply more efficient for some arrays 16:38:56 scottda: So if it's not supported by each vendor, that kind of implies then that we need to build in extra logic to Cinder to do the batch operation sequentially and return the results. 16:38:59 scottda: so its just moving the 'for' loop inside Cinder 16:39:24 Nova is living with bulk create for some time now. I think we should ask them if that haven't turned out bad. 16:39:27 erlon: But some arrays can create many volumes in much less time than looping through.. 16:39:39 dulek: Good question.. 16:39:57 It's case like boot-from-volume that they hate as it moves orchestration into OpenStack. Which isn't easy. 16:40:04 scottda: well we would need to see how much more efficient that is than just calling heat or using a bash 'for' 16:40:07 erlon: for some arrays create 10 volumes takes the same amount of time as create 1 16:40:16 xyang1: VMAX was one of the big wins here, right? It can create many in the same amount of time it can create one? 16:40:31 storewize as well 16:40:31 smcginnis: yes 16:40:57 <_alastor_> We have similar functionality at Datera 16:41:00 * smcginnis just thought of creating 100 boot volumes simultaneously and shuddered 16:41:01 From this point of view it sounds bad, that such ability isn't exposed in Cinder. 16:41:04 xyang1: hmm, so you would be eliminating the API + Scheduler time for each request 16:41:37 erlon: so the scheduler needs to handle bulk create if we decide to move forward with this 16:41:45 So, there is precedence for the feature. 16:41:48 well, IMO as long as theres people requesting and a benefit on the aproach it worth to proceed with the discussion/spec 16:42:04 erlon, +1 16:42:13 It is one that keeps coming up. :-) 16:42:34 Yeah, I see the benefit over just writing a for loop script. I guess I'm for this if someones willing to put in the work. 16:43:16 smcginnis, Assuming the design will need to work for backends that can do it all at once and ones that would need to do it one at a time. Right? 16:43:47 We can certainly hash out design details in the spec... 16:43:55 jungleboyj: the default behavior will be do it one at a time and driver and override that 16:44:06 s/and/can 16:44:08 xyang1, Cool 16:44:25 scottda, +1 about spec 16:44:27 jungleboyj: Yep 16:44:49 So it sounds like we are cool with a Spec? 16:45:04 I think so. 16:45:29 cool, thanks 16:45:44 scottda: Was there a BP filed for this already? 16:46:19 xyang1: BP? 16:46:29 I really need to go through the BPs. I've been neglecting launchpad for awhile. :] 16:46:33 scottda: I have not filed one yet 16:46:47 OK, good. I was just wondering. 16:46:56 Great Leader is tired from taking care of all his people.... 16:47:03 Hah! 16:47:08 scottda: last time I just brought it up at the meetup. it is just on the etherpad 16:47:14 More like tired of launchpad's poor interface. ;) 16:47:30 smcginnis, ++ 16:48:10 back 16:48:12 scottda, xyang1: We good for now? 16:48:17 It's all up to hemna, right? 16:48:18 yup 16:48:22 ha 16:48:29 :) 16:48:34 smcginnis puts on his robe at 6:30 and watches nothing but youtube when he should be doing launchpad. 16:48:35 #topic Open discussion 16:48:44 hemna, Is going to love solving that problem. ;-) 16:49:14 Swanson, While smoking his cigar and drinking brandy 16:49:24 :D 16:49:25 Hi, do we need to resubmit the 3rd party for pike or is it implicit? 16:49:27 oh boy 16:49:40 I mean 3rd party driver Veritas HyperScale... 16:49:44 viks: You mean patches for new drivers/ 16:49:50 Yes 16:49:53 Ah, nope, saw that patch updated. 16:50:05 viks: I put it in the etherpad for Pike already. 16:50:15 We can start looking at new drivers again. 16:50:18 Yes...I saw that too...thanks 16:50:37 Now that we're removing some, we need to add a few more to stay above 100. :D 16:50:48 Wheeee! 16:51:13 Anyone have anything else today? 16:51:17 folks, i'm going to some refactor patches in few moments. if it is possible, better criticize it but don't shelve it... save me from rebasing hell) 16:51:33 mdovgal: Hah, will do. 16:51:37 :-) 16:51:39 fwiw, my new cireporter script is part of the third-party-co-tools 16:51:41 anyone can run it 16:51:42 mdovgal, Love it. 16:51:44 thanks) 16:51:47 hemna: Nice. 16:52:09 hemna: I took 5 minutes to try to add it to my automated updates. Ran into some problems getting it going though. 16:52:22 hemna: what does it do? 16:52:25 scotta, smcginnis: blueprint is here now: https://blueprints.launchpad.net/cinder/+spec/bulk-create-volume-api 16:52:31 First missing PyYAML, then array, then oslo_config... 16:52:41 Just need to spend a little more time working through that. 16:52:43 it didn't work? 16:52:50 oh yah 16:52:51 Might be good to add to the requirements.txt though. 16:53:01 that repo doesn't support requirements afaik 16:53:03 Just some setup work needed on my end. 16:53:04 I can add it 16:53:20 hemna: Wouldn't hurt, at least as a clue to someone new looking at it. 16:53:25 ok I'll add that 16:53:27 xyang1: Thanks 16:53:38 smcginnis: I found another blueprint with similar name, but that's for driver only: https://blueprints.launchpad.net/cinder/+spec/bulk-create 16:53:39 hemna, ++ 16:53:55 xyang1: Will this bp cover the bulk boot from volume create from nova? 16:54:13 xyang1: yes, I think that driver BP is one of our teams... 16:54:34 xyang1: I'll make sure we're all on the same page 16:54:47 scottda, Yes, that is one of your team. :-) 16:54:53 rajinir_: we should talk about that in the spec too. I am not clear what is the current status of that 16:55:18 scottda: cool 16:55:19 xyang1: That's the typical use case, sure, spec should cover it 1 16:55:32 Bring on the boot storms. :) 16:55:39 rajinir_: that is currently handled by heat? 16:55:49 bwah ha ha! 16:56:07 xyang1: Not sure, but we had customers experience performance issues with 50 instances 16:56:09 http://giphy.com/gifs/GWD5nSpiHxs3K 16:56:28 LOL 16:56:51 OK, are we done? 16:57:19 OK, thanks everyone! 16:57:21 I think that is a good note to end on. ;-) 16:57:25 ;) 16:57:25 Thanks! 16:57:33 #endmeeting