#openstack-meeting log

16:00:01 <thingee> #startmeeting cinder
16:00:01 <openstack> Meeting started Wed Jan 21 16:00:01 2015 UTC and is due to finish in 60 minutes.  The chair is thingee. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:02 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:05 <openstack> The meeting name has been set to 'cinder'
16:00:08 <thingee> hi everyone
16:00:12 <smcginnis> Hello
16:00:12 <dulek> o/
16:00:13 <rushiagr> hi!
16:00:14 <rhe00_> hi
16:00:15 <avishay> Hey
16:00:15 <thangp> hi
16:00:17 <eikke> hi
16:00:21 <eharney> hi
16:00:25 <thingee> our agenda for today:
16:00:28 <thingee> #link https://wiki.openstack.org/wiki/CinderMeetings
16:00:34 <scottda> hi
16:00:44 <xyang> hi
16:00:48 <kmartin> hello
16:01:07 <thingee> Just a reminder on third party ci
16:01:14 <tbarron> hi
16:01:15 <mtanino> hi
16:01:19 <cknight> Hi
16:01:33 <thingee> I sent an email to indvidual driver maintainers. a long email explaining this
16:01:37 <TobiasE> Hi
16:01:51 <thingee> the deadline being March 19th
16:02:03 <thingee> I also emailed the openstack dev mailing list:
16:02:05 <thingee> #link http://lists.openstack.org/pipermail/openstack-dev/2015-January/054614.html
16:02:05 <erlon> hi
16:02:08 <e0ne> hi!
16:02:34 <thingee> In addition for those interested, join the Third party ci documentation sprint starting today
16:02:37 <thingee> #link http://lists.openstack.org/pipermail/openstack-dev/2015-January/054690.html
16:02:45 <thingee> ok lets get started!
16:02:58 <thingee> #topic Sharing volume snapshots
16:03:01 <rushiagr> https://blueprints.launchpad.net/cinder/+spec/snapshot-sharing
16:03:02 <thingee> rushiagr: you're up
16:03:06 <rushiagr> #link ^
16:03:13 <thingee> #link https://blueprints.launchpad.net/cinder/+spec/snapshot-sharing
16:03:33 <rushiagr> so we left halfway last time..
16:03:59 <rushil> \o
16:04:03 <rushiagr> the general thinking was between 'umm okay' and 'no'..
16:04:43 <xyang> how is this related to the other blueprint on public snapshot?
16:04:43 <rushiagr> I dont remember if there was any strong objection on the idea.. If I am wrong, please remind me again..
16:04:48 <flip214> rushiagr: sounds sane to me... I can see the usecase. And it doesn't even suffer from the quota-hardlink problem that UNIX had.
16:04:59 <rushiagr> xyang: it's an extension of that
16:05:15 <rushiagr> xyang: but to be clear, it's not full ACL
16:05:27 <dulek> Wasn't the outcome that public and private snapshots are enough?
16:05:30 <asselin_> hi
16:05:34 <xyang> rushiagr: so is this one dependent on that?
16:06:07 <rushiagr> xyang: there was already a blueprint and spec for public snapshot, so I wasn't sure how to write that, but yes, it's going to be dependent on that
16:06:13 <avishay> So what is the usecase?  We already have shareable images, right?  What do shareable snapshots give us?  Why not shareable volumes too?
16:06:23 <rushiagr> dulek: I don't think we reached the conclusion
16:06:45 <jgriffith> rushiagr: to recap my preference was "no" on sharing, and "yes" on public snaps
16:06:53 <rushiagr> avishay: I don't know the exact requirement. But I was told 'they' wanted to share 'UAT disks' between tenants
16:07:11 <dulek> rushiagr: Yes, definitely, but public snapshots solve the usecase and are more-or-less accepted.
16:07:19 <winston-d_> rushiagr: what's UAT?
16:07:28 <dulek> rushiagr: So I just wanted to remind the idea.
16:07:34 <avishay> "they"?  sounds like a government conspiracy :P
16:07:35 <jgriffith> dulek: +1
16:07:36 <rushiagr> jgriffith: okay. I remember you saying sharing snapshots as I proposed is a saner way than ACL. That's all I remembered :)
16:07:51 <jgriffith> rushiagr: yeah, that's the good part ;)
16:08:15 <jgriffith> rushiagr: I thought about use cases and looked at code last week-end; I like public as best option
16:08:25 <thingee> Sharable snapshots makes me think there is some framework in place for cinder to be mediator for the snapshot to be shared. making a snapshot public, seems like it's a bit switch and up to the owner to communicate to who they want to tell the uuid to.
16:08:26 <rushiagr> dulek: 'they' (winks at avishay) might not want to make stuff public
16:08:57 <jgriffith> thingee: +1, clean and easy IMO
16:09:22 <jgriffith> and it makes sense in other OpenStack workflows
16:09:25 <rushiagr> just to be clear about implementation, when a user will share a snap with a tenant, it will just add a db entry. No actuall data will be generated/transferred out of the block storage until that new user creates disk out that shared snap
16:09:32 <thingee> sure, we just did it with volume types in k-1
16:09:44 <dulek> thingee: won't public snapshot be visible on cinder list? Glance shows public resources on the lists.
16:09:46 <thingee> oh I guess that was project_id
16:10:07 <rushiagr> thingee: sorry, I didn't completely understand you..
16:10:45 <thingee> dulek: I think the idea is, and rushiagr can correct me if I'm wrong, is the listing won't show the snapshots. Instead when we skip context check on the snapshot if it's public.
16:11:08 <thingee> but it's up to the owner to communicate the uuid to who they want to share it with.
16:11:19 <thingee> if it's anything other than that you're talking about ACL's
16:11:20 <rushiagr> thingee: yes, listing won't show public snapshots (or shared snapshots) by default..
16:11:21 <dulek> thingee: I don't like how inconsistent it is with other services behaviour
16:11:43 <xyang> if it is public, any tenant should be able to see it
16:11:49 <jgriffith> rushiagr: ummmm... that would be part of the feature add
16:11:56 <jgriffith> rushiagr: dulek not sure the concern?
16:12:24 <jgriffith> owner can set snapshot to "public", modify get to get all tenant and public, done
16:12:58 <jgriffith> let's try this...
16:13:03 <rushiagr> I think it's a valid use case if a tenant wants to chare a snapshot. I even think it's cleaner than creating a vol, and then transferring to another tenant.. Best would have been only sharing snapshots, and no vol transfer.. But again, personal opinions
16:13:04 <dulek> jgriffith: Yup, thats how it should be done IMO
16:13:15 <jgriffith> anybody have a good reason to NOT use public, but instead use SHARING?
16:13:36 <dulek> But thingee and rushiagr are proposing that public won't be shown on lists for other tenants.
16:13:48 <dulek> Or did I misunderstood something?
16:13:53 <thingee> dulek: I'm guessing what rushiagr wants.
16:13:57 <jgriffith> dulek: hmmm...  i didn't catch that, seems odd
16:14:05 <xyang> seems that we are mixing the two
16:14:09 <jgriffith> xyang: :)
16:14:11 <rushiagr> dulek: I'm just saying that current API won't, but a new API will
16:14:23 <jgriffith> so look... here's the thing;  Every use case is covered if we do public
16:14:32 <jgriffith> you can transfer a volume to anotehr tenant if you want
16:14:38 <jgriffith> or you can make a snapshot public
16:14:38 <winston-d_> rushiagr: what new API?
16:14:40 <rushiagr> 'show all my snaps plus all shared snaps)
16:15:00 <rushiagr> s/)/'/
16:15:02 <jgriffith> I'm still not completely clear on use cases rushiagr is interested in solving
16:15:09 <dulek> Not every, rushiagr mentioned that "they" may not want to have *public* snaps, just shared
16:15:33 <thingee> flip214: what use cases did you see with this?
16:15:43 <winston-d_> rushiagr: does any other openstack project has similar API like this?
16:15:50 <avishay> Maybe "they" should share their use case - maybe there is a better way to achieve what they want
16:15:51 <rushiagr> jgriffith: I am trying not to make an artifical example. But it's similar to volume transfer..
16:16:14 <thingee> avishay: +1
16:16:25 <thingee> here's what I'm going to propose...
16:16:27 <winston-d_> rushiagr: then use volume transfer
16:16:32 <rushiagr> avishay: they have data in block storage snapshotted.. I'm using too much 'they' I guess.
16:16:34 <thingee> rushiagr: work with Tiantian on https://review.openstack.org/#/c/125045/
16:16:35 <jgriffith> winston-d_: :) +1
16:16:47 <flip214> thingee: having a few well-known images that _all_ people can use. RHEL6, RHEL7, Ubuntu, etc.
16:16:56 <thingee> seems like the idea of just public is a preferred approach
16:16:57 <dulek> thingee: +!
16:17:00 <rushiagr> winston-d_: the problem is it's slighly bad on usability. We seem to be not caring much about the end user..
16:17:17 <thingee> right now we're all confused on what is being proposed here and guessing what you want. and I just spent 15 mins trying to figure it out
16:17:23 <thingee> also if you look at:
16:17:25 <winston-d_> rushiagr: how come?
16:17:25 <thingee> #link https://review.openstack.org/#/c/125045/
16:17:33 <thingee> I asked for use cases :)
16:18:12 <rushiagr> thingee: sorry, I was and will be busy for this whole week.
16:18:32 <rushiagr> I apologise if I don't seem to have a use case.. Totally my fault..
16:18:55 <thingee> rushiagr: ok, well you got until feb 15 for feature freeze. and no worries, we just need things better defined to make a decision
16:19:38 <thingee> anything else?
16:19:39 <rushiagr> thingee: okay. Thanks. I'll write a spec.. that would help come to a decision soon..
16:19:48 <rushiagr> thingee: nope. We can move to next topic
16:19:51 <thingee> rushiagr: work with this spec if possible https://review.openstack.org/#/c/125045/
16:19:53 <rushiagr> thanks all
16:19:58 <rushiagr> thingee: definitely
16:20:05 <thingee> #topic Additional requirements with Third Party CI
16:20:28 <thingee> so I've gotten questions from driver maintainers
16:20:46 <thingee> How long can a ci be down, before we consider the driver inactive?
16:21:36 <flip214> How about 4 weeks from first notice, or 1 week from third one or so?
16:21:40 <smcginnis> A week? Unless they are in contact for specific issues.
16:21:43 <erlon> does a non reporting CI is considered broken??
16:22:25 <thingee> erlon: after the deadline, yes. you're not reporting, you're not continuous integrating with proposed changes.
16:22:35 <flip214> I'm thinking about the case that "the" CI maintainer is on vacation for 3 weeks (not that uncommon here in Europe)
16:22:35 <thingee> defeats the whole purpose
16:22:53 <jordanP> yeah, 3 weeks of vacation happens here :)
16:22:59 <erlon> flip214: +1
16:23:03 * smcginnis moving to Europe
16:23:04 <flip214> (yes, there should be a few people knowing how to fix it, but still...)
16:23:15 <winston-d_> flip214, jordanP: i want to get a job there
16:23:27 <eikke> winston-d_: we're hiring
16:23:34 <thingee> I ask maintainers to give me an email address that forwards to a few people in the company
16:23:35 <erlon> even a weekend cannot be enough with 5 days period
16:24:03 <hemna> flip214, then someone else in the company should do their best to be a backup
16:24:16 <hemna> 3 weeks of no reporting on a driver is terrible IMHO
16:24:27 <thingee> asselin_: opinions?
16:24:31 <jgriffith> so IMHO there's no sense arguing about "I might go on vacation"
16:24:39 <jgriffith> first step is get a working system
16:24:42 <jgriffith> go from there
16:24:42 <hemna> jgriffith, +1
16:24:47 <flip214> hemna: yes, "should". In practice, especially in the summer time when lots of people are on vacation, I can foresee issues with that.
16:24:49 <avishay> i don't think there should be a hard rule.  if it's down for a week because of equipment failure and you report that you're working on it, i think that's fine. if you disappear for 3 weeks, that's not IMO.
16:25:05 <hemna> avishay, +1
16:25:17 <winston-d_> avishay: +1
16:25:20 <jgriffith> personally if you have a system up and running and you go on vacation and it breaks I don't give a crap
16:25:21 <thingee> Ok, we're agreeing no hard rule, but case by case basis?
16:25:21 <Swanson> avishay: +1
16:25:22 <hemna> flip214, make sure at least someone is a backup.
16:25:24 <jgriffith> fix it when you get back :)
16:25:24 <xyang> avishay: +1
16:25:31 <hemna> thingee, +1
16:25:49 <jgriffith> thingee: hard rule does need to be on demonstrating something functional and reliable at some point though IMO
16:25:49 <smcginnis> thingee: +1
16:26:13 <avishay> jgriffith: +1
16:26:17 <jgriffith> In other words, "prove you've set it up and done the work"
16:26:18 <jgriffith> we
16:26:20 <dulek> jgriffith: +1
16:26:28 <thingee> #agreed case by case basis on CIs not reporting will be handled
16:26:30 <jgriffith> we'll take issues with failures or the system going down as they come up
16:26:37 <jgriffith> cool :)
16:26:43 <jgriffith> just wanted to make sure I was on the same page
16:26:47 <thingee> #topic Target drivers - DRBD transport layer Nova
16:26:58 <thingee> flip214: you'll need to better explain this :)
16:27:02 <flip214> right.
16:27:14 <flip214> well, I'm trying to get a new block storage transport mechanism working.
16:27:23 <flip214> but I keep stumbling....
16:27:35 <flip214> there's parts of brick, target drivers, connector, etc.
16:28:11 <flip214> so I've got the architecural question: should I change *all* these places, or will eg. brick be removed in the next few weeks anyway?
16:28:26 <hemna> flip214, I'm working on brick
16:28:30 <hemna> it's going to be a bit
16:28:44 <hemna> it'll be my job to try and keep it in sync until we switch over.   :(
16:28:56 <hemna> flip214, for now, do the work in cinder/brick/initiator
16:28:57 <flip214> hemna: okay. how about connector? target driver? jgriffith is working on that one, I believe.
16:28:59 <hemna> that you need to do.
16:29:02 <flip214> okay.
16:29:04 <thingee> hemna: do we have an idea of it happening in K?
16:29:18 <hemna> the target side of brick isn't going into the external lib.  just the initiator side
16:29:37 <hemna> thingee, I'm going to try and get it in.   I'm churning on multi-attach at the same time.
16:29:49 <thingee> flip214: https://github.com/openstack/cinder/blob/master/cinder/volume/targets/driver.py
16:29:54 <flip214> next question: create_export() doesn't know which host will attach the volume. the volume already has the key "attached_host" but it's None.
16:29:57 <thingee> you need to define a new target driver in cinder
16:30:11 <flip214> thingee: I'm doing that.
16:30:11 <asselin_> thingee, sorry multitasking. There's a spec for CI Status that will apply to all 3rd party ci. I think we should define working as part of that.
16:30:27 <thingee> asselin_: thanks!
16:30:41 <flip214> do we want attached_host to have the host that it _will_ be attached to?
16:30:47 <thingee> asselin_: can you give me the link when you have a chance?
16:30:53 <asselin_> #link https://review.openstack.org/#/c/135170/
16:31:02 <flip214> would make my life much easier, if I would get that information.
16:31:44 <flip214> next point: for DRBD many storage hosts can be connected to one nova node at the same time. (That's the HA part of DRBD). So I'd provider_location much larger... 4kB or even more.
16:32:04 <flip214> Is there a chance for that, or is there some hard restriction that says no?
16:32:35 <flip214> Can I pass only a string in provider_location to Nova, or can that be structured data, too?
16:32:59 <xyang> the size of provider_location is 255
16:33:05 <thingee> I'm not sure I understand why provider_location matters for multiple nodes attaching
16:33:06 <flip214> xyang: now, yes.
16:33:32 <flip214> thingee: because the nova node needs to know _all_ storage nodes it should access at the same time. Many hostnames, IP addresses, TCP Ports, etc.
16:33:41 <flip214> the 255 characters are too small.
16:34:02 <flip214> If there's _no_ chance to get that enlarged, I'll have to pass the data around in some other way.
16:34:30 <hemna> can't you pass in a list to a new entry?
16:34:34 <flip214> hemna: BTW, my changes are in https://github.com/LINBIT/cinder
16:34:37 <hemna> provider_locations = []
16:34:39 <thingee> flip214: I don't have an answer for you right now. provider_location was for things like iscsi..know the ip portal, lun number, etc
16:35:05 <flip214> hemna: I don't think so, because that all gets serialized to the database, which won't know about arrays, no?
16:35:23 <xyang> flip214: do you have to save it in provider_location? can initialize_connection return that info to nova?
16:35:33 <flip214> thingee: yes, for a single connection is was enough. _multiple_ connections are one reason for using DRBD.
16:35:58 <flip214> xyang: initialize_connection in cinder, or in nova? if in nova, where would it get that information from?
16:36:03 <xyang> in cinder
16:36:37 <thingee> flip214: so it sounds like you'll need to propose a way to support this in cinder. I was just saying I don't have an immediate answer for you
16:37:19 <flip214> right. I wanted to bring that up, so that people can think about it and tell me possible solutions ... we can talk about them later on, I hope for next week.
16:37:55 <flip214> if there's some way to relay information from cinder to nova via mq without size or data structure restriction it would be ideal ... I don't know whether such a thing exists.
16:38:19 <thingee> flip214: everything is done through cinder-api.
16:38:56 <flip214> so, as summary: my questions are pre-fill "attached_host" during create_export? size of "provider_location"? connector, target driver, brick?
16:38:59 <flip214> thank you.
16:39:06 <flip214> I'm done for today.
16:39:12 <thingee> heh
16:39:20 <thingee> sorry we weren't much of help
16:39:31 <thingee> #topic TaskFlow workflows
16:39:34 <thingee> #link https://review.openstack.org/#/c/135170/
16:39:37 <flip214> that's okay, I didn't expect (only hope ;) for immediate answers.
16:39:38 <thingee> dulek: hi
16:39:41 <dulek> okay, some meetings ago I offered my help on improving reliability of cinder
16:39:51 <dulek> I wasn't pointed to any community-driven initiative, so tried to find one
16:40:00 <dulek> idea is to resume interrupted workflows when starting the services
16:40:02 <thingee> dulek: great we need help with taskflow, and it's not with persistent
16:40:15 <thingee> #link https://bugs.launchpad.net/bugs/1408763
16:40:17 <dulek> general spec is in the link thingee provided
16:40:39 <e0ne> thingee, dulek: me and vilobh are also interested in it
16:40:41 <dulek> yeah, I'm monitoring this bug
16:41:02 <dulek> so I'm asking for opinions
16:41:05 <thingee> I think jgriffith was looking at that bug for week to get us to work better with taskflow. right now it's not a positive thing imo
16:41:32 <thingee> I'll let other talks :)
16:41:45 <dulek> this should be certainly solved before persistence, I've mentioned it in API part of the spec
16:42:04 <jgriffith> dulek: that's not the issue
16:42:10 <dulek> I can take a look at that as it's a blocker and jgriffith stopped to work on that
16:42:22 <thingee> jgriffith: yes sorry...
16:42:29 <dulek> why?
16:42:46 <thingee> so the real issue is people are finding that working with taskflow hasn't been the easiest.
16:42:46 <jgriffith> dulek: it's not an issue of restarting jobs
16:43:02 <jgriffith> dulek: the problem is that the taskflow retries are issued from the TF lib
16:43:11 <jgriffith> dulek: our code isn't written to deal with that properly
16:43:22 <hemna> thingee, that's kinda always been the main issue w/ taskflow
16:43:25 <hemna> is it's complexity
16:43:28 <dulek> jgriffith: resumes can also trigger a retry
16:43:37 <jgriffith> dulek: so for example in the increment GB section....  it raises before the increment
16:43:47 <jgriffith> dulek: which IMO is going to make things "worse"
16:43:51 <jgriffith> dulek: with our current code
16:44:03 <jgriffith> dulek: we don't have clear control lines
16:44:17 <jgriffith> dulek: so we're doing a number of "unexpected" things on retries
16:44:32 <jgriffith> dulek: besides.... what's so great about resume functionality anyway :)
16:44:55 <dulek> resumes solve clean shutdown problems and increase reliability
16:45:02 <jgriffith> dulek: meh
16:45:07 <jgriffith> dulek: don't believe the hype
16:45:20 <jgriffith> dulek: regardless....
16:45:21 <e0ne> dulek: it could help
16:45:32 <dulek> if cinder service dies it will probably be started again by pacemaker and resume it's wokr
16:45:33 <e0ne> if we'll implement it
16:45:35 <jgriffith> dulek: go for it, happy to see folks improve what we have
16:45:37 <avishay> can't tell if jgriffith is being sarcastic or not
16:45:47 <dulek> avishay: :)
16:45:51 <jgriffith> avishay: little of both
16:45:54 <rushiagr> avishay: +1
16:46:10 <dulek> but jgriffith answered my main concern
16:46:39 <dulek> should I go in this direction or is this idea not worth it?
16:47:09 <e0ne> dulek: we neeed to work both with harlowja_away on it. it is really not a lot of time to be done in K
16:47:23 <dulek> I know tpatil's team can help on implementation and now e0ne is interested
16:47:48 <dulek> yeah, I'm consulting this with harlowja_away constantly
16:48:00 <dulek> so any opinions from cores?
16:48:27 <thingee> by the way, tell violbh to finish what he started with cinder :)
16:48:52 <e0ne> dulek:  harlowja_away sad that vilobh will contunie work on in in cinder
16:49:16 <thingee> #topic open discussion
16:49:27 <e0ne> thingee: i've already tried. hope, he will be available today
16:49:45 <dulek> Ok, I hope this means go on. ;)
16:50:07 <thingee> dulek: I moved on because you said you got your answer from jgriffith. jgriffith answering for me, I'll take it!
16:50:08 <e0ne> thingee: is it ok if i'll continue to work on that patch if violbh won't soon?
16:50:30 <thingee> e0ne: which? this https://blueprints.launchpad.net/cinder/+spec/cinder-state-enforcer
16:50:43 <e0ne> thingee: yes, this pne
16:50:48 <e0ne> s/pne/one
16:50:55 <thingee> e0ne: please
16:51:02 <e0ne> thingee: thanks
16:51:13 <thingee> just communicate with harlowja_away and violbh
16:51:41 <thingee> oh, so topics for mid cycle meetup
16:52:13 <thingee> add to potential topics https://etherpad.openstack.org/p/cinder-kilo-midcycle-meetup
16:52:22 <thingee> we already have some :)
16:52:35 <thingee> I think that's it for me
16:52:37 <thingee> anyone else?
16:52:52 <thingee> going
16:52:53 <flip214> is next week a meeting?
16:52:53 <thingee> going
16:53:02 <hemna> thingee, do we need to chat about brick
16:53:02 <thingee> flip214: good question, no.
16:53:09 <hemna> and the process of cutting over?
16:53:22 <hemna> I'll add that.  (in the hopes that I can get the lib done)
16:53:23 <e0ne> thingee: will we have hangouts?
16:53:36 <thingee> e0ne: we did last time. hemna was nice enough to do that.
16:53:42 <e0ne> cool!
16:53:43 <rushiagr> what's a good way to target a bug or a bp to a milestone? Can I as a cinder driver group member do that? or is it the responsibility of the PTL?
16:53:46 <thingee> e0ne: not sure about this time
16:53:56 <e0ne> hemna: pleeeez
16:54:13 <thingee> rushiagr: core or me can target it. it'll be targeted regardless after being implemented
16:54:19 <rushiagr> e.g. for bugs which have a patch which has two +1s and a +2?
16:54:42 <rushiagr> thingee: okay. Thanks :)
16:54:43 <hemna> added Cinder Agent discussion as well.
16:55:03 <thingee> ok thanks everyone!
16:55:05 <thingee> #endmeeting