16:00:02 <thingee> #startmeeting cinder
16:00:02 <openstack> Meeting started Wed Nov 26 16:00:02 2014 UTC and is due to finish in 60 minutes.  The chair is thingee. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:06 <openstack> The meeting name has been set to 'cinder'
16:00:09 <thingee> hi all
16:00:12 <tbarron> hi
16:00:12 <jaypipes> o/
16:00:16 <Yogi2> Hello
16:00:23 <cknight> Hi
16:00:28 <TobiasE> Hi
16:00:31 <kaisers1> hiya
16:00:32 <bswartz> hey
16:00:33 <xyang1> hi
16:00:33 <georgkunz> hi
16:00:35 <thingee> agenda today is small, yay!
16:00:37 <avishay> yoyoyo
16:00:38 <thingee> #link https://wiki.openstack.org/wiki/CinderMeetings
16:01:05 <thingee> just a reminder because it has been coming up a couple of times, k-1 is the only time to get your new driver in
16:01:05 <DuncanT> Hey
16:01:12 <smcginnis> o/
16:01:16 <e0ne> hi
16:01:35 <thingee> you can read more details here http://lists.openstack.org/pipermail/openstack-dev/2014-October/049512.html
16:01:41 <enikher> hey
16:02:13 <thingee> I will be sending out a reminder to all potential new driver maintainers that are already targeted for k-1 that they're making slow progress and that we're aiming to merge december 18
16:02:18 <thingee> OK, lets begin
16:02:32 <TobiasE> The blueprint is https://blueprints.launchpad.net/cinder/+spec/volume-status-polling.
16:02:43 <thingee> #topic volume status poll
16:02:48 <thingee> TobiasE: you're up
16:02:51 <thingee> #link https://review.openstack.org/#/c/132225/
16:02:57 <thingee> for the cinder spec
16:02:58 <TobiasE> thanks
16:03:09 <thingee> and for the nova spec:
16:03:11 <thingee> #link https://review.openstack.org/#/c/135367/
16:03:24 <thingee> I'm assuming this is why I see a jaypipes in the audience this morning
16:03:26 <thingee> :)
16:03:30 <jaypipes> indeed :)
16:03:49 <thingee> TobiasE: go ahead
16:04:08 <TobiasE> We see some problems when running e.g. 100 attach
16:04:38 <TobiasE> The idea is to implement async between nova and cinder
16:04:44 <avishay> TobiasE: 100 simultaneous attach operations to VMs?
16:05:02 <TobiasE> yes, high load scenarios
16:05:24 <avishay> OK
16:05:26 <enikher> But not the vm attachment is failing. it is the communication towards the storage backend
16:06:02 <TobiasE> We have timeouts on RPC or HA-proxy side
16:06:10 <enikher> Since terminate and initialize_connection are calls not casts we face the rpc timeout here
16:06:42 <TobiasE> And then inconsistencies between backend and Cinder DB
16:07:16 <thingee> ok, so jgriffith raised the point that he would not want to see the timeouts raised. Which makes sense
16:07:52 <thingee> DuncanT would rather see timeouts raised
16:08:08 <DuncanT> API load is already an issue, so I'm not sure I'd like every attach polling. Maybe a blocking call that ends with 'Now poll' if it times out?
16:08:49 <thingee> I'm concerned you're going to see performance problems if you have a cinder volume polling, even with green threads
16:09:15 <avishay> why poll?  no mechanism for callback?
16:09:16 <thingee> the only difference is you won't have something that times out
16:09:19 <DuncanT> i.e. time out the API call slightly faster than the RPC timeout
16:09:21 <jaypipes> is there a way to query the backend storage array for how many ongoing attach requests are underway, and place the request for a new attachment into a queue and then return a polling URI to the caller?
16:09:22 <TobiasE> DuncanT: Catching Exeption and then start polling?
16:09:35 <DuncanT> TobiasE: If we can
16:09:50 <thingee> jaypipes: that's what I was thinking, but no we don't have something like that
16:10:18 <DuncanT> jaypipes: That sounds plausible, but is a fairly big change
16:10:23 <flip214> a callback would be the cleanest solution, I think
16:10:24 * jaypipes personally has no problem with not having a timeout as long as there is some reliable way of knowing if the backend is actively working on (or has a queued) attachment reqyuest
16:10:31 <enikher> a worker queue in the cinder-volume manger is also a good idea
16:10:49 <enikher> but does not solve the issue completly
16:10:50 <DuncanT> The load a backend can cope with varies massively though....
16:10:51 <thingee> enikher: there's a patch with that and state problems with that.
16:11:04 <jaypipes> DuncanT: yes, I would suspect that.
16:11:16 <DuncanT> flip214: A call back into nova?
16:11:30 <enikher> Thingee could you past the patch?
16:11:38 <jaypipes> DuncanT, flip214: that is what neutron does for NIC attachment.
16:11:58 <thingee> enikher: https://review.openstack.org/#/c/135795/
16:12:04 <flip214> DuncanT: basically, "whereever you like" ... simply putting an URL into the request that is being called upon completion.
16:12:27 <thingee> flip214, jaypipes, TobiasE, DuncanT: makes sense to me
16:12:39 <thingee> polling is soooo four years ago
16:12:48 <avishay> yes
16:12:48 <flip214> or, perhaps, return OK if possible within one second; else say "later, will call URL"
16:13:14 <avishay> flip214: one code path is hard enough i think
16:13:15 <flip214> there are HTTP codes for that, I believe.
16:13:16 <bswartz> flip214: I like that idea, but it makes the API harder to use through a firewall, because the callback could get blocked
16:13:19 <jaypipes> flip214: yes, that is what neutron does.
16:13:27 <jaypipes> 202 Accepted.
16:13:43 <DuncanT> bswartz: This is nova talking to cinder, so if you have a firewall there you have bigger problems
16:13:47 <flip214> bswartz: in that case, the call has to be idempotent - and gets called until cinder says "done", every few seconds.
16:13:50 <thingee> DuncanT: +1
16:13:50 <jaypipes> DuncanT: right :)
16:13:54 <flip214> with all the disadvantages.
16:13:58 <bswartz> nova isn't the only thing that does cinder volume attaches
16:14:10 <bswartz> cinder is consumed by other clients too
16:14:10 <xyang1> after cinder is done with initialize connection, nova side still need to discover the lun, etc.  so attach is not complete just after cinder completes initialize_connection
16:14:34 <flip214> "discover"? isn't that passed from cinder to nova?
16:14:38 <DuncanT> bswartz: Fair enough. I'd like to hear those usecases if there are specific ones, please?
16:15:09 <xyang1> flip214: nova side has some work to complete after cinder returns
16:15:22 <xyang1> flip214: that's why it waits currently
16:15:32 <DuncanT> So the callback would cause that step to happen
16:15:37 <flip214> xyang1: yes, but I thought that cinder passes information like IP, LUN, etc back to nova.
16:15:42 <bswartz> block storage as a service? that's not an obvious enough use case? some people like cinder but have problems with nova and choose to use something else, or they have something preexisting that they choose to use instead of noca
16:15:45 <jaypipes> so, bottom line, is that the cinder+nova contributor communities need to settle on either supporting long polling or supporting push based notifications ala Neutron's NIC attachment APIs.
16:15:47 <DuncanT> You still need a timeout there though, to deal with stuck backends etc
16:15:48 <bswartz> s/noca/nova/
16:15:51 <xyang1> flip214: nova needs to make sure the lun is visible to the host
16:16:24 <thingee> TobiasE: can we see the spec redone with callback in mind?
16:16:45 <TobiasE> Might need some help with that
16:16:47 <DuncanT> bswartz: I know the principle, if there are any concrete cases you know of then I'd like to hear about them. I want to write a bare metal attach; if somebody has already done it then it might same me making mistakes they've already avoided
16:17:02 <thingee> TobiasE: seems like flip214 has some knowledge to help. :)
16:17:05 <jaypipes> xyang1: it's the same problem domain for NIC attachment. Neutron sends information (the "port_binding" dict) to Nova when the NIC has been created inside the Neutron drivers, and Nova then uses the port_binding information to plug the VIF locally on the nova-compute node
16:17:27 <thingee> TobiasE: I would like to see things expanded with functional tests though.
16:17:35 <xyang1> jaypipes: ok, then we should take a look of that implementation
16:17:38 <bswartz> I'm not at liberty to speak about the case I'm aware of. In any case, I agree the firewall problems is an unlikely one -- just wanted to make sure it was considered.
16:17:56 <avishay> bswartz: cinder already has calls to nova, so i don't think this adds new requirements
16:18:04 <thingee> TobiasE: also bring in the warnings that bswartz into the spec.
16:18:07 <TobiasE> thingee: Testing is essential here
16:18:08 <jaypipes> xyang1: yes, that is what I recommend as well, but it relies on the cinder+nova contrib communities getting aligned on that direction. thus, I'm here :)
16:18:30 <DuncanT> bswartz: Fair enough
16:18:31 <thingee> TobiasE: absolutely, i just meant the current state of the spec doesn't explaining the testing plan well.
16:18:36 <xyang1> jaypipes: ok thanks
16:19:08 <thingee> #agreed cinder to do callback for attachments for clients to consume
16:19:23 <thingee> #action TobiasE to update current spec with cinder doing callbacks for attachments
16:19:28 <thingee> anything else?
16:19:41 <DuncanT> Are we going to allow polling as well?
16:19:55 <DuncanT> i.e. what does nova get if it calls attach a second time?
16:20:00 <rushiagr> hi!
16:20:01 <enikher> nova will timeout in live-migration
16:20:26 <xyang1> DuncanT: it still can be successful if it is already attached on the array
16:20:31 <enikher> since initialize_connection is then longer then expected
16:20:32 <jaypipes> DuncanT: the second call should return a 409 Conflict, IMO.
16:21:10 <DuncanT> jaypipes: That makes things tricky if nova-compute got restarted or something....
16:21:11 <jaypipes> DuncanT: because you don't want to create two callbacks inside Nova. only one.
16:21:26 <thingee> enikher, DuncanT: these details can be worked on the mailing list http://lists.openstack.org/pipermail/openstack-dev/2014-November/049756.html
16:21:27 <enikher> What will happen if nova will not callback?
16:21:35 <thingee> I just wanted TobiasE to be able to move forward
16:21:44 <TobiasE> OK
16:21:58 <thingee> thanks for the help jaypipes!
16:22:02 <enikher> still we have the same timeout problem
16:22:04 <bswartz> nova still needs a timeout in case the callback never comes
16:22:08 <jaypipes> DuncanT: if nova-compute gets restarted, the callback will have either been consumed from the MQ or not. If not, then the status of the volume attachment should be ERROR, and the user should be able to resubmit an attachment request.
16:22:35 <DuncanT> jaypipes: But cinder will never get told that nova didn't successfully attach
16:22:38 <enikher> we need that live-sign to see that the backend is still working
16:22:47 <jaypipes> bswartz: cinder should be able to copy the same retry/timeout code from the neutron work.
16:23:32 <jaypipes> DuncanT: I don't see how that is a problem? Isn't the storage backend the source of truth for that type of information?
16:23:34 <flip214> enikher: then just specify that the callback takes a parameter "in_progress=<step-number>"
16:23:39 <enikher> not an neutron expert could you paste the commit?
16:24:02 <DuncanT> jaypipes: No. There's the cinder db, and the backend, and currently they can get out of sync
16:24:10 <xyang1> jaypipes: currrently the state of the volume will be changed back to 'available' if timeout happens
16:24:40 <flip214> well, I believe that the DB should always be the "should" state.
16:24:41 <xyang1> even though the array may have finished the attach operation
16:24:43 <bswartz> jaypipes: does that involve cinder storing more state? typically cinder backends try to be stateless
16:24:47 <DuncanT> jaypipes: We can solve the current problem by making cinder queue a detach if it times out
16:24:48 <flip214> and the reality has to match what's there.
16:25:04 <flip214> so in case of conflict the attach operation needs to be re-doable on nova.
16:25:05 <jaypipes> DuncanT: right, but how is that Nova's problem?
16:25:46 <DuncanT> jaypipes: If nova thinks the attach is still happening, you get in a mess. The nova bdm is yet another piece of state that gets out of sync
16:26:51 <TobiasE> jaypipes: Haven't you worked on cleaning BDM up
16:27:00 <DuncanT> One option rather than polling or callback is just for cinder to clean up better on timeout
16:27:16 <DuncanT> And leave nova to clean itself up
16:27:22 <jaypipes> DuncanT: ++
16:27:27 <enikher> then the user has to try again?
16:27:33 <winston-d_> BDM shouldn't be considered as a place store volume attaching state, 'cos you can attach a volume to a 'stopped' instance, BDM still exists, and volume not attached.
16:27:33 <enikher> that is not good I think
16:27:38 <DuncanT> This would mean that some attach calls fail and have to be retried, but that is the cloud way sometimes
16:27:47 <jaypipes> DuncanT: if nova can call cinder to get volume metadata, and considers cinder's response authoritative, that would be ideal.
16:28:43 <flip214> DuncanT: that means that on high load requests will fail, get requeued, and make still bigger load.
16:29:00 <DuncanT> enikher: Sometimes it is better for the user to retry than to over complicate code - instance startup and volume create can fail and need retrying too
16:29:29 <DuncanT> flip214: Hopefully the load spike will have passed by then
16:29:32 <bswartz> flip214: there's a point at which any system will be overwhelmed -- at that point the caller must throttle his requests or expect failure
16:29:37 <flip214> I'd prefer to go the callback route, without looking at timeouts for now. that solves the high-load issue.
16:29:52 <enikher> yes but the user does not get enough information to know that the backend is just overloaded
16:29:59 <DuncanT> But the callbacks massively increase the odds of a 'stuck' system
16:30:07 <DuncanT> Which can only be fixed by admin
16:30:09 <enikher> with very slow backends we had problems to attach 10 volumes
16:30:15 <DuncanT> Rather than something the user can simple retry
16:30:26 <enikher> so that would mean the user has to do a lot of retries
16:30:26 <DuncanT> enikher: Then buy a better backend IMO
16:30:40 <flip214> "simply retry" - based upon what information? she won't know whether the load spike is done.
16:30:49 <DuncanT> flip214: Not having any timeouts is a total none-starter
16:31:10 <DuncanT> flip214: If the backend is broken then you end up with the user in an unfixable mess
16:31:11 <enikher> Actually what do you think about estimating the timeout
16:31:19 <flip214> if the user can see the number of queued requests, and in which position some specific one is, she can see progress - or not.
16:31:24 <thingee> DuncanT: if this is just to fetch the initialize_connection, how do we end up in a stuck state? If nova never sets the volume to in-use, can't cinder just roll it back?
16:31:38 <enikher> if you want to have a short ha-proxy timeout the timeout will often occur
16:31:39 <DuncanT> flip214: That is way more complex than just saying 'rety some time'
16:32:14 <DuncanT> thingee: Once you've done the prep work in order to return the connection info, how do you know when to roll back?
16:32:15 <bswartz> I agree -- you always need a timeout to deal with the case where cinder had a critical hardware failure and was forced to restart
16:32:24 <avishay> hopefully being able to scale out cinder-volume will also help with higher loads, and yes, backends need to be sane - i would open a bug for a backend that was so inefficient (if it was something the driver could fix)
16:32:28 <thingee> DuncanT: that would be up to cinder.
16:32:32 <flip214> http://ferd.ca/queues-don-t-fix-overload.html    - queues are good, but don't fix overload.
16:32:33 <enikher> flip214: the backend is still working or not?
16:32:44 <flip214> but at least they can be used to _show_ whether there's progress-.
16:32:44 <DuncanT> bswartz: Or when rabbit lost a message due to restarting, or a cinder service got restarted, or whatever
16:33:11 <flip214> having retries means (still) higher load, so I don't think the user should retry.
16:33:17 <DuncanT> flip214: Exposing queues to a cloud user is a nightmare
16:33:41 <thingee> so here's where I'm going with this. If nova does an initialize_connection request, cinder does a call back. If nova disappears, cinder should roll it back, unless nova later does something with the call back and tells cinder to set the volume to in-use
16:33:52 <DuncanT> flip214: Rate limit the retries just like you have to rate limit everything else to avoid DoS
16:33:58 <enikher> flip214: estimate the timeouts? so that the polling is not done so frequently?
16:34:28 <DuncanT> thingee: If cinder has rolled it back, then the work needs to be done again, which means another callback... ad infinitum for a slow backend
16:35:07 <thingee> DuncanT: there was no mention of another call back
16:35:18 <flip214> enikher: estimating based upon what? that's another guess-game.
16:35:23 <thingee> update status is an api call, which just passes to cinder db
16:35:28 <winston-d_> thingee: wait, what happen when a user attach a volume to a stopped instance, initialize_connection() can be called but the actual attach operation can be long after.
16:35:31 <avishay> I agree with DuncanT - the spike should be over very quickly and the user can retry.  if the spike lasts a long time then it's not a spike and the cloud sucks
16:35:40 <thingee> winston-d_: exactly
16:35:40 <DuncanT> avishay++
16:36:34 <winston-d_> thingee: what i meant was in that cinder, cinder shouldn't rely on callback.
16:36:46 <winston-d_> s/in that cinder/in that case/
16:36:51 <DuncanT> So our number one support call for LVM is 'my volume is stuck in attaching/detaching' - this sounds like it will make that way worse
16:37:18 <thingee> Honestly I think if the backend is under load, cinder should rely on the call back. if it's not, it should continue behavior as normal
16:37:18 <winston-d_> guitarzan: what do you think?
16:38:02 <thingee> majority of us apparently won't even notice different behavior because according to the feedback in the spec, we're all user super fast solutions that don't bog down on load.
16:38:08 <DuncanT> Two paths means more testing, more bugs and more failures
16:38:28 <thingee> DuncanT: I know, extra work is a bummer.
16:38:30 <DuncanT> If most people don't use a code path, it /will/ end up weirdly broken over time
16:39:27 <thingee> My proposal went unanswered, so I'm going with that until someone responds. I'll raise it on the mailing list again though
16:39:35 <enikher> So callback does not seem to be helpful or?
16:39:35 <thingee> this topic is dead imo for this current meeting.
16:39:49 <thingee> enikher: read my proposal I said twice in this meeting.
16:39:57 <thingee> #topic Over-Subscription Alternative
16:40:01 <thingee> bswartz: you're up
16:40:05 <bswartz> hey guys
16:40:10 <bswartz> #link https://review.openstack.org/#/c/129342/
16:40:19 <bswartz> I'm the lone voice of negativity here
16:40:37 <thingee> bswartz: not in this meeting :)
16:40:53 <bswartz> but I proposed an alternative to xyang's oversubscription proposal
16:41:02 <bswartz> basically I have 2 concerns
16:41:16 <xyang1> thingee: everyone said we were all in agreement on this topic before the summit:)
16:41:34 <bswartz> 1) I don't think it's the right UI for administrators to put oversubscribe ratios in the cinder.conf file -- I think it's better to make them an aspect of the volume_type
16:42:08 <avishay> bswartz: why?
16:42:27 <bswartz> 2) I think it's a bad idea to implement oversubscription by having the backends effectively lie to the scheduler about how much free space they have, I think the scheduler should know the true values and implement oversubscription itself
16:43:11 <xyang1> bswartz: I think you can only calculcate over subscription ratio for a backend or a pool, not a volume type though
16:43:12 <bswartz> avishay: thick provisioning (as opposed to thin) may be a value-add option you want to sell to users who are willing to pay more
16:43:22 <DuncanT> bswartz: re 2) I sort of agree
16:43:43 <xyang1> because a volume type can be associated with multiple backends, a single backend, or a backend can support multiple volume types
16:43:57 <DuncanT> re 1) though, the problem is that if a backend fills up with bronze volumes then that doesn't help the gold volumes at all
16:44:04 <bswartz> xyang1: I attempted to answer that question in my followup comment on the review
16:44:08 <xyang1> I just can't see how the formula will work to compare capacities from a backend with a ratio of a type
16:44:19 <asselin_> \o
16:44:21 <winston-d_> bswartz: my understanding is xyang1's proposal doesn't 'lie' to scheudler, it's a fact that right now, scheduler doesn't know the oversubscription ratio of backends, even though it knows actual/virtual provisioned capacity.
16:44:24 <jaypipes> xyang1, TobiasE, enikher: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L4453-L4526 <-- the relevant code for how NIC plugging events are handled.
16:44:41 <xyang1> jaypipes: thanks!  we'll take a look
16:44:41 <bswartz> DuncanT: so mixing gold and bronze on the same backend only works if the backend is enable to enforce that the gold volumes get all the space they're promised
16:44:52 <thingee> bswartz: hasn't the backend been able to lie to cinder scheduler for a while?
16:44:54 <jaypipes> xyang1, TobiasE, enikher: it's not pretty, but it works...
16:44:56 <thingee> bswartz: infinite
16:45:01 <bswartz> I proposed that backends also report a "space_reservation" capability
16:45:02 <DuncanT> bswartz: Quite. I don't think most can do that
16:45:30 <thingee> jaypipes: thanks
16:45:40 <bswartz> DuncanT: NetApp can do that -- I'm pretty sure EMC can do that -- LVM can do it
16:46:04 <TobiasE> jaypipes: thanks
16:46:17 <thingee> bswartz: WRT to #2, hasn't the backend been able to lie to scheduler for a while now? by saying "infinite"
16:46:29 <DuncanT> bswartz: LVM can? Ok. objection withdrawn
16:46:32 <bswartz> IMO it's pretty dumb to implement thin provisioning if you don't have a way to exempt some things from the thin provisioning
16:46:33 <winston-d_> bswartz: we have 'reserved_percentage' since filter_scheduler is introduced, but it's never used. what's the difference between this and 'space_reservation'?
16:46:56 <bswartz> otherwise you're asking for disaster
16:47:02 <xyang1> bswartz: is space_reservation actually a thick lun?
16:47:14 <bswartz> xyang1: yes -- it would be thick luns
16:47:32 <bswartz> winston-d: it's important that the reservation only applies to volumes for which the "gold" promise was made
16:47:43 <xyang1> bswartz: you can still create thick luns. this proposal doesn't prevent that
16:47:48 <bswartz> a space reserve % is a blanker reserve across the whole backend
16:47:52 <bswartz> blanket*
16:48:16 <jaypipes> xyang1, TobiasE, enikher: https://github.com/openstack/nova/blob/master/nova/compute/manager.py#L532 <-- the wait_for_instance_event() main method.
16:48:26 <xyang1> bswartz: there's already a used ratio that controls how much is really used
16:48:44 <flip214> jaypipes: thank you.
16:48:48 <xyang1> jaypipes: thanks!
16:48:51 <jaypipes> no problemo :)
16:49:09 <bswartz> anyways I'm not married to my proposal -- I think some middle ground can be found between what xyang proposed and what I proposed
16:49:27 <bswartz> I just wanted to raise those 2 concerns and suggest some ways to work around them
16:49:43 <thingee> bswartz: the used ratio xyang1 mentioned doesn't help?
16:49:49 <DuncanT> If you really can do per-volume reservations and get accurate answers out of the backend summing them then that sounds preferable
16:49:55 <xyang1> bswartz: so I think your driver can still calculate free space the way you described and send back to scheduler
16:50:11 <bswartz> the current proposal makes it impossible to mix "gold" and "bronze" storage on the same backend, assuming the admin wants gold to be thick and bronze to be thin
16:50:51 <DuncanT> I think I'm starting to agree with Ben here
16:51:16 <thingee> xyang1: do you have any plans to mix the different types?
16:51:18 <flip214> bswartz: it's possible. you only have to do one thin pool per thin LV ;)
16:51:44 <bswartz> flip214: are you referring to cinder pools?
16:51:53 <xyang1> bswartz: there are unresolved issues with Ben's proposal, so I'm still not sure
16:52:03 <bswartz> in priciple, a backend could have 2 pools, each with different oversubscribe amounts
16:52:13 <bswartz> but how would the admin enter that info into cinder.conf?
16:52:37 <thingee> xyang1: what's what?
16:52:39 <xyang1> thingee: the over subscription ratio should really be calculated for a pool or backend
16:53:01 <bswartz> xyang1: did you read my followup comment that answers your question?
16:53:04 <xyang1> it is the ratio of virtual capacity over total capacity
16:53:19 <bswartz> I can write up a whole spec if I need to to spell out all the details
16:53:23 <xyang1> bswartz: did you just update?
16:53:27 <bswartz> but I'd rather just adjust the existing spec
16:53:33 <bswartz> xyang1: like 2 hours ago
16:53:42 <thingee> bswartz: xyang1 already replied back to you
16:53:45 * jaypipes goes off hunting for lunch...
16:53:48 <winston-d_> bswartz: the driver should expose some per-pool level config options to solve that problem - per pool overcommit amounts
16:54:31 <DuncanT> winston-d_: Tricky with dynamic pools
16:54:45 <bswartz> DuncanT: that's what I was thinking
16:55:02 <winston-d_> DuncanT: unfortunately yes.
16:55:06 <bswartz> if we can make new pools dynamically, then the driver will have to make up some value or use a default under the existing spec
16:55:20 <thingee> 5 minute warning
16:55:54 <bswartz> I just have a feeling that the scheduler is in a better position than the backends to make decisions about where and when to oversubscribe, and if the scheduler needs more data from the backends to do so, then let's implement that
16:55:55 * DuncanT would like to read some more details of bswartz' approach, if that is doable?
16:56:08 <flip214> bswartz: +1
16:56:26 <DuncanT> It feels like it better answers the questions like dynamic pools and mixing types
16:56:27 <xyang1> so the problem is scheduler gets free capacity from  backend
16:56:44 <xyang1> scheduler doesn't calculate free capacity for the backend
16:56:52 <winston-d_> bswartz: i agree with your last statement, but i thought that was what xyang1's proposal plan to do?
16:57:03 <xyang1> to do what bswartz suggested, it seems to be a big over haul of the scheduler
16:57:42 <bswartz> winston-d: it's a step in the right direction, but I think we can do better
16:57:46 <xyang1> bswartz: so the formula you provided to calculate free space, that should be executed by the driver and report back in stats
16:57:57 <xyang1> the scheduler doesn't calculate that
16:58:05 <bswartz> just to be clear -- I'm not completely opposed to xyang's approach -- it's an improvement
16:58:11 <xyang1> available_capacity
16:58:20 <bswartz> I'm just worried that we're commiting ourselves to an interface that can't be improved on later
16:58:54 <bswartz> DuncanT: I can write a whole new spec
16:58:58 <bswartz> or a wiki or something
16:59:03 <bswartz> make a wall of text on the ML?
16:59:11 <bswartz> s/make/maybe/
16:59:16 <winston-d_> bswartz: thx, that'll be very helpful.
16:59:20 <DuncanT> bswartz: It sounds like picking any one of those would be a good idea. Spec seems the most logical
16:59:20 <xyang1> bswartz: so that is a problem because scheduler doesn't calculate available capacity for a backend
16:59:39 <thingee> xyang1: +1
17:00:00 <bswartz> xyang: looks like you and I will have to meet up again offline
17:00:08 <thingee> #endmeeting