15:00:38 <n0ano> #startmeeting gantt
15:00:39 <openstack> Meeting started Tue Jul  8 15:00:38 2014 UTC and is due to finish in 60 minutes.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:40 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:42 <openstack> The meeting name has been set to 'gantt'
15:00:47 <n0ano> anyone here to talk about the scheduler?
15:00:51 <jaypipes> o/
15:00:55 <bauzas> o/
15:00:59 <bauzas> uh, \o
15:01:10 * n0ano mutters about lefties
15:01:30 <jaypipes> heh
15:02:01 <mspreitz> hi
15:02:24 <ericfriz> hi
15:02:25 <lisaZ> hi
15:02:35 <n0ano> well, we have 2 completely non-contentious items today, should be short :-)
15:02:46 <n0ano> #topic code forklift
15:02:59 <bauzas> n0ano: don't promise that kind of thing :)
15:03:07 <n0ano> note the smiley
15:03:16 * bauzas has to learn bbcode
15:03:32 <n0ano> As I read the email threads, the concensus seems to be...
15:03:42 <bauzas> so I guess it's my turn ?
15:03:47 <n0ano> 1)  Clean up `all` of the nova APIs...
15:04:00 <n0ano> 2) Split out a fully functional gantt...
15:04:12 <bauzas> 3) Take a coffee and rest
15:04:27 <n0ano> 3) Gantt immediately becomes the default scheduler
15:04:40 <bauzas> indeed
15:04:42 <n0ano> what was your take bauzas
15:05:01 <bauzas> yey, I thought about that
15:05:10 <bauzas> I can understand the worries here
15:05:44 <bauzas> so,
15:06:02 <bauzas> I think the main difference is when we split
15:06:33 <jaypipes> n0ano: it's step 1) that needs a lot of effort right now, IMO. 2) and 3) will naturally happen when the APIs are cleaned up.
15:06:35 <bauzas> not that much difference, because if Gantt code is 100% nova-scheduler, Tempest and CI would remain the same
15:07:06 <bauzas> I'm just worried about 1)
15:07:21 <bauzas> we need to get a consensus on them
15:07:31 <n0ano> jaypipes, I think 2) is harder than we think (based upon my experience the last time we tried to do the split), hence my perference (voted down) to do the split early rather than late
15:08:13 <bauzas> n0ano: I think you had difficulties because the code was not properly splitted
15:08:33 <n0ano> bauzas, the critical changes are the client library and the isolate db, both of which are being worked on
15:08:44 <jaypipes> n0ano: the split will be much easier if 1) is done properly and fully.
15:08:54 <bauzas> n0ano: yey, but for both of them, we don't have a full agreement over them
15:09:26 <bauzas> n0ano: take your "delete aggregate" question
15:09:36 <bauzas> n0ano: that's still something unmanaged
15:09:55 <n0ano> well, it's pretty clear to me we're not going to make Juno so, as long as we stay focused on cleaning up the interfaces, we have time to do it right
15:09:57 <bauzas> so, long story short, let's focus on 1)
15:10:30 <bauzas> n0ano: that doesn't necessary mean we cannot work on updating Gantt as POC
15:10:42 <jaypipes> as an example, is it agreed that the new scheduler API will *fully own* the management of compute resources? I think the resource tracker should be moved entirely into the scheduler (it kind of already is with the HostStateManager) and the nova-compute workers should just call the scheduler API to update resource information
15:11:42 <bauzas> jaypipes: the rationale behind the split was to be the less disruptive as possible
15:12:03 <n0ano> jaypipes, that sounds like a good idea to me, that should be more a code layout rather than an API change
15:12:07 <bauzas> jaypipes: hence the idea of a scheduler library
15:12:16 <jaypipes> bauzas: well, if we clean up the interface call signatures, don't you think a split would be less disriuptive?
15:12:29 <jaypipes> n0ano: yes, agreed.
15:12:35 <bauzas> is johnthetubaguy over here ?
15:13:08 <n0ano> bauzas, the client lib is still crucial, both for cleanly identifying the interface and for future users of gantt
15:14:02 <bauzas> I still think the most crucial bp is how we isolate the scheduler
15:14:32 <n0ano> bauzas, +1
15:14:34 <jaypipes> n0ano: do we have a document somewhere that clearly lists all of the interfaces currently between nova-conductor and the scheduler and nova-compute and the scheduler (via the compute_node object, of course..)
15:14:36 <jaypipes> ?
15:14:48 <bauzas> I mean, we can decide that ResourceTracker would be replaced by a new one, we would still have to manage a feature compatibility with the existing filters
15:15:01 <bauzas> so we would still have to deal with aggregates or instance groups
15:15:44 <bauzas> jaypipes: the spec I wrote about isolate-sched-db is the most effective one I thnik
15:15:49 <n0ano> bauzas, we're nt talking about a new ResourceTracker, just having the current code run in the gantt context
15:15:50 <jaypipes> k
15:16:49 <bauzas> n0ano: what do you mean by running in the gantt context ?
15:17:29 <n0ano> RT should be part of the scheduler (be it the current one or gantt), not part of nova
15:19:13 <bauzas> n0ano: well I can't see the whole interest of it
15:19:26 <bauzas> n0ano: I mean if we say that RT would be owned by Gantt
15:19:45 <bauzas> n0ano: what would be the interest of it?
15:20:04 <bauzas> n0ano: the idea is that Gantt would provide a client that other projects could use
15:20:44 <bauzas> n0ano: so, to me, it's just saying that we could have a Resource API and a Schedule API
15:21:07 <bauzas> I mean, whatever the project is, it updates the scheduler with its view of resources
15:21:22 <n0ano> and part of that client interface would be resouce tracking, to me the resource API would just be a part of the scheduler api
15:21:58 <jaypipes> agree with n0ano on this.
15:22:42 <n0ano> I think we're in violent agreement, there
15:22:49 <n0ano> didn't finish
15:23:07 <jaypipes> basically, part of the scheduler API should be "get me a set of claims that represent these requested sets of resources" and another part of the API should be "update your view of the state of resource usage on these resource providers"
15:23:28 <n0ano> jaypipes, +1
15:23:38 <jaypipes> the claims piece is *currently* in the RT, not the scheduler, and that's what needs to be "cleaned up" in step 1), IMO.
15:24:19 <bauzas> jaypipes: well, I think I see a slight difference here
15:24:33 <jaypipes> the reason it's not as simple as just a layout change in the code is because the RT currently only handles claims for the nova-compute worker it is running in, whereas the moved claims code would need to handle all (or a subset) of compute nodes in the deployment
15:24:48 <bauzas> jaypipes: to me, the resources are still owned by the project, but the project just reports use of the resources to the scheduler
15:25:53 <jaypipes> bauzas: I disagree (respectfully of course!). The resources are owned by the system (the scheduler), and are rented/leased to the project. The ultimate arbiter of resource usage and claims should always be the scheduler
15:26:48 <bauzas> jaypipes: I see now your point
15:27:05 <bauzas> jaypipes: that's what we tried to provide with Climate
15:27:16 <n0ano> I think I have to agree with jaypipes on this one
15:27:35 <bauzas> jaypipes: the resources are owned by climate, but the execution is deferred to the project
15:28:34 <bauzas> so that means we agreed on the fact that nova-computes are a scheduling resource, not a nova on
15:28:36 <bauzas> one
15:29:12 <bauzas> woah, such a change, and we go back to what we discussed : compute_nodes should be a scheduler part
15:29:52 <bauzas> jaypipes: so that also means that if you want to declare a new resource, it has to go to gantt, not nova
15:29:58 <n0ano> I don't think it's that big a change, now the compute nodes send resource claim/usage to the scheduler, the way it should be done
15:30:04 <jaypipes> n0ano: so, I agree with you that the nova-scheduler/gantt split will not happen in Juno. I think we should put together a doc on the existing "interfaces" between the nova-compute, nova-conductor, and nova-scheduler pieces, and clearly show what we'd like to change regarding the call signatures. I think that woudl be a good first building block. I think the isolate-db-scheduler blueprint would sort of naturally f
15:30:04 <jaypipes> ollow the completion of such a set of refactoring steps.
15:30:20 <jaypipes> bauzas: yes, that is exactly correct.
15:30:57 * bauzas is thinking that jaypipes has to see the Climate APIs...
15:31:11 * jaypipes already knows the Climate APIs ;)
15:31:45 <bauzas> we defined a couple of time ago what we call a os-hosts api
15:31:48 <jaypipes> bauzas: they are on target for some things I'm talking about, but not other things... namely, the temporal aspect of the API reservations don't meet my needs.
15:31:49 <bauzas> for reserving hosts
15:32:06 <n0ano> the good news is I don't think we need to change anything we are currently doing, just expand our scope a little bit
15:32:13 <bauzas> jaypipes: I'm not saying that Climate *is* Gantt
15:32:28 <jaypipes> bauzas: oh, I know! I wasn't saying that either. :)
15:32:43 <bauzas> jaypipes: I'm just saying that the concept of reserving a resource is close to schedule a resource :)
15:32:47 <jaypipes> n0ano: s/expand/refine and expand/, but yeah.
15:33:24 <bauzas> woah, I still think that's a big change for nova, not just a refactoring thing
15:33:30 <jaypipes> bauzas: well, it kind of is, but the additional aspect of date/time period (temporal) makes it more complicated than what needs to be in gantt..
15:33:45 <n0ano> we have another topic today, let's think about what we're doing and try and clarify via email exactly what happens next
15:33:59 * bauzas is just thinking about all the discussions he had with johnthetubaguy and a certain CRUD interface for Gantt...
15:34:22 * jaypipes clearly lacks the historical context :(
15:34:36 * johnthetubaguy felt that his name was called
15:34:44 <bauzas> johnthetubaguy: good to see you here :)
15:34:47 <johnthetubaguy> sorry, was in another meeting, but out of that now
15:35:16 <n0ano> johnthetubaguy, might want to review the log but I want to move on to another topic now
15:35:17 <bauzas> johnthetubaguy: we're just discussing on the sched split, and jaypipes has an interesting PoV whose we agree
15:35:44 <johnthetubaguy> n0ano: OK, will review the log
15:35:47 <n0ano> moving on...
15:35:56 <n0ano> #topic fair share scheduler
15:36:03 <n0ano> lisaZ, still there?
15:36:06 <lisaZ> yes
15:36:07 <lisaZ> hi
15:36:22 <lisaZ> did you receive our doc? any comments?
15:36:38 <n0ano> I read your use case, looks interesting, how do you respond to the comment this this seems very similar to AWS spot intances?
15:36:44 <n0ano> s/intances/instances
15:37:03 <lisaZ> I already answered in the ML
15:37:23 <bauzas> lisaZ: and I did an update too
15:37:25 <lisaZ> our model is similar to the Amazon's spot instances model because both try to maximize the resource utilization. The main difference is the mechanism used for assigning resources to the users (the user's offer in terms of money vs the user's share). They differ even on how they release the allocated resources. In our model, the user, whenever requires the creation of a Type 3 VM, she has to select one of the possible types of "life time" (sho
15:37:25 <lisaZ> rt = 4 hours, medium = 24 hours, long = 48 hours). When the time expires, the VM is automatically released (if not explicitly released by the user).
15:37:25 <lisaZ> Instead, in Amazon, the spot instance is released whenever the spot price rises.
15:37:27 <n0ano> sorry, missed that
15:38:22 <bauzas> lisaZ: as I said, that's just a matter of stopping the instance on a certain event
15:38:41 <bauzas> lisaZ: which can be triggered either by the price or whatever else
15:39:11 <johnthetubaguy> interesting, I assumed the user's VM could get evicted, if someone else who reserved the space wanted the slot you were "optimistically" allowed to take
15:39:38 <n0ano> anyway, my concern is that, in many respects, your usage is more a data center use case rather than a cloud use case, do your needs really fit a cloud model
15:39:48 <johnthetubaguy> but having all VMs leased for a specific amount of time is more like a batch processing system, and is an interesting alternative pattern
15:41:07 <bauzas> johnthetubaguy: that's all the matter of guarantteing your lease or not
15:41:11 <johnthetubaguy> yeah, its a grid computing view of the world, and really does change the meaning of the API, but I suspect we could do something for that use case, its just I expect the lease management will be outside of Nova for the moment
15:41:45 <johnthetubaguy> and yeah, that sounds like some of what climate was looking at, but prehaps a little different
15:42:09 <johnthetubaguy> I could see an API to request complete "slots" that is implemented by calling nova to create and delete servers
15:42:28 <lisaZ> the lease solves in part our problems
15:43:08 <lisaZ> we need a fair share algorithm which provides priorities on user requests
15:43:25 <johnthetubaguy> lisaZ: would you idea work, if its something external to Nova, that just creates/deletes resources inside nova?
15:43:39 <lisaZ> able to queue them if the resources are not available
15:43:54 <schwicke> I think there is another difference: in Lisa's model requests which currently cannot be fulfilled because of lack of resources can be dispatched later on.
15:44:25 <schwicke> that is a quite important as it helps a site to keep their resources busy
15:44:28 <bauzas> schwicke: can't see why Climate couldn't handle that
15:45:50 <schwicke> the idea of climate seems more like an advance reservation kind of scheduling, right ?
15:45:50 <lisaZ> Climate is not able to distinguish privileged users from the simple ones
15:46:20 <lisaZ> Climate covers just one use case
15:46:58 <lisaZ> but it is not enough
15:47:10 <johnthetubaguy> lisaZ: I would love if you thought about this as a separate API, and we work out what it does, then we can look at how to use bits of climate and nova to implement what you need
15:47:18 <bauzas> lisaZ: that's what I called a best-effort mode for non-privileged users
15:47:20 <lisaZ> it is missing of the fair share algorithm
15:47:53 <lisaZ> correct
15:48:04 <lisaZ> from my point oh view
15:48:31 <lisaZ> blazar is a component above the scheduler
15:48:40 <bauzas> johnthetubaguy: +1
15:48:47 <johnthetubaguy> I see it almost like Heat, a description of things you want to create in Nova, but then there is some scheduling to give you the capacity you are allowed, balanced with other current requests in the system, rather than scaling up/down based on load
15:49:07 <bauzas> right
15:49:07 <lisaZ> suppose to replace the filter-scheduler with our fairsharescheduler
15:49:28 <lisaZ> in this scenario all our use case are implemented
15:49:40 <lisaZ> because blazar provides the missing
15:49:51 <lisaZ> feature which is the advanced reservation
15:49:54 <johnthetubaguy> the problem is the current nova API does not really model what you want to model
15:50:07 <lisaZ> why not?
15:50:17 <johnthetubaguy> you need uses to manage a queue of jobs, that may have an associated nova server
15:50:34 <lisaZ> it si just an advanced scheduler
15:50:49 <lisaZ> a true scheduler a not just a matchmaker of resources
15:51:01 <johnthetubaguy> its also a different concept of flavor, and has automatic deleteing, and I am sure other things will crop up
15:51:22 <bauzas> lisaZ: at the moment, nova-scheduler is a resource placement engine
15:51:28 <johnthetubaguy> we don't really have a scheduler, we just have a simple placement engine for requests that come in, if there is no space, the server build fails
15:51:43 <bauzas> lisaZ: if you want to involve time-based decisions, nova-scheduler is not right for it
15:51:47 <lisaZ> correct
15:51:52 <johnthetubaguy> this changes the contract quite a lot, you end up having a queued state
15:52:12 <johnthetubaguy> feels like that needs to be managed externally to Nova, if possible, at least for the moment
15:52:38 <bauzas> at least until Nova uses an external scheduler :)
15:53:16 <johnthetubaguy> bauzas: its the nova API changes that worry me, it feels like a heat feature, in a way
15:53:41 <bauzas> johnthetubaguy: indeed, that's why we moved to a separate API for Climate
15:53:59 <bauzas> johnthetubaguy: formerly, we were planning to support reservation features in Nova
15:54:07 <bauzas> johnthetubaguy: but we moved to a separate project because of the API need
15:54:19 <johnthetubaguy> right, same issue here
15:54:36 <lisaZ> let me understand: which relationship there is between nova-scheduler and Gantt? Gantt will replace definitely nova-scheduler?
15:54:49 <bauzas> lisaZ: that's the idea, yes
15:54:58 <lisaZ> ok
15:55:04 <johnthetubaguy> lisaZ: short term, its a refactoring effort, to help scale out how the projects grow
15:55:14 <bauzas> nova-scheduler will become deprecated once Gantt becomes fully operational
15:55:29 <lisaZ> our solution is between blazar and gannt
15:55:33 <n0ano> think nova-volume to cinder, same idea
15:55:45 <lisaZ> we need both
15:56:18 <bauzas> lisaZ: I'm just saying that you can probably have some concepts in Blazar that you can make use, without redoing all the wheel
15:56:43 <bauzas> Gantt is still under constructionhere
15:56:58 <lisaZ> that's correct in fact I don't want redoing the well ;-)
15:57:04 <lisaZ> that's correct in fact I don't want redoing the whell ;-)
15:57:14 <bauzas> probably in the future, Climate won't have any sense as a separate project - or not
15:58:18 <bauzas> johnthetubaguy: have time for discussing with jaypipes and n0ano about what we discussed earlier ?
15:58:34 <bauzas> after the meeting I mean ?
15:58:53 <n0ano> bauzas, give me about 15 min. and I can join on the nova channel
15:58:54 <johnthetubaguy> lisaZ: if we agree the idea API, what it means, and how users use it, I think it will be much clearer about the best way forward
15:59:00 <johnthetubaguy> bauzas: sure
15:59:07 <bauzas> johnthetubaguy: cool, n0ano: sure
15:59:41 <bauzas> lisaZ: I think the first thing is to define the API interfaces for your needs
15:59:54 <bauzas> lisaZ: I mean the contract in between your user and the system
16:00:00 <jaypipes> bauzas: unfortunately, I need to leave for the airport in a few minutes...
16:00:04 <n0ano> approaching the top of the hour, we'll have to continue this via email, let me know if you want to keep this subject on the agenda for next week.
16:00:06 <bauzas> dammit
16:00:20 <n0ano> tnx everyone
16:00:21 <lisaZ> meantime could you provide me some pointers to your new architecture?
16:00:30 <n0ano> #endmeeting