15:03:34 #startmeeting scheduling 15:03:35 Meeting started Tue Sep 17 15:03:34 2013 UTC and is due to finish in 60 minutes. The chair is garyk. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:36 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:03:39 The meeting name has been set to 'scheduling' 15:03:55 hope that people are around to discuss 15:04:24 #topic summit sessions 15:04:50 Does anyone have any additional comments or updates to https://etherpad.openstack.org/IceHouse-Nova-Scheduler-Sessions 15:05:00 yes 15:05:21 MikeSpreitzer: ok, is that what you want to discuss later in the meeting or something else? 15:06:03 Debo and I added a topic called Smart Resource Placement.. and we have added a blueprint 15:06:06 Can I start with a clarification on the whole host allocation part... 15:06:20 Yathi: thanks! 15:06:56 MikeSpreitzer: what would you like clarification on? 15:06:56 MikeSpreitzer: Sure. Unless people want to discuss something else regarding the proposed summit sessions 15:06:59 Is whole host allocation about bare metal allocation , really exclusive allocation, or is it about some bigger unit of allocation (pool)? 15:07:28 MikeSpreitzer: It's not about baremtal. It's about allocation to host aggregates essentially 15:08:10 host aggregates will be set aside for exclusive use by a tenant, or delegated tenants 15:08:24 It is about giving one tenant control over a whole host aggregate, right? 15:08:30 yes 15:08:33 So it is about this larger unit of allocation. 15:08:38 yep 15:09:08 Why do we want that? 15:10:01 performance and isolation may be motivations 15:10:11 There are customer requests for this type of allocation. I've heard it's for concerns about resource isolation and somewhat for security concerns, though that's questionable 15:10:12 security too 15:11:21 Performance and isolation can be delivered by requesting performance and isolation from one undivided cloud, letting that cloud decide where to place for performance and isolation. 15:12:13 Same thing for security, really. 15:13:24 that's kind of what this is doing 15:13:35 host aggregates just help the cloud decide where to place instances 15:13:52 it is allowing the tenant to run their instances on specific resources that may be reserved for that specific tenants 15:14:22 That sounds like AZ functionality. 15:15:31 in my opinion it is just another option that is available that enables the cloud provider to meet certain standards. 15:15:33 My point here is that a holistic scheduler that is aware of isolation issues could place for isolation, without having a separate feature for dividing up the cloud a-priori. 15:16:41 MikeSpreitzer: that's likely the case, though how does it ensure that there remains enough spots to ensure isolation is possible? 15:16:46 i agree with you on that. but why not have the option of allocating a whole host? 15:16:59 ALaski: yes,... 15:17:20 (thinking on my feet here...) 15:17:39 But whole host allocation is very early right now. I know it's going to be the topic of a lot of discussion so alternative ideas are appreciated 15:17:54 OK, I'll stop here. I understand. 15:18:28 MikeSpreitzer: please feel free to take your questions or reservations to the lists or bring them up here. 15:18:37 Next session. For multiple scheduler policies, what sort of differences are involved? 15:19:03 One point that come up at the Neutron meeting last night and I am not sure if it is relevant here is that people wanted to work only with etherpads at the summit and 'ban' presentations. 15:19:43 glikson you around? 15:19:51 I like the idea. I think it's good for us to think about but probably a topic for the Nova meeting 15:20:08 ok. 15:20:24 MikeSpreitzer: alex is not here to elaborate. 15:20:39 OK, I'll pursue that separately 15:20:58 Is Boris Pavlovic here? 15:20:59 I think that it enables different scheduling policies to be invoked for different requests. That is, not have one global configuration 15:21:21 MikeSpreitzer: not sure. 15:21:40 Are there any additional things we want to discuss regarding the summit sessions? 15:21:43 boris is boris_42. Doesn't look like he's here 15:22:02 he is currently driving a rally 15:22:10 I see significant overlap between the "Scheduling across Services" session proposal and the "Smart Resource Placement" session proposal. 15:22:35 Yathi: do you think there is overlap here? 15:23:19 I think that there may be room for some collaboration here. 15:23:25 Smart Resource Placement provides a generic framework to allow for complex constraints 15:23:41 Yathi: between resources of different types? 15:24:02 yes that is part of our idea 15:24:24 Isn't that the essence of Scheduling Across Services? 15:25:36 i guess this framework is something that can be leveraged 15:25:47 to build complex constraints that run across services 15:25:51 It is in a sense and it is something that we touched on at the last summit but we did not make any progress with this 15:25:58 Anyway, I think I am just suggesting they go in the same session. 15:26:36 I suppose I am also suggesting the proponents talk to each other and see about a merge beforehand. 15:26:37 scheduling across services calls for orchestration framework 15:26:55 smart scheduling provides a pluggable solver framework 15:27:07 um, anything calls for orchestration. What exactly do you mean? 15:27:08 MikeSpreitzer: agreed. that is why we are discussing this now to try and be more efficient when it comes to the summit 15:27:58 trying to separate orchestration between services and decision making framework 15:28:10 that is what I meant 15:28:30 OK, no surprise there. The u-rpm proposal also has this, as does my group's running code. 15:28:32 #action consider combining "smart resource placement" and "multiple scheuler policies" to one session 15:28:40 if I understand you correctly 15:29:56 Anything else regarding summit or can we move to the resource tracking? 15:30:16 I'm done 15:30:32 #topic resource tracking 15:30:54 I brought this up last time 15:30:54 alaski: do you want to explain your ideas. last week we touched on it but the meeting was ending 15:31:31 So my main idea is that I think it would be helpful to persist the resource tracker off of the compute node 15:31:45 And have it be remotely accessible by other components, like conductor 15:32:06 What does the tracker do? 15:32:31 My thinking being that I want to speed up scheduling so I want to get a host from the scheduler and then consult the resource tracker quickly without having to roundtrip to the compute 15:32:50 MikeSpreitzer: the resource tracker is the definitive source of what resources are available/used on a compute 15:33:08 definitive in Nova I mean 15:33:09 Really definitive, or a convenient cache? 15:33:35 As definitive as we get in Nova, it could still mismatch reality a bit 15:33:43 I would expect the hypervisor is the definitive source regarding what is actually being used now. 15:33:51 alaski: in some cases there is querying from the db, would that be replaced by interfacing with the conductor instead? 15:34:11 MikeSpreitzer: you're correct, so in that sense it is a cache 15:34:17 This is where the distinction between what I call observed and target state matters... 15:34:46 The observed state is a convenient cache of the real state, and the target state is about allocations that may or may not be in effect right now. 15:35:06 garyk: I'm not sure where the db queries are, so i don't know. But possibly 15:36:07 alaski: cold this be related to the changes that boris and co are doing with the messages (i have yet to look at that code) 15:36:43 MikeSpreitzer: I have your emails flagged and need to read those thoroughly. I think we all want to move in a similar direction and need to figure out how to come together 15:36:55 The read of nova's DB, in preference to (cache of) read from hypervisors, would be to get target state. 15:37:49 Yes, I am also trying to catch up on the other work here and help figure out how to bring it all together. 15:37:50 garyk: possibly, in the sense of using the same pattern for setting it up. But resource tracker and scheduler are separate entities so it's not likely to be touched by his work 15:37:57 the complexity is being able to sync all schedulers 15:38:14 alaski: ok, understood 15:38:45 garyk: I wonder which multiplicity you are referring to. Different services, or different cells/regions/... ? 15:39:33 MikeSpreitzer: i am trying to understand how the conductor(s) will manage the data and enable the scheduler(s) to access and use it 15:40:19 (I need to learn what a conductor is) 15:40:49 garyk: I am still wondering which multiplicity of scheduler you are referring to. 15:41:02 garyk: the way I'm looking at it, the conductor queries the scheduler for a host or list of hosts, then it consults the resource tracker to make sure the instance will fit on that host 15:41:25 garyk: I do not know what you meant by "the changes that boris and co are doing with the messages"… can you identify another way? 15:41:32 Right now we have to send the build to the compute host before it can fail the resource tracker check. I want it to fail faster 15:41:53 alaski: wouldn't scheduler already check the available capacity? or you are suggesting to separate the two? 15:42:43 glikson: I think they're already separate. TBH I don't know everythng that the scheduler looks at, I should dig into that a bit 15:43:12 But I know that sometimes an instance is scheduled to a host and then there's not actually enough free memory to build the instance 15:43:26 alaski: in that case it would go to recheduling. 15:43:43 alaski: that might happen because of race conditions between schedulers, for example.. 15:44:11 alaski: I have had colleagues running clouds tell me that happens for a variety of reasons, mistakes/discrepencies are possible at every level 15:44:44 garyk: right. My main concern is optimizing it so the schedule/reschedule loop can be faster 15:44:59 i think that there is a over commit ratio that takes thing slike this into account (but may be wrong) 15:45:15 glikson: yes. My understanding is that scheduling is a best attempt fail fast setup. I want failure to be as fast as possible 15:45:41 alaski: I'm with you on that... 15:45:55 but every cache has lag and there can be a nasty surprise in rare cases. 15:46:08 alaski: is this something that will work with multiple conductors (sorry I am slow today) 15:46:41 alaski: so, are you thinking to keep that somewhere else than the DB, to keep better track of in-fly requests? 15:47:06 MikeSpreitzer: true. It's worth me looking into what can go wrong. I guess I'm thinking of a write through type cache where lag shouldn't present itself, but I suppose it could 15:47:18 at the moment the flow is api-> scheduler-> compute node 15:47:45 only one scheduler can allocate on a compute node, I take it. 15:47:58 garyk: it would need to. Right now resource tracker has synchronization based on being on a single compute, but moving it off the compute means we need to address synchronization another way 15:48:16 glikson: right now resource tracker is in memory on the compute, I want it in a db or other store 15:48:20 alaski: ok 15:48:52 I have heard that when VM creation or deletion has a strange failure, a zombie can be left using memory that the scheduler does not realize exists. 15:49:35 alaski: I thought it is already using the DB.. but maybe I'm confusing it with something else. 15:49:40 MikeSpreitzer: multiple schedulers can allocate to a compute. It's racy, but known to be racy, and the resource tracker is the control point 15:50:02 alaski: didn't we just move those updates from using rpc fanout to using the DB? 15:50:15 alaski: it would be intersting to discuss the data structure for the resource tracking in more detail 15:51:30 alaski: or are you talking about the part that generates those updates, at nova-compute? 15:51:42 glikson: I think we're talking about different things. But now you have me wondering if it's sending data up to the scheduler 15:52:33 glikson: it's possible. I'm talking about the part that runs instance_claim() to claim resources 15:52:46 but it may also be populating something for the scheduler to use 15:53:07 we may be running out of time. do we want to continue with this or switch to MikeSpreitzer mails and document? Would could discuss that next week as I am not sure many of us got to read https://docs.google.com/document/d/1hQQGHId-z1A5LOipnBXFhsU3VAMQdSe-UXvL4VPY4ps/edit 15:53:37 I say we switch. I think I need to research a bit more and come up with a more solid proposal 15:53:54 alaski: ok 15:54:05 #topic MikeSpreitzer's mail 15:54:22 MikeSpreitzer: with the few minutes left 15:54:32 we can always continue next week 15:54:52 I also had a quick question regarding the proposal to consider merging multi-sched and smart-sched proposals, when I was away for few minutes.. 15:55:01 I am finding rough alignment between the u-rpm proposal and my group's work.. 15:55:16 so I thought I would outline what we have worked out. 15:55:38 glikson: MikeSpreitzer suggest that have them togertheer as there may be some overlap 15:55:43 I have not yet roadmapped to a set of small changes, just wanted some review of the overall vision. 15:56:13 and hope to help out 15:56:14 glikson: i guess we can take it offline and discuss 15:56:24 garyk, glikson, I think garyk meant smart-sched and 'scheduling across services' 15:56:55 Yathi: yes, that is what I meant. sorry my bad 15:56:56 garyk: I think the two are complimentary -- one to introduce a new scheduler driver, and the second to have different driver configs co-exist within the same scheduler instance (regardless of which driver it is) 15:57:12 I think there is big overlap between those two session proposals and what I wrote about. 15:57:51 I think that we should try and read what you have written an then discuss it next week. 15:58:06 OK 15:58:11 I guess we could also have some time to see what we can combine (if possible) 15:58:23 #action discuss MikeSpreitzer proposal next week 15:58:41 #action check if we can merge/combine sessions 15:58:41 yathu: ah, ok. I personally think those two are also complementary -- the optimization approach is rather orthogonal to the scope of the optimization problem to solve.. 15:59:11 I thought it was said that Smart Resource Placement is also about going across services 16:00:03 Yathi: right? 16:00:21 I am sorry but I guess we will have to continue next week. 16:00:34 thanks guys 16:00:36 #endmeeting