15:08:20 #startmeeting scheduling 15:08:21 Meeting started Tue Oct 1 15:08:20 2013 UTC and is due to finish in 60 minutes. The chair is garyk. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:08:22 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:08:24 The meeting name has been set to 'scheduling' 15:08:43 agenda? 15:08:57 Not sure if you guys saw the mail I sent. I suggested we talk about maybe discussing an API to propose for summit 15:09:14 And in addition to this the Heat scheduling discussion on the list. 15:09:17 I saw the mail. 15:09:25 If I understand, it's really the same discussion. 15:09:25 So should we start with the API? 15:09:40 #topic Discuss scheduling API for summit 15:09:42 I think the API should look a lot like the Heat API — deal with this template 15:09:45 garyk, mike: +1 for API 15:09:58 but I am not sure if the scheduler API should look like HEAT .... 15:10:16 If we want to make a unified decision, we need unified input 15:10:20 It needs to specify the VRT or eq and policy handles 15:10:27 and it should be very very simple 15:10:29 alaski has been pushing a set of changes for the query scheduler - how does that relate to this ? 15:10:51 Oh, something else I need to learn. Can you give a pointer? 15:10:57 Mike: agree about the unified input ... hence maybe VRT ... .. 15:11:31 debo_os: can you please elaborate on VRT 15:11:32 but the API could be very simple which is easy to incrementally build and allow for extensions to have complex variants 15:11:50 VRT = virtual resource topology from Mike's jargon 15:12:12 Just to recap, last week we spoke about an trying to understand the following (3 things): 15:12:21 1. a user facing API 15:12:41 2. understanding which resources need to be tracked 15:12:47 for example, one needs to pass in groups of resources that need to be scheduled as a single entity for starters - network compute storage ... and pass a list of policy object 15:12:48 3. backend implementation 15:12:55 ok ... so we are one 1 15:13:04 debo_os: yes :) 15:13:32 I think 2, 3 are imp but we should have a simple 1. with room for complex extensions since this will evolve 15:13:45 maybe over 1-2 releases ... 15:13:47 debo_os: agreed. 15:13:55 I think we should consider two basic approaches to 1: (a) introduce a new service with its own API or (b) introduce a side-car to the existing Heat engine 15:14:12 do you want to explain what you are thinking and lets see if we can translate those ideas into api's 15:14:50 ( a ) would be to put up a service that has an API that is similar to Heat's — you can give it a template to instantiate / update, and ask about what happened to it. 15:14:52 garyk: was that for Mike 15:14:56 personally i think that heat is too high in the application stack to be able to make the scheduling decions 15:15:02 right 15:15:03 debo_os: it was for you 15:15:11 or not 15:15:41 ok ... so here is my simplification of the threads - I agree with Mike wrt specify all resources upfront for scheduler 15:16:01 I think of holistic infrastructure scheduling as a lower level thing than software orchestration preparation, 15:16:12 so an API should have the following objects: list of VRTs, list of policies and list of metadata 15:16:25 but infrastructure orchestration is downstream from holistic scheduling. 15:16:46 so in the simple variation, VRTs could be instances alone and implemented inside nova 15:17:04 in a complex variation, this thing could be built on top of nova, neutron, cinder 15:17:12 Debo: You mean a VRT would mention only VMs and be processed only inside nova? 15:17:50 in teh simplest implementation to show the API layer works with the policies 15:18:10 in the complex variant, we can do what you proposed ... specify the topology 15:18:15 Debo: I take that as agreement and elaboration 15:18:21 instances are the simplest incarnation of yoru topology - single node 15:18:27 oh 15:18:43 now I'm not so sure I understand you 15:19:12 ok lets look at your topology - nodes are compute, or storage say .... 15:19:14 debo_os: can you please give example so it can maybe help to explain 15:19:20 By "single node" you mean something with VRT syntax that just happens to have only one resource in it? 15:19:22 ok consider a simple web app 15:19:40 web layer (rails) = 1 VM connected to mysql (1VM) 15:20:21 in the simplest incarnation, you can say give me 2 VMs ... in the full VRT variation, its VM ---> VM 15:20:32 or rather ext_network--> VM --> VM 15:20:43 so you are specifying network and compute 15:20:54 (not to mention storage) 15:21:02 of course :) 15:21:18 of course not, or of course including 15:21:19 ? 15:21:28 but we can have teh same API and different variations ... 15:21:45 I'm a little lost. Is this a new API for nova, a new syntax to stuff into some existing nova API, or what? 15:21:46 VRT implemented in nova boils down to asking for only compute nodes 15:22:06 Mike this is a simple API that can both be done in nova for starters and then done as a service 15:22:09 without changing the API 15:22:19 Ah, thanks 15:22:21 and that allows you to plug in your secret sauce 15:22:37 since everyone will have their smart ways of implementing it 15:22:45 So it's a new API that takes a VRT. Start by implementing it for nova, later implement as a new service or expansion of Heat engine. Right? 15:22:50 someone will do LP based solving, someone will do nonlinear 15:22:58 yes :) 15:23:01 MikeSpreitzer: yes, 15:24:00 wow ... any disagreements? 15:24:02 if we could come to an agreement on what the API would look like then it could be useful to propose that 15:24:07 Nova already has an API for creating a set of VM instances, right? Can we expand the syntax accepted there? 15:24:10 PhilD: what do you think? 15:24:46 a few of us were trying to get instance_groups as an extension ... I think we could improve that API to have VRTs 15:24:49 simple variation will still have - list of instances (simple VRT), list of policies, and list of metadata right 15:24:51 Sorry, production issue came in 15:24:54 :-( 15:25:00 ok, np 15:25:41 MikeSpreitzer: the instance groups just have policies at the moment. It should be consumed by the API that we would like to propose (i think) 15:25:44 OK, I'm such a newbie I mostly read documentation. But the doc for the Nova API includes today an extension for placing a set of VMs. 15:25:46 so the API shoudl have CRUDs for VRTs, policies and metadata? 15:25:59 the VRTs should specify the request_spec right. . 15:26:07 yathi: yes 15:26:10 yes 15:26:23 I think that our goal here is to define the VRT 15:26:32 policies are parts of VRTs, you do not create policies independently 15:26:50 if we could define the API, flows and usecases then we could have a good starting point 15:26:53 ok ... so then a list of VRTs with embedded policies and metadata? 15:27:04 would that work 15:27:08 debo: yes, that's what I was thinking 15:27:10 Yeah, I think I'd need to see some examples of the VRT to really get a sense of what's being propsoed 15:27:12 embedded structure sounds good - makes it clean 15:27:31 I have one in my mind but Mike might have examples 15:27:34 The wiki page I wrote about policy extension gives much of that 15:27:44 lets consider ext_network --> VM --> VM to go back to the web app 15:27:47 MikeSpreitzer: please paste the link 15:27:48 I did not go all the way to concrete syntax, but am happy to discuss that here 15:27:52 where you need 1 VM for apache and 1 VM for mysql 15:27:55 Worked thruugh Use cases are always a good way exploring this kind of think IMO 15:28:09 we need to evolve the instance group api extension to consider this new thing 15:28:18 Yathi: agreed 15:28:43 debo_os: please continue with the example (we all seemed to interrupt you) 15:28:52 so VRT = { nodes, connections, policy, metadata} 15:29:05 where nodes = list of VM request specs 15:29:18 connections = list of pairs 15:29:29 https://wiki.openstack.org/wiki/Heat/PolicyExtension 15:30:19 tahts my simple use case 15:30:23 The way I see it, we already have syntax (in heat) for set of resources. Need to add only: (1) grouping, (2) policies, (3) way to put policies on relationships 15:30:47 debo_os: would the policies not be coupled with the connections 15:30:59 that is, some we may want affinity, others anti-affinity etc 15:31:07 garyk: VRT level policies are here 15:31:18 connection level policies need to be in 15:31:29 sorry should have added policy for all the types 15:31:32 exactly. You can attach a policy to a relationship between two group/resource 15:31:36 ok, understood. that sounds logical 15:31:40 can a VRT be a hierarchy of VRTs ? 15:31:47 why not ... 15:31:50 No, one VRT has a hierarchy of groups 15:31:53 then each VRT can have a VRT level policy 15:32:00 VRT is the whole you want processed at once 15:32:23 yup, kind of what we once described as ensembles 15:32:25 http://docwiki.cisco.com/wiki/Donabe_for_OpenStack .... we have an implementation fo recursive containers on openstack ... hence recursive VRTs 15:32:56 So we are agreed on the idea of recursive containment 15:32:57 with full GUI ... http://www.openstack.org/summit/portland-2013/session-videos/presentation/interactive-visual-orchestration-with-curvature-and-donabe 15:33:26 We could use a syntax that is oriented around the group AKA container, primarily a tree of those. 15:33:29 garyk: +1 lots of things are similar which is good since it means we all need something like this 15:33:41 mike: why tree and why not graphs 15:33:46 agreed. 15:34:13 "contain" pretty much implies a tree-like shape to me. 15:34:13 we can already do graphs with neutron and openstack ... 15:34:25 sorry nova 15:34:50 It may not seem like it to some of you, but I am actually trying to not go farther than necessary here 15:35:00 I think a tree is sufficient 15:35:04 mike: while i agree it looks like treee ... i can also think it looks like a graph 15:35:22 Debo: acyclic, right? 15:35:22 esp for describing virtual clusters for intense workloads 15:35:56 if you spec bw constraints you might want to spec a clique with max bw of the edges 15:36:24 see this is why we need an abstract API 15:36:30 Debo: Is that an answer to my question about whether the graph can have cycles? 15:36:33 in one implementation you could restrict VRTs to trees 15:36:35 yes it can 15:36:54 I mean Neutron would support it ... why not then 15:37:12 let's start with simple examples :) 15:37:13 Can you give us some use cases that require something more general than a tree? 15:37:31 i really think that we need to start with something simple. 15:37:34 sure ... in a hadoop env, you might want to define a clque 15:37:35 yes 15:37:48 thats why keeping the API to the VRT level is what I would love to see 15:37:58 if we go for complex there is no chance we are going to get it through (it should be extensible to be built on in the fyture) 15:38:00 since there is no end to making this API look better 15:38:14 The clique is not a problem for a tree. One vertex for the parent, one for each member. 15:38:18 I am happy if we agree to VRTs with embedded metadata and policy 15:38:19 okay going back to the API.. what is a POLICY ? 15:38:23 members all children of the same parent 15:38:31 I have seen flavors of affinity, antiaffinity etc 15:38:40 Yes... 15:38:48 policy could be simple named policy handles implemented by whoever is provising u the scheduling 15:38:50 but do we have a generic idea of what could a policy be like 15:38:52 collocation, anti-collocation 15:39:01 yeah so these are named objects 15:39:04 Yes, we need to define semantics 15:39:05 proximity and compute resources 15:39:12 I think some take parameters 15:39:20 mike: do we need to define semantics in the api right now 15:39:27 for example, anti-collocation to what level of granularity? Rack, machine, … ? 15:39:36 why dont we agree on the basic high level objects that the API needs 15:39:39 i think that the onus is on us to try and define the API. then provide examples, use cases and flows 15:39:39 I like the idea of named policy handles.. leaving the implementation details outside 15:40:15 so each kind of implementation of the "SMART resource placement engine" can use policies differently 15:40:22 A policy "instance" as it appears in a VRT needs only to name the policy, the thing or two to which it applies, and give the values of the relevant parameters. 15:40:30 the more robust the api the better (i know it sounds like lip service, but we really need a good base here) 15:40:30 agreed 15:41:03 garyk: trying to see if we need anything more than a list of VRTs with policy names (maybe params) 15:41:15 else the API looks simple from a 30K ft alt 15:41:37 I oulined a proposal in https://wiki.openstack.org/wiki/Heat/PolicyExtension 15:41:48 it should compile on paper (or in our case interpret on paper) 15:41:53 we need groups, a way to apply a policy to a group, and a way to apply policies to a pair of groups 15:42:20 you could allow resources in places of groups, or not, depending on evolution tactics 15:42:33 mike: could we keep teh API simple and just have VRTs with policies 15:42:39 would that break your use cases 15:43:01 then it would be really simple and the impl could be as elaborate as you want! 15:43:08 We need a way to apply a policy to a relationship 15:43:20 apply = implementation, right? 15:43:27 no,... 15:43:41 e.g., "I need 1 Gbps between A and B" 15:43:58 yes thats a policy for the connection between A, B 15:44:08 or, for firewall, "A should be able to open a TCP connection to port 8080 on B" 15:44:12 so when you put edges in your VRT, you should have edge policy 15:44:31 hence edges = 15:44:33 right, call them edges or relationships, we need them 15:44:40 nodes = 15:44:47 VRT = 15:45:06 +1 for VRT = 15:45:07 I am just using std graph terminology 15:45:16 G=(V,E) :) 15:45:39 and we need recursive grouping 15:45:48 in the simplest case edges =[], and we can do this in Nova 15:45:59 nodes = VMs only 15:46:03 The three key ideas are: recursive grouping, relationships, and policy applied to group/element or relationship 15:46:18 Can an edge include more than 2 nodes? 15:46:22 no 15:46:38 ok this definition will apply if you consider node = abstract node that represents anotehr VRT 15:46:43 And we need edges to be directed, in some cases (e.g., firewall rule) 15:46:49 exept that you need metadata 15:46:52 I was going to ask the same thing can the node be a VRT 15:46:55 for abstract 15:47:12 for a node to be treated as a VT you need ingress border nodes for a given VRT 15:47:13 so no way to secure 1 GB between A, B and C? 15:47:32 Doron: you need to do A-B, B-C C-A 15:47:44 yeah... 15:47:46 I know, but one of them may fail. 15:47:54 which invalidates everything 15:48:01 if you want shared A,B,C you need a special VRT with policies that implement that 15:48:07 doron: that is why the scheduling should be done at one shot 15:48:10 and then stick this VRT as a node in the general VRT 15:48:14 Most policies are essentially about a pairwise relationship, so applying such a policy reduces to a bunch of atomic relationships 15:49:10 I'm aware of the atomic need, which sometimes 15:49:22 ends up with a need for more than a pair of nodes. 15:49:43 if you take afinity, 15:49:46 applying a dyadic policy to a group means to apply it to every pair within the group 15:49:46 so I guess we are all saying the same thing with slight changes in jargon 15:49:57 so a dict of jargon mappings would suffice :) 15:50:41 Yes, I imagine you could apply a collocation policy to a group of 7 VMs, that means all pairs are collocated 15:50:46 so any disagrees with the simple API = VRTs = ? 15:50:57 with implementation in plugins 15:51:08 What do you mean by metadata? Is that the parameters of the policies that take parameters? 15:51:14 MikeSpreitzer: I may need some more info on your suggestion, but I can look into it later. 15:51:15 thats undefined 15:51:17 for now 15:51:22 its defined by the implementation 15:51:41 its a placeholder for any random attributes I guess 15:51:43 What does metadata look like? To what is it attached? Why do you want it? 15:51:49 you can think of it that way 15:51:54 it would be nice if we could get the api on paper so people could see it and think of issues and problems 15:51:55 in python a simple dictionary ? 15:52:18 So we can attach a general dictionary to any vertex in the graph? 15:52:20 the metdata will just be key/value pairs 15:53:43 Is there convergence on this: a VRT is a graph, with policies applied to vertices and edges (which are directed), metadata applied to vertices. A vertex can be a resource or another VRT. 15:54:41 +1 15:54:58 And the API will get all the relevant VRTs at once. 15:55:07 So it can work on all of the at once. 15:55:09 we are running out of time. debo_os would it be possible you write up the api and share it with everyone and we can discuss in more detail next week 15:55:36 sure ... 15:55:42 would love to rope in mike and yathi too 15:55:48 but I take teh aI 15:55:53 great. anyone else want to help debo_os write this up 15:55:56 sure. Let's agree on when/how to talk 15:56:04 would love to work with Debo on this 15:56:17 cool. i'll jump in too 15:56:23 awresome 15:56:30 this is one set of API 15:56:44 We didn't get to 2 or 3 15:56:45 garyk: you would have been there even if you hadn't volunteered :) we would have dragged u 15:56:50 what about the API to the other parts of the big vision 15:56:56 :) 15:57:01 Any quick feedback on using host aggregates to convey location structure? 15:57:16 Yathi: once we have the foundations we can try and map it to all of the use cases we can think of 15:58:22 ok 15:58:26 ok so we have agreement wrt API? 15:58:32 MikeSpreitzer: are you talking about a user configuration of having the aggregate report the 'proximity' 15:58:56 only the high level VRTs etc 15:58:57 ? 15:59:04 I'm getting to the questions of the other APIs. The scheduler will need location info, so how is that represented/discovered/conveyed? 15:59:34 MikeSpreitzer: that is certainly something that we need to discuss 15:59:47 we will not address the how it is discovered yet.. 15:59:49 i think that we are out of time. can we continue offline or next week? 15:59:55 One direction would be to define some key:value pairs to use in host aggregates, use host aggregates to represent the structure of the datacenter 15:59:57 but represented and conveyed is something to tackle first 16:00:14 I'll be watching the ML 16:00:20 ok. great. 16:00:22 thank guys 16:00:25 #endmeeting