#openstack-meeting log

15:01:05 <n0ano> #startmeeting scheduler
15:01:06 <openstack> Meeting started Tue Aug 13 15:01:05 2013 UTC and is due to finish in 60 minutes.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:07 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:09 <openstack> The meeting name has been set to 'scheduler'
15:01:29 <n0ano> anyone here for the scheduler meeting?
15:01:47 <PhilDay> Yep (but I have to drop at half past)
15:02:13 <jog0> o/
15:03:12 <n0ano> let's wait a minute or two and then go...
15:03:16 <jgallard> hi all
15:04:11 <n0ano> #topic Perspective for nova scheduler
15:04:25 <n0ano> I hope eveyone has had a chance to look at Bors' paper
15:04:30 <n0ano> #link https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit#heading=h.6ixj0ctv4rwu
15:05:29 <garyk> hi guys
15:05:29 <n0ano> my read is that basically he is saying update state info through the DB is a scaling problem, doing RPC calls to the scheduler would solve this problem
15:06:25 <n0ano> my intuition is to agree with that but I believe there is a significant group the feels the DB is a more scalable solution
15:06:35 <n0ano> how to we resolve this dichotomy
15:07:35 <PhilDay> I think there were also some open questions on how does a newly started scheduler get a full set of state
15:07:56 <jog0> wasn't a great deal of this hashed out on the ML?
15:08:26 <PhilDay> And is all of this is now held just in memory by the scheduler(s) how do we get visibility into that state
15:08:51 <n0ano> jog0, discussed on the ML -yes, resolved -I don't think so
15:09:07 <PhilDay> Could be that I missed that mlist discussion (had way to much in my inbox when I came back - must stop taking holidays)
15:09:31 <n0ano> PhilDay, good points but are those implementation details or architectural problems
15:10:09 <jog0> PhilDay: here is the thread http://lists.openstack.org/pipermail/openstack-dev/2013-July/012221.html
15:10:24 <PhilDay> If we're moving to a model of not persisting the scheduling related state in the DB, then I'd say they are architectural
15:10:39 <PhilDay> @jog0 - thanks
15:12:45 <PhilDay> I guess what I'm thinking is that in addition to "time to scheudule a new VM" I'd like to see "time for a new scheduler to retrieve its state" as an explicit metric
15:12:47 <n0ano> PhilDay, would providing an API to the scheduler to access this info be sufficient or would somehow periodically syncing to the DB work
15:13:13 <jog0> there was some good ideas at the end ofthat thread
15:13:20 <PhilDay> I guess either
15:13:41 <PhilDay> Ok - sounds like i have some more reading to do before I can contribute intellegently ;-)
15:14:54 <PhilDay> <lurk mode on>
15:15:05 <jog0> also boris-42's paper didn't clearly show the actual issue (IE not sure how to reproduce there results)
15:15:17 <boris-42> jog0 hi
15:15:26 <boris-42> jog0 whatsup?
15:15:51 <jog0> I think we all agree the current scheduler has limitiations the questions are at what point exactly and are there any good short term fixes we can do for now, until Icehouse dev is open
15:16:07 <jog0> boris-42: see backlog
15:16:14 <boris-42> jog0 yeah you are doing great work
15:16:21 <boris-42> jog0 around removing fanout]
15:17:03 <boris-42> n0ano hi
15:17:13 <n0ano> boris-42, welcome
15:17:44 <boris-42> n0ano we updated today our document
15:17:54 <boris-42> n0ano so we are not doing fanout call to scheduler
15:18:08 <PhilDay> As I understood last weeks discussion, any short term fix would keep the DB in place but the path for updates would change from "comp->conductor->DB" to "comp->Sched->DB"
15:18:13 <n0ano> must read new doc
15:18:31 <boris-42> n0ano there is not so much change
15:18:43 <boris-42> PhilDay not only this change
15:18:48 <PhilDay> Do we really think there is time to move to a non-DB model still in Havana ?
15:19:00 <boris-42> PhilDay no
15:19:07 <boris-42> PhilDay this should be done in I cycle
15:19:34 <jog0> PhilDay: things like more optimized DB queries or caching are options right now
15:19:43 <PhilDay> Ok, so if its all to be done in I - shouldn't this be a topic to be bottomed out in HK ?
15:19:57 <boris-42> PhilDay yes
15:20:04 <n0ano> PhilDay, absolutly, but doing some prep work before hand is good
15:20:38 <boris-42> PhilDay yeah we would like to prapaer 1) all code 2) benchmark results on real deployments before summit
15:20:59 <PhilDay> OK - sorry I got the wrong end of the stick from the start of this then.  I thought you were trying to drive to a conclusion on the architecture in here
15:21:12 <boris-42> nonon
15:21:13 <boris-42> =)
15:21:18 <jog0> PhilDay: thats what I thought too
15:21:18 <n0ano> boris-42, have you considered PhilDay concern that you need a way to look at the compute node states, moving from the DB makes that hard
15:21:47 <jog0> so I liked some of Clint's ideas for scheduling
15:21:50 <boris-42> n0ano we will put all data about HOST into DB
15:22:03 <boris-42> scheduler DB
15:22:08 <boris-42> not only compute_node table
15:22:19 <boris-42> but also data from compute_node_stats and probably from cinder
15:22:31 <boris-42> to be able to use different data from different project in our scheduler
15:22:44 <PhilDay> Just to be clear I want any  query on data to be behind an API - so I'n not wedded to it being in the DB, I just want to be sure I don't lose any visbility
15:23:04 <boris-42> PhilDay visibility about what?
15:23:22 <PhilDay> The data the scheduler is using (i.e host states, etc_
15:23:32 <boris-42> PhilDay and?
15:23:38 <jog0> boris-42: any cross project DB stuff makes things much harder
15:23:44 <boris-42> jog0 no
15:23:49 <boris-42> jog0 it don't make
15:23:52 <jog0> politically
15:24:01 <boris-42> jog0 our goal is to have one scheduler
15:24:07 <boris-42> that keeps all data about hosts
15:24:12 <n0ano> personally, I like an API and a back channel (for debugging when the API server fails)
15:24:17 <jog0> it becomes a nother contractual API  to maintain
15:24:39 <boris-42> jog0 it will be much easier
15:24:42 <PhilDay> A single scheduler that can also know about Network locatilty (from Quantum) and Volume locality (from Cinder) ?
15:24:48 <boris-42> yeah
15:24:52 <boris-42> PhilDay yes
15:24:58 <boris-42> philDay and is actually scalable
15:25:18 <boris-42> PhilDay it is very useful in a lot of cases
15:25:36 <boris-42> PhilDay for example you are runing cinder and nova on each host
15:26:11 <boris-42> PhilDay and would like to schedule you instance with block device with size of 200GB and ensure that on that host you have enough of free disk in cinder=)
15:26:36 <jog0> boris-42: I like the idea too, but doing it requires careful consideration to make sure it doesn't couple the assorted projects too much.
15:26:44 <jog0> also is this in the new document?
15:26:53 <boris-42> jog0 sorry not ready yet
15:27:02 <PhilDay> One other thing that's at the back of my mind (but I haven't done much thinking about it) is what it would take to plug in a third party scheduler (like say MOAB)  - having only an RPC interface might make that simpler I guess
15:27:16 <boris-42> jog0 but iour goal is to finish all this things before summit and doc also
15:27:47 <n0ano> PhilDay, is a 3rd party scheduler really that necessary?
15:27:51 <jog0> PhilDay: that is a good question, most of this discussion is around we only have one scheduler
15:28:01 <boris-42> jog0 PhilDay it is really huge change (not in lines) but in approach. So I agree that we should really carefully discuss all this things
15:28:42 <jog0> n0ano: some people may want to use other information to schedule on, and simpler scheduler etc
15:29:07 <boris-42> PhilDay nano I don't see very is complexity of our approach?
15:29:18 <n0ano> jog0, I would hope that the extensibility we've already built into the scheduler is sufficient for 99% of the users
15:29:42 <PhilDay> Not necessary, and I wouldn't do that in favour of having all of these features in Openstack - but it is something that comes up from time to time in conversation with customers wanting to build thier own clouds.
15:29:48 <boris-42> PhilDay one simple scheduler that have small amount of methods (run_instance, migrate, cinder scheduler methods)
15:29:56 <boris-42> and one another method
15:30:01 <boris-42> that update host_state
15:30:12 <boris-42> and could be called from different serviesec
15:30:33 <PhilDay> Got to dive for another call - sorry
15:30:40 <boris-42> PhilDay good luck
15:30:41 <jog0> PhilDay: bye
15:30:54 <jog0> boris-42: in short this is a big change, huge infact
15:30:58 <n0ano> all ARs to to PhilDay :-)
15:31:04 <debo_os> sorry for joining late but just like we decided to do a separate network service, I dont see why we cant have a plugabble scheduler service
15:31:26 <n0ano> it already is, you can select from multiple scheduler right now
15:31:28 <boris-42> jog0 I agree that it is big change in approach, but small in LOCs=)
15:31:34 <boris-42> jog0 and could be done step by step
15:31:43 <jog0> LOCs don't matter in this
15:31:44 <boris-42> jog0 but first step should be done only in I cycle
15:32:01 <boris-42> jog0 I find your current job great
15:32:04 <boris-42> jog0 for H cycle
15:32:06 <jog0> there was a BP to do this a while backbut it got stalled
15:32:15 <debo_os> n0ano: however the state management is not pluggable yet
15:32:27 <jog0> boris-42: I do like this proposal, I am just saying it is tricky
15:32:33 <n0ano> debo_os, hence the discussion here
15:32:49 <jog0> I would recomend drafting up an early idea and putting it to the ML along with an outline of what you think
15:32:54 <debo_os> n0ano: apologies for joining late hence might sound repetitive :)
15:32:58 <jog0> along with any history of why tried it before
15:33:02 <boris-42> jog0 https://blueprints.launchpad.net/nova/+spec/no-db-scheduler
15:33:03 <n0ano> debo_os, NP
15:33:20 <boris-42> jog0 Ok will be done soon
15:33:44 <debo_os> in addition to the discussion ... one of my colleagues had written up a doc for the last summit and socialized it .. https://docs.google.com/document/d/1cR3Fw9QPDVnqp4pMSusMwqNuB_6t-t_neFqgXA98-Ls/edit#
15:34:13 <n0ano> boris-42, looks like your BP is mainly just a link to your doc
15:34:20 <debo_os> there was some good feedback and folks told him to get back a little later
15:34:22 <jog0> the main question is the mechanics of adding a new contracttual API for all projects that wires to the scheduler
15:34:30 <boris-42> n0ano yes
15:34:33 <debo_os> its a little like boris's doc
15:34:50 <boris-42> n0ano because in doc is described a lot of
15:34:54 <debo_os> boris-42: should we try to merge the 2 proposals
15:35:07 <boris-42> debo_os as I say in email yes of course
15:35:14 <boris-42> debo_os they are really close
15:36:46 <jog0> debo_os: don't use the word orchestrator in your proposal its an overloaded word
15:36:52 <boris-42> =))
15:36:53 <jgallard> sorry for joining the conversation late, but, I like the idea of having a kind of Scheduler as a Service
15:37:28 <jog0> debo_os: also the doc needs an abstract/summery
15:37:34 <debo_os> jog0: agreed. I need to clean it up since my colleague wrote most of it and now left the OS world ...
15:37:37 <jog0> its TL;DR for me, skimming hte slides
15:37:37 <n0ano> ignoring current proposals I still don't see how to resolve the question - which is more scalable DB vs. RPM?
15:37:50 <debo_os> gr8 feedback
15:37:52 <boris-42> n0ano RPM
15:37:54 <n0ano> s/RPM/RPC
15:37:58 <boris-42> rpc*
15:38:10 <boris-42> for example
15:38:15 <boris-42> we have 10k nodes
15:38:29 <boris-42> we need to produce only 150req/sec
15:38:32 <boris-42> to all schedulers
15:38:46 <boris-42> so 150/SCHEDULER_AMOUNT
15:38:47 <boris-42> in sec
15:38:55 <jog0> n0ano: IMHO neither
15:39:07 <boris-42> even if you have only 3 schedulers for 10k nodes
15:39:09 <debo_os> boris-42
15:39:22 <boris-42> you will have to process only 50 req/sec
15:39:26 <boris-42> and it is nothing
15:39:35 <jog0> but then again we don't need to hash this out right now
15:39:44 <debo_os> boris-42: lets work to merge the 2 proposals. For starters we can add this doc for reference too
15:40:14 <boris-42> debo_os Ok it will be easier to merge it through emails then IRC chat =)
15:40:15 <n0ano> we don't have to answer now but I would like to know `how` to come to an answer, right now we're kind of in a `he said, she said' situation
15:40:30 <debo_os> boris-42: agreed! lets work to merge the 2 proposals over emails
15:40:32 <boris-42> n0ano we will make real benchmarks
15:40:38 <boris-42> n0ano on real deplyouments
15:40:48 <boris-42> n0ano will be it enough for you?
15:41:02 <boris-42> debo_os As I said "nods"
15:41:07 <n0ano> boris-42, I think we need that, measureable and reproducible would be great
15:41:39 <jog0> n0ano: ++ and any proposed idea has to show its better then the existing and why its better then other options
15:41:40 <boris-42> n0ano yes we are going to create some new project so everybody will be able to reproduce these things
15:42:02 <n0ano> Well, I'm hearing some actions out of all of this:
15:42:16 <n0ano> #action boris-42 & debo_os to merge proposals
15:42:42 <n0ano> #action come up with benchmark to measure DB vs. RPC scalability
15:42:42 <boris-42> yeha
15:42:50 <boris-42> n0ano
15:42:51 <boris-42> no
15:42:57 <boris-42> we are building whole system
15:43:01 <boris-42> to test real openstack
15:43:10 <boris-42> not just this case
15:43:44 <doron> guys, if I may jump in for a sec. Is there a reason to rule out an in-memory solution?
15:43:53 <n0ano> hmmm, notic `reproducible', if not then we're just providing anecdotal input
15:44:08 <debo_os> doron: not at all
15:44:11 <doron> ie- I agree db is problematic, but RPM will have it's price.
15:44:18 <debo_os> thats why we need to define APIs 1st instead of implementation
15:44:21 <doron> (RPC)
15:44:29 <debo_os> hence boris-42 and I need to merge the proposals ....
15:44:43 <boris-42> doron didn't understood question
15:45:05 <debo_os> doron: ideally if you run the scheuler as a service you could swap out the implementation and have an in memory solution for the state management
15:45:22 <doron> boris-42: did you consider of storing the needed data in-memory instead of a DB?
15:45:30 <boris-42> doron we will use in memory key-value storage
15:45:38 <boris-42> doron to avoid scheduler fanout
15:45:50 <doron> boris-42: gr8. this is what I had in mind
15:46:12 <boris-42> doron so each request to update host state will be processed only by one scheuler
15:46:35 <boris-42> doron this allows us to solve problem with too much for one scheduler rpc =)
15:46:48 <boris-42> just adding another schedulers in system
15:46:55 <doron> makes sense. I'll go over your merged doc
15:46:56 <debo_os> boris-42: I guess if we define the crisp update APIs etc ... the implementation could be separated and we will have all teh scaling featuers you want to do .....
15:47:16 <boris-42> debo_os>
15:47:27 <boris-42> debo_os we already implemented this part of our scheduler
15:47:46 <boris-42> debo_os switch from Nova.DB to scheduler.DB
15:48:06 <doron> I'll take a look. sorry for the noise.
15:48:17 <boris-42> doron ok I think we will publish soon code
15:48:25 <boris-42> so I will add you as reviews
15:48:33 <doron> thanks!
15:48:50 <debo_os> boris-42: gr8
15:49:13 <n0ano> boris-42, now I'm confused, are you proposing that the scheduler implement it's own private DB
15:49:26 <boris-42> n0ano yes
15:49:32 <n0ano> I thought it was just maintaing state in it's memory
15:49:49 <boris-42> n0ano that produce fanout
15:50:00 <boris-42> n0ano and we spoke with Mike from BlueHost
15:50:32 <boris-42> n0ano and he said that it will be better to use fast key-value storage such as memcached
15:51:09 <n0ano> then it's not really a DB, it's just a backup for the internal memory storage
15:51:23 <boris-42> n0ano as I said we haven't enough time to update our docs, and they are updated today
15:51:33 <boris-42> n0ano we don't need "real" DB
15:51:40 <boris-42> n0ano for temp data
15:51:48 <doron> boris-42: +1 on no need for real db.
15:51:55 <jog0> if this isn't ready for review/discussion why are we here?
15:52:15 <boris-42> jog0 I am just answering on question
15:52:29 <boris-42> s/question/questions
15:52:42 <n0ano> jog0, I thought we were farther along and I wanted the answer to how do we decide DB vs. RPC
15:53:02 <boris-42> =)
15:53:42 <boris-42> sorry guys for misunderstanding =)
15:53:51 <n0ano> so, it's getting late, looks like boris-42 & debo_os need to update the doc, when that is done we can re-visit this
15:54:04 <boris-42> n0ano it is already updated
15:54:17 <n0ano> boris-42, what about merge with debo_os ?
15:54:29 <boris-42> n0ano I mean about using DB in scheduler
15:54:34 <boris-42> n0ano sorry=)
15:54:48 <boris-42> n0ano ok due next session we will update new combined doc
15:54:56 <n0ano> boris-42, NP
15:55:02 <boris-42> And I agree with jog0 that we should discuss in this moment about Havana
15:55:03 <boris-42> work
15:55:06 <boris-42> not I
15:55:29 <n0ano> let's see where we are next week, especially with an eye to what do we need/want to do for Havana
15:55:56 <n0ano> #topic opens
15:56:07 <n0ano> any opens in the remaining few minutes?
15:57:12 <n0ano> hearing silence, I'll thank everyone
15:57:20 <n0ano> #endmeeting