15:01:00 #startmeeting scheduler 15:01:01 Meeting started Tue Nov 26 15:01:00 2013 UTC and is due to finish in 60 minutes. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:03 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:05 The meeting name has been set to 'scheduler' 15:01:38 hi 15:01:43 bauzas, welcome (we don't bite - much :-) 15:01:56 n0ano: thanks :) 15:02:51 I sent out an agenda but most of the people that are concerned with those items aren't here yet 15:03:51 Given the US holiday this week this meeting might be a bust 15:04:04 I see Boris with memcached based 15:04:09 Yathi with instance group 15:04:09 hi all 15:04:19 collins cannot join 15:04:29 boris doesn't appear to be online and yathi hasn't said anything 15:04:38 what is "black box scheduler" ? 15:04:44 hi 15:04:50 Hi 15:05:29 a session from the summit, basically allow the system to use a `black box' scheduler, put in the data and the black box gives the scheduling answer 15:05:56 this is the same thing as "scheduling as a service" ? 15:06:04 How is BB sched different from plugging in a custom scheduler? 15:06:07 n0ano: is it related to the scheduling-as-a-service thing ? 15:06:07 is it the session we proposed ? - smart resource placement ? 15:06:14 jgallard: :) 15:06:17 :) 15:06:38 jgallard, I don't think so, saas is move the scheduler into a separately addressable service, black box is changing the internals of the scheduler 15:06:40 garyk are you on? 15:07:23 I think a "black box" scheduler would have to be a new scheduler that's plugged in rather than filter_scheduler. There would have to be a compelling reason for a deployer to use it 15:07:25 hi, sorry, was on a call 15:07:28 Yathi, I believe the BB was from the rethinking scheduler design session 15:07:33 n0ano: ok, thanks for the clarification 15:07:53 do we have a etherpad on BB? 15:08:16 one was started at the summit, it should still be there 15:08:36 #topic black blox scheduler 15:08:49 toan-tran: let me try and look up lifeless's etherpad on the scheduling 15:09:06 garyk: thanks 15:09:24 alaski, yes, I was worried about throwing out the baby with the bath water with this proposal 15:09:29 It will be good to have the link... for all the session etherpads.. I seem to have lost it 15:09:55 https://wiki.openstack.org/wiki/Summit/Icehouse/Etherpads#Nova 15:10:01 the current filter scheduler has some scaling concerns, I don't know that we have to throw it away completely to address them. 15:10:10 I do not see the words "black box" on that index 15:10:19 MikeSpreitzer: thanks 15:10:25 here is the proposal - https://etherpad.openstack.org/p/icehouse-external-scheduler 15:10:57 MikeSpreitzer, that's my interpretation, that's probably not the exact words from the session but I think it describes it better 15:11:22 What garyk posted is Robert Collins' proposal 15:11:38 that's not "black box", that's code refactoring 15:11:44 n0ano: was it about extending the resource tracker ? 15:11:46 MikeSpreitzer: yes, that is correct. it seems be be gaining momentum 15:12:01 My understanding is that the first step will be code moving 15:12:12 Then there will be discussion how to make it into a service 15:12:33 garyk: agreed, that's the saas goal 15:12:34 garyk: neither of those is "black box", at least as the words are usually construed 15:12:44 :) 15:12:44 bauzas, it was to create a set of constraints that could be fed to an industry standard scheduler code 15:12:51 black box, if I remember about the rethinking scheduler design proposal - it is about the multiple scheduler threads 15:13:19 but can't recollect this being called as black box 15:13:23 n0ano: ah, so you talk about this one ? https://etherpad.openstack.org/p/IcehouseNovaExtensibleSchedulerMetrics 15:13:35 OK, maybe the problem is just bad wording on today's agenda 15:13:38 I think there was concern that a solver scheduler would be a black box 15:14:10 as part of the smart resource placement design session - we talked about Solver Scheduler - a constraint based solver 15:14:25 bauzas, no, that's not it either, let me look 15:14:28 a pluggable "black box" so to say! 15:14:43 Any replaceable module in a system is a black box in that sense, alaski, right? We define its interface, internals are private. 15:15:09 pluggable? plugged to what? 15:15:13 nova? 15:15:14 found it - https://etherpad.openstack.org/p/RethinkingSchedulerDesign 15:15:20 or openstack in general? 15:15:52 MikeSpreitzer: in a sense yes. But with the filter_scheduler it's easy to trace how it made its decision, a solver scheduler was considered a potential black box because there's not that same traceability 15:16:23 it's more about debugging issues when the scheduler doesn't return an answer you expect 15:16:30 ah yes, I remember that remark 15:16:37 exactly.. this issue was raised at the session 15:17:39 traceability may have to be introduced, probably with some logging if it is possible 15:17:50 But I'm not sure what to do with it. Are we to shy away from every computation that is not easy to reproduce in someone's head? 15:18:43 In my opinion, no. But it can't be the only, or default, option for Nova 15:19:00 because...? 15:19:20 because the default is used for gating code changes and traceability is a necessity 15:19:42 what exactly do you mean by "traceability"? 15:20:06 it believe even with filter scheduler - it is a list of filters 15:20:16 so some filters will fail and log ? 15:20:30 understanding why a scheduling decision was made. If Jenkins fails a gate check because it couldn't schedule an instance, I want to know why 15:20:39 thanks 15:20:52 alaski: we have logs on every filter 15:20:59 cant that help? 15:21:41 In my group's previous work, we developed a replay framework. Problem instances can be logged completely, and replayed into a test harness for debugging purposes. 15:21:57 Essentially, a formalized kind of log that can be replayed. 15:22:14 MikeSpreitzer, but how do you know the exact state that the system was in in order to replay things 15:22:34 The log contains all the relevant information. 15:22:35 toan-tran: I'm not sure if there are logs on every filter, but they can be added. And there is a blueprint for additional logging in the scheduler being worked on 15:23:07 alaski: at least the filter_scheduler says which filter returns which hosts 15:23:34 of course' it's inside the filer that we have to add log if we need more details 15:23:44 always remembering that logging adds overhead, we're already concerned about scheduler efficiency 15:23:50 we also have Error code, although not every detailed 15:24:03 toan-tran: right. The filter_scheduler is fine, the concern is regarding a potential new scheduler which is based on more complicated solving methods, or possibly even hueristics 15:24:34 and by fine I mean not too bad, it could certainly be better 15:24:41 alaski: aggreed 15:24:54 Regardless of decision method, same inputs apply, right? 15:25:25 Would it be OK to have a variable level of logging? Full in the gate, production might be less? 15:25:34 just to be clear.. the idea is not yet to replace Filter scheduler.. provide an additional option for a scheduler driver 15:25:50 MikeSpreitzer, I think that's an absolute requirement 15:25:52 is logging really expensive? I thought the issue was mostly the time to pull the data out of the database? 15:26:06 so basically we need a framework to write the new scheduler, some steps that it must voice the state? 15:26:17 Yathi: yes. 15:26:38 cfriesen, the logs have to be stored somewhere, we're already concerned about DB access, this would just make it worse 15:26:47 MikeSpreitzer: variable logging would be great 15:27:10 why not just stream the logs via syslog? 15:27:20 I think it is about enhancing a decision making engine to be able to clearly log which of the constraints did not satisfy 15:27:50 Yathi: getting a log of complete input is non-trivial 15:27:59 but necessary to replay and explain. 15:28:18 However, note that some serious guys do very extensive logging all the time 15:28:29 cfriesen, possible but one of the ideas is creating multiple schedulers, with multiples a single log point would be helpful although maybe I've overthinking things 15:28:36 Do I recall correctly that Google logs a lot all the time? 15:29:16 I guess I don't have anything else to add here at this point on the logging aspect 15:30:00 log is not good, we should think about creating info objects 15:30:11 I have experience with IBM products that offer variable level of logging. Our product guys love it. I hate it when called in to debug a customer problem, they always logged too little, so it always starts with "turn up the logging to XXX and then reproduce the problem" 15:30:12 I think we have a blueprint for that 15:30:25 toan-tran, not sure I understand what you mean about objects 15:30:48 toan-tran: https://blueprints.launchpad.net/nova/+spec/record-scheduler-information though it's still under discussion 15:31:08 MikeSpreitzer, but at least that's an option vs. no or minimal logging 15:31:44 yes 15:32:12 What we did at first is to make some of our optional logging have a very precise and parseable format, put all information on scheduler problems in there. 15:32:30 well, one take away from this seems to be a concensus that we need to consider logging, especially variable level 15:32:39 Later the product guys got interested in non-optional binary logging of structured data, but I'm not sure how far they have taken it thus far. 15:32:50 I don't know if there is any kind of loggin standard in OpenStack, anybody know? 15:33:28 openstack/common/log.py is what everything uses 15:33:29 alaski: this is what I'm talking about: https://blueprints.launchpad.net/nova/+spec/add-missing-notifications 15:34:02 I remember it has had more information than current version 15:34:22 russellb, which I believe puts everything in files on the local machine with not level capability 15:34:54 n0ano: there are level capabilities 15:35:20 and this one: https://blueprints.launchpad.net/nova/+spec/notification-compute-scheduler 15:35:20 alaski, which are setable from configuration files/run time? 15:35:27 and can use syslog 15:35:41 yes, you configure what levels you want logged 15:35:44 and where you want the logs to go 15:35:48 n0ano: you just set it explicitely 15:36:23 sounds like the infrastructure is there then, we just need to make sure all the filters use the logging services properly 15:37:05 And if we want to be able to debug scheduler decision making, "properly" means log all the relevant information at the chosen log level. 15:37:20 MikeSpreitzer, +2 15:37:28 s/+2/+1 15:38:00 Mike: +1 15:38:07 the question is , how we find "relevant"? 15:38:27 I only played with a global logger for the whole project, don't know if we can have a special logger for scheduling things 15:38:48 afaik, the logger is global to nova 15:38:59 bauzas, I would hope we don't need anything special, standard loggin services should be fine 15:39:06 the problem of text logging is that the developper of a scheduler can write anything in the log 15:39:19 which is not necessarily meaningful to others 15:39:22 n0ano: then you're fine 15:39:35 toan-tran: that's why I talk about a precisely defined format for the scheduler problem info 15:39:47 Mike: agreed! 15:40:03 should we create a log class for that? 15:40:11 put some structure into what is logged 15:40:56 That's probably a good idea, but I think there's more immediate work before that becomes a concern 15:41:02 I agree 15:41:15 some structure is good as long as there is the freedom to add other things that aren't part of the structure 15:41:43 toan-tran: there is no need for a log class 15:42:08 you just have to explicitely define which logger name you want 15:42:15 I'm feeling that someone needs to create a BP to propose some standardized logging for the current scheduler filters 15:42:16 In my group's work, we have an internal API to the solver, and it has simple style: input is a whole problem, output is a whole answer. It is pretty easy to do complete logging in that case. 15:43:06 We have not had to worry about alternate solvers or alternate schedulers. 15:43:09 Mike: is it possible to record the state of the system in the log? 15:43:39 MikeSpreitzer, the filter scheduler is kind of like that, input is the set of possible nodes and output is the set of acceptable nodes 15:43:39 Currently we log snapshots of the relevant state info. Alternatively the log could stream updates. 15:44:22 As alaski said, I think we have beat this horse enough for now. 15:44:55 Mike +1 15:45:06 agreed, since Yahti is here let's switch to 15:45:15 #topic instance groups 15:45:38 Yathi, do you have an update on this 15:45:58 garyk you want to say something 15:47:07 well, since one one say a word, I have a question :) 15:47:12 no major update as of now. But the plan after the summit was to continue the implementation on a simpler instance group model 15:47:15 looks like garyk got called away 15:47:25 if we intend to make it into nova 15:47:35 do we keep edge & policy? 15:47:41 a flat group model 15:47:55 we do not keep the edge 15:48:04 Yathi: +1 15:48:12 Yathi, I thought there was work needed on the V3 API, is that ongoing 15:48:19 what about policy, we don't have policy manager either 15:48:24 ? 15:48:38 yeah I believe it is part of the plan.. to complete what was pending from Havana time.. 15:49:16 work is needed for V3 API 15:49:44 do you think that will be controversial or should it be straight forward 15:49:52 * n0ano always worries about API changes 15:50:52 I've been playing with the current instance groups CLI and have some comments on usability--where do I send feedback? 15:51:01 we will sync up again with others - garyk, debo and discuss on the remaining tasks 15:51:42 cfriesen: I'm just a newbie here, my guess is the mailing list 15:51:57 please send it to - the dev mailer is the best 15:52:02 but you can talk to us now too! 15:52:36 there's a bunch of stuff I ran into...like it would be nice to accept human-readable group names in the commands rather than only the full group UUID 15:52:48 +1 in general on that 15:52:53 and to me it doesn't make sense to have an "instance-group-add-members" command where the member argument is optional 15:53:05 what does that even mean? :) 15:53:40 me steps back, waiting for someone who designed that API to answer 15:54:07 * MikeSpreitzer will eventually remember to type a slash before a command 15:54:20 sounds like no one wants to admit ownership, might need to ask that one on the dev mailing list 15:54:34 I think it is best to compile an email 15:54:56 okay, will do. 15:55:05 OK, time running down 15:55:09 #topic opens 15:55:21 sorry, i had internet problems. 15:55:31 anybody have any opens they want to raise in the few minutes we have available 15:55:39 instance group updates: have posted scheduler changes. pending api changes - debu will work on these next week\ 15:55:45 sorry for late update 15:55:50 garyk, NP, we didn't say too many bad things about you :-) 15:55:55 :) 15:56:09 garyk, yeah, that's what we got, pretty much WIP 15:56:40 any other opens 15:56:43 I'd like to discuss on SaaS 15:56:58 well, discuss on SaaS's discussion :) 15:57:03 you mean the external scheduler ? 15:57:09 yeah 15:57:17 we're running out of time 15:57:19 toan-tran, I would like to discuss it also but we'll need a full session for that 15:57:20 that might need a lot of time.. 15:57:32 Yathi, no might, it will take a lot of time. 15:57:42 :) 15:57:44 that's what I'm saying :) 15:57:56 how we organise discussion on SaaS 15:58:02 can we add this item in 1st for next week? 15:58:04 :) 15:58:11 +1 15:58:21 I was thinking there was a separate meeting on that point, non ? 15:58:23 no ? 15:58:36 I don't know where Collins live 15:58:37 next week, if possible, I'd like to get Boris on board to talk about memcached, that's the most important immediate topic, we can put SaaS as the 2nd priority 15:58:46 toan-tran: he lives in NZ 15:58:49 if he lives in UTC+13 15:59:02 ... 15:59:12 n0ano: sounds good 15:59:14 ok so we need another slot 15:59:19 n0ano: ok, great :) 15:59:25 that's what lifeless proposed 15:59:30 not scheduler meeting 15:59:36 we'll discuss further next week 15:59:40 tnx everyone 15:59:45 #endmeeting