14:59:45 #startmeeting scheduler 14:59:46 Meeting started Tue Jun 11 14:59:45 2013 UTC. The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:59:47 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:59:50 The meeting name has been set to 'scheduler' 15:00:01 show of hands, anyone here for the scheduler meeting? 15:01:13 hi guys 15:02:38 hmm, slow start today, I'm getting a lack of enthusiasm from the crowd :-) 15:03:05 I'm here - what do you want to talk aboui ? 15:03:29 I was hoping to talk about the scaling issues that jog0 brought up last week. 15:04:04 Hi I'm new to this group but I'll be joining in 15:04:23 pmurray, NP welcome 15:04:42 Did jog0 have specific issues that he'd seen - or was it a general question ? 15:04:55 #topic scheduler scaling 15:05:15 unfortunately, he brought it up in the last 5 min. so we don't have a lot of detail... 15:05:56 Are you here jog0 ? 15:05:59 the basic issues was BlueHost created a 16K node cluster, discovered the scheduler was not working and removed it in favor of a totally random scheduler 15:06:48 Bluehost have a very specific use case though - they are in effect a trad. hosting company, and so they can in effect hand place thier VMs 15:06:51 I don't believe they did a thorough analysis of what was wrong but the guess would be the scheduler dealing with all the compute node updates. 15:06:56 this is interesting. it will be great if we can get more details on the bottleneck 15:07:35 senhuang, that was my thought, I'd like to know what is wrong to see if there's an implementation issue or something needs to be re-archtected 15:07:44 So they didn't need a rich scheduler. I didn't get the imporession that they spent long tying to work out the issues 15:07:55 n0nao: yes. agreed 15:08:00 Most of thier effort went into DB access 15:08:24 PhilDay, I'm not that concerned with BH's specific use case, if there is a scaling issue I'd like to analyze it and see what can be done. 15:08:47 So I think it would be wrong to conclude from Bluesacle that there is a specific scale issue 15:09:41 PhilDay, possibly, but it is a data point, do we know of any more traditional cloud use cases that have scaled beyond what the scheduler can handle? 15:09:42 PhilDay: what is special about a trad. hosting company in terms of scheduling? 15:10:04 I did find one issue this week that I'm working on a fix for - which is that some filters really don't need to run for each instance in request - e.g. no need to evaluate the AZ filter 100 times for 100 instances - esp as it currently makes a db query for each host 15:10:37 So I'm looking at making filters able to declare if they need to be run for each instance or just once 15:11:17 compute_filter is another that it doesn't make sense to run multiple times 15:11:18 PhilDay, good idea but I'd be more concerned about compute node updates, seems like that would be an on-going overhead that could cause scaling issues. 15:12:00 also, I thought there was an effort to create a DB free compute node, is that still a goal or have we dropped that idea? 15:12:42 As I understand it there are two update paths (not sure why). The hosts send updates on capabitlies via mesages to the scheduler, but the resource counts are still updated via the DB. 15:13:10 we should really consolidate one or the other, two paths seem silly 15:13:29 Not clear to me that there is value in the capability update messages as they stand, as they are pretty much fixed data. You can filter the rate at which they send the updates. 15:13:32 hi, sorry for joing late 15:13:44 also, not sure why capabilities are periodically sent since they are static 15:13:49 PhiDay: maybe the update is another way of heartbeat? 15:13:53 Its two different sets of data; capabilites and capacity 15:14:08 I'd prefer to remove the DB update and put everything in the message to the scheduler 15:14:09 Maybe, but its not used as such as far as I can see 15:14:28 agreed. basically qualitative and quantitative capabilities 15:15:26 If you do it all in messages then you need some way for a scheduler to know at start up when it has all of the data 15:15:27 but it does create an obvious scale issue, have all compute nodes send a message with static data to the scheduler seems a little silly 15:16:06 PhilDay, how would that be different from getting the same data (possibly incomplete) from the DB 15:16:19 I could be wrong - but that was my reading of the code. For sure the host manager view of capacity is read from the DB at the start of each request 15:17:15 wouldn't be that hard to have the scheduler ignore nodes that haven't reported capacity yet 15:17:53 But when your trying to stack to teh most loaded host you coudl get soem very wrong results durign that start up stage 15:18:35 At least with teh DB you get the full scope of hosts, even if the data is stale. And stale data is handled by the retry 15:18:41 PhilDay, but is a non-optimal scheduling decision, only during startup, that big a problem. 15:19:14 Depends on your use case I guess ;-) I'm sure there wil be someone its a problem for 15:19:56 I think we have to be wary of trying to design out scale issues that we don't know for sure exist 15:20:09 I don't know, stale (e.g. incorrect data) in some sense is even worse to me than no data at all. 15:21:18 bottom line, 2 mechanisms (message & DB) seem wrong, we should pick one and use that 15:22:58 Probably need to start with a post in openstack.dev to see if someone can explain why capabilities are sent by messages 15:23:14 PhilDay, good idea 15:23:37 #action n0ano to start thread on openstack-dev about messages to scheduler 15:24:16 Perhaps the generic host state BP is the right place to mop up any changes around this ? 15:25:02 potentially, I'm interested in the area so I can look at that BP and see what's appropriate, do you have a specific link to it? 15:25:21 https://blueprints.launchpad.net/nova/+spec/generic-host-state-for-scheduler 15:25:34 tnx, I'll check it out 15:26:15 we've talked about compute node capacity updates, are there any other obvious scaling issues in the scheduler we can think of? 15:28:14 then, without any empirical data (like what BH was seeing), we'll have to accept what the scheduler is doing so far. 15:29:04 #topic DB free compute node 15:29:18 * n0ano this is fun, being the chair means I can set the topics :-) 15:29:42 I thought there was a goal for this at one point in time, is that still true? 15:31:45 hmm, hearing silence, I guess I'll have to bring this up on the mailing list 15:31:53 #topic opens 15:32:05 Anyone have anything new they want to bring up today? 15:32:11 join #openstack-cinder 15:32:39 haomaiwang, forgot the `/' on that join :-) 15:32:51 n0ano: the instance groups is coming along nicely and would be nice if we can get some help with the reviews 15:33:22 garyk, sure, you got a pointer I can add to ping people 15:33:47 n0ano: that would be great - https://blueprints.launchpad.net/openstack/?searchtext=instance-group-api-extension . hopefully by the end of the week we'll have the cli support too 15:34:23 all - if you got the time, try and give a review on this 15:34:53 n0ano: thanks 15:36:09 I'm hearing silence so I think we'll close a little early this week (more time for reviews :-) 15:36:44 i need to run to swap the babysitter. sorry 15:37:07 OK, tnx everyone and we'll type at each other next week 15:37:12 #endmeeting