#openstack-meeting log

15:00:49 <n0ano> #startmeeting gantt
15:00:50 <openstack> Meeting started Tue Aug 26 15:00:49 2014 UTC and is due to finish in 60 minutes.  The chair is n0ano. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:51 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:53 <openstack> The meeting name has been set to 'gantt'
15:01:03 <n0ano> anyone here to talk about the scheduler?
15:01:04 <bauzas> \o
15:01:24 <mspreitz> yes
15:02:07 <n0ano> looks like I missed a lively meeting last week but we'll get to that later
15:02:49 <bauzas> n0ano: yeah, last week one was good to discuss
15:03:10 <jaypipes> o/
15:03:19 <n0ano> let's get started, maybe others will join in
15:03:31 <n0ano> #topic resource model for scheduler
15:03:36 <bauzas> sure
15:03:40 <bauzas> sooooo
15:03:48 <n0ano> bauzas, jaypipes you said you were going to look at this, any progress?
15:04:02 <bauzas> n0ano: I thought about the sched split yeah
15:04:18 <bauzas> n0ano: so, maybe let me explain why this action item is there
15:04:32 <bauzas> n0ano: so we could discuss about the progress after that
15:04:56 <bauzas> n0ano: so, basically, the last week discussion was about how bad the scheduler was about updating stats
15:05:47 <bauzas> atm, resources are passed to the sched each 60 secs by writing into compute_nodes DB table some JSON fields called "resources"
15:06:19 <bauzas> long story short, we thought it was needed to give a more API for scheduler updates
15:06:42 <bauzas> so, the proposal is re: scheduler-lib patch and what is passed now
15:07:02 <bauzas> the idea is to make use of the next method that will be provided thanks to the patch
15:07:05 <bauzas> hold on
15:07:16 <bauzas> https://review.openstack.org/82778
15:07:29 <bauzas> https://review.openstack.org/82778 is the API proposal for scheduler updates
15:07:53 <bauzas> so, here we provide a JSON field called 'values'
15:08:45 <bauzas> based on last week discussion, we identified the need to keep that method but provide high-level Objects instead of these JSON blobs
15:08:51 <bauzas> so the plan is
15:08:56 <bauzas> 1/ merge that patch
15:09:23 <bauzas> 2/ provide a change for providing ComputeNode object instead of values JSON field into that method
15:09:50 <bauzas> that requires some work on ComputeNode object, ie. making sure that it's correct
15:10:04 <bauzas> the main pain point is the Service FK we have on that object
15:10:39 <bauzas> hence I owned a bug created by jaypipes for cleaning up CN : https://bugs.launchpad.net/nova/+bug/1357491
15:10:40 <uvirtbot> Launchpad bug 1357491 in nova "Detach service from compute_node" [Wishlist,Triaged]
15:10:50 <bauzas> sooooo
15:11:20 <n0ano> a couple of issues come to mind - 1) does this require a change to the DB (which currently holds that JSON string) and 2) how extensible is the new method (I know of changes bubbling underneath related to enhanced compute node stats)
15:11:34 <bauzas> once we have ComputeNode passed instead of arbitrary JSON field, we should think on how provide other objects if needed for filters
15:12:03 <bauzas> n0ano: about 1/
15:12:24 <bauzas> n0ano: there should be a change about FK on service_id which will be deleted
15:12:31 <bauzas> n0ano: apart from that, no changes on DB
15:13:08 <bauzas> n0ano: because instead of calling db.compute_update, we would be issue compute_obj.save()
15:13:13 <bauzas> which is by far better
15:13:18 <n0ano> so, rather than passing a JSON string in `values' we pass a ComputeNode object (that contains that same JSON string)
15:13:48 <bauzas> n0ano: the main difference being that's not arbitrary fields but versioned and typed ones
15:14:05 <bauzas> jaypipes: your thoughts on that ?
15:14:48 <n0ano> I'm not objecting actually, it seems we're just making the API heavier weight for minimal gain but it still works and if everyone thinks its better that's fine
15:15:59 <bauzas> n0ano: yeah, the problem is that we discovered some problems with the current situation
15:16:35 <n0ano> I think both my points don't really apply, the DB will be the same and you can extend things by changing the `resources' string in the ComputeNote object, different way of doing the same thing
15:16:37 <bauzas> n0ano: for example, with NUMA patches, ndipanov had to convert back from JSON to an object, it was a pure hack
15:17:26 <bauzas> n0ano: the extent of ComputeNode is yet to be discussed
15:17:38 <jaypipes> bauzas: sorry, on phone...
15:18:09 <n0ano> jaypipes, can you scroll back, do we match your thinking?
15:18:11 <bauzas> n0ano: from my perspective, we should say that ComputeNode could have some dependent classes
15:18:41 <bauzas> n0ano: one other point while jaypipes is on the phone, we discussed about the claim thing
15:18:59 <n0ano> just remembering that we have to consider how any changes to the ComputeNode object will be reflecting in the comput_node table in the DB
15:19:03 <bauzas> n0ano: wrt a very good paper I recommend, I'm pro having a in-scheduler claim system
15:19:18 <bauzas> n0ano: that's already tied
15:19:39 <bauzas> n0ano: have you seen the link I provided about this paper, the Omega one ?
15:19:50 <n0ano> that was my concern, change the ComputeNode table implies changing the DB
15:20:05 <n0ano> haven't read that yet, I saw the link, I'll read it
15:20:33 <n0ano> in general I agree, I think the scheduler is the right central place to track resources
15:21:06 <bauzas> n0ano: http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf
15:21:42 <bauzas> n0ano: that one is basically saying that an optimistic scheduler with retry features is better than a divide-and-conquer scheduler
15:22:00 <bauzas> n0ano: in terms of scalability
15:22:21 <n0ano> well, I thought we currently `had` an optimistic with retries :-)
15:22:44 <bauzas> n0ano: right but with a slight difference about the claiming thing
15:22:57 <bauzas> n0ano: in Omega, claiming is done on a transactional way
15:23:20 <bauzas> n0ano: here that's a 2-phase commit
15:23:48 <bauzas> n0ano: that said, I think the most important problem for the split is about the API
15:24:00 <bauzas> n0ano: hence the work to provide clean interfaces
15:24:05 <n0ano> indeed +1
15:24:14 <bauzas> n0ano: kind of a follow-up of scheduler-lib
15:24:34 <n0ano> seems to me that just changing to the ComputeNode object shouldn't be that big a change
15:24:40 <bauzas> n0ano: think so too
15:24:59 <bauzas> n0ano: tbh, that's even part of the move-to-objects effort
15:25:21 <n0ano> so, back to mechanics, the plan is to push the current sched-lib patch and then change it to use an object - right?
15:25:26 <bauzas> +1
15:25:53 <bauzas> that still doesn't mean we agreed on how to update filters for isolate-sched-db :)
15:26:19 <bauzas> ie. I think we need to make use of that API instead of the ERT
15:26:28 <n0ano> right about now just getting the sched-lib resolved seems like a major accomplishment :-)
15:26:51 <n0ano> let's segue into the next topic
15:26:56 <bauzas> jaypipes was having some concerns about the naming of such a method :)
15:27:02 <bauzas> +1
15:27:06 <n0ano> #topic forklift status
15:27:19 <bauzas> (at least once jaypipes is freed from his phone :) )
15:27:23 <bauzas> soooo
15:27:32 <bauzas> I think we covered the first bit of the split
15:27:33 <n0ano> seems to me the isolate-sched-db is hung up on the ERT discussion, is there anyway to resolve that?
15:28:07 <bauzas> n0ano: by speaking of that, I think we should not rely on ERT for providing that
15:28:18 <bauzas> n0ano: but on scheduler-lib and objects instead
15:28:41 <n0ano> if there's a way to be independant from ERT I'm +2 for that
15:29:00 <bauzas> n0ano: the way has to be designed yet still :)
15:29:49 <n0ano> well, we kind of have to, what do we do if they decide to revert ERT out?
15:30:45 <n0ano> the other option is we code to the current interfaces (e.g. we use ERT) and only change if ERT is changed
15:31:09 <bauzas> n0ano: well, I think we need to think what's a resource
15:31:23 <bauzas> n0ano: here I'm saying that a resource is a ComputeNode object
15:31:57 <bauzas> n0ano: if we want to claim things, that should be on the ComputeNode object too
15:32:26 <n0ano> bauzas, that's kind of a high level view, I would call resources many of the things encapsulated by the ComputeNode object
15:32:51 <bauzas> n0ano: that's where I disagree
15:32:54 <n0ano> I guess the question is how coarse can the resources be
15:33:15 <bauzas> n0ano: IMHO, we should provide a claim class per object
15:33:39 <bauzas> n0ano: ie. "I want to claim usage for ComputeNode"
15:33:56 <bauzas> n0ano: but I can also "claim usage for an Aggregate"
15:34:37 <n0ano> but you don't claim the entire ComputeNode, I want to claim 2G of mem from the node and 2 vCPUs and so on, hence you need a finer granularity
15:34:55 <bauzas> n0ano: the idea behind that is that the computation of the usage is done on the object itself, so it can be shared with RT until we give that to the scheduler
15:35:32 <n0ano> not following, how do I claim that 2G of mem
15:36:31 <bauzas> n0ano: well, you're probably right, it would be a 1:N dependency
15:37:04 <bauzas> n0ano: ie. a ComputeNode object could have a ClaimCPU object, a ClaimMemory etc.
15:37:23 <n0ano> claiming the ComputeNode object is simpler so I'd accept it as a start but, ultimately, I think we'll want finer control
15:38:11 <bauzas> n0ano: well, the outcome of this is to have a compute_obj.cpu.claim(usage) method but you get the idea
15:38:11 <mspreitz> n0ano: finer in what way?
15:38:54 * n0ano screams at X window, let my keyboard talk :-)
15:39:37 <n0ano> mspreitz, rather than claiming an entire compute node object claim 2G from that node and 2 vCPUs and so on
15:39:57 <mspreitz> I thought that's what bauzas is saying
15:40:18 <bauzas> n0ano: to be precise, I don't like the word "claim"
15:40:43 <bauzas> n0ano: I prefer compute_obj.make_use_of()
15:41:02 <mspreitz> bauzas: who quantifies how much usage?
15:41:09 <bauzas> so what you "claim" is a subset of the resource itself
15:41:34 <bauzas> mspreitz: atm, that's compute based on request_spec
15:41:57 <bauzas> mspreitz: it will probably be the scheduler wrt request_spec in the next future
15:42:17 <mspreitz> how would the scheduler modulate what is in the request spec?
15:42:22 <n0ano> bauzas, so the compute node would call compute_obj.make_use_of() reserve resources, is that the idea
15:42:35 <bauzas> n0ano: that's from my mind, correct
15:42:37 <n0ano> bauzas, and then that would have to be sent to the scheduler via an API
15:43:09 <bauzas> n0ano: correct too, until scheduler calls directly that method
15:43:14 <n0ano> bauzas, and would you reserve multiple resources with one call or have to make a separate call for each resource
15:43:51 <bauzas> n0ano: well, you go into the details where that still WIP in my mind :)
15:44:10 <n0ano> sorry, just doing a mind dump here :-)
15:44:16 <bauzas> I'm just thinking of aggregates and instances here
15:44:43 <bauzas> or NUMATopology
15:44:44 <n0ano> there are details to be worked out but, as long as the ability to reserver specific resouces is there, I'm OK with it
15:44:51 <mspreitz> those of us who want to make joint decisions also want to make joint claims
15:45:25 <n0ano> mspreitz, hence my question about whether multiple calls are needed
15:45:39 <mspreitz> right, I did not see a clear answer
15:45:54 <bauzas> frankly, I don't think it's here yet
15:45:56 <mspreitz> I was hoping for an affirmation that this is a goal
15:46:08 <n0ano> mspreitz, I don't think we have one yet, this is just bauzas thinking for the future
15:46:29 <bauzas> jaypipes: still otp ?
15:46:53 <n0ano> mspreitz, I agree with the goal of supporting joint decisions, I don't want to do anything that would preclude that
15:48:27 <n0ano> well, back to immediate concerns, how do we proceed with the isolate-sched-db?
15:49:46 <bauzas> n0ano: I think we still need to see how community is thinking about ERT
15:49:51 <bauzas> n0ano: and leave the patches there
15:50:15 <bauzas> n0ano: but in the mean time, I need to carry on the move to ComputeNode for updating stats and think about the alternative
15:50:51 <n0ano> OK (maybe), I don't like treading water but I guess we can, hopefully the ERT will be decided soon (it better be_
15:50:57 <bauzas> n0ano: anyway, the spec is not merged so we can't say "eh, that was validated before so that needs to be there"
15:51:22 <bauzas> n0ano: PaulMurray is still on PTO until end of this week AFAIK
15:51:32 <n0ano> bauzas, yeah, I rattled some cages but didn't get a response, at least it hasn't be rejected
15:52:01 <bauzas> n0ano: you can voice your opinion there
15:52:04 <n0ano> really?, I was hoping ERT would be decided this week, now we have to wait until next week, sigh
15:52:12 <bauzas> https://review.openstack.org/115218 Revert patch for ERT
15:52:50 <bauzas> yay, that's the price to pay for having a dependency on such a new feature :)
15:52:55 <n0ano> yeah but nothing is going to happen until Paul gets back, that's the important thing
15:53:00 <bauzas> +1
15:53:04 <n0ano> sigh
15:53:06 <n0ano> moving on
15:53:09 <bauzas> n0ano: hence my work on ComputeNode
15:53:11 <n0ano> #topic opens
15:53:19 <n0ano> anything new anyone?
15:53:32 <bauzas> I'll probably be on PTO end of this week
15:53:43 <mspreitz> I think there's a flaky CI
15:53:49 <bauzas> and maybe the begining of next week
15:54:01 <bauzas> mspreitz: ie. ?
15:54:05 <n0ano> isn't there a bauzas 2.0 scheduled soon :-)
15:54:14 <bauzas> n0ano: bauzas 3.0 tbp
15:54:20 <mspreitz> check-tempest-dsvm-full
15:54:34 <bauzas> n0ano: coming in theaters end of this week
15:54:45 <n0ano> congratulations and good luck
15:55:00 <n0ano> mspreitz, I would imagine that will be a topic for the nova meeting this week
15:55:09 <mspreitz> ok, thanks
15:55:30 <bauzas> mspreitz: well, you can at least check if a bug has been filed
15:55:38 <bauzas> and create it if so
15:55:42 <mspreitz> yeah, haven't had a chance to do that yet
15:55:54 <mspreitz> hope to get to it
15:55:56 <mspreitz> soon
15:56:05 <bauzas> mspreitz: that's the most important thing, because it needs to be categorized
15:56:25 <bauzas> so, rechecks can be done on that bug number and that could get a trend
15:56:41 <bauzas> mspreitz: you can also try logstash to see the frequency of your failure
15:57:09 <mspreitz> bauzas: oh, what's that?
15:57:22 <n0ano> I pretty much just blindly recheck once, only if I get a second failure on code I think is good do I look for a CI issue
15:58:17 <bauzas> dammit, I would recommend a training on the next Summit for you :)
15:58:19 <bauzas> https://wiki.openstack.org/wiki/ElasticRecheck
15:58:43 <bauzas> and in particular
15:58:43 <bauzas> https://wiki.openstack.org/wiki/GerritJenkinsGit#Test_Failures
15:58:51 <mspreitz> thanks, I'll read that again
15:59:18 <n0ano> OK, top of the hour, I'll thank everyone, talk again next week
15:59:22 <n0ano> #endmeeting