14:59:50 #startmeeting gantt 14:59:51 Meeting started Tue Feb 17 14:59:50 2015 UTC and is due to finish in 60 minutes. The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:59:52 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:59:55 The meeting name has been set to 'gantt' 15:00:04 o/ 15:00:09 \o 15:00:16 hi, stepping up this time as n0ano can't make it, who's there ? 15:01:50 woah, please all don't speak at the same time :) 15:02:16 alex_xu and I were way ahead of you 15:02:26 o/ 15:02:38 ;à 15:02:40 :) 15:02:46 :) 15:03:02 so, let's start, so people could come 15:03:46 #topic Remove direct nova DB/API access by Scheduler Filters 15:04:10 So any ideas on the best way to represent version of the compute node? 15:04:13 so, https://review.opernstack.org/138444/ is updated very often, thanks edleafe 15:04:23 Since RPC version doesn't seem to fit 15:04:45 bauzas: I have another update waiting to go, after the results of this meeting 15:05:14 edleafe: here I was thinking a x.y.z version is good 15:05:38 it would seem that rolling compute node updates would have already been a problem 15:05:47 and that someone would have created a solution 15:05:58 edleafe: actually, the problem is that the scheduler is still really tied to Nova 15:06:09 bauzas: true 15:06:10 edleafe: but longer term, that would be an API 15:06:28 but in general, compute nodes are rolled out in bunches, not all at once 15:06:36 edleafe: so considering that's a separate API, in general, clients provide a backwards compatibility by discovering versions 15:06:44 so the issue of differing behavior should have been a problem at some time, no? 15:07:14 edleafe: here, on a longer term, we should imagine a Gantt client able to discover if the Gantt server is having the compute capabilitiy 15:07:45 edleafe: at the moment, as we don't have a discovery mechanism, that's just something we pass to the Scheduler 15:07:57 the reverse is also true 15:08:16 edleafe: reverse of what ? 15:08:31 if the gantt server is relying on outside entities behaving a certain way, it needs to be able to have a way to verify that 15:08:42 edleafe: that, I disagree :) 15:08:50 i.e., gantt server discovering its clients 15:08:54 edleafe: that's all about capabilities of a given API 15:09:18 edleafe: so, if Gantt wants to know Nova capabilities, it will run the Nova client which provides that backwards compatibiltiy 15:09:32 so you're saying that the gantt server will never be dependent on anything outside of itself? 15:09:44 edleafe: all the compatibility checks are usually done by the clients 15:09:53 edleafe: I'm not saying that :) 15:10:21 edleafe: I'm saying that if Gantt has to depend on something else, it will use the "something else" client library for knowing the "something else" capabilities 15:10:30 bauzas: we're not at a pure client/server relationship 15:10:51 we're relying on individual compute nodes as the source of truth 15:11:00 edleafe: right, and that comes up to my point : that's just because the Scheduler is really tied 15:11:01 not the compute api 15:11:12 understood 15:11:15 edleafe: right, and it won't change 15:11:33 edleafe: meaning that the compute nodes are running a scheduler clien t 15:11:38 that's what I'm trying to figure out how to deal with. Not the ideal situation in the future 15:11:49 edleafe: so, we can just consider that the scheduler client is having a version 15:12:20 edleafe: as it's already the case for Juno, we know that all updates are going thru the client 15:12:38 edleafe: so bumping a client version seems reasonable 15:12:57 bauzas: bumping the scheduler client version, yes 15:13:02 but that doesn't help us here 15:13:04 edleafe: ie. computes provide their stats thru the client, the client is adding a version number to those states 15:13:06 stats 15:13:28 edleafe: so we keep the release tagging by the scheduler 15:13:36 it's adding the same version number to every stats report 15:13:43 edleafe: incorrect 15:13:56 ?? 15:14:00 edleafe: computes have different scheduler client versions 15:14:14 edleafe: because the code is run by the compute node 15:14:16 I'm confuse on what we expect on Gantt in the future, Gantt poll data and nova also push data to gantt 15:14:43 we want to poll data or push data? 15:14:48 alex_xu: the direction is very clear : Gantt won't *poll* data unless exceptional circumstances 15:15:08 alex_xu: in the future, gantt will own the data 15:15:08 alex_xu: Computes (or others) will push data to Gantt 15:15:21 So we ask compute nodes to write their scheduler client version to DB on startup; then any node sans version which hasn't sent us a message we assume is old? 15:15:33 lxsli: you got it 15:15:35 lxsli: no, we don't even need the db then 15:15:45 edleafe: no, we still need DB 15:15:46 :D 15:16:00 edleafe: because compute updates are going thru the DB now 15:16:03 bauzas: but we define interface for sync instances info. We call gatt to tell which instance need updated, then we let gantt poll the updated instance 15:16:04 bauzas: can't the scheduler keep track when it gets stats reports? 15:16:30 Host A has client version 1.2.3, host B has 1.2.4 15:16:30 edleafe: that's an async process 15:16:40 bauzas: yes 15:16:48 edleafe: sched client updates the conductor which updates DB 15:16:52 edleafe: we need to know whether a node is old on startup, before it gets a chance to update us 15:16:54 and until a host reports a minimal version, we assume it is old 15:17:00 that's why we still need the DB afaik 15:17:17 lxsli: why do we need to know that at startup? 15:17:32 if it's old, it won't be sending updates 15:17:35 (for instances) 15:17:57 because a compute node doesn't message us until something changes on it and we need to be able to schedule to it immediately 15:18:12 when the time comes to add instance info to HostState objects, unless we've seen a minimal version for that host, we'll grab the InstanceList ourselves 15:18:18 edleafe: so you're just saying that we should just consider that if no RPC calls are going from an host, then the host is old - that's cautious : 15:18:33 bauzas: sort of 15:18:43 we get the initial InstanceList from the DB, we want that to include whether the node is new so we don't do DB queries for new nodes 15:18:44 bauzas: I'm saying that the compute nodes are sending stats regularly 15:18:54 edleafe: I was just seeing that computes periodically update stats to DB using the client, we can tag this client 15:18:56 edleafe: not regularly - on change 15:19:01 lxsli: +1 15:19:13 bauzas: once we see a minimal version for the client, we know that we are getting instance changes 15:19:28 edleafe: that's even better if we tag those stats - just because we're adding a version 15:20:42 edleafe: anyway, the idea is the scheduler version - maybe the sync method (which I still think it's a bad name, but anyway...) can just report that version 15:20:48 lxsli: if the compute node is new, it will also be sending syncs periodically 15:20:58 lxsli: so we'll know if we missed something 15:21:02 edleafe: ahhh the sync - OK, that can work 15:21:18 gosh, I really dislike the 'sync' word :/ 15:21:44 sanity check? :) 15:21:44 bauzas: it's better than 'check_for_same_uuids' :) 15:22:09 bauzas: or 'are_we_in_sync' 15:22:27 OK to sum up - so we assume any node is old unless we've had a sanity check from it, and the scheduler client version contained in that is new enough 15:22:28 edleafe: anyway, I don't want to nitpick 15:22:30 lxsli: gut_check 15:22:45 lxsli: that's what I'm thinking 15:22:47 (where 'new enough' is always true right now) 15:23:08 That works for me... we could do a few extra queries in the first minute, but ^^ 15:23:35 edleafe: that could work - ie. the 'sync' method is just "pass context to the scheduler, incl. version" 15:23:45 lxsli: yes, the startup will always be a little crazy until things settle down and we know what we're dealing with 15:23:57 bauzas: yep 15:24:10 +1 on it, I don't want to take too much time on that 15:24:21 +1 woohoo progress! 15:24:22 bauzas: actually, any sync at all means it's new enough :) 15:24:37 edleafe: agreed 15:24:51 edleafe: adding a version seems reasonable anyway 15:24:58 this is great, because it solves the hairy DB problem I was seeing in tests 15:25:09 eh eh 15:25:23 since the ComputeManager isn't going to be writing to the DB at startup 15:25:34 edleafe: it does :) 15:25:52 edleafe: because ComputeManager calls RT.update_resource_stats() at startup 15:26:17 bauzas: but that's already taken care of 15:26:30 edleafe: anyway, I leave you update the spec 15:26:33 I don't have to mock out this new call all over the place 15:26:37 I put a comment on the spec 15:26:42 What's the next topic? 15:26:46 edleafe: about the spec, I had one comment about passing the user context as an argument 15:26:56 bauzas: yes 15:27:00 basically, I'm not seeing it as an added value 15:27:21 if the scheduler wants to query its DB, it doesn't need the Nova user context 15:27:22 so it's not needed for the RPC stuff to work? 15:27:39 Just looking at the other calls, they all pass in context, 'method_name', **kwargs 15:27:50 edleafe: nope at all, we are even trying to reduce the number of times we're passing out a context 15:28:00 ok, then I'll take it out 15:28:02 edleafe: like in the object methods 15:28:13 edleafe: we're just passing out a context once 15:28:30 edleafe: again, I don't want to nitpick, lxsli thoughts ? 15:28:52 None 15:28:56 awesome 15:29:07 hey, if it's not needed, then I'll take it out 15:29:08 lxsli raised an NotFoundException 15:29:45 anything else on the spec? 15:29:53 ok, before we're definitely loosing lxsli, we can move on :) 15:30:18 #topic Status on cleanup work 15:30:27 bauzas: need any help on your patch series? 15:30:41 edleafe: which one ? :D 15:30:53 any of them! 15:30:54 detach-service ? 15:31:08 * edleafe was thinking of detach server 15:31:11 so, detach-service-from-computenode is in a quite good shape 15:31:12 ugh 15:31:13 service 15:31:18 because I have core support 15:31:26 dan went +A wild :) 15:31:35 and jaypipes kindly helped me on that one 15:31:41 I have other BPs 15:32:00 isolate-sched-db-aggregates is in a good shape, ie. coding coding coding 15:32:31 and RequestSpec objectification is currently blocked because of an Image objectification patch 15:32:48 so, I'm pretty late to the planning, but I can still handle my work :) 15:32:52 bauzas: how can the rest of us help out? 15:33:07 edleafe: honestly, not a lot of things 15:33:09 I have a bit of time too 15:33:13 ok, just checking 15:33:32 edleafe: as I said, it's all about coding a BP which is quite straightforward :) 15:33:48 I'm coding my spec as WIP, so when that finally gets approved, the code should be ready, too 15:33:49 lxsli: you told me this morning about something on the RT objectification, right K 15:33:51 ? 15:33:58 edleafe: awesome 15:34:24 Yeah Paul promised we'd help with that but we've been blocked / busy 15:34:46 I think my migration object chain is pretty OK right now so I was looking for something more to do in that area 15:35:27 lxsli: I'm lost with your series, could you just give us the link ? 15:36:20 So mine is https://review.openstack.org/#/c/79324/ but I need Jay's https://review.openstack.org/#/c/152689/ before sahid's patch can merge before mine can 15:36:44 lxsli: ok thanks, I'm starring those changes 15:36:59 So that one will be some time... meanwhile looking for some more RT objectification to help with 15:37:24 lxsli: there are some patches in review about that 15:37:46 lxsli: I saw them in the pipeline from someone named Hans Lindgren 15:38:15 lxsli: here it is https://review.openstack.org/#/c/149224/ 15:38:31 lxsli: you can probably help him 15:38:44 OK I'll have a look, thanks 15:39:21 any further things to discuss about the priority BPs or can I open the opens ? 15:39:50 open the opens! 15:40:04 #topic Open discussion 15:40:25 So, wazztup ? 15:41:17 bauzas: https://www.youtube.com/watch?v=tauYnVE6ykU 15:41:19 It's just us 4 here right? Maybe not too much 15:41:43 We discussed my issues already, so I'm good 15:41:46 edleafe: I was exactly on the same video but was too shy to propose it here :) 15:41:50 edleafe: I'm glad you did :) 15:42:00 bauzas: :) 15:42:06 ok, if crickets, then return 15:42:17 Early finish \o/ 15:42:22 thanks all 15:42:24 #endmeeting