#openstack-meeting log

14:59:50 <bauzas> #startmeeting gantt
14:59:51 <openstack> Meeting started Tue Feb 17 14:59:50 2015 UTC and is due to finish in 60 minutes.  The chair is bauzas. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:59:52 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:59:55 <openstack> The meeting name has been set to 'gantt'
15:00:04 <edleafe> o/
15:00:09 <alex_xu> \o
15:00:16 <bauzas> hi, stepping up this time as n0ano can't make it, who's there ?
15:01:50 <bauzas> woah, please all don't speak at the same time :)
15:02:16 <edleafe> alex_xu and I were way ahead of you
15:02:26 <lxsli> o/
15:02:38 <bauzas> ;à
15:02:40 <bauzas> :)
15:02:46 <alex_xu> :)
15:03:02 <bauzas> so, let's start, so people could come
15:03:46 <bauzas> #topic Remove direct nova DB/API access by Scheduler Filters
15:04:10 <edleafe> So any ideas on the best way to represent version of the compute node?
15:04:13 <bauzas> so,  https://review.opernstack.org/138444/ is updated very often, thanks edleafe
15:04:23 <edleafe> Since RPC version doesn't seem to fit
15:04:45 <edleafe> bauzas: I have another update waiting to go, after the results of this meeting
15:05:14 <bauzas> edleafe: here I was thinking a x.y.z version is good
15:05:38 <edleafe> it would seem that rolling compute node updates would have already been a problem
15:05:47 <edleafe> and that someone would have created a solution
15:05:58 <bauzas> edleafe: actually, the problem is that the scheduler is still really tied to Nova
15:06:09 <edleafe> bauzas: true
15:06:10 <bauzas> edleafe: but longer term, that would be an API
15:06:28 <edleafe> but in general, compute nodes are rolled out in bunches, not all at once
15:06:36 <bauzas> edleafe: so considering that's a separate API, in general, clients provide a backwards compatibility by discovering versions
15:06:44 <edleafe> so the issue of differing behavior should have been a problem at some time, no?
15:07:14 <bauzas> edleafe: here, on a longer term, we should imagine a Gantt client able to discover if the Gantt server is having the compute capabilitiy
15:07:45 <bauzas> edleafe: at the moment, as we don't have a discovery mechanism, that's just something we pass to the Scheduler
15:07:57 <edleafe> the reverse is also true
15:08:16 <bauzas> edleafe: reverse of what ?
15:08:31 <edleafe> if the gantt server is relying on outside entities behaving a certain way, it needs to be able to have a way to verify that
15:08:42 <bauzas> edleafe: that, I disagree :)
15:08:50 <edleafe> i.e., gantt server discovering its clients
15:08:54 <bauzas> edleafe: that's all about capabilities of a given API
15:09:18 <bauzas> edleafe: so, if Gantt wants to know Nova capabilities, it will run the Nova client which provides that backwards compatibiltiy
15:09:32 <edleafe> so you're saying that the gantt server will never be dependent on anything outside of itself?
15:09:44 <bauzas> edleafe: all the compatibility checks are usually done by the clients
15:09:53 <bauzas> edleafe: I'm not saying that :)
15:10:21 <bauzas> edleafe: I'm saying that if Gantt has to depend on something else, it will use the "something else" client library for knowing the "something else" capabilities
15:10:30 <edleafe> bauzas: we're not at a pure client/server relationship
15:10:51 <edleafe> we're relying on individual compute nodes as the source of truth
15:11:00 <bauzas> edleafe: right, and that comes up to my point : that's just because the Scheduler is really tied
15:11:01 <edleafe> not the compute api
15:11:12 <edleafe> understood
15:11:15 <bauzas> edleafe: right, and it won't change
15:11:33 <bauzas> edleafe: meaning that the compute nodes are running a scheduler clien t
15:11:38 <edleafe> that's what I'm trying to figure out how to deal with. Not the ideal situation in the future
15:11:49 <bauzas> edleafe: so, we can just consider that the scheduler client is having a version
15:12:20 <bauzas> edleafe: as it's already the case for Juno, we know that all updates are going thru the client
15:12:38 <bauzas> edleafe: so bumping a client version seems reasonable
15:12:57 <edleafe> bauzas: bumping the scheduler client version, yes
15:13:02 <edleafe> but that doesn't help us here
15:13:04 <bauzas> edleafe: ie. computes provide their stats thru the client, the client is adding a version number to those states
15:13:06 <bauzas> stats
15:13:28 <bauzas> edleafe: so we keep the release tagging by the scheduler
15:13:36 <edleafe> it's adding the same version number to every stats report
15:13:43 <bauzas> edleafe: incorrect
15:13:56 <edleafe> ??
15:14:00 <bauzas> edleafe: computes have different scheduler client versions
15:14:14 <bauzas> edleafe: because the code is run by the compute node
15:14:16 <alex_xu> I'm confuse on what we expect on Gantt in the future, Gantt poll data and nova also push data to gantt
15:14:43 <alex_xu> we want to poll data or push data?
15:14:48 <bauzas> alex_xu: the direction is very clear : Gantt won't *poll* data unless exceptional circumstances
15:15:08 <edleafe> alex_xu: in the future, gantt will own the data
15:15:08 <bauzas> alex_xu: Computes (or others) will push data to Gantt
15:15:21 <lxsli> So we ask compute nodes to write their scheduler client version to DB on startup; then any node sans version which hasn't sent us a message we assume is old?
15:15:33 <bauzas> lxsli: you got it
15:15:35 <edleafe> lxsli: no, we don't even need the db then
15:15:45 <bauzas> edleafe: no, we still need DB
15:15:46 <lxsli> :D
15:16:00 <bauzas> edleafe: because compute updates are going thru the DB now
15:16:03 <alex_xu> bauzas: but we define interface for sync instances info. We call gatt to tell which instance need updated, then we let gantt poll the updated instance
15:16:04 <edleafe> bauzas: can't the scheduler keep track when it gets stats reports?
15:16:30 <edleafe> Host A has client version 1.2.3, host B has 1.2.4
15:16:30 <bauzas> edleafe: that's an async process
15:16:40 <edleafe> bauzas: yes
15:16:48 <bauzas> edleafe: sched client updates the conductor which updates DB
15:16:52 <lxsli> edleafe: we need to know whether a node is old on startup, before it gets a chance to update us
15:16:54 <edleafe> and until a host reports a minimal version, we assume it is old
15:17:00 <lxsli> that's why we still need the DB afaik
15:17:17 <edleafe> lxsli: why do we need to know that at startup?
15:17:32 <edleafe> if it's old, it won't be sending updates
15:17:35 <edleafe> (for instances)
15:17:57 <lxsli> because a compute node doesn't message us until something changes on it and we need to be able to schedule to it immediately
15:18:12 <edleafe> when the time comes to add instance info to HostState objects, unless we've seen a minimal version for that host, we'll grab the InstanceList ourselves
15:18:18 <bauzas> edleafe: so you're just saying that we should just consider that if no RPC calls are going from an host, then the host is old - that's cautious :
15:18:33 <edleafe> bauzas: sort of
15:18:43 <lxsli> we get the initial InstanceList from the DB, we want that to include whether the node is new so we don't do DB queries for new nodes
15:18:44 <edleafe> bauzas: I'm saying that the compute nodes are sending stats regularly
15:18:54 <bauzas> edleafe: I was just seeing that computes periodically update stats to DB using the client, we can tag this client
15:18:56 <lxsli> edleafe: not regularly - on change
15:19:01 <bauzas> lxsli: +1
15:19:13 <edleafe> bauzas: once we see a minimal version for the client, we know that we are getting instance changes
15:19:28 <bauzas> edleafe: that's even better if we tag those stats - just because we're adding a version
15:20:42 <bauzas> edleafe: anyway, the idea is the scheduler version - maybe the sync method (which I still think it's a bad name, but anyway...) can just report that version
15:20:48 <edleafe> lxsli: if the compute node is new, it will also be sending syncs periodically
15:20:58 <edleafe> lxsli: so we'll know if we missed something
15:21:02 <lxsli> edleafe: ahhh the sync - OK, that can work
15:21:18 <bauzas> gosh, I really dislike the 'sync' word :/
15:21:44 <lxsli> sanity check? :)
15:21:44 <edleafe> bauzas: it's better than 'check_for_same_uuids' :)
15:22:09 <edleafe> bauzas: or 'are_we_in_sync'
15:22:27 <lxsli> OK to sum up - so we assume any node is old unless we've had a sanity check from it, and the scheduler client version contained in that is new enough
15:22:28 <bauzas> edleafe: anyway, I don't want to nitpick
15:22:30 <edleafe> lxsli: gut_check
15:22:45 <edleafe> lxsli: that's what I'm thinking
15:22:47 <lxsli> (where 'new enough' is always true right now)
15:23:08 <lxsli> That works for me... we could do a few extra queries in the first minute, but ^^
15:23:35 <bauzas> edleafe: that could work - ie. the 'sync' method is just "pass context to the scheduler, incl. version"
15:23:45 <edleafe> lxsli: yes, the startup will always be a little crazy until things settle down and we know what we're dealing with
15:23:57 <edleafe> bauzas: yep
15:24:10 <bauzas> +1 on it, I don't want to take too much time on that
15:24:21 <lxsli> +1 woohoo progress!
15:24:22 <edleafe> bauzas: actually, any sync at all means it's new enough :)
15:24:37 <bauzas> edleafe: agreed
15:24:51 <bauzas> edleafe: adding a version seems reasonable anyway
15:24:58 <edleafe> this is great, because it solves the hairy DB problem I was seeing in tests
15:25:09 <bauzas> eh eh
15:25:23 <edleafe> since the ComputeManager isn't going to be writing to the DB at startup
15:25:34 <bauzas> edleafe: it does :)
15:25:52 <bauzas> edleafe: because ComputeManager calls RT.update_resource_stats() at startup
15:26:17 <edleafe> bauzas: but that's already taken care of
15:26:30 <bauzas> edleafe: anyway, I leave you update the spec
15:26:33 <edleafe> I don't have to mock out this new call all over the place
15:26:37 <lxsli> I put a comment on the spec
15:26:42 <lxsli> What's the next topic?
15:26:46 <bauzas> edleafe: about the spec, I had one comment about passing the user context as an argument
15:26:56 <edleafe> bauzas: yes
15:27:00 <bauzas> basically, I'm not seeing it as an added value
15:27:21 <bauzas> if the scheduler wants to query its DB, it doesn't need the Nova user context
15:27:22 <edleafe> so it's not needed for the RPC stuff to work?
15:27:39 <edleafe> Just looking at the other calls, they all pass in context, 'method_name', **kwargs
15:27:50 <bauzas> edleafe: nope at all, we are even trying to reduce the number of times we're passing out a context
15:28:00 <edleafe> ok, then I'll take it out
15:28:02 <bauzas> edleafe: like in the object methods
15:28:13 <bauzas> edleafe: we're just passing out a context once
15:28:30 <bauzas> edleafe: again, I don't want to nitpick, lxsli thoughts ?
15:28:52 <lxsli> None
15:28:56 <bauzas> awesome
15:29:07 <edleafe> hey, if it's not needed, then I'll take it out
15:29:08 <bauzas> lxsli raised an NotFoundException
15:29:45 <edleafe> anything else on the spec?
15:29:53 <bauzas> ok, before we're definitely loosing lxsli, we can move on :)
15:30:18 <bauzas> #topic Status on cleanup work
15:30:27 <edleafe> bauzas: need any help on your patch series?
15:30:41 <bauzas> edleafe: which one ? :D
15:30:53 <edleafe> any of them!
15:30:54 <bauzas> detach-service ?
15:31:08 * edleafe was thinking of detach server
15:31:11 <bauzas> so, detach-service-from-computenode is in a quite good shape
15:31:12 <edleafe> ugh
15:31:13 <edleafe> service
15:31:18 <bauzas> because I have core support
15:31:26 <lxsli> dan went +A wild :)
15:31:35 <bauzas> and jaypipes kindly helped me on that one
15:31:41 <bauzas> I have other BPs
15:32:00 <bauzas> isolate-sched-db-aggregates is in a good shape, ie. coding coding coding
15:32:31 <bauzas> and RequestSpec objectification is currently blocked because of an Image objectification patch
15:32:48 <bauzas> so, I'm pretty late to the planning, but I can still handle my work :)
15:32:52 <edleafe> bauzas: how can the rest of us help out?
15:33:07 <bauzas> edleafe: honestly, not a lot of things
15:33:09 <lxsli> I have a bit of time too
15:33:13 <edleafe> ok, just checking
15:33:32 <bauzas> edleafe: as I said, it's all about coding a BP which is quite straightforward :)
15:33:48 <edleafe> I'm coding my spec as WIP, so when that finally gets approved, the code should be ready, too
15:33:49 <bauzas> lxsli: you told me this morning about something on the RT objectification, right K
15:33:51 <bauzas> ?
15:33:58 <bauzas> edleafe: awesome
15:34:24 <lxsli> Yeah Paul promised we'd help with that but we've been blocked / busy
15:34:46 <lxsli> I think my migration object chain is pretty OK right now so I was looking for something more to do in that area
15:35:27 <bauzas> lxsli: I'm lost with your series, could you just give us the link ?
15:36:20 <lxsli> So mine is https://review.openstack.org/#/c/79324/ but I need Jay's https://review.openstack.org/#/c/152689/ before sahid's patch can merge before mine can
15:36:44 <bauzas> lxsli: ok thanks, I'm starring those changes
15:36:59 <lxsli> So that one will be some time... meanwhile looking for some more RT objectification to help with
15:37:24 <bauzas> lxsli: there are some patches in review about that
15:37:46 <bauzas> lxsli: I saw them in the pipeline from someone named Hans Lindgren
15:38:15 <bauzas> lxsli: here it is https://review.openstack.org/#/c/149224/
15:38:31 <bauzas> lxsli: you can probably help him
15:38:44 <lxsli> OK I'll have a look, thanks
15:39:21 <bauzas> any further things to discuss about the priority BPs or can I open the opens ?
15:39:50 <edleafe> open the opens!
15:40:04 <bauzas> #topic Open discussion
15:40:25 <bauzas> So, wazztup ?
15:41:17 <edleafe> bauzas: https://www.youtube.com/watch?v=tauYnVE6ykU
15:41:19 <lxsli> It's just us 4 here right? Maybe not too much
15:41:43 <edleafe> We discussed my issues already, so I'm good
15:41:46 <bauzas> edleafe: I was exactly on the same video but was too shy to propose it here :)
15:41:50 <bauzas> edleafe: I'm glad you did :)
15:42:00 <edleafe> bauzas: :)
15:42:06 <bauzas> ok, if crickets, then return
15:42:17 <lxsli> Early finish \o/
15:42:22 <bauzas> thanks all
15:42:24 <bauzas> #endmeeting