#openstack-meeting log

21:00:21 <jd__> #startmeeting ceilometer
21:00:21 <openstack> Meeting started Wed Dec 18 21:00:21 2013 UTC and is due to finish in 60 minutes.  The chair is jd__. Information about MeetBot at http://wiki.debian.org/MeetBot.
21:00:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
21:00:24 <openstack> The meeting name has been set to 'ceilometer'
21:00:36 <dragondm> o/
21:00:39 <lsmola> hello
21:00:44 <jd__> hello everyone
21:00:51 <jd__> #link https://wiki.openstack.org/wiki/Meetings/Ceilometer
21:00:54 <llu-laptop> o/
21:01:01 <thomasem> o/
21:01:08 <eglynn> o/
21:01:30 <dhellmann> o/
21:01:33 <ildikov_> o/
21:01:38 <gordc> o/
21:02:28 <nealph> o/
21:03:08 <apmelton1> o/
21:03:15 <herndon> o/
21:03:41 <jd__> #topic Release python-ceilometerclient?
21:03:48 <jd__> I think we're good on that
21:03:56 <eglynn> yep, no need AFAIK
21:04:55 <dhellmann> +1
21:05:00 <jd__> #topic housekeeping: meeting cancellation/re-scheduling over the holiday period (eglynn)
21:05:07 <eglynn> well, our next two scheduled meetings fall on public holidays here
21:05:14 <eglynn> (in Ireland at least)
21:05:23 <eglynn> i.e. Thrus Dec 26th and Wed Jan 1st
21:05:36 <jd__> I think it's a good idea
21:05:37 <eglynn> so I'd propose we cancel the Dec 26th meeting and reschedule the Jan 1st meeting to Thurs Jan 2nd
21:05:39 <gordc> same in Canada
21:05:41 <dhellmann> yeah, I'll be on vacation those days, too
21:05:50 <herndon> +1
21:05:53 <ildikov_> +1
21:05:54 <apmelton1> +1
21:05:55 <dragondm> dhellmann: Ditto.
21:06:00 <llu-laptop> +1
21:06:04 <dragondm> +1
21:06:09 <jd__> eglynn: well Jan 2 is going to be vacation still for a lot of people including me
21:06:14 <jd__> likely we can't skip both?
21:06:16 <lsmola> +1
21:06:17 <jd__> s/can't/can/
21:06:36 * dragondm will be driving home on Jan 2.
21:06:36 <eglynn> jd__: a-ha, I didn't realize that
21:06:52 <thomasem> I'll be out the 20th thru 6th in Ireland :P
21:06:52 <eglynn> so cancel both?
21:06:59 <dhellmann> do we have anything pressing that we would need to talk about? I think it's safe to skip.
21:07:01 <eglynn> or move it to the third
21:07:08 <jd__> I think it's safe to cancel both at this point
21:07:11 <jd__> unless something big pops up
21:07:20 <eglynn> dhellmann, jd__: cool with me!
21:07:42 <jd__> #agree skip meetings during holidays!
21:07:46 <jd__> :)
21:07:56 <eglynn> \o/
21:07:57 <jd__> I'll update the wiki to indicate the next meeting
21:08:04 <jd__> #topic use of the mongodb aggregation framework to serve out /v2/resources (eglynn)
21:08:25 <eglynn> yeah, so basically we're using an aggregation over the meter collection to construct the resources
21:08:36 <eglynn> #link https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/impl_mongodb.py#L638
21:08:38 <jd__> I think I wrote that
21:08:52 <eglynn> presumably because we need to adorn the resources with first & last timestamps, meter links etc.?
21:08:58 <jd__> eglynn: yes
21:09:05 <eglynn> not that we do anything with those timestamps currently ...
21:09:09 <eglynn> but we will
21:09:19 <dhellmann> I think we expect clients to use the values
21:09:20 <eglynn> once https://bugs.launchpad.net/ceilometer/+bug/1262335 is fixed
21:09:20 <jd__> oh yeah
21:09:23 <uvirtbot> Launchpad bug 1262335 in ceilometer "first & last sample timestamps not included in the Resource representation" [Medium,Triaged]
21:09:29 <dhellmann> oh, haha
21:09:32 <eglynn> ... but I digress
21:09:48 <eglynn> the main problem is that the mongo aggregation pipeline performs sorting in-memory
21:09:57 <eglynn> in this case operating over a potentially very large collection
21:10:01 <jd__> the main problem is mongo? :p
21:10:10 <jd__> #chair eglynn
21:10:11 <eglynn> LOL :)
21:10:12 <openstack> Current chairs: eglynn jd__
21:10:41 <jd__> (sorry I'm being kicked out, so I'm letting eglynn chairing just in case I go MIA)
21:10:50 <eglynn> cool
21:11:07 <eglynn> ... so the data involved is potentitally large, particularly if the GET /v2/resources isn't constrained with query params
21:11:08 <dragondm> Honestly, some of that schema in impl_sqlalchemy makes me sad too...
21:11:11 <dhellmann> should we track the timestamp values as the data is recorded or something?
21:11:19 <eglynn> (e.g. to limit to a single tenant for ex.)
21:11:25 * jd__ pats dragondm
21:11:37 <eglynn> so in production we're the seeing the following error ...
21:11:40 * dragondm feels better.
21:11:48 <eglynn> "terminating request: request heap use exceeded 10% of physical RAM"
21:12:04 <eglynn> ... sad face on the ops guy
21:12:17 <eglynn> turns out mongo is hardcoded to abort any sorts in an aggregation pipeline that'll take more than 10% of memory
21:12:26 <lsmola> hehe
21:12:30 <dhellmann> oh, wow
21:12:31 <eglynn> ... I did not know that
21:12:34 <dragondm> Lovely.
21:12:44 <eglynn> though fair enough policy I guess
21:12:53 <dragondm> Tho doing a big sort in mem is no bueno to begin with.
21:12:58 <eglynn> so question is ...
21:13:01 <eglynn> dragondm: yep
21:13:26 <eglynn> does anyone recall if the justification for the use of the aggregate framework was just convenience?
21:13:29 <eglynn> or something more?
21:13:44 <eglynn> a quick look at the fossil record on gerrit kinda implies it may have been just as convenience
21:13:52 <eglynn> #link https://review.openstack.org/35297
21:13:56 <dhellmann> jd__ did that work, right?
21:14:19 <eglynn> if that was the case, /me thinking maybe we should just revert to the equivalent map-reduce for this?
21:14:29 <eglynn> coz in-memory sorting kinda gives me the shivers ;)
21:14:35 <dhellmann> makes sense to me
21:14:59 <gordc> i thought aggregate was a ' new feature' that newer mongo provided? i assume new features are suppose to be better
21:15:22 <eglynn> gordc: I think it was new in 2.0.1 or thereabouts
21:15:25 * dhellmann admires gordc's optimism
21:15:38 <gordc> :)
21:15:44 <eglynn> apparently another alternative is to avoid the need for sorting in-memory for pipelines by ensuring sufficient indices exist
21:15:48 <ildikov_> eglynn: like the idea too, I was reading about this issue in my quick research for statistics, I did not look good
21:15:51 <eglynn> ... or google tells me anyway
21:15:58 <eglynn> ildikov_: cool
21:16:08 <dhellmann> eglynn: oh, if we can fix it with indices we should try that, too
21:16:34 <eglynn> I think the default sort order for resources is currently user_id, project_id, timestamp all descending
21:16:56 <eglynn> ... not sure if we want the cost of maintaining all those indices on db.meter?
21:17:16 <dhellmann> those are the big query filters anyway, right?
21:17:21 <dhellmann> maybe not user
21:17:30 <dhellmann> I'm not sure why we sort on user, to be honest
21:17:40 <eglynn> yeah me neither
21:17:41 <dhellmann> esp. for resources
21:17:50 <dhellmann> samples maybe
21:18:21 <eglynn> samples by timestamp first seems more logical, or?
21:18:38 <dhellmann> resource, timestamp
21:18:50 <dhellmann> or vice versa
21:19:00 <eglynn> yeah
21:19:03 <dragondm> ^ one of those +
21:19:03 <lsmola> eglynn, for timelines, yes
21:19:11 <dhellmann> I think we definitely want indexes on resource, though
21:19:24 <eglynn> anyhoo, seems there's broad agreement to ditch the aggregation pipeline unless we can make it work *without* in-memory sorting?
21:19:45 <eglynn> (by skirting round the issue with judicious use of indices ...)
21:19:45 <dhellmann> does this same issue come up with the statistics API?
21:19:51 <dhellmann> or is the dataset small enough there?
21:20:15 <eglynn> dhellmann: I guess there would almost always be narrowing constraints in that case
21:20:29 <eglynn> dhellmann: narrow timestamp, project_id matching etc.
21:20:35 <dhellmann> yeah, I wasn't sure if they were narrow enough, but I suppose that depends on the TTL for the sample data
21:20:37 <gordc> eglynn: out of curiosity what's the rough number of records we're talking about before the in-memory stuff becomes an issue.
21:20:55 <eglynn> gordc: very good question, and I don't know the answer as yet
21:21:11 <gordc> dhellmann: ttl stuff was what i was hinting at.
21:21:13 * eglynn doesn't have direct access yet to the deployment on which it came up
21:21:26 <eglynn> ... but I'm hoping to get access this soon
21:21:31 <ildikov_> dhellmann: if I know right statistics now works with map-reduce in mongo
21:21:47 <eglynn> (so I'll be able to do hands-on sizing etc.)
21:22:22 <dhellmann> eglynn: I wonder if we could also store the first & last timestamp right in the resource document, to skip the map reduce?
21:22:29 <eglynn> ildikov_: a-ha, yes you're correct
21:22:29 <dhellmann> ildikov_: ok, good
21:22:47 <eglynn> resources is the only case where we use an aggregation pipeline
21:23:29 <eglynn> dhellmann: and update the resource collection every time we add a sample to the meter collection?
21:23:43 <dhellmann> eglynn: yeah, mongo has an update-in-place operation
21:23:51 <dhellmann> I have no idea what that would do to efficiency
21:24:17 <eglynn> dhellmann: ... but we still need to adorn the resource with links to all the associated meters
21:24:35 <dhellmann> we did that at one point, are we not doing that any more?
21:24:38 <eglynn> dhellmann: ... so I guess we'd need to store those too in the resource collection
21:24:38 <gordc> eglynn: we do something similar for sql -- updating resource for every meter.... i notice deadlock errors because of it
21:24:42 <dhellmann> maybe that's what this change you linked to was doing
21:24:58 <eglynn> dhellmann: ... I'm pretty sure we're still doing it
21:25:13 <eglynn> actually I think horizon depends on those links being present
21:25:28 <eglynn> lsmola: ^^^ am I recalling that correctly?
21:25:49 <lsmola> eglynn, thinking
21:26:01 <lsmola> eglynn, checking
21:26:02 <eglynn> i.e. horizon walks the resource representation at some point and grabs all the meter links
21:26:06 <dhellmann> yeah, we have $addToSet for the meter with some metadata, but not the timestamps
21:26:24 <gordc> eglynn: if i recall, we build those links on the api side...i don't think we actually associate them in database (might be wrong here)
21:26:40 <dhellmann> impl_mongodb.py line 411
21:27:25 <lsmola> eglynn, yes I use to to get all resources for some meter
21:27:36 <lsmola> eglynn, though that is not very optimal
21:27:54 <eglynn> lsmola: yep, here for example IIUC https://github.com/openstack/horizon/blob/master/openstack_dashboard/dashboards/admin/metering/views.py#L182
21:28:07 <gordc> dhellmann: whoops never mind then, guess we do store them in mongo.
21:28:11 <dhellmann> but I guess the point is we have a couple of options and we don't need to pick the approach here
21:28:53 <lsmola> eglynn, yes
21:29:02 <eglynn> dhellmann: exactly ... so I'll proceed with some more experimentation
21:29:11 <eglynn> ... see if I can work-around simply with indices
21:29:22 <eglynn> ... otherwise use one of the approaches discussed above
21:29:24 <dhellmann> sounds good -- is there a bug filed already?
21:29:42 <eglynn> nope, I wanted to pick some brains on the background first
21:29:53 <dhellmann> ok
21:30:06 <eglynn> (... the issue only came to light late yesterday)
21:30:46 * jd__ back
21:30:49 <jd__> eglynn: it was convenience
21:30:50 <eglynn> #action eglynn file a bug to capture all the background on mongo aggregation pipeline memory issues
21:30:55 <jd__> if you have any other methods go ahead
21:30:58 <jd__> it was actually convenience AND correctness
21:31:05 <jd__> <dhellmann> [22:22:22] eglynn: I wonder if we could also store the first &
21:31:09 <jd__> last timestamp right in the resource document, to skip the map
21:31:09 <eglynn> jd__: cool
21:31:12 <jd__> reduce?
21:31:15 <jd__> that won't work dhellmann , that what it used to be
21:31:18 <jd__> but as soon as you add a filter, that breaks
21:31:31 <eglynn> a-ha, yes, good point
21:31:39 <dhellmann> jd__: ah, right, filtering on something in the metadata
21:31:40 <eglynn> there's is no static first & last, right?
21:31:46 <jd__> nop
21:31:57 <eglynn> got it
21:31:58 <jd__> so I'm not sure there is an easy way to precompute those values
21:32:17 <jd__> maybe you could just add a test case that handles the fact MongoDB is going to fail for some queries
21:32:33 <jd__> return a 542 Sorry MongoDB is there.
21:32:41 <dhellmann> good point, we should make sure we're handling that anyway
21:32:41 <dragondm> Heh.
21:32:42 <apmelton1> lol jd__
21:32:42 <eglynn> LOL :)
21:32:47 <lsmola> hehe
21:33:07 <jd__> that links back to the #action eglynn have added anyway
21:33:10 <dragondm> Error 599 Mongo saaaddd..
21:33:12 <thomasem> hahaha
21:33:48 <lsmola> :-)
21:33:49 <jd__> eglynn: all good on that?
21:34:01 <eglynn> for now I've pointed the ops guy to a work-around to partition the resource listing per-tenant in a bash for loop
21:34:11 <eglynn> so the sky is not going to fall in
21:34:24 <eglynn> ... i.e. we've time to consider this and come up with a well thought-out solution
21:34:39 <eglynn> yep, that's all I got on this, let's move on
21:34:55 <jd__> #topic promote Sandy Walsh to core? (herndon)
21:35:27 <jd__> I think we got the candidature but there was not enough vote on that
21:35:31 <herndon> yeah, I just don't know what the procedure is here, I saw a couple of +1s come through on email, but what happens?
21:35:43 <herndon> ah, ok then.
21:35:52 <jd__> well it needs at least 5 +1 from ceilometer-core devs
21:36:03 <thomasem> I was curious about that, okay.
21:36:29 <jd__> and AFAIK it only got 3
21:36:39 <herndon> yep, that's what I saw as well.
21:36:49 <eglynn> I'd like to point out that I think sandywalsh_ would make an excellent core once he's increased his review count and patch-landing acitivity
21:37:06 <dhellmann> eglynn: yeah, I think that was where I came up, too
21:37:27 <dhellmann> s/came/ended/
21:37:28 <jd__> agreed
21:37:58 <eglynn> yep ... so I wouldn't like this to be a discouragement to him
21:38:32 <dragondm> Myself and thomasem have been working to tame the review queue as well. We could use some faster patch-landing (with appropriate reviews, of course)
21:38:47 <thomasem> undoubtedly
21:38:56 <thomasem> :)
21:38:59 <eglynn> I think we could re-visit the question soon-ish if and when his stats go in the right direction
21:39:22 <dhellmann> +1
21:39:38 <herndon> ok, thanks for the explanation :)
21:39:50 <eglynn> dragondm: thanks! I noticed that ... I've been a bit remiss with reviews myself lately due to the productization work on my plate
21:39:53 <herndon> +1 on faster patch landing for events btw.
21:40:15 <jd__> I feel guilty too, I'm trying to review a lot
21:40:30 <thomasem> For visibility, there has been a ton of work behind the scenes over here - we've been working on load testing the drivers and in the process of trying to clean up the SQLAlchemy schema and what-not.
21:40:35 <jd__> anyway don't give up, every contribution is appreciated :)
21:40:55 <dragondm> eglynn: thanks on those reviews on some of that event code, tho'.
21:41:00 <thomasem> https://etherpad.openstack.org/p/ceilometer-data-store-scale-testing
21:41:01 <eglynn> thomasem: yep, I suspect as much on the "behind the scenes" work
21:41:42 <jd__> #topic Open discussion
21:41:54 <dragondm> Yup. and there's a ton more event work coming doewn the pike soon, too.
21:42:10 <eglynn> cool
21:42:19 <herndon> yeah, like up to my eyeballs amount of work to do
21:42:20 <gordc> dragondm: i'm hoping it's after the holidays
21:42:23 <thomasem> eglynn, thanks; just wanted to bring more visibility to it since it's been very time consuming.
21:42:52 <ildikov_> I have just one quick question
21:43:13 <dragondm> gordc: yah. I will probably put up a WIP branch for the event pipelines soon, but I want to add more docs to it, which will probably be after the holidays.
21:43:19 <ildikov_> Do we have an agrreement on how to handle the NotImplementedError-s in the db drivers?
21:43:51 <jd__> ildikov_: like we do currently, no?
21:44:02 <eglynn> ildikov_: re. David's comments on https://review.openstack.org/62157
21:44:06 <gordc> dragondm: awesome. i really like the events stuff so feel free to ping me about them.
21:44:17 <ildikov_> jd__ yes, I've seen the patch set from you
21:44:43 <dragondm> gordc: Will do. It's gonna be *cool*
21:44:43 <ildikov_> eglynn: yes, the question is originating from there
21:44:51 <eglynn> #link https://review.openstack.org/#/c/62157/3/ceilometer/storage/base.py
21:45:01 <jd__> I didn't check that review
21:45:33 <jd__> eglynn: did you see https://review.openstack.org/#/c/62636/ then?
21:45:39 <jd__> *BATTLE IS ON*
21:46:07 <jd__> i think the abstract stuff is not a good idea now, because we have a lot of drivers doing different stuff, and we might even split them by feature
21:46:08 <eglynn> a-ha, I did not
21:46:18 <ildikov_> jd__: I just wanted to be sure that it is the way that we are taking now
21:46:19 <jd__> so this was a good idea in the early days, but it's getting weird now
21:46:32 <eglynn> ok, I see the context more clearly now
21:47:05 <ildikov_> jd__: I agree with those changes
21:47:06 <eglynn> I'll review https://review.openstack.org/62636 tmrw
21:47:13 <jd__> eglynn: thumbs up
21:48:25 <ildikov_> jd__: I just wanted to make it clear, before upload the corrections
21:48:26 * dragondm wonders if we could just have a 'Not Implemented' decorator...
21:48:35 <jd__> ildikov: and it was a good idae :)
21:48:39 <dhellmann> jd__: the right approach there is to put the related methods in a base class, declaring the interface
21:48:53 <dragondm> Might make it easier to introspect what features what drivers support for doc purposes...
21:48:55 <dhellmann> then the methods can still be abstract, but a subclass that doesn't inherit the interface doesn't have to implement them
21:48:58 <ildikov_> jd__: thanks :)
21:49:22 <jd__> dhellmann: I wouldn't see the point to have all that code duplication again
21:49:37 <dhellmann> what is duplicated?
21:49:42 <jd__> raise NotImplementedError is
21:49:58 <dhellmann> well, the idea is the drivers ought to be implementing those methods :-)
21:50:13 <jd__> well you have to implemented them if it breaks ceilometer not to have them, I agree
21:50:25 <jd__> but since most part of Ceilometer handle those case correctly, there's no need for that in most cases
21:50:43 <jd__> I think record_metering_data is a good candidate to be abstract for example
21:50:45 <dhellmann> ok, I guess I think the real issue is the lack of feature parity in the drivers rather than the implementation
21:50:54 <jd__> because likely ceilometer will explode if your driver don't implement it
21:51:16 <jd__> dhellmann: yeah that's another issue :)
21:51:16 <dhellmann> but ok
21:51:26 <jd__> if it is an issue, I think we're pretty good actually on that
21:51:35 <jd__> I mean I think we deal with that the best we can
21:51:38 <dhellmann> yeah
21:52:00 <llu-laptop> Just wondering if anyone has the same problem with IRC in this channel as me. Mine sits silent for several minutes, and all of a sudden dozens of messages pops up at the same time. :(
21:52:20 <lsmola> nope
21:52:23 <gordc> jd__: would it make sense to have an events interface and then drivers would inherit that interface if it were to support events... if you support events you should probably support all the related methods.
21:52:25 <dragondm> llu-laptop: I've seen that on occasion. I
21:52:35 <dhellmann> llu-laptop: I get that sometimes, too
21:52:41 <dhellmann> less today than other days
21:52:46 <dhellmann> I think it's a freenode issue
21:52:46 <dragondm> llu-laptop: er, I usually blame znc
21:53:14 <tongli> @jd__, now without these methods (throwing NotIMplementedError), what happens when it gets called?
21:53:31 <dhellmann> tongli: the base class implementation would raise the exception
21:53:56 <jd__> gordc: yes I think it would ultimately, especially since we could add the ability to use differents drivers for different features
21:53:59 <tongli> @jd__, will the behavior be same as before from client perspective?
21:54:20 <jd__> tongli: yes
21:54:32 <dhellmann> jd__, gordc: right, that's what I meant by splitting the base class into different interface classes with related methods
21:54:40 <gordc> jd__: yeah, i think that'd be cleaner...and then a grouping alarms as well
21:54:53 <tongli> @jd__, ok, in that case, it will be nice to have the changes in the patch.
21:54:55 <gordc> dhellmann: yeah, that's what i thought you meant
21:54:55 <jd__> dhellmann: ok I didn't get that, I'm all for that totally, with that perspective in mind
21:55:09 <eglynn> +1 for the splitting
21:55:15 <dhellmann> gordc: we could also explore aggregation, and have a driver object have several subobjects with the real implementations
21:55:16 <dragondm> dhellmann: Ah... yah. +1
21:55:21 <dhellmann> driver.alarms.some_method()
21:55:26 <dhellmann> driver.events.some_other_method()
21:55:27 <dhellmann> etc.
21:55:34 <dragondm> Also good.
21:55:43 <apmelton1> dhellmann: +1
21:55:44 <ildikov_> +1 for splitting
21:56:07 <dragondm> Avoids the db layer looking like the one in Nova with n hundred methods...
21:56:08 <gordc> dhellmann: that sounds cool.
21:56:15 <dhellmann> dragondm: exactly
21:56:34 <eglynn> yep, an anti-pattern if ever I saw one!
21:56:42 <dhellmann> and it solves the duplicate implementation issue, because the default for the events attribute could be an instance of a single class with NotImplemented implementations
21:57:15 <jd__> yep
21:57:21 <dragondm> Bingo. And creates larger scale 'features' (i.e. 'event', 'alarms', etc)
21:57:36 <jd__> conclusion http://persephonemagazine.com/wp-content/uploads/2013/02/himym-marshall-lily-high-five.gif
21:57:40 <dhellmann> we'd have to figure out how much state would need to be shared between the different interface classes, so each driver might also need to pass a "state" object in or something
21:58:06 <dragondm> Have the parent object pass itself to the child's __init__
21:59:08 <dhellmann> dragondm: I'd be worried about circular references with that, but that's the general idea
21:59:28 <jd__> next meeting will be the 9th January FWIW
21:59:59 <eglynn> cool, call it a night so?
22:00:00 <dhellmann> have a good break, everyone
22:00:13 <thomasem> Likewise! have fun! :)
22:00:18 <gordc> cheers
22:00:20 <jd__> yep have fun guys
22:00:28 <eglynn> yep happy not-coding days
22:00:29 <ildikov_> and have some rest :)
22:00:32 <dragondm> Merry Christmas, & Happy New Year all!
22:00:35 <lsmola> have a great holiday guys
22:00:36 <jd__> #endmeeting