15:01:04 <gordc> #startmeeting telemetry
15:01:05 <openstack> Meeting started Thu Jan 14 15:01:04 2016 UTC and is due to finish in 60 minutes.  The chair is gordc. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:08 <openstack> The meeting name has been set to 'telemetry'
15:01:13 <cdent> o/
15:01:18 <ildikov> o/
15:01:50 <_nadya_> o/
15:01:50 <liusheng> o/
15:02:37 <jd__> o/
15:03:07 <gordc> #topic recurring: roadmap items (new/old/blockers) https://wiki.openstack.org/wiki/Telemetry/RoadMap
15:03:08 <sileht> o/
15:03:28 <gordc> i sent an email regarding the items i'm tracking (that aren't complete)
15:03:46 <gordc> if there are any extras please let me know (and maybe add to etherpad
15:04:01 <gordc> do we need any reviews on specs?
15:04:14 <gordc> (can someone approve my rolling upgrade doc spec)
15:04:29 <cdent> gordc: link?
15:04:41 <gordc> https://review.openstack.org/#/c/242216/
15:05:01 <_nadya_> we need more eyes here https://review.openstack.org/#/c/267078/
15:05:11 <_nadya_> and here https://review.openstack.org/#/c/209799/
15:05:29 <jd__> i'll check them
15:05:31 <gordc> _nadya_: what spec is first one tied to?
15:05:45 <_nadya_> to liusheng's https://review.openstack.org/#/c/209799/
15:06:08 <gordc> oh... i think liusheng was working on that...
15:06:24 <_nadya_> gordc: we sync'ed with him, don't worry :)
15:06:36 <gordc> kk
15:07:04 <liusheng> _nadya_: after the composite alarm feature :)
15:07:25 <_nadya_> liusheng: cool
15:07:51 <gordc> ok. i'll just ask if we can put a bit more focus on specs for now
15:07:53 <liusheng> _nadya_: thanks for your Poc
15:08:29 <gordc> although i think it's really only the polling cache that might require more eyes
15:08:57 <_nadya_> gordc: what do you think about transformers?
15:09:12 <gordc> redis caching?
15:09:13 <_nadya_> or discuss it later?
15:10:19 <gordc> _nadya_: ^?
15:10:32 <_nadya_> gordc: please go ahead :) I will wait for open discussion
15:10:38 <gordc> kk
15:10:45 <gordc> #topic aodh topics
15:10:59 <gordc> any items here?
15:11:12 <gordc> aodhclient patches are here: https://review.openstack.org/#/q/status:open+project:openstack/python-aodhclient+branch:master+topic:alarm
15:11:50 <gordc> anyone know why the liberty aodh integration gate is failing?
15:12:14 <jd__> no :(
15:12:29 <gordc> i don't really understand why it passes randomly...
15:12:49 <gordc> i can't tell if it's celiometer, aodh, gnocchi, heat, or nova that's broken :(
15:13:00 <gordc> something to debug i guess
15:13:10 <idegtiarov_> hi!
15:13:48 <gordc> anyone else with aodh topic?
15:14:20 <gordc> kk, if someone does look at liberty aodh gate, please let me know
15:14:32 <gordc> #topic ceilometer topics
15:15:13 <gordc> _nadya_: you might be interseted in this https://bugs.launchpad.net/ceilometer/+bug/1533787
15:15:14 <openstack> Launchpad bug 1533787 in Ceilometer "workload_partitioning inconsistently reports group members" [Critical,In progress] - Assigned to gordon chung (chungg)
15:15:14 <liusheng> gordc: FYI, the main composite patches has been basically completed, except client side
15:15:26 <gordc> liusheng: that's awesome. you have links?
15:15:45 <liusheng> gordc: https://review.openstack.org/#/q/status:open+project:openstack/aodh+branch:master+topic:bp/composite-threshold-rule-alarm
15:15:56 <gordc> liusheng: cool cool
15:15:58 <_nadya_> gordc: yep, I saw this. Probably, I need to test this
15:16:22 <gordc> liusheng: do you have plans to deprecate combination alarms?
15:16:28 <liusheng> gordc: the two patches of sileht is unrelated
15:16:36 <liusheng> gordc: just because dependency
15:16:40 <gordc> are we all agreed that we should deprecate it? or just leave it?
15:17:21 <liusheng> gordc: maybe we should keep it at least a release cycle ?
15:17:52 <gordc> keep it, deprecate in N* and remove in O*?
15:18:04 <gordc> ok.
15:18:11 <jd__> i'll try to review that too
15:18:17 <jd__> thanks liusheng
15:18:25 <_nadya_> gordc: what do you think about my results from here https://docs.google.com/document/d/1Um5RFH7KGtuD9_PlaBQtMD3pdd6fNvMRUa5bSqyfk0M/edit#heading=h.7f8sbf3dzgfh ? I'm talking mostly about how things are now, not about Redis
15:18:31 <gordc> well we can deprecate it now i think.. if not i think i need to add combination support to client
15:19:00 <gordc> _nadya_: i added comment (and some to your bug)
15:19:18 <gordc> i think we need better understanding of why
15:19:24 <liusheng> gordc: I suggest we'd better to notice users before deprecation, in case of we have forgot some use cases
15:19:33 <gordc> liusheng: sure sure
15:19:57 <_nadya_> gordc: there were a lot of "out of order, drop" in logs, yes
15:20:44 <gordc> how large is your queue when you start agents?
15:20:57 <_nadya_> @gordc 0
15:21:23 <gordc> _nadya_: my main concern with redis design is it's sort of an antipattern... 'database as ipc'... that said, a lot of things are antipatterns
15:21:24 <_nadya_> gordc: I'm starting from scratch, create load only by test
15:22:13 <_nadya_> gordc: ok, let me sum verything up and create a thread in ML
15:22:14 <gordc> _nadya_: i think we need to try without the bug. maybe do a log dump for all agents too
15:22:24 <gordc> not just errors
15:22:54 <gordc> it's sort of hard to understand what's happening with a few charts and no understanding what happened in between
15:22:55 <_nadya_> gordc: I'm thinking about returning transformers back to polling agents
15:23:26 <gordc> _nadya_: why's that?
15:24:13 <_nadya_> gordc: transformations will be done at each agent, willn't it?
15:24:40 <gordc> cdent: ^ since you did this work
15:24:57 <cdent> I'm listening
15:25:02 <gordc> _nadya_: currently transformation are done at each agent?
15:25:11 <cdent> but don't understand the problem that reverting would fix?
15:25:56 <_nadya_> cdent: I'm investigating how transformers work in coordination mode for notification agents
15:26:21 <_nadya_> cdent: and now I have results that we lost a lot of delta metrics under high load
15:26:35 <jd__> +1 antipattern
15:26:53 <_nadya_> gordc: on each notification agent, not comute agent
15:26:55 <cdent> jd__: ?
15:26:56 <jd__> you're not solving the right problems from what I read
15:27:12 <jd__> cdent: [16:21:23]  <gordc>	_nadya_: my main concern with redis design is it's sort of an antipattern... 'database as ipc'... that said, a lot of things are antipatterns
15:27:36 <_nadya_> jd__: forget about Redis, let's concentrate on what we have now
15:27:46 <gordc> _nadya_: you are losing deltas because if the agents don't register to same group, they will start fighting for samples and we will not be able to calcluate deltas (all the time)
15:28:17 <gordc> which is why we need to look at why sometimes the agents can't see each other
15:28:19 <jd__> don't compute delta may be a solution?
15:28:24 * jd__ runs
15:28:29 <_nadya_> jd__: +1
15:28:33 * gordc grabs brick
15:28:37 <jd__> :(
15:29:02 <gordc> _nadya_: we need to look at what's happening in middle and not just why you have end results you have
15:29:18 <_nadya_> jd__: tbh, I'd keep only one value-based transformers
15:29:20 <gordc> i'm pretty sure you see same thing as me where the members aren't all there
15:30:27 <_nadya_> gordc:will let you know about results
15:30:53 * cdent is still confused
15:30:54 <gordc> _nadya_: can you drop full logs some where?
15:31:11 <gordc> cdent: about?
15:31:42 <_nadya_> gordc: yep, I'll collect them. from notification agents, rabbit?
15:31:42 <cdent> As far as I can hear it sounds like a thing that never worked so well in the first place is now demonstrated to not work well with the transformers moved, so there's though to move the transformers back
15:31:53 <cdent> yet it sounds to me like that's not addressing the actual problem
15:32:03 <cdent> just removing a variable that may be independent
15:32:04 <gordc> _nadya_: both? i'm thikning the segfault is something else.
15:32:23 <jd__> I like what cdent says even if I don't know if it's true
15:32:46 <cdent> I don't know if it is either, thus the confusion
15:33:02 <_nadya_> cdent: why we still have it? why can't we remove transformers support if it doesn't work?
15:33:17 <gordc> cdent: what's 'the thing that don't work so well' transformers?
15:33:49 <_nadya_> cdent: we have a super-distributed stuff for notification agents to support transformers
15:33:50 <gordc> _nadya_: well i think we need ot figure out why it doesn't work
15:34:05 <cdent> what gordc just said
15:34:15 <_nadya_> lol
15:34:16 <cdent> rather than just killing it, find out why
15:34:38 <cdent> because it _sounds_ like it is masking some problem that we should know about
15:34:53 <gordc> cdent: agree
15:35:26 <_nadya_> ok, I'll continue with investigations then and will ping you both gordc and cdent
15:35:38 <gordc> _nadya_: kk
15:36:16 <gordc> i would say main thing is to make sure the agents are all in group.
15:36:43 <cdent> If we've got evidence that tooz is being confused lately, we need to make sure that's proper before going much further
15:37:33 <_nadya_> not sure that tooz is to blame
15:37:42 <_nadya_> kk, let's move on
15:38:44 <gordc> _nadya_: basically, if the membership is messed up, all the tranformations are messed up. same would apply whether transfomration are done at polling or notificaiton
15:38:55 <gordc> _nadya_: i'll put together some questions so we can debug farther
15:39:08 <_nadya_> gordc: ok, great
15:39:12 <gordc> #topic gnocchi topics
15:39:26 <gordc> jd__: happy times in gnocchi?
15:39:36 <jd__> oh yeah
15:39:48 <jd__> we'd need a few more reviews on current patches
15:40:01 <jd__> but we're in pretty good shape!
15:40:09 <jd__> awesome improvement, good performance increase
15:40:17 <gordc> jd__: 1000%
15:40:26 <jd__> I'll probably do some benchmarks in the next weeks, I'm pretty sure we beat things like InfluxDB now :p
15:40:34 <gordc> performance increase. people like unveriable numnbers
15:40:36 <jd__> I'm adding some request features these days
15:40:41 <jd__> gordc: exactly
15:40:56 <jd__> I'll pop up some random big numbers
15:41:01 <jd__> we also have a good doc update
15:41:06 <jd__> http://gnocchi.xyz/
15:41:07 <gordc> jd__: i think llu  said he just got access to his lab so we can test across a few nodes
15:41:22 <jd__> gordc: awesome
15:41:30 <jd__> let me know if you need a hand
15:41:49 <gordc> we'll ping you if we run into anything... and we can add docs/bugs from that
15:42:45 <jd__> 👍
15:42:47 <gordc> #topic open discussion
15:43:39 <gordc> i should mention, everyone should have gotten or be getting there ATC emails
15:43:43 <gordc> so check your junk
15:43:50 <gordc> ... mail
15:44:03 * cdent reports gordc
15:44:43 <gordc> *shrugs* regular checkups at the doctor is good practice
15:44:54 <jd__> rofl
15:44:56 <jd__> I registered
15:45:13 <jd__> that means I should be there so… behave.
15:45:15 <cdent> me too
15:45:16 <jd__> (or not)
15:45:53 <liamji> hello Gordon, I have updated the bp spec about enabling the LBaaS v2 based on the comments. Could you review it if you have time? https://review.openstack.org/#/c/244139/ thanks
15:45:55 <gordc> this is all logged... so...lol
15:46:09 <gordc> liamji: cool! yes, i remember.
15:46:22 <liamji> gordc: thanks : )
15:46:26 <gordc> everyone take a look at that.
15:46:29 <jd__> it's important that it's logged
15:46:38 <gordc> jd__: get out.
15:46:39 <jd__> I want my descendants to be able to read it
15:47:18 <gordc> regardless, if you don't have atc let me know.
15:47:27 <gordc> liamji: have you started coding sa well?
15:47:32 <gordc> as well*
15:48:50 <gordc> ok anything else?
15:48:55 <gordc> last few seconds
15:49:42 <liamji> gordc: yes, I have started to code
15:49:55 <gordc> liamji: cool cool. sounds good.
15:50:00 <gordc> let's close this meeting
15:50:05 <gordc> thanks everyone
15:50:09 <gordc> #endmeeting