15:01:04 #startmeeting telemetry 15:01:05 Meeting started Thu Jan 14 15:01:04 2016 UTC and is due to finish in 60 minutes. The chair is gordc. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:06 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:08 The meeting name has been set to 'telemetry' 15:01:13 o/ 15:01:18 o/ 15:01:50 <_nadya_> o/ 15:01:50 o/ 15:02:37 o/ 15:03:07 #topic recurring: roadmap items (new/old/blockers) https://wiki.openstack.org/wiki/Telemetry/RoadMap 15:03:08 o/ 15:03:28 i sent an email regarding the items i'm tracking (that aren't complete) 15:03:46 if there are any extras please let me know (and maybe add to etherpad 15:04:01 do we need any reviews on specs? 15:04:14 (can someone approve my rolling upgrade doc spec) 15:04:29 gordc: link? 15:04:41 https://review.openstack.org/#/c/242216/ 15:05:01 <_nadya_> we need more eyes here https://review.openstack.org/#/c/267078/ 15:05:11 <_nadya_> and here https://review.openstack.org/#/c/209799/ 15:05:29 i'll check them 15:05:31 _nadya_: what spec is first one tied to? 15:05:45 <_nadya_> to liusheng's https://review.openstack.org/#/c/209799/ 15:06:08 oh... i think liusheng was working on that... 15:06:24 <_nadya_> gordc: we sync'ed with him, don't worry :) 15:06:36 kk 15:07:04 _nadya_: after the composite alarm feature :) 15:07:25 <_nadya_> liusheng: cool 15:07:51 ok. i'll just ask if we can put a bit more focus on specs for now 15:07:53 _nadya_: thanks for your Poc 15:08:29 although i think it's really only the polling cache that might require more eyes 15:08:57 <_nadya_> gordc: what do you think about transformers? 15:09:12 redis caching? 15:09:13 <_nadya_> or discuss it later? 15:10:19 _nadya_: ^? 15:10:32 <_nadya_> gordc: please go ahead :) I will wait for open discussion 15:10:38 kk 15:10:45 #topic aodh topics 15:10:59 any items here? 15:11:12 aodhclient patches are here: https://review.openstack.org/#/q/status:open+project:openstack/python-aodhclient+branch:master+topic:alarm 15:11:50 anyone know why the liberty aodh integration gate is failing? 15:12:14 no :( 15:12:29 i don't really understand why it passes randomly... 15:12:49 i can't tell if it's celiometer, aodh, gnocchi, heat, or nova that's broken :( 15:13:00 something to debug i guess 15:13:10 hi! 15:13:48 anyone else with aodh topic? 15:14:20 kk, if someone does look at liberty aodh gate, please let me know 15:14:32 #topic ceilometer topics 15:15:13 _nadya_: you might be interseted in this https://bugs.launchpad.net/ceilometer/+bug/1533787 15:15:14 Launchpad bug 1533787 in Ceilometer "workload_partitioning inconsistently reports group members" [Critical,In progress] - Assigned to gordon chung (chungg) 15:15:14 gordc: FYI, the main composite patches has been basically completed, except client side 15:15:26 liusheng: that's awesome. you have links? 15:15:45 gordc: https://review.openstack.org/#/q/status:open+project:openstack/aodh+branch:master+topic:bp/composite-threshold-rule-alarm 15:15:56 liusheng: cool cool 15:15:58 <_nadya_> gordc: yep, I saw this. Probably, I need to test this 15:16:22 liusheng: do you have plans to deprecate combination alarms? 15:16:28 gordc: the two patches of sileht is unrelated 15:16:36 gordc: just because dependency 15:16:40 are we all agreed that we should deprecate it? or just leave it? 15:17:21 gordc: maybe we should keep it at least a release cycle ? 15:17:52 keep it, deprecate in N* and remove in O*? 15:18:04 ok. 15:18:11 i'll try to review that too 15:18:17 thanks liusheng 15:18:25 <_nadya_> gordc: what do you think about my results from here https://docs.google.com/document/d/1Um5RFH7KGtuD9_PlaBQtMD3pdd6fNvMRUa5bSqyfk0M/edit#heading=h.7f8sbf3dzgfh ? I'm talking mostly about how things are now, not about Redis 15:18:31 well we can deprecate it now i think.. if not i think i need to add combination support to client 15:19:00 _nadya_: i added comment (and some to your bug) 15:19:18 i think we need better understanding of why 15:19:24 gordc: I suggest we'd better to notice users before deprecation, in case of we have forgot some use cases 15:19:33 liusheng: sure sure 15:19:57 <_nadya_> gordc: there were a lot of "out of order, drop" in logs, yes 15:20:44 how large is your queue when you start agents? 15:20:57 <_nadya_> @gordc 0 15:21:23 _nadya_: my main concern with redis design is it's sort of an antipattern... 'database as ipc'... that said, a lot of things are antipatterns 15:21:24 <_nadya_> gordc: I'm starting from scratch, create load only by test 15:22:13 <_nadya_> gordc: ok, let me sum verything up and create a thread in ML 15:22:14 _nadya_: i think we need to try without the bug. maybe do a log dump for all agents too 15:22:24 not just errors 15:22:54 it's sort of hard to understand what's happening with a few charts and no understanding what happened in between 15:22:55 <_nadya_> gordc: I'm thinking about returning transformers back to polling agents 15:23:26 _nadya_: why's that? 15:24:13 <_nadya_> gordc: transformations will be done at each agent, willn't it? 15:24:40 cdent: ^ since you did this work 15:24:57 I'm listening 15:25:02 _nadya_: currently transformation are done at each agent? 15:25:11 but don't understand the problem that reverting would fix? 15:25:56 <_nadya_> cdent: I'm investigating how transformers work in coordination mode for notification agents 15:26:21 <_nadya_> cdent: and now I have results that we lost a lot of delta metrics under high load 15:26:35 +1 antipattern 15:26:53 <_nadya_> gordc: on each notification agent, not comute agent 15:26:55 jd__: ? 15:26:56 you're not solving the right problems from what I read 15:27:12 cdent: [16:21:23] _nadya_: my main concern with redis design is it's sort of an antipattern... 'database as ipc'... that said, a lot of things are antipatterns 15:27:36 <_nadya_> jd__: forget about Redis, let's concentrate on what we have now 15:27:46 _nadya_: you are losing deltas because if the agents don't register to same group, they will start fighting for samples and we will not be able to calcluate deltas (all the time) 15:28:17 which is why we need to look at why sometimes the agents can't see each other 15:28:19 don't compute delta may be a solution? 15:28:24 * jd__ runs 15:28:29 <_nadya_> jd__: +1 15:28:33 * gordc grabs brick 15:28:37 :( 15:29:02 _nadya_: we need to look at what's happening in middle and not just why you have end results you have 15:29:18 <_nadya_> jd__: tbh, I'd keep only one value-based transformers 15:29:20 i'm pretty sure you see same thing as me where the members aren't all there 15:30:27 <_nadya_> gordc:will let you know about results 15:30:53 * cdent is still confused 15:30:54 _nadya_: can you drop full logs some where? 15:31:11 cdent: about? 15:31:42 <_nadya_> gordc: yep, I'll collect them. from notification agents, rabbit? 15:31:42 As far as I can hear it sounds like a thing that never worked so well in the first place is now demonstrated to not work well with the transformers moved, so there's though to move the transformers back 15:31:53 yet it sounds to me like that's not addressing the actual problem 15:32:03 just removing a variable that may be independent 15:32:04 _nadya_: both? i'm thikning the segfault is something else. 15:32:23 I like what cdent says even if I don't know if it's true 15:32:46 I don't know if it is either, thus the confusion 15:33:02 <_nadya_> cdent: why we still have it? why can't we remove transformers support if it doesn't work? 15:33:17 cdent: what's 'the thing that don't work so well' transformers? 15:33:49 <_nadya_> cdent: we have a super-distributed stuff for notification agents to support transformers 15:33:50 _nadya_: well i think we need ot figure out why it doesn't work 15:34:05 what gordc just said 15:34:15 <_nadya_> lol 15:34:16 rather than just killing it, find out why 15:34:38 because it _sounds_ like it is masking some problem that we should know about 15:34:53 cdent: agree 15:35:26 <_nadya_> ok, I'll continue with investigations then and will ping you both gordc and cdent 15:35:38 _nadya_: kk 15:36:16 i would say main thing is to make sure the agents are all in group. 15:36:43 If we've got evidence that tooz is being confused lately, we need to make sure that's proper before going much further 15:37:33 <_nadya_> not sure that tooz is to blame 15:37:42 <_nadya_> kk, let's move on 15:38:44 _nadya_: basically, if the membership is messed up, all the tranformations are messed up. same would apply whether transfomration are done at polling or notificaiton 15:38:55 _nadya_: i'll put together some questions so we can debug farther 15:39:08 <_nadya_> gordc: ok, great 15:39:12 #topic gnocchi topics 15:39:26 jd__: happy times in gnocchi? 15:39:36 oh yeah 15:39:48 we'd need a few more reviews on current patches 15:40:01 but we're in pretty good shape! 15:40:09 awesome improvement, good performance increase 15:40:17 jd__: 1000% 15:40:26 I'll probably do some benchmarks in the next weeks, I'm pretty sure we beat things like InfluxDB now :p 15:40:34 performance increase. people like unveriable numnbers 15:40:36 I'm adding some request features these days 15:40:41 gordc: exactly 15:40:56 I'll pop up some random big numbers 15:41:01 we also have a good doc update 15:41:06 http://gnocchi.xyz/ 15:41:07 jd__: i think llu said he just got access to his lab so we can test across a few nodes 15:41:22 gordc: awesome 15:41:30 let me know if you need a hand 15:41:49 we'll ping you if we run into anything... and we can add docs/bugs from that 15:42:45 👍 15:42:47 #topic open discussion 15:43:39 i should mention, everyone should have gotten or be getting there ATC emails 15:43:43 so check your junk 15:43:50 ... mail 15:44:03 * cdent reports gordc 15:44:43 *shrugs* regular checkups at the doctor is good practice 15:44:54 rofl 15:44:56 I registered 15:45:13 that means I should be there so… behave. 15:45:15 me too 15:45:16 (or not) 15:45:53 hello Gordon, I have updated the bp spec about enabling the LBaaS v2 based on the comments. Could you review it if you have time? https://review.openstack.org/#/c/244139/ thanks 15:45:55 this is all logged... so...lol 15:46:09 liamji: cool! yes, i remember. 15:46:22 gordc: thanks : ) 15:46:26 everyone take a look at that. 15:46:29 it's important that it's logged 15:46:38 jd__: get out. 15:46:39 I want my descendants to be able to read it 15:47:18 regardless, if you don't have atc let me know. 15:47:27 liamji: have you started coding sa well? 15:47:32 as well* 15:48:50 ok anything else? 15:48:55 last few seconds 15:49:42 gordc: yes, I have started to code 15:49:55 liamji: cool cool. sounds good. 15:50:00 let's close this meeting 15:50:05 thanks everyone 15:50:09 #endmeeting