15:00:32 #startmeeting telemetry 15:00:33 Meeting started Thu Feb 25 15:00:32 2016 UTC and is due to finish in 60 minutes. The chair is gordc. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:34 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:36 The meeting name has been set to 'telemetry' 15:00:38 o/ 15:00:51 o/ 15:01:03 o/ 15:01:06 o/ 15:01:10 o/ 15:01:50 <_nadya_> o/ 15:02:09 o/ 15:02:44 ok let's start, i think some people are on PTO 15:02:57 #topic recurring: roadmap items (new/old/blockers) https://wiki.openstack.org/wiki/Telemetry/RoadMap 15:03:12 we're basically up for time on features for Mitaka 15:03:14 o/ 15:03:23 the items we were tracking last week seem to be ok 15:03:56 i'll run through each in the subtopics 15:04:03 but any pressing concerns? 15:04:29 m-3 is next week so basically all features now will need to be very very small 15:05:18 cool. let's move to the projects 15:05:29 #topic aodh topics 15:05:44 right now we're tracking composite alarms for Mitaka 15:05:48 main patch is in 15:05:57 we just need approval on api and client 15:06:01 gordc: thanks 15:06:20 https://review.openstack.org/#/c/257722/ 15:06:44 https://review.openstack.org/#/c/284022/ 15:07:02 if we can get reviews and get that merged that'd be great 15:07:35 * gordc nudges sileht 15:08:03 has anyone looked at switchig ceilometerclient to aodhclient in heat? 15:08:03 it seems the jenkins has been broken :( 15:08:23 liusheng: yeah, it's being fixed 15:08:31 liusheng, gordc I will check that one last time 15:08:37 we have some time next week to merge as well. 15:09:00 gord, sileht, cool, tanks! 15:09:07 i'll try taking a look at porting ceilometerclient to aodhclient in heat... i'm guessing it won't make it for Mitaka though 15:09:27 liusheng: the problem we found in the our working day is fixed. Now it is the second one :) 15:09:35 gordc: can we get an exception for it? 15:09:47 ildikov: in heat? 15:09:57 gordc: yes 15:10:17 ildikov: i'll give them a ping and see. 15:10:25 i'm hoping it's an easy swap 15:10:45 gordc: me too, this is why I asked 15:11:18 #action see if heat will allow aodhclient FFE 15:11:42 aside from that, i think Aodh is what it is? 15:11:49 r-mibu: did you have success with tempest? 15:12:10 i believe pradk mentioned whatever we have now is conflicting with what exists in tempest repo 15:12:14 you mean running test with plugins? 15:12:23 r-mibu: correct 15:12:38 right, so I'll fix the id of test that may fix the bug 15:12:51 but didn't check yet 15:13:04 will do by this week :) 15:13:05 r-mibu: do you have time next 2 weeks? 15:13:08 ok 15:13:21 yep, other stuffs done :) 15:13:22 i'm going to make tempest stuff FFE since it's only tests. 15:13:31 anyone have concerns? 15:13:31 ok 15:14:01 docs... 15:14:19 as you pointed in review 15:14:37 r-mibu: ok. let's try to get it working as soon as possible. but i'm ok with cutting m-3 and merging tempest stuff in an rc build unless someone has an issue 15:14:55 r-mibu: docs for tempest tests? 15:15:01 I will run a config guide update for Aodh 15:15:04 docs for aodh 15:15:20 but, yes, that's not big problem for m-3 15:15:24 ildikov: cool cool. thanks for tracking that 15:15:29 I think we will need to look into the Alarming section of the Admin Guide in OS Manuals 15:15:57 ildikov: defnitely. we can still make changes to docs after m-3 correct? 15:15:58 I'm not sure how much that part is outdated, so prolly pretty much :) 15:16:19 gordc: sure, we have some time during the stabilization period 15:16:26 ildikov: awesome 15:16:50 we need to do docs for all of aodh, our dev docs are non-existent too 15:16:59 gordc: of course sooner the better, but still we will have a better picture right after m-3 regarding what made it and what to document 15:17:07 ildikov: sounds good 15:17:16 anything else for aodh? 15:17:31 test :( 15:18:01 anyone see me? can't read any messages :( 15:18:32 llu-laptop: I can see you :) 15:18:33 llu-laptop: i can see your message 15:18:33 llu-laptop: I see your messages 15:18:40 I can see your messages llu-laptop 15:18:57 and he's gone.lol 15:19:05 let's move on for now 15:19:14 #topic ceilometer topics 15:19:27 we have two items here to get merged 15:19:56 ityaptin's patch for minimising nova-api load: https://review.openstack.org/#/c/284322/ 15:20:14 i think that needs a docimpact since we added a new optoin 15:20:48 and liamji's patch for neutron v2: https://review.openstack.org/#/c/277434/ 15:21:24 r-mibu: same FFE for tempest in ceilometer 15:21:48 got it 15:21:57 gordc neutron v2 is failing due to gate issues? 15:22:17 we all happy? let's hold off on any other features 15:22:27 gordc, what about event transformers FFE? 15:22:29 neelashah: yeah, we need to fix a gnocchclient issue first and we should be ok 15:22:41 gordc - ok, thanks 15:22:59 idegtiarov: i have some concerns but we can talk about that now 15:23:22 #topic event bracketer transformer 15:23:26 #link https://review.openstack.org/#/c/266488/ 15:23:30 great what is your main concern? 15:23:50 i don't understand why latency is an event. 15:24:04 and why it needs to be calculated inline/stream 15:24:38 also, the code seems be very inflexible. 15:24:40 it is event that could be published to - - tosamplenotifier:// and will be stored as sample 15:25:45 idegtiarov: and it has real-time requirement because? 15:26:50 gordc, if you need to get alarm based for example on latency_time event/sample it is 15:26:59 <_nadya_> gordc: quick question about alarms transformer 15:26:59 <_nadya_> gordc: events, sorry 15:26:59 <_nadya_> #link https://review.openstack.org/#/c/266488/5 perhaps we can start with instances only in Mitaka? 15:27:40 idegtiarov: but the alarm scenario is handled by the timeout mechanism in aodh no? 15:28:53 gordc, as an example we could be interested why instances booting longer then 10 minutes and create alarm for that case 15:29:57 the main idea is have tool for event transformation and store it as event/sample 15:30:05 <_nadya_> can we convert only to samples? I agree that "latency" is mostly about sample 15:30:17 shouldn't timeouts be done by event alarm? 15:30:42 we will can when https://review.openstack.org/#/c/227106/ will be merged 15:31:17 <_nadya_> it looks it's ready to be 15:31:21 i agree gordc - alarming logics can be put in aodh rather than ceilometer itself 15:32:04 if creation time of instance can be meter, i'm ok 15:32:39 _nadya_: yeah. i think it's definitely a measurement. i'm wondering how many measurements it is though. 15:32:44 r-mibu, it is not alarming logic it is logic of event transformation that could be used for statistics of booting instances or alarming based on new samples/event 15:32:55 we will really only have one latency measurement per resource (you can only ever create once) 15:33:25 is this better as a query feature in events api 15:34:29 usually that's how most BI tools work. you calculate the data from specific log records 15:34:45 much better as for me because it is rather expensive to index events traits end we already have event_type indexing in mongodb so api requests for new events will be pretty fast 15:35:17 <_nadya_> gordc: it is only one for now, right. But we can have latency for many different resources: instances, volumes and so on. In M* we may start with instances only 15:36:38 is the 'resource' == host? because it doesn't matter how many different resources you'll have, it still just one entry for each instance/volume/etc... no? 15:37:39 for me, i think this functionality is better as post-storage work, i don't really see the real-time requirement of it. that's my main point 15:38:00 gordc: I tend to agree with you on this point 15:38:46 adding new logic might affect event processing and having date in workers make difficult in HA/multi-worker 15:38:52 <_nadya_> It looks so great to have alarm: "look, your instances start to boot more then 10 minutes" 15:39:00 it is not only about booting time, but for example instances update 15:39:06 so just referencing stacktach and how they were planning to implemented alarms, i believe they also do these calculations in post 15:39:59 r-mibu: yeah, it definitely complicates stuff having a global cache shared across workers/systems... 15:40:15 although notification agent is/should be smart enough to redirect to common queues. 15:40:57 <_nadya_> dunno, we already have "online" mechanism for transformers 15:41:44 _nadya_: the instance booting time issues is more an alert in definition, also I would assume it gets interesting when it happens with all of them not just one 15:41:45 _nadya_: so the alarm comment i believe we want to have it handled by Aodh 15:41:54 you are defining the rule there already. 15:42:10 IIUC, if we emit measurements on events latency, these measurements sparse in timeline, if we have alarms on these measurements, the state of alarms maybe always "insufficient data" 15:42:33 ildikov, not only alerts but could be used for statistics of booting vms 15:42:56 idegtiarov: that can be a post operation/query as well 15:43:04 ildikov: + 15:43:10 not now 15:43:22 even not with ceilo api 15:43:24 idegtiarov: but it could be, if someone worked on it ;) 15:43:35 gordc: +1 :) 15:44:04 idegtiarov: i understand your use case, but it sometimes won't work since we cannot make sure that ceil receive set of start and end message 15:44:08 <_nadya_> gordc: store events in time series storage and have post processing? in Gnocchi? 15:44:22 idegtiarov: i had this topic for making events more useful for BI at last summit? i just didn't do anything, so it's kinda my fault (but i won't admit it) 15:45:59 r-mibu: right, the potential latency in MQ may cause weird results from real-time pov 15:46:08 <_nadya_> r-mibu: I hear this very often. But actually, doesn't it mean that we cannot provide a reliable billing? We lose notifications about instances, sorry 15:46:18 r-mibu when we do not receives end event it sould be alarm otherwise we will have and could used if we need such data 15:46:47 _nadya_: i'm not sure if we need gnocchi specifically. gnocchi i think is continuous measurements over time 15:47:09 latency seems to be measurements in set time 15:47:48 i assume for billing purpose operator will check db record as well, otherwise I will boot many instances on that system :) 15:47:49 <_nadya_> gordc: so...what storage will be used for events? sql? 15:48:40 for billing we usually have some freedom I think regarding messages and it usually does not have to be real-time 15:49:03 <_nadya_> r-mibu: :) 15:49:32 _nadya_: existing storage: sql/elasticsearch. 15:49:36 idegtiarov: yep, and that can be done in aodh + events storage 15:49:49 <_nadya_> let's move on, I see community point here 15:49:55 actually to do the same operation on not indexing data will be a big issue for big event collection 15:50:01 _nadya_: in theory, this should be doable in elasticsearch. 15:50:35 _nadya_: i also believe stacktach offers some mechanisms to handle related events (i don't know status of all that though) 15:50:40 <_nadya_> gordc: I don't like it is external, not ceilometer-core. But perhaps I need to think more 15:50:56 what's external? 15:51:32 <_nadya_> gordc: that this statistics should be calculated outside ceilometer, in external system 15:51:47 idegtiarov: but we index event_type and all the traits... 15:52:37 we do not index traits 15:53:15 ceilometer gathers data, normalises and transforms. gnocchi does a lot of stuff 'outside ceiloemter' but it's still our project 15:54:45 idegtiarov: https://github.com/openstack/ceilometer/blob/master/ceilometer/storage/sqlalchemy/models.py#L294 15:55:05 i don't understand, it seems index'd. it's a primary key 15:55:39 if not, it should be. 15:55:48 sileht: you have anything for gnocchi? 15:56:16 * sileht is reading backlog 15:56:17 gordc, I mean in mongodb 15:56:32 sileht: no backlog. just asking if we can leave gnocchi topics :) 15:57:00 idegtiarov: we probably should? or not use mongodb :P 15:57:09 :P 15:57:14 gordc, oh I have released gnocchiclient 2.2.0 and I will start working on gnocchi dispatcher for bachting measurements 15:57:30 tbh, it seems like i'm not the only person who has issues so maybe we should punt it for Mitaka 15:57:58 o no :( 15:58:12 idegtiarov, gordc why not having both indexes, the new one and the old one ? 15:58:20 gordc: agreed 15:58:26 idegtiarov: let's move this to main chanell post meeting 15:58:35 #topic gnocchi topics 15:58:35 k 15:59:00 sileht: i had a question, do we want to make the gnocchi dispatcher use new batching support 15:59:15 gordc, why not ? 15:59:19 for mitaka? 15:59:31 i just want to know if we should track it 15:59:41 someone is going to yell soon 15:59:42 I'm a bit lost on where we are on the roadmap 15:59:54 sileht: :) i'll ask in main channel 16:00:05 thakns everyone 16:00:09 #endmeeting