15:00:10 #startmeeting ceilometer 15:00:11 Meeting started Thu Nov 27 15:00:10 2014 UTC and is due to finish in 60 minutes. The chair is eglynn_. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:12 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:15 The meeting name has been set to 'ceilometer' 15:00:39 Happy Thanksgiving y'all :) 15:00:51 who's all here for the ceilo meeting? 15:00:56 o/ 15:01:04 o/ 15:01:07 <_elena_> o/ 15:01:50 o/ 15:01:57 o/ 15:02:00 #topic "Kilo-1 blueprints" 15:02:09 #link https://launchpad.net/ceilometer/+milestone/kilo-1 15:02:39 o/ 15:02:45 o/ 15:02:49 a "blocked" blueprint is one for which the corresponding specs review hasn't yet landed 15:02:50 o/ 15:03:07 but we expect it to do so, and the implementation work is progressing in parallel 15:03:44 does anyone have any other BPs other than the 10 listed in https://launchpad.net/ceilometer/+milestone/kilo-1 that they hope to get for k1? 15:04:20 gordc: e.g. are you thinking about k1 for the notification agent co-ordination stuff? 15:04:49 eglynn_: working on it now... i think i might need sileht's listener-pool patch but i'm not exactly sure yet 15:05:01 figuring stuff as i go along... 15:05:06 gordc: cool enough, thanks! 15:05:20 kilo-1 is Dec 18 BTW 15:05:43 i'm going to say maybe 15:06:24 gordc: coolness, we can add it to kilo-2 say for now and then bump forward to kilo-1 if all goes well? 15:07:29 sure. 15:07:35 cool 15:07:39 sorry. in between meetings. :| 15:07:46 np! 15:07:48 anything else on kilo-1? 15:08:08 nothing for me. i have elasticsearch patch but i don't know how to test it currently. 15:08:44 gordc: elastisearch also possibly in play for k1, or more likely k2/3? 15:09:13 probably same boat as notification... maybe k-1 probably k-2 15:09:19 cool 15:09:29 #topic first stable/juno release 15:09:35 #link https://wiki.openstack.org/wiki/StableBranchRelease#Planned_stable.2Fjuno_releases 15:09:50 this was planned for today, but pushed back to Dec 4th 15:09:52 o/ sorry for the delay, folks :) 15:10:05 I've started doing backport for likely looking bugs 15:10:10 *backports 15:10:48 but if you've a particualr bugfix you've landed for k1 that you think is a good backport candidate 15:11:04 please tag with "juno-backport-potential" 15:11:12 eglynn_, ok, thanks 15:11:40 #topic "TSDaaS/gnocchi status" 15:11:57 jd__: anything you want to bring to the table on gnocchi? 15:12:16 #link https://review.openstack.org/#/q/status:open+project:stackforge/gnocchi,n,z 15:12:31 progress on refining the RBAC policies 15:12:52 also a nice patch exposing the back_window in the API 15:13:54 eglynn_, after I'll fix the merge agents chain, I'll start working og opentsdb driver 15:14:20 and new stackforge repo with co-processor deletion thing for it :) 15:14:28 I hope to start tomorrow 15:14:31 DinaBelova: cool :) 15:14:41 DinaBelova: BTW I heard yahoo were evaluating opentsdb for their own internal metrics storage (... just a datapoint showing wide adoption) 15:14:59 hehe, nice :) 15:15:07 DinaBelova: co-processor deletion == retention logic ? 15:15:13 eglynn_, yes 15:15:16 cool 15:15:19 as said, it'll be some java code 15:15:23 but in the separated repo 15:15:27 not to mess everyone 15:16:02 <_nadya_> DinaBelova: what about testing? any ideas :)? 15:16:09 _nadya_, well :) 15:16:30 DinaBelova: could this be something that might be contrib'd back to the OpenTSDB folks? 15:16:38 what about landing the dispatcher :p ? 15:16:39 DinaBelova: ... as opposed to maintaining a separate repo 15:16:44 _nadya_, currently I wonder how to make integraiton tetsts possible 15:16:53 if it's possible at all for out OpenStack gate 15:16:58 o/ 15:17:09 yeah I think you have a good summary eglynn_ 15:17:12 thanks 15:17:17 jd__: np :) 15:17:20 eglynn_, probably that'll be the solution :) 15:17:21 <_nadya_> DinaBelova: TBH, I think it's impossible...We need at least HBase in devstack 15:17:26 sileht: good point, I was blocked on the glance not setting the user_id on samples 15:17:35 let's see if that'll be possible as a part of their retention param impl 15:17:43 sileht: I need to give that dispatcher patch another run-through 15:17:44 eglynn_, I have removed that check 15:17:46 _nadya_, yeah, I suppose so 15:17:54 although 15:17:59 sileht: coolness, I'd imagine good to fly in that case 15:18:06 sileht: will look again after this meeting 15:18:17 _nadya_, I still hope to see Hbase devstack change being merged 15:18:44 eglynn_, and then we can fix thegnocchi DB schema when we land the first glance resource into gnocchi 15:19:05 sileht: cool, that sounds reasonable 15:19:32 since we have created_by_* now I'm less reluctant to allow null project_id/user_id 15:19:55 jd__: yeah, good point 15:20:58 jd__: but we do have existing metrics with null project_id/user_id, how to handle them? 15:21:16 I think that replies to your question llu-laptop, no? 15:22:12 we have snmp/ipmi and alos metrics from SDN like opendaylight & opencontrail 15:22:33 jd__, Should I accept null project_id too in the dispatcher ? 15:22:48 DinaBelova: Also co-processor may be useful approach for Ceilometer HBase storage time-to-live. 15:22:50 sileht: if we change the schema accordingly yeah 15:22:58 ityaptin, yes, fpr sure 15:23:43 so the idea would be that the created_by_* gives a fallback on which to evaluate RBAC rules if the primary project *or* user ID is null, amiright? 15:25:00 jd__: I'm resurrecting the Influx driver, was planning to start with a ceilo-specs proposal 15:25:07 jd__: ... which brings up the question of whether the specs "process" applies in gnocchi-land? 15:25:31 eglynn_: maybe, but for now I'd say, don't lose time with that? 15:25:33 eglynn_, I've got the same question day ago from nellysmitt 15:25:34 :) 15:25:45 sorry, maybe i'm out of context here. I think we are talking about the dispatcher to handle null project/uer_id, like https://review.openstack.org/#/c/98798/66/gnocchi/ceilometer/dispatcher.py#Ln94 15:25:50 like, what interesting could we discuss on an Influx spec? 15:25:53 she was wondering where to start her relationship with gnocchi 15:26:04 bugs/blueprints 15:26:09 some one place to work in 15:26:29 so in future probably it'll be useful 15:26:50 jd__: fair point :) ... I was thinking of using a spec to get all my ideas straight on mapping gnocchi semantics to influx features 15:26:51 I've always have a todo list 15:27:06 DinaBelova: just send her on IRC and talk to me? :O 15:27:08 :) 15:27:19 jd__, it was late :) 15:27:28 but anyway, I said her it's A GOOD IDEA :) 15:27:30 eglynn_: well if you think you need to write something, yeah why not, I guess it depends on how you work and your degree of confidence 15:27:35 sorry for caps 15:27:55 eglynn_: I just don't thinks it's a requirement for now… I prefer to see people spending time writing code rather than specs :) 15:28:05 DinaBelova: ack :) 15:28:06 jd__: yeah, understood 15:28:46 llu-laptop: yes, sileht said he has, or is going to, relax that requirement to have (s['project_id'] and s['user_id']) 15:29:54 llu-laptop: see https://review.openstack.org/#/c/98798/66..67/gnocchi/ceilometer/dispatcher.py 15:30:12 llu-laptop: now only the project_id must be set 15:31:01 move on? 15:31:26 #topic "Too many identical nova notifications" 15:31:31 #link https://bugs.launchpad.net/ceilometer/+bug/1396257 15:31:33 Launchpad bug 1396257 in ceilometer "Redundant 'instance' samples from nova notifications" [Medium,Confirmed] 15:31:49 Hi! 15:31:54 At last 2 weeks many people complained about the fact that the "instance" statistics shows unexpected patterns. 15:32:00 Now we collect all nova notification with event_type "compute.instance.*" as "instance" sample. This behavior generate too unexpected samples which terrifyingly affect 'instance' statistics. 15:32:01 so is the key point that we're now triggering of *both* the start & end events? 15:32:11 During next workflow: create instance, suspend, add floating ip, terminate instance, nova send 28 notifications. 15:32:15 We collect all! 15:32:27 There is 16 compute.instance.update messages with minor tasks which are not important for metering, how i think. 15:32:51 I prepared paste with some data from these messages 15:32:52 jd__: do you remember the motivation for including both start and end events here? https://review.openstack.org/#/c/38485/7/ceilometer/compute/notifications.py 15:33:04 http://paste.openstack.org/show/139591/ 15:33:10 I transformed message dictionary to set of tuples (key, value) and printed difference between current message and union of all previous at every step. Of course, I didn't print unique unnessesary fields like timestamp, message_id and payload.audit_period_ending. 15:33:43 eglynn_: more data points 15:34:18 jd__: but not any extra value, if start&end always come together? 15:34:35 jd__: was the idea to allow "bracketing" as stacktach does? 15:34:47 eglynn_, problem is not only in start-end 15:34:47 eglynn_: they are close but not together 15:34:56 but about update hotifications 15:34:59 jd__: e.g. to bound the time taken to spin up the instance 15:35:07 that are almost identical in fact 15:35:09 eglynn_: the idea is to have as many data point as we can, not trying to do anything fancy 15:35:23 regardless if you specify .end or .start, we'll get the notifications in events. 15:35:24 eglynn_: that'd be more a job for the event-API part I'd say 15:35:58 events. takes every single notifcation we get... from the info priority. 15:36:10 DinaBelova: yeah, but we changed the policy for start|end (previously we just triggered compute.instance.create.start and compute.instance.delete.end) 15:36:23 sorry the opposite! 15:36:27 eglynn_, a-ha, I did not know this 15:36:33 compute.instance.create.end and compute.instance.delete.start 15:36:57 so the idea was not to bill for time when the instance is being deleted, or before it's fully usable 15:37:01 yeah and then people started to think the duration of sampling was the duration of the instance uptime 15:37:11 I changed that so it confuses more people and they stop doing that 15:37:17 eglynn_, I am basically not agains start/end for creation and deletion, or any other prcess like htis 15:37:26 😈 15:37:39 I'm kind of crazy with updates of different tasks from nova 15:37:56 and they are almost meaningless... 15:38:09 ityaptin: so the other question is the tightly bunched compute.instance.update notifications 15:38:12 because if creation/deletion was ok, we'll see *.end 15:38:50 DinaBelova: I dunno if it's possible to distinguish between a single isolated update event, and a tight sequence of them that's part of the same workflow 15:38:50 I think we don't need update notifications 15:39:05 maybe add a filter to ignore samples with same key value within short time?... too hacky? 15:39:10 It's minor updates which are not important for us 15:39:11 ityaptin: what about the changes in resource metadata 15:39:30 gordc: yeah, sounds a bit fragile 15:39:34 gordc, kind of it... we have no guarantees for some timeout wo squash all of them 15:39:42 gordc: sounds risky 15:40:00 sounds so risky it might work? :) j/k 15:40:44 it's either that or maybe services are flooding 'info' priority as we do with logging 15:40:59 what about one notification hits one agent, and another lands in another agent? 15:41:00 gordc, lol 15:41:18 eglynn: we collect metadata from instance.create.*, instance.terminate.* and smth else. 15:41:19 in that case, the timeout doesn't work too 15:41:35 llu-laptop: my coordination will fix that... assuming it works.lol 15:41:53 gordc: this discussion feeds into your bug about completely dropping "existence" samples 15:42:14 ityaptin: I'm thinking about "isolated" compute.instance.update notifications 15:42:15 eglynn_: yeah i was going to bring that up too... 15:42:38 ityaptin: (i.e. after the instance is spun up, when some attribute is updated subsequently) 15:42:43 eglynn_: i assume these volume=1 metrics make no sense in gnocchi? 15:42:48 gordc: agreed, that seems the right path 15:42:54 eglynn: yep, I think about it too. 15:43:07 llu-laptop: dropping 'existense' meters? 15:43:19 gordc: yes 15:43:25 gordc: good point, certainly pre-aggregating a sequence of 1s is kinda wasted effort 15:43:34 jd__: ^^^ amiright? 15:43:37 :) cool cool. 15:44:02 * jd__ reads backlog 15:44:05 yeah, i think we just need a migration plan... and have ability to alarm in events (not sure that's critical) 15:44:29 gordc: but that would cause the user to use a different set API to retrieve those events instead of metering? 15:44:39 volume=1 metrics make no sense in gnocchi 15:45:02 in Ceilometer it's just used to track existence of a resource 15:45:09 that's what the Gnocchi indexer of resources is for 15:45:14 llu-laptop: yeah... or do we want the api to convert events into samples? back to same issue of too many hits then. 15:45:17 and is much more efficient… 15:46:00 jd__: so would we need a deleted_at attribute on the gnocchi generic resource to capture the end of its lifespan? 15:46:23 jd__: ... i.e. how to capture the end of the sequence of 1s for a particular resource 15:46:38 eglynn_: or you can look at events. 15:47:04 eglynn_: there are started_at, ended_at, if that's not enough we can add more fields 15:47:22 or you can look at gordc answer 15:47:24 ;) 15:48:18 eglynn_: i'll switch my bug to a bp and we can discuss deprecating meters on the side. 15:48:29 jd__: a-ha, k ... forgot about those resource fields 15:49:03 jd__: ... do we currently set that in the ceilo dispatcher? 15:49:17 jd__: ... or is the assumption that we'll special case the *.delete.start notification handling? 15:49:44 gordc: cool, sounds like we need more discussion/thought on this one 15:49:44 I think we update them in the dispatcher 15:49:55 but the best thing to do will be to use the events later yeah 15:52:47 so doing it in the dispatcher would require checking the event_type in the sample? 15:54:13 i.e. checking if the sample was generated from a resource deletion notification 15:55:02 sounds a bit unreliable if we don't get those notifications for all resources 15:55:06 ... or if the deletion events don't follow a predictable naming pattern 15:55:22 anyhoo, not going to solve it here 15:55:35 that's a good reason to have schema :) 15:55:49 yeah 15:56:07 #topic "Reminder on dates/location for mid-cycle" 15:56:07 changing the routing pattern would also be a good optimization 15:56:14 like we could subscribe on only some notifications 15:57:09 yeah, so currently we see them all, but discard ones that don't match a handlers declared event_types? 15:58:12 running out of time 15:58:18 just a reminder to update https://etherpad.openstack.org/p/CeilometerKiloMidCycle if you plan to attend 15:58:35 6 names up so far 15:58:40 we'll make a call next week on whether we've quorum 15:59:11 #topic "Open discussion" 15:59:30 a minute left if anyone was anything to raise? 15:59:36 *has anything 16:00:05 nope for me 16:00:11 k, let's call it a wrap ... thanks for your time! :) 16:00:23 #endmeeting ceilometer