#openstack-meeting log

15:01:04 <jd__> #startmeeting ceilometer
15:01:04 <openstack> Meeting started Thu Jan  9 15:01:04 2014 UTC and is due to finish in 60 minutes.  The chair is jd__. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:05 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:08 <openstack> The meeting name has been set to 'ceilometer'
15:01:10 <thomasem> o/
15:01:13 <terriyu> o/
15:01:16 <herndon_> o/
15:01:19 <eglynn> o/
15:01:20 <nprivalova> o/
15:01:24 <DanD> o/
15:01:26 <llu-laptop> o/
15:01:26 <ildikov> o/
15:01:29 <lsmola_> o/
15:01:43 <ildikov> Happy New Year! :)
15:01:48 <jd__> happy new year folks
15:01:48 <dhellmann> o/
15:01:53 <rhochmuth> o/
15:01:58 <lsmola_> happy new year
15:02:01 <thomasem> Happy new year! :)
15:02:25 <terriyu> Happy New Year to all
15:02:38 <jd__> #topic Tempest integration (nprivalova)
15:02:42 <jd__> nprivalova: floors is yours
15:02:44 <jd__> -s
15:02:52 <nprivalova> hi all!
15:03:55 <nprivalova> First of all I'd like to draw you attention to our tempest tests :)
15:04:15 <nprivalova> We have the big problem with code duplication
15:04:48 <eglynn> duplication within the tempest code tree?
15:05:02 <nprivalova> now I guess we've managed with the situation but the question is how to track the blueprints
15:05:22 <nprivalova> eglynn, yes. We had several variants of client, e.g.
15:06:00 <nprivalova> from mailing list I got only one suggestion: to create a spreadsheet with changes
15:06:26 <nprivalova> but I think it is not much better that just having an etherpad
15:06:28 <eglynn> nprivalova: yeah, the requirement to have a tempest-specific client is awkward
15:06:31 <eglynn> but I guess there's a good reason behind it
15:06:44 <sileht> o/
15:06:47 <eglynn> (to avoid bugs in the API being masked by compensating bugs in the client library)
15:07:03 <eglynn> (... at least that was my understanding of the motivation)
15:08:02 <nprivalova> yes, I'm not against client in Tempest. But we had several change requests with it's implementation in Tempest
15:08:14 <jd__> eglynn: I understand that as well
15:08:26 <eglynn> where would this spreadsheet live?
15:08:29 <eglynn> or github?
15:08:34 <eglynn> s/or/on/
15:08:45 <dhellmann> wiki page?
15:09:05 <nprivalova> it's my question :) Nova guys use the following https://docs.google.com/spreadsheet/ccc?key=0AmYuZ6T4IJETdEVNTWlYVUVOWURmOERSZ0VGc1BBQWc&usp=drive_web#gid=0
15:09:29 <dhellmann> wow, that's a long list
15:10:00 <nprivalova> yes, looks not very good
15:10:17 <nprivalova> now we have just a https://etherpad.openstack.org/p/ceilometer-test-plan
15:10:23 <thomasem> D=
15:11:02 <jd__> whatever works
15:11:24 <jd__> pick it and other will follow :)
15:11:26 <nprivalova> but my question: how to make people look at it? People just come to tempest and start to create tests. But these tests are already being implemented by others
15:11:49 <jd__> nprivalova: there's no magic bullet, just yell at them, -1 and point to the original patch? :)
15:12:03 <jd__> you can't force people to look for something unfortunately
15:12:26 <dhellmann> an email to the mailing list announcing it would help, but I agree with jd__ that you should -1 patches that duplicate effort
15:12:37 <eglynn> hmmm, I seem to remember the tempest folks at one point suggesting the use of an initially empty WIP patch as a mutex
15:12:46 <eglynn> (to avoid duplication of effort)
15:12:54 <dhellmann> I guess we expect to have enough tests that we don't want to open bugs or blueprints for them individually?
15:13:00 <jd__> eglynn: that only works if people looks at Gerrit first
15:13:17 <eglynn> nprivalova: ask dkranz about that mutex idea, I can't recall the details
15:13:20 <jd__> while we're at it, I'd like to emphasize something on Tempest testing
15:13:21 <eglynn> jd__: true that
15:13:31 <nprivalova> ok. I hope that at least core team is aware of our document :)
15:13:38 <jd__> I'd like that people writing Tempest tests be _very_ careful about how they test things, avoiding race conditions
15:14:00 <jd__> typically it is NOT safe to do things like "spawn a VM and check that there's a meter", since you don't know how much time is needed to have a meter
15:14:19 <eglynn> we don't want ceilo to become the new neutron ;)
15:14:20 <jd__> writing such tests that would work 95% of the time is a terrible idea and is going to make us look like Neutron
15:14:28 <nprivalova> JFYI, our client is https://review.openstack.org/#/c/55276/
15:14:31 * jd__ high fives eglynn's great mind
15:14:39 <eglynn> LOL :)
15:14:51 <jd__> so *PLEASE* be careful about what you write
15:15:04 <nprivalova> jd__, please come and review us :)
15:15:05 <jd__> and add synchronous mechanism if needed in Ceilometer to make test *RELIABLE*
15:15:27 <jd__> nprivalova: I wish I wish I had enough time, but it's getting harder to follow everything :)
15:15:57 <jd__> so I'm just throwing that for now, that's all I can do, and if it was obvious for all of you, wonderful ;)
15:16:06 <dhellmann> nprivalova: we talked at one point about having ceilomter-core added to reviews, are the test authors doing that?
15:16:24 <jd__> dhellmann: I know I'm on reviews, don't know aobut ceilometer-core
15:16:26 <nprivalova> but we definitely need a core reviewer from ceilometer team for tempest test. I hope we will have more tests soon
15:16:40 * dhellmann has 280 unread threads related to code reviews in his mailbox
15:16:43 <nprivalova> dhellmann, yes
15:17:20 <dhellmann> nprivalova: good, thanks
15:18:08 <nprivalova> my goal for this meeting is to remind you about tempest. So it's rather ok for now, we may move on. I will ping you on the next meeting
15:18:17 <nprivalova> :)
15:18:24 <jd__> cool
15:18:35 <jd__> moving on
15:18:42 <jd__> #topic Batch notification processing (herndon)
15:18:50 <dhellmann> it might be good to have tempest status as a recurring topic
15:19:00 <dhellmann> like the client lib
15:19:07 <eglynn> dhellmann: +1
15:19:11 <nprivalova> dhellmann, yep. I've added it to agenda
15:19:44 <jd__> I'll keep it on the agenda
15:19:46 <jd__> herndon_: around?
15:19:49 <herndon_> yep
15:19:55 <jd__> herndon_: enlighten us
15:20:00 <herndon_> just didn't want to interrupt
15:20:19 <jd__> :)
15:20:20 <herndon_> so, I'd like to chat about this email thread that happened before the holidays: http://lists.openstack.org/pipermail/openstack-dev/2013-December/022789.html
15:20:48 <herndon_> It seems like for the most part people liked the idea of consuming notifications in batches.
15:21:03 <herndon_> I believe DanD had some concerns, wanted to give him an opportunity to voice those if he's around
15:21:22 <DanD> am here
15:21:35 <herndon_> care to elaborate a bit?
15:21:51 <jd__> herndon_: I think the end result is "please write a patch", no?
15:21:56 <jd__> herndon_: or is anything unclear?
15:22:07 <herndon_> unclear, ie, which patch to write
15:22:12 <DanD> my concern with the message q based approach is that we then don't get caching/batch consumption for other collection methods
15:22:25 <DanD> we already have api post
15:22:32 <DanD> discussions on ipmi, snmp
15:22:38 <jd__> herndon_: change the prototype of data recording to accept multiple records to be able to batch INSERTs first, then the RPC part
15:22:40 <DanD> those should also support this
15:23:00 <herndon_> sandywalsh proposed putting the code into ceilometer, not oslo, as a first step. This will likely have a faster turn around time
15:23:11 <jd__> which code?
15:23:15 <sandywalsh> yep, in the collector
15:23:19 <eglynn> would the batching would only apply to samples derived from notifications, or also to metering messages derived from polling?
15:23:44 <sandywalsh> optionally turned on but could be shared by all consumers (sample/event/etc)
15:23:46 <DanD> if we put in in the processing pipeline then it could apply to all collection
15:23:59 <herndon_> It could do both I suppose. I'm mostly interested in notifications (and specifically, events), but it would be good to be able to batch everything.
15:24:09 * eglynn is thinking about timeliness for alarm triggering
15:24:18 <thomasem> As I understand it, it'd apply to anything that consumes notifications? instead of a single message in and product of a single message out, it'd transact in lists...
15:24:26 <DanD> you need to be able to turn it off as well
15:24:26 <sandywalsh> eglynn, it would be optionally turned on
15:24:44 <herndon_> DanD - the problem with batching up http posted data is there's no way to tell the client that something went wrong... I think that data must be inserted directly.
15:24:50 <sandywalsh> DanD, not sure if the pipeline is the right place ... perhaps before it?
15:25:14 <herndon_> with AMQP, we have the opportunity to hold on to the message until it is committed to the storage back end.
15:25:17 <DanD> sandy, that could work as well
15:25:33 <llu-laptop> could we have another publisher for batch?
15:25:53 <llu-laptop> herndon_: how? AMQP doesn't interact with storage backend in ceilometer
15:25:54 <sandywalsh> yeah, it's really only beneficial for queue based inputs ... wouldn't be good for other forms like http
15:26:32 <herndon_> llu-laptop: the collector wouldn't ack anything in the batch until the batch is committed. so if something goes wrong, all of those messages just get requeued
15:26:36 <eglynn> would we need selectivity for batching metering messages for some meters but not others?
15:26:50 <eglynn> (i.e. suppress batching for cpu_util samples)
15:27:00 <DanD> there is also udp messages that do not need the reliability but could have huge volumes
15:27:18 <llu-laptop> agree with eglynn for selectively batch
15:27:57 <herndon_> eglynn: I don't understand...
15:28:23 <eglynn> herndon_: I'd want to ensure that cpu_util samples hit the DB immediately on receipt
15:28:29 <sandywalsh> eglynn, if samples are coming from events, then timeliness can't be that big a concern anyway. There's always latency there.
15:28:30 <rhochmuth> batching for http is useful is the incoming http rate is reasonably high. you can batch the incoming requests based on size/time and do async responses back to the client after the request is completely processed
15:28:47 <sandywalsh> eglynn, I would assume you'd want UDP for things like CPU samples?
15:28:48 <eglynn> herndon_: as say autoscaling actions may be driven off alarms based on cpu_util
15:28:55 <herndon_> ok, I see
15:29:14 <eglynn> sandywalsh: well out-of-the-box AMQP is used for everything
15:29:32 <eglynn> sandywalsh: ... but yeah, could configure the pipeline.yaml that way
15:29:55 <herndon_> but the metering data doesn't come in on the notification topic right?
15:30:07 <sandywalsh> eglynn, yeah, perhaps we should reconsider using the queue for samples as default
15:30:19 <sandywalsh> herndon_, hmm, good point
15:30:44 <eglynn> herndon_:  yeah that's why I asked whether it just applied to notifications
15:30:51 <herndon_> thinking for starters, lets just batch all of the notifications and build up a more generally useful batching mechanism. Then we can decide how to use it elsewhere?
15:30:53 <ildikov> herdon: metering data has an own metering topic, if I remeber well to the configuration file
15:31:14 <ildikov> herdon: until someone not sets it for notification, if it is allowed
15:31:47 <sandywalsh> ildikov, perfect
15:32:00 <sandywalsh> (as it should be :)
15:32:07 <jd__> batching notification = sending an array of notification in one AMQP message?
15:32:21 <ildikov> herndon_^^ sorry, I'm typing too fast and mistyped :(
15:32:28 <dragondm> no, consuming a batch of notifications off the queue
15:32:41 <jd__> ok so it's just on the reader side
15:32:46 <thomasem> jd__, yeah
15:32:47 <sandywalsh> no, it means holding them so we can batch write them to storage so it's not 1 read: 1 write
15:32:48 <herndon_> yeah
15:32:56 <jd__> that's what I understand from the list thread but this disussion confused me
15:33:09 <sandywalsh> N queue reads: 1 disk write
15:33:42 <herndon_> Sounds like it's time to write a BP and start writing the code
15:33:45 <rhochmuth> 1 ack per batch of the sequence number acks everything prior
15:33:49 <jd__> herndon_: my point ;)
15:33:58 <DanD> back to my orginal question :), do we want to support batch writes/caching for all sources of data not just message queues?
15:34:15 <thomasem> rhochmuth, Are we able to specify messages that failed in the bulk ack scheme so they either get dead-lettered or pushed back onto the queue?
15:34:20 <jd__> DanD: other source of data being..?
15:34:25 <llu-laptop> DanD: in storge side?
15:34:26 <sandywalsh> DanD, my vote is start with message queues
15:34:35 <thomasem> Something like bulk ack up to message 1001, except for message 24 and 32
15:34:49 <rhochmuth> thomasem, i don't think so,
15:34:55 <herndon_> thomasem: if you "nack" single messages before the greater "ack", then yes
15:35:05 <thomasem> Gotcha
15:35:05 <thomasem> cool
15:35:20 <thomasem> I knew we had talked about that previously but the details slipped my mind. :) Thanks!
15:35:22 <sandywalsh> rhochmuth, I don't think bulk ack() would work in a multi collector scenario either
15:35:23 <nprivalova> should we implement our own TCP?
15:35:31 <thomasem> sandywalsh, why?
15:35:42 <jd__> nprivalova: I didn't read that
15:35:43 <herndon_> DanD - batching is not going to take place in oslo, so theoretically we could create UDP batches.
15:35:55 <nprivalova> jd__, :)
15:36:17 <dragondm> sandywalsh: afaik a bulk ack only acks messages for that client.
15:36:17 <herndon_> nprivalova - care to fill us in on what that means?
15:36:21 <DanD> so we are talking about batching on the ceilometer queue, not the service queue?
15:36:29 <thomasem> ^^ dragondm
15:36:30 <thomasem> :)
15:36:37 <thomasem> That's how it's supposed to work
15:36:40 <herndon_> dragondm: tjat
15:36:41 <sandywalsh> dragondm, that was my concern ... if that's the case, we're ok in the happy-day scenario
15:36:43 <herndon_> that's correct
15:37:00 <sandywalsh> dragondm, otherwise, we have to individually ack for thomasem's use-case
15:37:25 <sandywalsh> either way, not a biggie
15:37:36 <thomasem> I think the message tag used to ack is specific to the client
15:37:37 <thomasem> in some way
15:37:47 <herndon_> it's specific to the channel.
15:37:50 <thomasem> So, consumer A won't effect consumer B
15:37:56 <thomasem> when acking or nacking
15:38:01 <herndon_> anyway, I think we're done, I'll write up the BP and send it out.
15:38:04 <dragondm> yup.
15:38:14 <sandywalsh> I think the big issue is eglynn's concern and if they're on different queues we're golden. Just need to support batching on a per-queue basis
15:38:42 <sandywalsh> so, then it's just like jd__ said, submit branch ... sounds like a worthy effort
15:38:46 <eglynn> herndon_: ... means a warning against re-inventing the wheel I'd guess
15:38:46 <eglynn> sandywalsh: yep, agreed (IIUC)
15:38:54 <eglynn> +1
15:39:06 <jd__> moving on then :)
15:39:07 <nprivalova> herndon_, we need to send ack and nack, 'sliding window'. all of these remind me TCP protocol implementation. Just an association :)
15:39:09 <thomasem> neat
15:39:32 <jd__> #topic Release python-ceilometerclient?
15:39:47 <eglynn> I'm guessing herndon_ wants to wait for https://review.openstack.org/54582/
15:39:50 <thomasem> Haha, support for event API is coming, promis.
15:39:53 <thomasem> promise*
15:39:54 <herndon_> yes please!
15:39:54 <thomasem> :P
15:40:25 <jd__> fair enough
15:40:32 <herndon_> eglynn like it, so it MUST be ready to go :)
15:40:38 <eglynn> herndon_: is that the only patch you're waiting on landing?
15:40:39 <herndon_> s/like/likes
15:40:43 <jd__> let's wait for next meeting to see if we can do it unless eglynn handles the burden before
15:40:48 <herndon_> for the client? yes
15:41:07 <eglynn> herndon_: I'll cut the client release once that patch lands
15:41:20 <eglynn> jd__: cool enough
15:41:34 <jd__> #topic Open discussion
15:41:41 <jd__> there's some timeleft if you want to raise anything
15:41:53 <yfujioka> Hello, I posted  two blueprints.
15:42:05 <hvprash_> Sorry I am asking this again and  haven't got any directions so far on how to proceed - Is Active/Active Ceilometer Central agent fully supported  in Havana or is that something planned in Icehouse ? Can someone provide some inputs, any docs ?
15:42:06 <llu-laptop> sorry for the noise, but I need some reviews for https://review.openstack.org/#/q/status:open+project:openstack/ceilometer+branch:master+topic:bp/support-resources-pipeline-item,n,z it's been there for some time
15:42:12 <yfujioka> Those are https://blueprints.launchpad.net/ceilometer/+spec/monitoring-network and https://blueprints.launchpad.net/ceilometer/+spec/monitoring-network-from-opendaylight.
15:42:19 <yfujioka> I hope your comments.
15:42:28 <Akhil> We have started work on supporting VMware vSphere for Ceilometer. We've come up with technical aspects of implementation like mapping meters from Ceilometer to the counters available in vSphere.  The corresponding blue print is - https://wiki.openstack.org/wiki/Ceilometer/blueprints/vmware-vcenter-server.  It would be good if it gets reviewed by others.
15:43:08 <llu-laptop> hvprash_: havana don't have active/active central agent
15:43:37 <yfujioka> And I posted patches about those BP.
15:43:41 <hvprash_> any proposed blueprint ? or is it under the central agent improvement bp ?
15:43:57 <nprivalova> I've started a POC for aggregation and rolling_up. If you are interested in it please take a look https://review.openstack.org/#/c/65681/ . I'm working on detailed description in bp
15:44:50 <eglynn> IIRC we chatted before about a mid-cycle meet-up for the euro-ceilo devs at FOSDEM
15:44:57 <eglynn> that still on the cards?
15:45:02 <eglynn> sadly, /me won't be able to make it to FOSDEM this year :(
15:45:04 <yfujioka> Those BP are not under the central agent improvement bp.
15:45:25 <yfujioka> Should I move to under the the central agent improvement bp ?
15:45:35 <yfujioka> s/the the/the/
15:45:36 <jd__> hvprash_: icehouse we hope
15:45:48 <llu-laptop> hvprash_: that's the central agent improvement bp created for, it's an unmbrella
15:45:50 <hvprash_> jd__: can you refer me to the bp ?
15:46:09 <jd__> eglynn: that sucks! how is that possible :(
15:46:15 <eglynn> ... but if there's quorum of other folks attending might be worth organizing something
15:46:16 <hvprash_> llu-laptop, ah ok !
15:46:29 <jd__> hvprash_: https://blueprints.launchpad.net/ceilometer/+spec/central-agent-improvement
15:46:54 <jd__> eglynn: I'll send a mail on -dev
15:47:23 <hvprash_> jd__, assuming that also includes the unification of the ceilometer central agent like mulitple datacenter csources etc
15:47:32 <jd__> eglynn: should we move FOSDEM to Dublin next year for ya? ;)
15:47:52 <eglynn> jd__: now you're talkin'!
15:47:58 <absubram_> jd_: thanks for the pointer to the central agent improvement bp.. is this being actively targeted for icehouse-3?
15:48:06 <jd__> eglynn: that'll just change the beers color.
15:48:22 <jd__> absubram_: it is, finger crossed
15:48:26 <eglynn> jd__: yeah, we don't do fruit in our beer!
15:48:35 <thomasem> No, just pure happiness.
15:48:37 <nprivalova> ah, one question from me! I need your heads for discussion https://review.openstack.org/#/c/52670/7/ceilometer/api/controllers/v2.py
15:48:44 <dragondm> thomasem: yup :>
15:49:03 <nprivalova> dhellmann^^
15:49:06 <absubram_> thanks jd__
15:49:26 <Akhil> eglynn: Is there a formal process of getting the technical implementation of a BP verified?
15:49:43 <jd__> Akhil: git review
15:50:01 <eglynn> yep, wot he said ...
15:51:17 <jd__> sileht: going at FOSDEM this year btw?
15:51:47 <Akhil> eglynn: Which is after the code is written and we post a review. Is there any platform for discussing architecture before coding or the BP approval is enough?
15:51:48 <dhellmann> nprivalova: I'll take a look
15:51:59 <sileht> jd__, I don't think so
15:52:47 <dragondm> Akhil: Just ML and IRC chats...
15:52:49 <eglynn> Akhil: BP approval is usually enough, but if it controversial best to also raise it on the ML and/or IRC to give folks a chance to object before you do all the implementation work
15:52:53 <nprivalova> dhellmann, thanks
15:53:02 <hvprash_> jd__, do you mind if i can sync offline with you to get some insight into this BP ? not sure if all the cases are captured in this or might be my lack in understanding
15:53:38 <absubram_> jd__: llu-laptop: just a quick request.. if there is a plan to add active/activa HA support via this central agent impeovement bp.. can we add that to the white board in the bp?
15:54:17 <Akhil> eglynn: Thanks !!! We'll send out a mail with the points of concern.
15:54:28 <eglynn> Akhil: cool
15:54:37 <jd__> hvprash_: we can discuss it on openstack-dev I think
15:54:49 <hvprash_> jd_, that works.. thx a lot
15:54:59 <jd__> so who's going to FOSDEM finally, probably nijaba and me, that's it?
15:55:08 * dhellmann can't make it
15:55:20 <jd__> not sure if it's worth organizing something if I'm alone coding
15:55:27 <thomasem> Awwww
15:55:28 <jd__> :-)
15:55:33 <thomasem> You can put a camera on yourself
15:55:53 <thomasem> Or telepresence robots
15:56:18 * nealph votes for robots
15:56:19 * dhellmann would watch a live-stream of jd__ coding
15:56:20 <terriyu> is that telemetry?
15:57:57 <jd__> heh
15:58:13 <thomasem> If we were to hold up little gauges to the camera, perhaps.
16:00:07 <jd__> that ends it dudes
16:00:09 <jd__> #endmeeting