15:00:18 #startmeeting ceilometer 15:00:19 Meeting started Thu Feb 6 15:00:18 2014 UTC and is due to finish in 60 minutes. The chair is jd__. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:20 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:22 The meeting name has been set to 'ceilometer' 15:00:40 o/ 15:00:41 #link https://wiki.openstack.org/wiki/Meetings/Ceilometer 15:00:42 hi, @jd__ 15:00:44 o/ 15:00:47 hi everyone 15:00:50 o/ 15:00:53 o/ 15:02:04 o/ 15:02:24 #topic Milestone status icehouse-3 15:02:39 #link https://launchpad.net/ceilometer/+milestone/icehouse-3 15:02:53 so a lot of things are started, but it'd be great to finish ASAP 15:02:57 we still need approval for this patch: https://review.openstack.org/#/c/62157/ 15:03:03 otherwise we'll be caught in the gate storm 15:03:33 ildikov_: yeah I'll try to take a look at it 15:03:40 ildikov_: i may have time to review tomorrow as well. 15:03:50 jd__: thanks 15:04:13 otherwise not much to add on my part yet 15:04:35 thanks guys, it would be really good, if we could go on wih the statistics bp and also have the patch sets of the complex query landed in i-3 15:04:38 anything else about one of your blueprint? 15:04:59 I'm still confused about aggregation 15:05:17 not sure should I continue or not 15:05:52 nprivalova: do you have a requirement on that? 15:05:52 nprivalova: did we come to any conclusion on the overlapping periods issue I raised? 15:05:52 o/ 15:05:59 o/ 15:06:30 nprivalova: ... i.e. the question of whether aggregation can be helpful in the common case of periods that overlap 15:06:31 * jd__ dodges the issue 15:06:46 eglynn: we agreed that it is not for alarming 15:07:17 #link https://blueprints.launchpad.net/ceilometer/+spec/base-aggregation 15:07:45 nprivalova: k, then the question really is the potential benefit for the other common cases of recurring statistics queries 15:08:10 nprivalova: ... if we can detect when the same query constraints recurr 15:08:13 yep, I agree. I saw a comment about billing use case 15:08:29 nprivalova: ... and match the actual query constraints to the pre-aggregated values 15:08:49 anyway, I think we may continue with meeting :) 15:08:53 ok 15:09:00 #topic Tempest integration 15:09:06 wassup on that? 15:09:23 we have the following 15:09:25 https://review.openstack.org/#/q/status:open+project:openstack/tempest+branch:master+topic:bp/add-basic-ceilometer-tests,n,z 15:09:46 so notifications part is done 15:09:52 but we have a bug :) 15:10:29 #link https://bugs.launchpad.net/ceilometer/+bug/1274607 15:10:31 Launchpad bug 1274607 in ceilometer "ceilometer-agent-notification is broken without eventlet monkey patching" [Critical,In progress] 15:11:01 yep, so that's why we have only -1 from Jenkins 15:11:16 fair enough, that one should be resolved soon fortunately 15:11:18 I'm testing the fix 15:11:48 #topic Release python-ceilometerclient? 15:11:57 no need this AFAIK 15:12:07 *for this 15:12:12 ok :) 15:12:15 #topic Polling-on-demand discussion (ityaptin) 15:12:28 ityaptin: enlighten us 15:12:46 about pollsters on demand. Use cases of this feature are tests and debug. 15:13:06 #link https://review.openstack.org/#/c/66551/ 15:13:12 (nprivalova: the fix works if you have https://review.openstack.org/#/c/71124/) 15:13:34 so the purpose of this is to trigger polling for tests ... could the same be acheived by simply configuring the test with a v. short pipeline interval? 15:13:37 * dhellmann apologizes for being late 15:14:13 https://blueprints.launchpad.net/ceilometer/+spec/run-all-pollsters-on-demand 15:14:29 And exists proposal to turn on this feature only with flag 'debug', because somebody can DoS ceilometer with starting pollstering. 15:14:31 dhellmann: you're… not fired! 15:14:37 * dhellmann whew! 15:14:50 i.e. the test needs to precipitate events that happen relatively infrequently (i.e. polling cycles with the boilerplate pipeline.yaml) 15:15:10 ... so one approach would be simply to make these events more frequent in the test scenario 15:15:11 ityaptin: how does the flag get set? the DoS issue was a concern when i read bp 15:15:12 fyi: I have added this to devstack: CEILOMETER_PIPELINE_INTERVAL=10 15:15:22 the problem is that polling != having sample anyway, there's no guarantee that samples are going to be available N seconds after being polled 15:15:27 is this for tempest tests or unit tests? 15:15:28 nothing's synchronous 15:15:31 dhellmann: tempest 15:15:36 @eglynn, that still won't be the same, I would think. 15:15:40 perhaps we can just set a different value for devstack-gate 15:15:57 DoS concern? I doubt that, it's a feature available on RPC 15:16:07 sure the admin can DoS himself, but well.. he's admin 15:16:11 tongli: not exactly equivalent, but perhaps a close enough analogue? 15:16:31 I think it's not only for tempest. When I install devstack it is useful just to check that pollsters work ok, without waiting interval 15:16:47 nprivalova: agreed 15:16:50 gordc: For example - debug option 15:16:56 nprivalova: ... but the test has to wait anyway for some "ingestion" lag 15:17:05 @eglynn, I think it will be nice to hit an enter key, then expect the code hit the break point. 15:17:38 eglynn, agree, for example the swift accounts size is done by a async swift task 15:17:48 @eglynn, @ityaptin, or you use the new notification alarm. 15:18:01 eglynn, so you have to wait swift have updated the value because ceilometer poll it 15:18:12 because/before 15:18:15 which will simply trigger it as soon as a notification is present on the bus. 15:18:16 * jd__ has no problem with that feature 15:18:30 if we're going to have a special test mode, it seems like it makes the most sense to make that a separate executable that runs the polling one time and exits 15:18:41 actually the question was about default value for debug flag :) 15:18:42 rather than adding a test mode to the main service 15:19:03 dhellmann: agreed 15:19:05 dhellmann: ... that sounds reasonable to me 15:19:05 dhellmann: if that would be synchronous, that'd be better 15:19:09 tongli: If we want to test pollsters it does not suitable 15:19:23 dhellmann FTW 15:19:24 @ityaptin, true. 15:19:29 jd__: yeah, just refactor the code that runs the polling pipelines so it can be called from a console script 15:19:45 dhellmann: I vote for that definitely, because that would be much better for Tempest 15:19:47 that wouldn't do anything for testing the collector 15:20:04 do we need some way to have the collector notify tests when data is available? 15:20:15 what I don't know is if it's reasonable to use that trick in tempest? 15:20:28 dhellmann: that'd be great 15:20:41 jd__: good point, we wouldn't really be testing the polling service 15:20:48 but we could have separate tests for that 15:21:18 if the point is to show that the service works and the pollsters work, do they have to run in the same test to know they work? 15:21:22 or we can also use the API method as in the current patch if it's synchronous, i.e. the GET /pollsters returns only when all pollsters are run 15:21:24 we may set configs only in devstack 15:21:37 there is o way to hack smth in tempest 15:21:48 having a callback on the collector is another issue, I don't have a solution yet but we can think about something else later I gues 15:21:49 in general I wonder how does tempest handle asserting other asynchronous tasks have completed? 15:21:50 nprivalova: ah, so we have to set devstack to configure ceilometer for tempest? 15:21:57 such as spinning up an instance 15:22:04 or a volume becoming available? 15:22:07 dhellmann: AFAIK, yes 15:22:09 eglynn: I see a lot of polling for status and timing out in the errors in the recheck list 15:22:16 nprivalova: ok 15:22:18 eglynn: by waiting and timing out, which has the potential to make Ceilometer the new Neutron :/ 15:23:08 yeah I guess 15:23:22 maybe we should move it to mailing list? 15:23:32 are we emitting any sort of notification of having received data? 15:23:38 ... /me is made a bit nervous by making big changes to the ceilo execution path for testing 15:23:43 could we write the test to watch the log for a specific message, or to listen for a notification? 15:23:52 eglynn: yeah 15:24:01 ... in the sense that we end up testing something other than what actually runs in prod 15:24:02 sending notification when we receive notifications? 15:24:28 jd__: otherwise I guess the test would call the api over and over until it got the data it wanted? 15:24:32 Ceilometer inception 15:24:41 oh no :) 15:24:53 dhellmann: yeah… polling and timing out :( 15:24:53 yeah so in prod its not the extra notification being emitted that has value, it's these data being visible in the API 15:25:29 eglynn: sure, I'm just trying to figure out how to write the test with the least polling 15:25:36 maybe polling is the best thing we can do 15:25:41 ... I dunno, suppose we did something funky with mongo replication 15:25:44 I think so for now 15:25:54 polling would certainly be simplest 15:25:55 ... and the data stopped being visible from a secondary replica 15:26:01 ... but the tests still pass 15:26:03 now the question is, is it acceptable to have a different path for polling (a request to the API) rather than the regular timer in term of testing 15:26:17 eglynn: but our tests aren't for mongo, they're for our code 15:26:24 notifications is another question. Now we are speaking only about polling 15:26:51 dhellmann: I thinking of our mongo storage driver doing some replication aware logic that has the potential to be broken 15:26:53 nprivalova: what I was hinting at was having ceilometer send a notification that the test could listen for to know when data had arrived, instead of polling the API 15:27:14 eglynn: if we have to put replication logic in our driver, then we'd have to test for it -- we don't have anything like that now, right? 15:27:49 dhellmann: nope, we don't ... that was just off the cuff example of something that could break 15:27:56 eglynn: ok 15:28:01 I think this is going too far? 15:28:14 @dhellmann, I am working the notification alarm, if that is what you asked. 15:28:24 I think my previous question is a good one, can I haz a cheese^W^Wyour opinion? 15:28:27 dhellmann: and might not be caught by a test that just asserted for a special notification that the collector had seen the incoming metering message 15:28:45 @dhellmann, when a notification appears, you can make something happen, 15:28:45 jd__: ok, I think we're talking about 2 different things 15:28:56 tongli: good point, just a sec 15:28:56 s/collector/notification agent/ 15:29:09 jd__: I was talking about how the test would know when ceilometer's collector had received data 15:29:19 let us write to mailing list again because honestly I don't see any solution now 15:29:48 nprivalova: good idea 15:29:51 dhellmann: I know, but that's a different topic that the one we're discussing 15:29:59 sorry, I thought we had moved on 15:30:02 dhellmann: so I'd like to have an answer on the first point, first :) 15:30:16 * dhellmann wonders when jd__ became such a stickler ;-) 15:30:17 which is having different path used to poll the data 15:30:20 and please take a look into notification tests in tempest, because we need to be sure that tests are correct 15:30:26 lol 15:30:35 I think it's a mistake to build something in for testing that is too different from something that would be useful in production 15:30:49 we have a periodic polling loop, so we need a test that shows that we poll periodically 15:30:49 dhellmann: +1 15:30:57 agreed 15:31:02 if we have an API to trigger polling, then we need a *separate* test to show that the api triggers polling 15:31:16 * jd__ hits the channel with his maillet 15:31:19 so we might as well just test for the code we have now, since we can't avoid it 15:31:51 if, as nprivalova says, we have to use the devstack configuration, then we will need to adjust the polling interval there to something relatively small and use that for the test 15:32:06 * jd__ nods 15:32:12 yep ... my suggestion exactly 15:32:17 as far as the notification of notification received is concern, I think it's something we should think about 15:32:19 alternately, if we could have the test adjust that interval -- maybe by starting a second copy of the service? -- then we could do all of this in tempest 15:32:25 but probably not here and now :) 15:33:15 jd__: for notification of notifications, we might be able to use the alarm trigger feature, but that is using some production code to test other production code 15:33:25 indeed 15:33:28 so it might be better conceptually to just have the test poll the API looking for the data 15:33:47 as long as the polling is "smart" enough, it that approach really that bad? 15:33:50 that would be good enough for now anyway 15:33:54 which is less elegant, in some sense, but more "correct" from a testing standpoint 15:33:57 eglynn: we'll see? 15:34:04 eglynn: nah, it just feels a little heavy-handed 15:34:28 by "smart" I mean say using a reasonably adaptive/backed-off intra-poll delay 15:34:43 it's tempest, you can hammer the API 15:34:43 eglynn: right 15:34:47 haha 15:34:56 "adaptive", tsss :) 15:35:01 LOL :) 15:35:04 GIVE ME THE DAMN DATA YOU API 15:35:12 that's how we should do it 15:35:29 shall we move on gentlemen? 15:35:29 * dhellmann opens a blueprint to change the API to allow queries in all caps 15:35:39 unfortunately we should commit it to devstack first :) 15:35:41 * jd__ puts his maillet away 15:35:54 devstack already have CEILOMETER_PIPELINE_INTERVAL configuration variable, so we just have to set it in gate-devstack 15:36:10 (and gentlewomen) 15:36:26 nprivalova: would that be a problem? 15:36:35 sileht: I will work on this 15:36:39 good point sileht 15:36:53 sileht saves us from over-engineering 15:37:18 #topic Work with metadata discussion 15:37:18 jd__, I don't know :) maybe you have a power to commit everything to everywhere 15:37:31 it's me again 15:37:41 nprivalova: I may or may not have some super power :D 15:38:08 The long story short: 15:38:17 When user requests meters or resources their's metadata is being flattened. 15:38:21 On other hand, when meter or resource is stored to db their metadata is flattened too. 15:38:28 These two processes are independent and now two different flatten-functions exist. 15:38:33 We decided to keep only one of them (related bug #link https://bugs.launchpad.net/ceilometer/+bug/1268618). 15:38:35 Launchpad bug 1268618 in ceilometer "similar flatten dict methods exists" [Medium,In progress] 15:38:45 After some discussions with team I decided to use dict_to_keyval everywhere. The reason is that this func allow user to create queues on lists and doesn't contain bugs. 15:38:55 So the question: API layer is the only place where recursive_keypairs is used and this function contais a bug. 15:39:16 The perfect solution is to change recursive_keypairs=>dict_to_keyval in API, but output of these funcs are different 15:39:20 You may take a look here #link https://review.openstack.org/#/c/67704/4/ceilometer/api/controllers/v2.py 15:39:29 Is it absolutely forbidden to make any changes in API output? We may postpone to change recursive_keypairs=>dict_to_keyval in API but maybe we may fix a bug in recursive_keypairs and fix all our wrong tests? 15:40:05 nprivalova: what's the bug in recursive_keypairs? 15:40:20 well it would be forbidden I'd say to make changes that could break existing API callers 15:40:37 yes, changing the return format would require an API version bump 15:40:45 should I fix the bug but simulate it again in API to keep the behaviour? 15:40:50 #link https://wiki.openstack.org/wiki/APIChangeGuidelines 15:40:53 which isn't out of the question, but is probably not something we want to do at this point in the cycle 15:41:18 * jd__ shakes in fear of APIv3 15:41:35 #link https://bugs.launchpad.net/ceilometer/+bug/1268628 15:41:37 Launchpad bug 1268628 in ceilometer "recursive_keypairs doesn't throw 'separator' param to next iteration" [Undecided,In progress] 15:41:50 nprivalova: i guess your fix is good then. i actually don't like how we're outputing some odd formatting... but it will change output to fix it. 15:41:50 nprivalova: ah 15:42:39 since the consensus is to not change output i think we need to keep your patch in to keep output consistent as before. 15:42:59 yep, just wanted to clear that 15:43:20 cool 15:43:28 I like when we all agree 15:43:46 #topic Open discussion 15:43:48 and one more cr https://review.openstack.org/#/c/68583/ 15:43:53 nprivalova: you've no idea how much it bothers me seeing 'a.b:c:d' keys.lol i'll review patch again 15:44:15 gordc: cool :) 15:44:41 anyone know if the ctrl+c problem was fixed or not? 15:44:59 tongli: where? in devstack? 15:45:06 yes. 15:45:20 I do not think that is specific to devstack though. 15:45:33 tongli, https://review.openstack.org/#/c/70338/ this one is missing I think for CRTL+C issue 15:45:51 tongli: oh, I'm not alone :) Today I faced it several times with devstack-master 15:46:04 tongli: it's patched. the oslo sync code just got merged. 15:46:20 ok. good. 15:47:11 do you have some bug-scrub procedure? 15:48:10 nprivalova: ... do you mean triaging the bug queue? 15:48:17 ... or squashing the actual bugs with a concerted effort at fixing? 15:48:33 ... such as a "bug-squashing day" 15:49:04 I meant clean-up 15:49:18 and triaging, yes 15:49:40 nova had few of these days in the past, 15:49:53 a clean-up that ends with a neater, prioritized queue ... but not necessarily with fixed bugs, right? 15:50:16 Hi I am a noob in ceilometer .... was reading code ..... any place i can help to start learning about the code? 15:50:33 hi, ddutta! 15:50:39 ddutta: try fixing a bug? 15:50:46 and the same with bps. I just found a bug with 'Confirmed' state that was fixed half of the year ago :) 15:50:59 btw I found a trivial typo too :) https://review.openstack.org/#/c/71431/ 15:51:37 dhellmann: hi ... would love to do something here as my interests are in streaming data mining and machine learning :) ... 15:52:22 ddutta: you've seen http://docs.openstack.org/developer/ceilometer/ right? 15:52:22 nprivalova: ... the newer bugs seem to be traiged fairly rapidly in general, but seems like we may need to do a periodic trawl of the older ones for dupes/stales etc. 15:52:32 will take on some simple bugs for starters to get more code and design insight ,...... 15:52:36 nprivalova: which bug was that? i occasionally run through bugs to clean them up a bit... i tend to let jenkins switch bug status so i guess it missed it in this case. 15:52:53 dhellmann: yes I started to read those 15:53:52 ddutta: i tend to throw breakpoints in code i'm interested in and step through... probably doesn't work for everyone but works for me. 15:54:13 ddutta: +2 on that patch, good eye 15:54:20 gordc: ah, ok. it was https://bugs.launchpad.net/ceilometer/+bug/1217412 . We've changed the status 15:54:21 Launchpad bug 1217412 in ceilometer "HBase DB driver losing historical resource metadata" [Medium,Fix released] 15:55:18 dhellmann: thx .... on to the bugs now 15:55:27 gordc: good idea .... 15:55:57 nprivalova: ah, yeah. that status wasn't updated by build... i guess anyone can change the status so if you notice anything feel free to make updates. 15:57:28 time to wrap up guys 15:57:41 feel free to continue in #openstack-ceilometer :) 15:57:48 happy hacking! 15:57:50 #endmeeting