15:00:03 #startmeeting monasca 15:00:03 Meeting started Wed Mar 30 15:00:03 2016 UTC and is due to finish in 60 minutes. The chair is rhochmuth. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:04 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:07 o/ 15:00:07 The meeting name has been set to 'monasca' 15:00:10 o/ 15:00:15 o/ 15:00:20 Agenda is at, https://etherpad.openstack.org/p/monasca-team-meeting-agenda 15:00:26 Agenda for Wednesday March 30, 2016 (15:00 UTC) 15:00:26 1. Non-periodic metrics and aggregate functions, are all suitable ? 15:00:27 2. Anyone else seeing https://bugs.launchpad.net/monasca/+bug/1559165? 15:00:27 3. Anyone seeing period monasca-api hangs (java)? 15:00:27 4. Compression support for POSTing metrics? 15:00:27 Launchpad bug 1559165 in Monasca "monasca-api log flooded with jetty parser errors (java api)" [Undecided,New] 15:00:27 5. Non-periodic/periodic metrics - update 15:00:27 o/ 15:00:27 6. Grafana 2 horizon update https://review.openstack.org/#/c/295983/ 15:00:27 o/ 15:00:27 Needed for devstack integration 15:00:31 hi everyone 15:00:32 o/ 15:00:48 o/ 15:00:54 good morning 15:00:55 morning 15:01:05 looks like a few agenda items are up today 15:01:13 o/ 15:01:16 but we should get through hopefully relatively quickly 15:01:16 o/ 15:01:22 and then can have other discussion 15:01:31 cool 15:01:38 so first topic 15:01:47 #topic Non-periodic metrics 15:02:00 tomasz: Is that you? 15:02:08 yes, actually it was a question: Non-periodic metrics and aggregate functions, are all suitable ? 15:02:47 are you referring to the case of applying like a statistic function to a non-periodic metric 15:02:54 yeah 15:03:11 something like: does avg make sense for metric that are non-periodic 15:03:40 it does in our case 15:03:41 i was hoping to not do something in this area, but now that you've brought it up, there are probably items worth addressing 15:03:47 i think it is ok 15:03:59 but, the question is what should the average be 15:04:14 for a metric that you haven't received in a long time 15:04:22 bklei: good to know 15:04:33 should it be the average of the last value 15:04:39 o/ 15:04:39 or should it be 0 15:04:43 or NaN 15:05:07 I am not sure how does it work in thresh right now...does it calculate average from all measurements or just a few last 15:05:10 personally, i'm ok addressing the non-periodic metrics in just the threshold engine for now 15:05:22 it doesn't make sense without a time range specified. 15:05:58 in the threshold engine it sould just be the last value 15:06:34 so, if the value hasn't been sent within the period, the value is just the last value that was sent 15:06:42 that was the interpretation that i had 15:07:24 so, let's say normally a periodic metric is sent every 30 seconds approximately 15:07:46 if it is a non-periodic metric then the assumption is that the value for the metric, is just the last value 15:08:22 slogan: if the time period would have to specified that would include modification in the UI as well, not to mention other components...that's not so trivial I guess 15:08:36 But average could be useful if a range is specified? Consider rainfall. Answers are completely different if averaged by day, month, year, season. 15:08:47 nod 15:09:09 roland: so, right now let's stay with what we have and keep this topic open, if someone would come (either Fujitsu or someone else) with the decent idea how to implement right 15:09:16 and basically I agree with slogan 15:09:25 but again, doing all that right now is too much 15:09:37 makes sense 15:09:50 that works for me 15:10:00 works for me to 15:10:20 could lead to some interesting insights in a UI to be avble to compute an average on the fly by dialing a periond knob 15:10:46 s/periond/period/ 15:11:26 ok, are we good 15:11:35 is it time to move on? 15:11:56 for me it is, that answers I guess the community feeling about that 15:12:10 thanks tomasz 15:12:13 #topic Anyone else seeing https://bugs.launchpad.net/monasca/+bug/1559165? 15:12:14 Launchpad bug 1559165 in Monasca "monasca-api log flooded with jetty parser errors (java api)" [Undecided,New] 15:12:16 that's me 15:12:26 stop bringing bugs 15:12:30 just a quick question => build from stable/mitaka ? 15:12:35 since we updated to the latest monasca-api (java) -- logs flooded with 15:12:36 probably user error 15:12:37 org.eclipse.jetty.http.HttpParser: Parsing Exception: java.lang.IllegalStateException: too much data after closed for HttpChannelOverHttp@6f19e5fb{r=1,a=IDLE,uri=-} 15:12:39 :) 15:12:39 * slogan runs 15:12:58 lost my train of thought, that was a joke 15:13:06 we hadn't brought in an api since 12/10, but i've narrowed it to a change between 12/10 and 1/19 15:13:14 not that it narrows it down much 15:13:18 still come back to your questino tomasz 15:13:29 i can keep digging -- but is helion seeing this? 15:13:37 or anyone else using latest monasca-api? 15:13:39 bklei: we are not aware of any issues, and have been doing a lot of testing 15:13:41 (java) 15:13:52 we are using latest java monasca-api in helion 15:13:52 we are in middle of updating monasca-* to stable/mitaka 15:13:53 i'm talking FLOODING the logs 15:13:56 but api works 15:14:00 and are getting ready for a release 15:14:07 so have been doing a lot of testing 15:14:17 so I'd have ask Artur tommorow about that, till now nothing like that has been spotted 15:14:29 ok, maybe we're special at TWC will keep digging 15:14:35 i'll check with our testers, but they woudl have told us if they saw something 15:14:40 k, thx 15:14:46 you guys are always special IMHO 15:14:52 short bus 15:15:08 might be interesting to see if the exception handling causes a retry or is just informative 15:15:25 i'll also update my devstack env with a java build to see what happens 15:15:27 you say things work still, no appararent loss of data, functionality? 15:15:31 from what i read, it could be a bad client call -- sending extra stuff 15:15:38 but the old monasca-api didn't care 15:15:49 exactly -- seems that it works fine 15:16:07 maybe something that pushes data to monasca was changed in a way that causing that problem 15:16:15 that is a possiblity 15:16:21 could be, and the new api is just more sensitive 15:16:23 we continue to improve the input validation 15:16:30 after all monasca-api is a server that monasca-agent (mainly or only) sends data to, right 15:16:31 ? 15:16:34 although, i would have expected a better error message 15:16:45 yeah, and some custom scripts/crons 15:17:03 the log that you've supplied doesn't seem very informative 15:17:11 which isn't expected 15:17:18 that's all that's there -- and lots of it 15:17:36 i'll keep digging and report back 15:17:43 ok, thanks 15:17:53 or let me know if you see/figure it out 15:17:53 i'll ask around, and do some test too 15:17:55 thx 15:18:14 maybe this lends some clues: http://stackoverflow.com/questions/29527803/eliminating-or-understanding-jetty-9s-illegalstateexception-too-much-data-aft 15:19:04 yeah, that's the page that talks about a badly written client :) 15:19:10 perhaps an issue with closing the http connection, hard to say 15:19:19 so what is the client in this case? 15:19:30 primarily monasca-agent 15:19:44 but also some cron-driven scripts/POSTs 15:19:45 ok 15:19:50 aha 15:20:20 i can shut different clients off in my sandbox, that's a good idea i hadn't thought of, to narrow it down 15:20:24 should be isolatable with some tracing I suppose 15:20:32 yes 15:20:37 divide and conquer 15:20:40 :) 15:20:55 LOG_LEVEL=All and go for it ]:-> 15:21:14 yup 15:21:15 that does seem the first place to look, since no one else has seen it 15:21:45 ok, i think we can move on 15:21:57 ok, thanks bklei 15:22:04 tomasz you were asking a question 15:22:22 about stable/mitaka 15:22:35 it was to bklei: does the bug started to happen on stable/mitka build of monasca-api 15:23:12 yeah, we brought in a new java api about a week ago, and hadn't done that since 12/10, that's when it began 15:23:21 sounds like it was well before stable/mitaka as that was only created last Friday 15:23:48 we built it locally with master at 2016-03-17 23:14 15:24:04 yeah, so we are now updating to this branch as well, so I'll ask Artur who's doing the upgrade to take a look at logs for monasca-log-api 15:24:10 bleh...monasca-api 15:24:19 specifically I guess it is requests.log, right ? 15:25:23 oh -- in our environment, /var/log/monasca/monasca-api.log 15:26:11 #topic Compression support for POSTing metrics? 15:26:27 started next topic, while previous one wraps-up 15:26:36 did we skip a topic -- api hang? 15:26:48 #topic Anyone seeing period monasca-api hangs (java)? 15:26:50 oops 15:26:53 i guess we did :) 15:26:54 we've seen this happen 3 times this week, anyone else see this in testing? 15:27:16 no 15:27:19 again...nope :( 15:27:25 nothing in the logs, process unresponsive, doesn't respond to thread dump, just dead 15:27:54 we're in the process of trying to get better debug msgs by leaving verbose logging on, hope to catch it 15:28:05 just wondering if anyone else has noticed this 15:28:21 we haven't every hit any problems like this 15:28:34 k 15:28:42 and this is also for this master build you mentioned ? 15:29:00 exactly -- hadn't seen it prior to that upgrade 15:29:15 possibly a related problem them 15:29:19 to your previous one 15:29:27 is memory growing 15:29:34 are file handles being used up 15:29:37 and not release 15:29:45 is vertica and kafka up 15:30:05 just throwing out the usual places to look 15:30:08 will validate that next time (file descriptors, etc) -- yeah the whole stack was still up and healthy 15:30:53 ok, moving on i guess 15:31:14 i guess, i don't have any great suggestions 15:31:31 we just did a huge amount fo scale testing and haven't seen any issues like this occur 15:31:36 i hadn't thought of the FD check, that's helpful 15:31:41 just one question: does this hanging seems to happen right away, or does it take some time ? 15:31:46 over time 15:31:52 mhm 15:32:21 i'll check around and see if i'm come up with more ideas 15:32:25 thx 15:32:44 we had an issue with kafka on using a lot of fd's, but not sure if that woudl impact you 15:33:08 easy way to check that would be to look at /proc/fds perhaps 15:33:09 hmm, can look there, we haven't updated kafka in a while in our env 15:33:14 see if it is growing 15:33:17 well file descriptions are limited per user (soft and hard) so if the process owner would reach the limit 15:33:18 +1 15:33:27 that would cause problem only for given application 15:33:34 as far as I know how limits work :D 15:33:53 #topic Compression support for POSTing metrics? 15:34:08 this is more a question -- have we considered this? 15:34:19 is this gzip compression of http payloads? 15:34:21 you mean something like gzip and stuff ? 15:34:29 slogan: :P 15:34:42 one of the primarly consumers of monasca here at TWC asked about zipping up data to POST 15:34:52 like zlib 15:35:00 I got flustered yesterday by horizon doing this, I was trying to sniff the traffic and see what was being sent... :-( 15:35:42 brain lapse on my http 15:35:46 did costumer asked only to zip traffic toward monasca-api ? or also data that is sent via kafka queue ? 15:35:55 because I think that it'd be also possible 15:36:13 just straight to monasca endpoint 15:36:17 but are you asking about just applying a compression encoding 15:36:24 yes 15:36:58 are we sure the compression/decompression time night not become a bottleneck somehow? 15:37:04 er, might 15:37:07 not a huge issue/request, just throwing it out there 15:37:08 there isn't anything in the monasca-api to accept different encodings 15:37:25 that's what i figured, and we've got bigger fish to fry atm 15:37:27 so, it could involve code changes 15:37:43 not necessarily code, but decorators and annotation 15:37:47 yeah 15:38:02 i'm assuming dropwizard could handle automatically 15:38:04 if enabled 15:38:11 and not sure about falcon 15:38:24 so, it could be low-hanging fruit to enable this 15:38:49 would this be something you'd want a config knob for, or perhaps something in the API so it could be enabled, disabled by a client? 15:38:57 i have to drop and take my kid to the dentist, thx for discussing the topic(s) 15:39:11 bye bklei 15:39:23 in falcon I did a quick search and it looks like that it should be possible directly on gunicorn (or simpy WSGI server) 15:39:33 that makes sense 15:40:14 since bklei is gone 15:40:17 #topic Non-periodic/periodic metrics - update 15:40:24 that's me 15:40:55 just wanted to give an update for latest patch sets, that would allow reviewers to grap the concept I am trying to introduce there better 15:41:37 so: following roland's and craig's advice, I modified the schema for SubAlarm table by adding sporadic field (boolean, false by default) 15:42:10 that accomplishes the case where topology is restarted somehow and threshold is able to recognize sub alarms for which sent metric were sparse 15:42:18 since most of them will be periodic 15:42:22 cool 15:42:28 https://review.openstack.org/#/c/292758 15:42:52 i'll start reviewing and testing it out 15:43:14 this change contains code that pulls that information from metric and saves in SubAlarm right before it is persisted (either mysql or ORM) 15:43:28 I've added comparing metric definitions between metric and sub alarm 15:43:40 but I am not sure if that accomplishes multitenant support 15:43:51 meaning to say in metric definition there is no information about the tenant 15:44:20 so I'd still need to figure out how to get that information in order to mark only specific SubAlarm as sporadic, not all for given Alarm 15:45:01 so that's more or less that for now 15:45:08 ok, i'll let craig know what is going on 15:45:18 and i'll need to dive in deeper in the code 15:45:26 to help out comment at this point 15:45:32 I will try to put references to either mysql schema or mysql migration script that would add sporadic field to exisiting schema 15:45:37 possible at gist 15:45:42 *a gist 15:46:05 yes, i'll need that 15:46:18 you just added a single column or multiple 15:46:23 a single column 15:46:50 I dug a bit between tables and it looks like that SubAlarm is the best place to put it 15:47:20 that sounds right 15:47:47 before that: one functional requirement should also be there (at least I hope I implemented it right) 15:48:01 meaning to say: once alarm for sporadic metric enters OK/ALARM state 15:48:13 it wont ever go back to UNDETERMINED statte 15:48:16 *state 15:48:25 that is my understanding too 15:48:44 that's what we all agreed upon :) 15:49:32 should we move on? 15:49:35 in the meantime there are some useful log outputs (debug or trace) I've added that allow to track what's going on with Metric,Alarm or SubAlarm between different bolts 15:49:39 yes, that's all 15:49:40 :) 15:49:55 ok, thanks tomasz, i'll start getting into it 15:50:01 #topic Grafana 2 horizon update https://review.openstack.org/#/c/295983/ 15:50:10 rbak: u there? 15:50:10 That's me 15:50:34 Just a reminder that this patch exists. I had one question, but no reviews so far. 15:50:58 I can't really integrate grafana 2 into devstack unless this gets merged. 15:51:30 looks like shinya tests 15:51:37 sorry, tested it 15:52:15 does this work in the old monasca-vagrant environment ? 15:52:23 i was wondering how to test? 15:52:44 It defaults to the old grafana, so you'd have to configure it yourself. 15:52:44 +1 15:52:53 You'd also need to build grafana. 15:53:05 If shinya ways it is ready, i'm ready to merge 15:53:10 you to you shinya 15:53:22 Works for me. I just didn't want to merge without input. 15:53:35 Yes I tested with grafana2 15:54:18 sounds like i should +2, unless anyeone else want to take a close look 15:54:19 I configured grafana2 15:54:30 i would like to see this added to the DevStack env ASAP 15:54:39 then i won't need to worry about monasca-vagrant anymore 15:54:47 if that works for everyone 15:56:38 works for me, I just +1'd it 15:56:39 +1 15:56:56 sorry for the delay, but if anyone else want to review further please let me know in the next 5 seconds, 15:57:07 else, i', ready to merge it 15:57:19 time's up! 15:57:33 ok, it is merged 15:57:36 thanks 15:57:45 I'll get back to devstack today then 15:57:49 if you need any help with the devstack, please let me know 15:57:52 will do 15:58:11 well, it is almost top of the hour again 15:58:17 seems like we coudl go on again 15:58:23 any questions in closing 15:58:27 https://review.openstack.org/#/c/297039/, Roland if would take a look at this change that would be great, you've added v3 implementation, which I adjusted in several places, would be great to if you could check that 15:58:32 mine :D 15:58:32 so grafana2 will be a part of mitaka release, right? 15:58:57 grafana 2 won't be part of mitaka release as it isn't an official openstack project 15:59:12 but it will work and be integrated into the DevStack plugin shortly 15:59:19 tomasz: i'll review your code 15:59:23 ok. thx 15:59:26 roland: thx 15:59:31 rhochmuth: just a ping: we need to settle on talks in Austin, or not. Soon :-) 15:59:36 tanks & bye 15:59:47 yes slogan 15:59:53 i'm still waiting to hear back 16:00:05 bye 16:00:07 from the event about rooms? 16:00:07 by hosanai 16:00:13 thanks shinya 16:00:14 cheers and nice day...or evening...depends on time zone you guys are in... (laughing....) 16:00:35 #endmeeting