#openstack-meeting-3 log

15:02:27 <rhochmuth> #startmeeting monasca
15:02:28 <openstack> Meeting started Wed Mar  9 15:02:27 2016 UTC and is due to finish in 60 minutes.  The chair is rhochmuth. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:02:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:02:31 <openstack> The meeting name has been set to 'monasca'
15:02:35 <rhochmuth> o/
15:02:43 <tgraichen> o/
15:02:43 <slogan_r_> morning
15:02:43 <ho_away> hi
15:02:44 <bklei> o/
15:02:48 <rhochmuth> running a little late this morning
15:02:50 <shinya_kwbt> o/
15:02:50 <rhochmuth> sorry
15:03:10 <slogan_r_> it's 7 am here, late is good :-)
15:03:55 <rhochmuth> #topic summit
15:04:18 <rhochmuth> ment to mention that the agenda is posted at, https://etherpad.openstack.org/p/monasca-team-meeting-agenda
15:04:26 <rhochmuth> Agenda for Wednesday March 9, 2016 (15:00 UTC)
15:04:26 <rhochmuth> 1.	Austin Summit Monasca Sessions
15:04:26 <rhochmuth> 2.	Monasca Agent discussion from previous week with SAP regarding overriding dimensions
15:04:26 <rhochmuth> 3.	Stale alarms for metrics that don't exist anymore (deleted vms) -- how should we address?
15:04:26 <rhochmuth> 4.	Multiple metrics per http request on statistics and measurements resource.
15:04:27 <rhochmuth> 1.	See review at https://review.openstack.org/#/c/289675/
15:04:27 <rhochmuth> 5.	Brief status update for Anomaly Detection
15:04:46 <rhochmuth> So, it looks like we had a descent number of Monasca related sessions
15:04:48 <rhochmuth> accepted
15:04:56 <rhochmuth> Thanks everyone
15:05:22 <jobrs> hi
15:05:28 <rhochmuth> hi jobrs
15:05:43 <jobrs> here for topic 2.
15:05:50 <rhochmuth> i've put it for four design summit sessions at the summit
15:06:08 <rhochmuth> assuming that occurs, we should be able to have some discussions there
15:06:37 <ho_away> nice!
15:06:56 <rhochmuth> so, i'll keep you posted on what i get, and then we can adjust the agenda as we get closer
15:07:45 <rhochmuth> the the summary is overall planning/status, discussion on new features and performance, logging api/implementation, and monasca/networking/broadview
15:08:05 <rhochmuth> if we run over, then we can find open spots to discuss more
15:08:16 <rhochmuth> does that sound reasonable?
15:08:32 <bklei> sounds good to me
15:08:58 <slogan_r_> sounds good here
15:09:05 <ho_away> if possible, i would like to add anomaly detection
15:09:18 <ho_away> in new features?
15:09:30 <rhochmuth> ho_away: if you'll be attending then i'll request a another spot for that topic
15:09:41 <rhochmuth> unfortunatley, the folks from bristol won't be there
15:09:54 <rhochmuth> so, i was worried about general attendance
15:10:14 <rhochmuth> but i'll request another spot, and we can discuss that topic in detail with whoever attends
15:10:20 <ho_away> i know, i didn't get permission for it yet but i will go there.
15:10:37 <rhochmuth> i'll definitely be up for a discssuon and planning on that topic
15:10:58 <ho_away> after i get permission i will let you know thanks!
15:11:03 <rhochmuth> ok, thanks
15:11:14 <rhochmuth> #topic agent
15:11:30 <rhochmuth> jobrs: this is a carry over from last week
15:11:36 <rhochmuth> i guess we broke you
15:11:44 <jobrs> yep
15:11:53 <rhochmuth> so, do you have a proposal
15:12:09 <jobrs> I shared some links last time, they are in the logs
15:12:19 <rhochmuth> we could add a parameter to adjust the behaviour
15:12:29 <rhochmuth> keep the old or the new
15:12:41 <rhochmuth> based on a parameter
15:12:48 <rhochmuth> woudl that be acceptable?
15:12:59 <jobrs> I would be happy already if we would have a shared view on what the 'service' dimension is good for
15:13:14 <jobrs> a) openstack service
15:13:19 <jobrs> b) technical service
15:13:50 <jobrs> a) means that plugins for generic components/services do not know the answer
15:14:17 <jobrs> b) means that plugins for generic components set it themselves
15:14:33 <jobrs> but b) also means that there is not standard dimension for openstack services as registered in the ks catalog
15:14:53 <rhochmuth> This is what we've been trying to do
15:14:57 <jobrs> b) also means that we have quite redundancy between component and service - what is the difference at all?
15:15:24 <jobrs> to me the previous behavior was more consistent
15:15:44 <jobrs> have 'user' set 'service' if it is a generic component
15:16:04 <jobrs> and not do something like component='mysql', service='mysql',process='mysqld'
15:16:06 <rhochmuth> service = compute, networking, …, when the entity being monitored corresponds to a a specific openstack component, such as nova-api
15:16:16 <jobrs> exactly
15:16:23 <rhochmuth> not done yet
15:16:40 <jobrs> it is for monasca at least
15:17:00 <rhochmuth> then service = mysql, rabbitmq, …, if it is a "shared" service, unless the shared component isn't really being shared
15:17:23 <rhochmuth> it is often the case the mysql and rabbitmq are sharess across many openstack services
15:17:38 <rhochmuth> but it can be the case that it is deployed 1-to-1 with a service
15:17:39 <jobrs> agreed, and in that case I believe it is fair that the one configuring the agent is taking care to set the --service parameter on monasca-setup
15:18:22 <rhochmuth> So, component should correspond to the specific "process" that is being montiroed
15:18:28 <jobrs> in any case the plugin cannot possibly know
15:18:29 <rhochmuth> so, for Nova
15:18:39 <rhochmuth> service=compute, component = nova-api, ...
15:18:49 <rhochmuth> For "mysql"
15:18:52 <rhochmuth> service = mysql
15:18:56 <rhochmuth> component = mysqld
15:19:14 <rhochmuth> In some cases, the values are the same
15:19:21 <jobrs> there is a process dimension
15:19:56 <jobrs> the plugin cannot know for what service mysql is used
15:20:11 <jobrs> same with apache
15:20:36 <jobrs> etc., so plugins should IMHO set dimensions sparingly
15:20:45 <rhochmuth> But, it can be overriden
15:20:46 <jobrs> or at least not override 'service'
15:20:57 <jobrs> no, it cannot - no longer
15:21:17 <jobrs> the order was reversed
15:21:32 <rhochmuth> got it
15:21:44 <rhochmuth> i think we just coded to what worked for our deployment
15:22:03 <jobrs> same with us :-)
15:22:22 <rhochmuth> we ran into a problem because there were plugins that were not setting the service dimension
15:22:39 <rhochmuth> this created problems in the ui
15:23:01 <rhochmuth> so, in that case what we wanted was to always supply a service dimension = "uncategorized"
15:23:01 <jobrs> sure, but this can be fixed when configuring the agent, not?
15:23:30 <bklei> i'm concerned about automatically changing the default dimensions -- if we do that i'd prefer the change be config file driven so old behavior continues to work
15:23:49 <jobrs> maybe this belongs to the UI layer? not sure
15:23:56 <bklei> we've run into issues with dimension changes and bloat
15:24:30 <jobrs> us too, that is a big issue in my opinion
15:24:32 <rhochmuth> so, my proposal is to restore the old behaviour, and then create an option to enable the new behaviour
15:24:52 <bklei> +1
15:24:53 <rhochmuth> i'm trying to get the rbrandt
15:25:15 <rhochmuth> he's not arround
15:25:16 <jobrs> adds complexity
15:25:24 <Christian____> there is an "old" bug: A metrics graph becomes not to appear when adding/deleting a dimension https://bugs.launchpad.net/monasca/+bug/1485859
15:25:24 <openstack> Launchpad bug 1485859 in Monasca "A metrics graph becomes not to appear when adding/deleting a dimension." [Undecided,Triaged]
15:25:51 <Christian____> that means changing dimensions is not a good idea...
15:26:18 <jobrs> that is what I meant with big issue. you cannot force people not to add dimensions, that is what they are good for
15:26:46 <rhochmuth> what we wanted to have happen was to always have a dimension of service=uncategorized
15:26:57 <rhochmuth> if one wasn't supplied
15:27:27 <jobrs> to me this looks like a presentation-layer problem
15:27:29 <rhochmuth> i'm not exactly sure at this point, why that ended up modifying the default behavriou
15:28:01 <rhochmuth> so, i'll check with rbrandt and come up with a proposal to fix
15:28:10 <rhochmuth> if that sounds ok
15:29:17 <rhochmuth> it isn't a presentation layer problem, btw
15:29:33 <rhochmuth> the problem occurs when searching for metrics and alarms
15:29:51 <rhochmuth> we don't support being able to get metrics and alarms that don't have a supplied dimension
15:29:51 <jobrs> I was just talking of the specific case of the service dimension
15:30:05 <bklei> agreed -- if this is the issue rbak found -- you end up with metrics you can't query for without merging
15:30:13 <jobrs> other than that I believe it is a bigger issue which will not be solved by the default service domain value
15:30:37 <jobrs> +1
15:30:38 <rhochmuth> so, you can't search for  the absencse of a dimension
15:30:48 <rhochmuth> today
15:30:55 <rhochmuth> and there is no way to do that in some databases
15:30:58 <rhochmuth> like influxdb
15:31:38 <rhochmuth> so, we wanted to supply a default dimension = uncategoried everywhere
15:31:53 <rhochmuth> but that broke the original baheviour
15:31:55 <jobrs> but it is not just "service"
15:32:09 <rhochmuth> yes, it applies to any dimension
15:32:36 <jobrs> so this is what I do not like about the option, it does not really fix the problem (for us)
15:32:51 <rhochmuth> but from a ui perspective we usually only group by hostname, service,
15:33:00 <rhochmuth> why?
15:33:29 <rhochmuth> why doesn't it work?
15:33:41 <jobrs> we have other dimensions
15:34:09 <jobrs> e.g. in kubernetes: namespace, ressource_controller, ...
15:34:37 <jobrs> the great thing about dimensions is that you can have your own ones
15:35:25 <Christian____> Roland: Is the number of dimensions fixed (and should not be changed)? All not used dimensions will have default value "uncategorized"?
15:37:38 <rhochmuth> Besides going back to the old behaviour, which doesn't fix the problem I'm trying to address, is there a specific proposal that we can implement
15:37:55 <rhochmuth> We have a problem
15:38:10 <rhochmuth> I'm just looking for a specific way to resolve at this point that is implementatble
15:38:22 <jobrs> sure
15:38:47 <jobrs> unfortunately I am not an influxdb expert
15:38:48 <tgraichen> but in your case, can't we just add service=uncategorized in the monasca agent in case the service is not set from the plugin and not via agent config?
15:39:33 <rhochmuth> ok, i'll look into that
15:39:53 <rhochmuth> i don't have an answer right now
15:39:54 <slogan_r_> that would imply the db defaults to that value when items added or removed, correct? Is that the case now?
15:39:56 <jobrs> the global configuration is overridden by the plugins
15:40:05 <rhochmuth> but, if it is possible, we'll try to do that
15:40:15 <jobrs> my proposal would be to make merge-metrics a default
15:40:20 <slogan_r_> er, added or updated
15:40:40 <rhochmuth> ok, i like that idea
15:41:00 <jobrs> äh, ...my default
15:41:05 <rhochmuth> not sure why we didn't do that, but i'll investigate and try to get back to that
15:41:48 <jobrs> so you tell the API which dimensions should be expanded
15:41:58 <jobrs> and the remaining ones are merged
15:42:05 <jobrs> that gives some stability
15:42:51 <jobrs> rbaks grafana plugin is exposing this behavior to the user, so it is possible it seems
15:43:31 <jobrs> if the api would support it, too, then the db-driver could do optimizations to reduce the number of queries (mid-term)
15:46:05 <rhochmuth> ok, i'll look at the code and work with rbrandt and see what we come up with
15:46:09 <rhochmuth> sound good?
15:46:51 <jobrs> sounds great
15:46:55 <jobrs> thank you
15:47:06 <rhochmuth> ok, thanks
15:47:11 <rhochmuth> switching topics
15:47:19 <rhochmuth> #topic stale alarms
15:47:28 <bklei> that's me
15:47:34 <rhochmuth> sorry times up
15:47:38 <rhochmuth> just kidding
15:47:41 <bklei> anyone else encounter this? :)
15:47:44 <ho_away> lol
15:47:47 <rhochmuth> yes,
15:47:48 <bklei> so here's the scenario
15:48:08 <bklei> the overview page in horizon, when tracking vms with alarm defs
15:48:08 <rhochmuth> we, this is one of the topi issues we've been looking at
15:48:17 <bklei> goes gray after a vm goes away
15:48:25 <bklei> and requires a manual alarm delete step
15:48:48 <bklei> so we could have a prune process specific to this, but i wonder if there's a better idea/solution
15:48:54 <bklei> to handle stale stuff in the UI
15:49:01 <rhochmuth> there is a better way
15:49:09 <bklei> good, i'm all ears
15:49:27 <rhochmuth> but first let me say that we are solving this in our helion distributino using a script and cron job
15:49:36 <rhochmuth> that is a short-term solution
15:49:48 <rhochmuth> because the better solution is harder
15:49:50 <bklei> right, and we could do the same here -- we'd welcome you sharing ^^
15:50:07 <rhochmuth> i'll try and get that script open-sourced somewhere
15:50:26 <bklei> gracias, and there's discussion about a more elegant solution?
15:50:27 <rhochmuth> the way we were intending was to use the Events API
15:50:34 <bklei> maybe discussion in austin?
15:50:46 <rhochmuth> Sure, we can discuss
15:50:48 <bklei> aah, so  lifecycle trigger
15:50:50 <bklei> i like that
15:50:53 <rhochmuth> Correct
15:51:02 <rhochmuth> the Events APi woudl receive all VM lifecycle events
15:51:36 <bklei> that's clean -- so we can do the cron/script as a workaround till then
15:51:40 <rhochmuth> and the events engine would have a handler associated with it to delete the alarm, when the VM end event occurs
15:52:06 <rhochmuth> The only alternative right now is a cron/script
15:52:26 <rhochmuth> the script atually invokes the nova api to determine the VMs that have been deleted
15:52:36 <rhochmuth> so it is not purely time based
15:52:38 <slogan_r_> that would work
15:52:40 <bklei> nice
15:53:06 <rhochmuth> the problem with the events api/engine is that it is not getting any development time right now
15:53:15 <rhochmuth> so, it could be a long wait
15:53:25 <rhochmuth> for that to be deliever
15:53:28 <slogan_r_> so, I presume these events events are defined by nova and they are publishing to OSLO?
15:53:34 <rhochmuth> correct
15:53:40 <slogan_r_> s/events events/events/
15:53:48 <rhochmuth> openstack notification, VM lifecycle events
15:53:56 <rhochmuth> there is a wiki that describes them all
15:54:02 <rhochmuth> but i dont' have the link
15:54:06 <rhochmuth> right now
15:55:10 <rhochmuth> so, is that enought of a discussion
15:55:12 <rhochmuth> on that topic
15:55:14 <bklei> thx
15:55:23 <rhochmuth> we are running out of time
15:55:34 <rhochmuth> #topic multiple metrics
15:55:39 <rhochmuth> So, i posted some code
15:55:42 <rhochmuth> it isn't complete
15:55:58 <rhochmuth> if bklei and rbak could look at it that woudl be good
15:56:26 <rhochmuth> what i would like to do is add a query parameter, "multiple_metrics" or something simialr
15:56:37 <bklei> yeah, looks good to me if we add the parm to differentiate behavior
15:56:45 <rhochmuth> to enable returning multiple metrics in a single measurements or statistics resource
15:56:53 <bklei> we'll definitely be using that
15:56:57 <rbak> I'll take a look as soon as the meeting is done.
15:56:59 <rhochmuth> this will probably improve your over all query perofmrance 10X at least
15:57:01 <rhochmuth> if not more
15:57:08 <rhochmuth> i'm actually hoping for 100X
15:57:30 <rhochmuth> i don't think this is possibel for influxb using an in-database query
15:57:43 <bklei> gonna be freaking awesome -- in vertica land :)
15:57:43 <rhochmuth> so, please take a look
15:58:27 <rhochmuth> I think we are out of time for anomaly detection
15:58:43 <ho_away> ok, next week :-)
15:58:48 <rhochmuth> sorry ho_away
15:58:52 <rhochmuth> you are up first next week
15:59:03 <rhochmuth> i'll also touch-base with luis
15:59:04 <ho_away> thanks!
15:59:29 <ho_away> yeah, i will send you email to have a meeting
15:59:31 <rhochmuth> jobrs: will get back to you
15:59:36 <bklei> thx for hosting rhochmuth!
15:59:39 <rhochmuth> thx ho_away
15:59:45 <rhochmuth> thanks everyone
15:59:49 <tgraichen> bye
15:59:51 <jobrs> thank you, looking forward too multiple metrics, too
15:59:54 <slogan_r_> later
15:59:56 <ho_away> thanks Roland
16:00:04 <Christian____> bye and thx
16:00:16 <shinya_kwbt> bye
16:00:16 <rhochmuth> #endmeeting