#openstack-monasca log

13:00:49 <witek> #startmeeting monasca
13:00:49 <openstack> Meeting started Tue Dec 10 13:00:49 2019 UTC and is due to finish in 60 minutes.  The chair is witek. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:00:50 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:00:52 <openstack> The meeting name has been set to 'monasca'
13:01:00 <witek> hello everyone
13:05:11 <witek> anyone around?
13:08:58 <chaconpiza> Hi, sorry for coming late.
13:09:12 <witek> hi Martin
13:09:19 <chaconpiza> Dobek took a day-off
13:09:46 <witek> let's start then
13:09:49 <chaconpiza> sure
13:09:52 <witek> agenda:
13:09:56 <witek> https://etherpad.openstack.org/p/monasca-team-meeting-agenda
13:10:20 <witek> #topic Promethues plugin update
13:10:28 <witek> https://github.com/stackhpc/stackhpc-monasca-agent-plugins
13:11:21 <witek> dougsz has published his extension for Prometheus plugin
13:11:37 <witek> I went through the readme today
13:11:45 <dougsz> hey, sorry i'm late
13:11:51 <witek> hi Doug
13:11:57 <witek> nice work
13:12:13 <dougsz> ah, thanks, just a test really at the moment, but it seems feasible
13:12:59 <dougsz> the main motivation was to use the Ceph Prometheus endpoint to replace the Monasca Agent Ceph collector
13:13:14 <dougsz> but should hopefully work for everything
13:13:41 <dougsz> I was wondering about the best approach to merge it into Monasca Agent
13:13:42 <witek> right, Ceph has added Prometheus instrumentation
13:14:02 <dougsz> The current Prometheus Monasca Agent plugin is heavily geared to k8s
13:14:11 <dougsz> which is probably not what a lot of people want
13:14:44 <witek> but it also allows static configuration, right?
13:14:55 <dougsz> yeah, in a very basic sense
13:15:09 <dougsz> I almost wonder if it should be the prometheus-k8s plugin
13:15:33 <dougsz> and then have a vanilla prometheus plugin like the prototype
13:15:42 <dougsz> Alternative is to have all functionality in one plugin
13:16:00 <witek> is there that many conflicting points?
13:16:52 <dougsz> There is some whitelisting capability but only for the k8s thing
13:16:59 <dougsz> which would need unifying
13:17:45 <witek> I see
13:17:46 <dougsz> `monasca.io/whitelist`
13:17:51 <dougsz> slightly weird naming
13:19:06 <dougsz> and the metrics types thing, is for k8s
13:19:16 <dougsz> but could be useful for general endpoints
13:19:19 <witek> I thought, `auto_detect_endpoints` could control if we use static endpoints configuration or K8s auto detection
13:19:52 <dougsz> yeah - I mean it works
13:20:38 <dougsz> but it seems like a chunk of the k8s specific config needs to be pulled down and made available to the non-k8s bit
13:20:59 <dougsz> and you have to worry about backwards compatible option naming etc.
13:21:24 <dougsz> hence my thoughts about making a new plugin
13:21:47 <witek> so you'd prefer to rename the existing one to prometheus-k8s, and publish the prototype based as prometheus?
13:22:13 <dougsz> something like that
13:22:50 <dougsz> probably best not to rename the existing one though for backwards compatibility
13:23:05 <witek> fine for me, if that's much easier to implement
13:23:37 <dougsz> thanks - it's much easier for me too, given limited time
13:25:34 <witek> the plugin name is not that important after all, we could document both options in the same section and point to two different configuration files
13:26:07 <dougsz> that's true - I can push something to that effect
13:26:18 <witek> chaconpiza: any thoughts?
13:27:12 <chaconpiza> to be in the context... the goal of this new plugin is to handle the new version of Ceph?
13:28:29 <witek> new Ceph versions offer Prometheus instrumentation, so yes, we can monitor Ceph with this new Prometheus plugin
13:28:53 <witek> but also any other Prometheus endpoints
13:29:15 <dougsz> chaconpiza: Ceph was the motivation (because the existing plugin parses the Ceph CLI which changes from release to release)
13:29:36 <chaconpiza> yes, I remember.
13:29:46 <dougsz> but yeah, as witek says, to make it easier to use Prometheus endpoints in general without running another Monasca service to compute rates etc.
13:30:04 <chaconpiza> then it sound good like a long term solution
13:31:21 <witek> I'm a little worried about performance with larger setups, where we'd like to scrape several endpoints and define multiple aggregations
13:31:59 <witek> in the long term aggregation on Monasca server would be better
13:32:36 <witek> but that's more work, so I'm happy with this plugin
13:33:08 <dougsz> One advantage of this is that the whitelist can greatly reduce the amount of data going into Monasca
13:33:31 <witek> right, that's very useful
13:33:37 <dougsz> Many Prometheus endpoints produce vast amounts of data at scale (eg. Ceph, 1M scrapes for 10 node cluster)
13:34:33 <dougsz> But I agree, people may want to the agent to be lightweight and shift the compute burden to the central Monasca deployment
13:34:46 <dougsz> *want the
13:34:52 <witek> do you remember this component?
13:34:56 <witek> https://github.com/monasca/monasca-aggregator
13:35:07 <dougsz> yeah
13:35:31 <witek> we could implement it scalable with Faust
13:36:03 <dougsz> That's a good idea
13:36:45 <witek> so many nice things we can do :)
13:37:12 <witek> I was reading through the doc of your Prom plugin
13:37:29 <witek> the `counter` section is somewhat missleading
13:38:06 <dougsz> any feedback much appreciated :)
13:38:36 <witek> will you be proposing it upstream in the next time?
13:39:00 <dougsz> yeah, I will do that, probably the best place to dicuss
13:39:17 <witek> very nice, thanks a lot!
13:39:44 <witek> can we move on?
13:39:56 <dougsz> please do, thanks
13:40:02 <witek> #topic review
13:40:18 <witek> we've made some progress on reviews this week
13:40:41 <witek> the merging of DevStack plugin landed
13:41:10 <witek> also updating ELK change has been updated
13:41:48 <witek> here our board:
13:41:51 <witek> https://storyboard.openstack.org/#!/board/190
13:42:14 <witek> I started looking at periodic notifications
13:42:25 <witek> https://storyboard.openstack.org/#!/story/2006837
13:42:38 <witek> the changes have been up for much too long already
13:43:11 <witek> other one needing attention is IPv6 support:
13:43:17 <witek> https://review.opendev.org/673274
13:43:46 <witek> Adrian has submitted new change deleting the old plugin from monasca-log-api:
13:43:52 <witek> https://review.opendev.org/690527
13:44:23 <witek> do you have some more reviews you'd like to mention?
13:45:07 <dougsz> None from me
13:45:10 <witek> #topic new bugs
13:45:20 <witek> we have one new bug report this week
13:45:25 <witek> https://storyboard.openstack.org/#!/story/2006984
13:45:37 <witek> it's about upgrading the DB schema
13:46:04 <dougsz> yes, that's mine, I will push a fix soon
13:46:23 <witek> nice, thank dougsz
13:46:42 <dougsz> I should have spotted that really - I think the alembic step just needs to query existing plugins which are configured and skip deleting the associated types
13:47:23 <dougsz> Worth knowing about if anyone is upgrading anytime soon
13:48:01 <witek> that's from Queens to any other version?
13:48:19 <dougsz> yeah, we did Queens -> Rocky -> Stein and hit it then
13:48:56 <witek> do you know at which step?
13:49:13 <witek> I mean -> R or -> S
13:49:37 <dougsz> I *think* it is in Rocky where we got rid of built in notification types
13:50:34 <dougsz> No stein actually
13:51:25 <witek> ok, thanks, we're upgrading to Rocky but I think we didn't hit it
13:52:06 <witek> #topic AOB
13:52:16 <chaconpiza> We had in a pre-production-env suddenly a big increase of memory consuption from Influxdb 1.3.4
13:52:31 <chaconpiza> We noticed that the monasca-metric-agent was wrongly configured in the url of Nova for the metric "http_status".
13:53:05 <chaconpiza> So it was producing the metrics with a long string in the "value_meta" besides of having "1" as the metric "value".
13:53:21 <chaconpiza> we are wondering whether Influxdb has troubles to process a big amount of points with this "long value_meta"
13:53:34 <dougsz> chaconpiza: Issue with the detection plugin not configuring the URL correctly?
13:54:09 <chaconpiza> In our devs machines the detection plugin works well and it end up with correct URL for keystone, nova, cinder, etc
13:54:30 <dougsz> value_meta limit is ~2kb right? the main issue I have seen is with two many unique dimensions
13:54:31 <chaconpiza> I am not sure how the pre-production team got it
13:54:41 <dougsz> *s/two/too
13:54:58 <chaconpiza> because of the cardinality?
13:55:01 <dougsz> yeah
13:55:31 <chaconpiza> We are simulating the metric-agent with: https://github.com/monasca/monasca-perf/blob/master/scale_perf
13:55:38 <chaconpiza> agent_simulator.py but setting a long string in the "value_meta".
13:55:52 <chaconpiza> in order to reproduce the issue
13:58:12 <chaconpiza> we will keep you informed in case we can break influxdb because of big "value_meta"
13:58:44 <dougsz> thanks, it's probably worth investigating InfluxDB 1.7 as 1.3 has a security issue where dimension values are leaked across tenants
13:59:11 <witek> I think it would be better to improve the auto-detection script to configure the agent as needed
14:00:04 <chaconpiza> so far all metric-agent's configuration were fixed manually and restarted
14:00:56 <chaconpiza> in pre-prod a single tenant is being used
14:01:13 <witek> OK, please keep us updated
14:01:22 <witek> the time is over
14:01:30 <witek> thanks for the discussions
14:01:43 <dougsz> thanks all, bye
14:01:48 <chaconpiza> thanks
14:01:52 <witek> thanks, bye
14:01:55 <witek> #endmeeting