#openstack-monasca log

15:01:33 <witek> #startmeeting monasca
15:01:33 <openstack> Meeting started Wed Feb 20 15:01:33 2019 UTC and is due to finish in 60 minutes.  The chair is witek. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:36 <openstack> The meeting name has been set to 'monasca'
15:01:45 <witek> hello everyone
15:01:48 <koji_n> hello
15:01:54 <witek> hi koji_n
15:02:37 <Dobroslaw> Hi
15:02:38 <mohankumar> hi eveyone
15:02:50 <witek> hi Dobroslaw and mohankumar
15:03:12 <witek> Courtesy Monasca meeting reminder in #openstack-monasca: witek, jayahn,iurygregory,ezpz,igorn,haad,sc,joadavis, akiraY,tobiajo,dougsz_,fouadben, amofakhar, aagate, haruki,kaiokmo,pandiyan,charana,guilhermesp,chaconpiza,toabctl
15:03:24 <dougsz> hi all
15:03:26 <sc> here I'm
15:03:34 <witek> agenda:
15:03:38 <witek> https://etherpad.openstack.org/p/monasca-team-meeting-agenda
15:03:46 <witek> #topic reviews
15:03:58 <witek> I have just one new one:
15:04:03 <witek> https://review.openstack.org/636337
15:04:50 <joadavis> more tests are appreciated. will have to read that one
15:05:47 <witek> I have also updated the alembic DevStack change to install missing jira dependency for monasca-notification
15:05:53 <witek> https://review.openstack.org/622361
15:06:40 <witek> some other reviews to pull attention to?
15:07:23 <dougsz> There was this one: https://review.openstack.org/#/c/637190/
15:08:01 <witek> oh yes, thanks dougsz
15:08:23 <Dobroslaw> dougsz: thx for comments
15:08:35 <Dobroslaw> michal is still thinking about it
15:09:19 <Dobroslaw> he said that api is gladly taking metrics with old timestamp but persister is not so happy about them
15:09:29 <Dobroslaw> he will look into it more
15:09:36 <dougsz> Sounds good, great to have another contributor.
15:09:52 <dougsz> thanks Dobroslaw
15:10:06 <Dobroslaw> yep, hi is learning quite fast
15:10:16 <Dobroslaw> *he*
15:10:33 <witek> persister will drop messages based on retention period
15:12:05 <Dobroslaw> shouldn't they be dropped already in the API?
15:13:26 <mohankumar> witek : qq , persister have rentention policy ? or you are referring InfluxDB retention DB retention policies ?
15:13:37 <witek> we don't control retention policy in the API
15:13:54 <witek> mohankumar: InfluxDB retention period
15:14:11 <mohankumar> witek : okay
15:14:26 <Dobroslaw> ok
15:14:52 <witek> Dobroslaw: we could do that, but that's additional logic which have to be done on every message
15:15:12 <Dobroslaw> hmmm, yea
15:16:40 <witek> let's discuss it further in review
15:16:45 <Dobroslaw> ok
15:17:01 <witek> can we move on to the next topic?
15:17:28 <witek> #topic Reliance Jio deployment
15:17:41 <mayankkapoor> Hi all
15:17:45 <witek> hi mayankkapoor
15:17:45 <mayankkapoor> Sorry it's been a while
15:17:49 <mayankkapoor> Wanted to give a quick update
15:18:03 <mayankkapoor> I've mentioned status of deployment in meeting agenda
15:18:32 <witek> thanks, great to hear that
15:18:45 <mayankkapoor> Deployed across 352 bare-metals at the moment (single openstack cluster), working fine, few issues we're working through when they come
15:19:16 <mayankkapoor> Any specific items I should talk about?
15:19:49 <joadavis> Are there any of the monasca metrics you are finding particularly useful?
15:20:01 <witek> you've deployed with Docker Swarm, would it be possible to share configuration?
15:20:47 <mayankkapoor> @joadavis: We've started with CPU and RAM mainly. We have built a custom console for our users, and we're showing these in that UI.
15:20:52 <openstackgerrit> weizj proposed openstack/python-monascaclient master: Update hacking version  https://review.openstack.org/627725
15:21:50 <mayankkapoor> @witek: Sure, yes sure no problem. How do you recommend we share the config? github.com?
15:22:03 <openstackgerrit> weizj proposed openstack/python-monascaclient master: Update hacking version  https://review.openstack.org/627725
15:22:14 <witek> yes, github would be great
15:22:28 <mayankkapoor> Ok let me upload to github.com and send the link here
15:22:37 <witek> great, thanks
15:22:38 <Dobroslaw> mayankkapoor: any monasca components working slower/unexpectly than others?
15:22:56 <witek> have you managed to improve persister performance
15:22:57 <witek> ?
15:23:51 <mayankkapoor> @Dobroslaw: We had some issues with Timeouts on InfluxDB, but that was mainly due to bombarding InfluxDB with large batch sizes (50k) and lots of persisters (5).
15:24:25 <mayankkapoor> Now we're using 5k batch size and 10 persisters. mohankumar works with me and can confirm latest config
15:24:31 <openstackgerrit> weizj proposed openstack/python-monascaclient master: Update hacking version  https://review.openstack.org/627725
15:25:02 <mayankkapoor> @witek: Things are working fine with persisters and InfluxDB now with 5k batch and 10 persisters
15:25:16 <mayankkapoor> Still early days through
15:25:39 <dougsz> mayankkapoor: How many metrics a second do you ingest?
15:25:54 <mayankkapoor> We've also built our own monitoring scripts for errors/warnings we got, which we'll share in github.com repo
15:26:42 <mohankumar> witek: hi , I m concerned about DB writing speed . We use API post to write into DB , Cluster DB would help to scale DB and perforamnce
15:27:01 <mayankkapoor> @dougsz: Think I will need some help how to calculate the exact number. So 352 bare-metals with monasca-agent sending libvirt plugin data every 30 seconds. Roughly 5 VMs on each bare-metal.
15:27:21 <joadavis> If I had more expertise, I'd love to share the Cassandra setup with the docker based installs. :/
15:28:54 <joadavis> we use Cassandra in a nicely clustered setup, but we still install using our own methods
15:29:17 <witek> mohankumar: InfluxDB can ingest >1.000.000 measurements/s, the bottleneck is the persister
15:29:31 <witek> in particular the old Kafka consumer
15:29:55 <witek> I hope I can provide the new implementation soon
15:30:00 <mohankumar> witek : im getting InfluxDB timeout error if i add more batch size
15:30:10 <mayankkapoor> @dougsz: We can share some data from our Kafka-admin page to give you some idea about TPS
15:30:23 <mayankkapoor> ^^^ increase batch size
15:31:19 <dougsz> thanks mayankkapoor, i'm always interested to hear about performance at scale
15:31:46 <mayankkapoor> Main thing we did for this setup was use GlusterFS for HA across three docker worker VMs
15:31:53 <mayankkapoor> This was a huge risk
15:32:12 <mohankumar> witek : just to add on Persister giving  InfluxDB timeout error if i increasing batch size
15:32:17 <mayankkapoor> However, we reasoned that we're not running active-active containers, so might be ok
15:33:29 <mayankkapoor> So when a stateful container dies, it respawns on another node and it has access to the same data it had previously
15:34:31 <mayankkapoor> We tested each component, MySQL, InfluxDB and monasca containers individually for HA with GlusterFS. Then proceeding for prod deployment.
15:34:39 <witek> so you add HA to InfluxDB that way
15:34:43 <mayankkapoor> Yup
15:34:50 <witek> do you know what is performance impact?
15:35:18 <joadavis> coool
15:35:27 <mayankkapoor> Hmm no we don't yet, haven't gotten around to testing a setup without GlusterFS and comparing
15:36:00 <dougsz> mayankkapoor: Are you using rdma transport for Gluster share?
15:36:43 <mayankkapoor> @dougsz: Hmm need to check. We're using a GlusterFS replicated volume with 3 replicas, and we haven't changed any of the defaults.
15:37:38 <dougsz> Cool, there is also nufa which is quite neat - if your gluster storage is local to the docker workers it can right directly to a local drive
15:38:00 <dougsz> This might be a useful reference, we use it for HPC activities: https://github.com/stackhpc/ansible-role-gluster-cluster
15:38:38 <mayankkapoor> Based on reading the GlusterFS docs, RDMA transport needs to be enabled. So no, we haven't enabled RDMA yet.
15:39:12 <mayankkapoor> Yeah our gluster storage is local to the worker VMs
15:39:22 <mayankkapoor> Hmm wait, we're using Ceph
15:39:27 <mayankkapoor> So not local
15:39:31 <mayankkapoor> Used Ceph RBD
15:39:44 <dougsz> Ah ok
15:40:49 <mayankkapoor> So we'll share our deployment doc and config on github.com for review
15:41:49 <witek> you mentioned problems with network metrics, do you mean standard system plugin metrics?
15:43:37 <mayankkapoor> @witek: We've disabled following in libvirt plugin: ping_check=false vm_ping_check_enable=false vm_network_check_enable=false
15:44:19 <witek> i see, thanks
15:44:27 <mayankkapoor> If we enable vm_network_check_enable, it loads our OpenContrail controllers too much. We tested load balancing on our OpenContrail controllers, and that worked fine
15:45:07 <mayankkapoor> Current hypothesis that we need to test is that monasca-agent is getting some unexpected response from the SDN controllers, and keeps querying the controllers rapidly
15:45:20 <mayankkapoor> Haven't gotten around to checking this at the moment.
15:45:48 <mayankkapoor> Rather than every 30 sec, it querying faster
15:46:07 <witek> please report a bug if you can confirm that
15:46:19 <mayankkapoor> sure
15:47:37 <witek> what OpenStack distribution do you use?
15:49:12 <mayankkapoor> Monasca deployment is for Ocata cloud. However, We have 7 production clouds with between 100-500 bare-metals each, and staging environments for each. Various versions of openstack. Oldest are on Liberty, latest on Pike.
15:49:16 <mohankumar> witek : Ocata
15:49:54 <witek> which OS?
15:50:13 <mayankkapoor> Ubuntu1604
15:50:42 <witek> meaning, Ubuntu agent packages would be handy
15:51:32 <mayankkapoor> Yes. However, we use the Mirantis distribution of openstack (they provide L4 support for us). So getting these bundled in the OS is a bit challenging for us.
15:53:09 <witek> thanks for the update
15:53:21 <mayankkapoor> You're welcome, and thanks for the great work on Monasca
15:53:23 <witek> that great to hear your feedback
15:53:46 <mayankkapoor> We'll try to figure out how we can contribute further.
15:53:57 <mohankumar> question : does the current monasca help me to get vm disk usage , if I'm using ceph storage. I can see baremetal (cmp nodes) disk usage , not from VM . If I enable monasca ceph plugin as per document   https://github.com/openstack/monasca-agent/blob/master/docs/Plugins.md#ceph . I'll get only ceph cluster metrics not from Individual VM . Is there any way in current monasca i can get from each vm with  ceph storage?
15:55:15 <mohankumar> I hope this question based on the agenda line what we ve :)
15:55:31 <witek> I'm afraid I cannot answer that right now
15:56:04 <witek> I'll check
15:56:52 <mohankumar> witek : sure thanks .
15:57:25 <witek> good support for Ceph is important, so if there's anything missing, we should think about closing the gaps
15:57:56 <dougsz> We've got update for Luminous support in the pipeline
15:58:18 <dougsz> The existing plugin can't handle the newer releases
15:58:40 <witek> thanks again for the update
15:58:43 <witek> I have to finish the meeting soon
15:58:51 <witek> #topic backlog
15:58:55 <witek> short update
15:58:57 <dougsz> yep, thanks mohankumar
15:59:06 <witek> I have added two stories to the backlog
15:59:44 <witek> especially running Python3 unit tests for monasca-agent
16:00:01 <witek> we still don't run them
16:00:14 <witek> all from me
16:00:19 <witek> thanks for joining
16:00:23 <witek> see you next time
16:00:24 <Dobroslaw> one info
16:00:24 <Dobroslaw> API docker image is now pushed from zuul on master, need to wait for tagging to see if it will be pushed with proper tag and then I will replace first image on github.com/monasca/monasca-docker with this one
16:00:26 <witek> #endmeeting