15:00:49 <srwilkers> #startmeeting openstack-helm
15:00:50 <openstack> Meeting started Tue Nov 14 15:00:49 2017 UTC and is due to finish in 60 minutes.  The chair is srwilkers. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:51 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:53 <openstack> The meeting name has been set to 'openstack_helm'
15:01:29 <srwilkers> #topic rollcall
15:01:33 <korzen_> hello
15:01:37 <srwilkers> o/
15:01:39 <jayahn> o/
15:02:01 <mateuszb> o/
15:02:23 <srwilkers> here's the agenda:  https://etherpad.openstack.org/p/openstack-helm-meeting-2017-11-14
15:02:33 <srwilkers> we'll give it a few minutes to see if anyone else comes along or wants to add to it
15:03:32 <v1k0d3n> hey all! o/\o
15:03:41 <srwilkers> hey v1k0d3n \o/
15:03:43 <v1k0d3n> o/*\o
15:04:56 <srwilkers> alright, seems we've got a good list to start with
15:05:11 <srwilkers> #topic: graph drawing in documentation
15:05:14 <srwilkers> all you jayahn
15:05:35 <jayahn> I just wanted to have a graph drawing capability in doc. :)
15:05:51 <jayahn> i did my best to copy the example configuration.
15:06:04 <srwilkers> i agree -- pictures are a great way to share things
15:06:46 <jayahn> not sure if i did all the necessary stuff to setup "sphinxcontrib-blockdig". if anyone can give more feedback. please. :)
15:06:50 <srwilkers> admittedly i dont know enough about getting it enabled to tell if that's the right way or not
15:07:04 <srwilkers> lamt is really good at that sort of stuff
15:07:19 <lamt> o/
15:07:28 <jayahn> ah. lamt is here. :)
15:07:42 <jayahn> https://review.openstack.org/#/c/519653/
15:08:43 <jayahn> pls leave any feedback on ps. :)
15:08:46 <srwilkers> ill poke lamt and make sure he gives some feedback there
15:08:52 <srwilkers> anything else on this jayahn?
15:08:57 <jayahn> nope
15:09:05 <srwilkers> sweet
15:09:09 <srwilkers> #topic fluent-logging
15:09:17 <srwilkers> i like this one -- take it away jayahn
15:09:40 <jayahn> we are almost done with putting flunt-based logging.
15:10:00 <jayahn> this is be the first one, it will be followed by next steps.
15:10:16 <jayahn> however, is openstack-helm-infra gate is working properly?
15:10:45 <srwilkers> yeah, it's working -- i can work with you and provide feedback to do what we need to run it properly
15:11:02 <jayahn> okay. that would be great.
15:11:17 <srwilkers> once things are tidied up, ill make sure everythings documented appropriately so adding new services is easy
15:11:29 <jayahn> can you do through review? or will it be better to setup  a serate time?
15:11:50 <srwilkers> id be happy to do it through review
15:11:52 <srwilkers> if that's okay
15:11:55 <jayahn> okay. great!
15:12:15 <srwilkers> also, wanted to get everyones and yours thoughts on something with the fluent-logging stuff
15:12:42 <jayahn> fyi, we have checked new version of fluent-bit support kubernetes plugin and some experimental kafka output. that will be our very next thing to do after this is merged.
15:13:10 <jayahn> okay. srwilkers shoot it. :)
15:13:40 <srwilkers> would it be worth considering handling the parsers and fluentbit configurations via values.yaml?  reason i ask is because while it works out of the box, its very opinionated in that it's expecting you'll only ever use the json logging driver for docker
15:14:02 <srwilkers> ie, it only gets log events via tailing /var/log/whatever and /var/lib/docker/containers/whatever
15:14:44 <srwilkers> it's not something i want us to spin the wheels on with the current patchset, because some of your work and mateuszb's work really depends on the fluent-logging stuff being finished
15:14:53 <srwilkers> but might be worth considering as an enhancement down the road
15:15:25 <jayahn> okay. we will surely consider your idea into our enhancement task.
15:15:33 <jayahn> i will talk to sungil about this.
15:15:47 <srwilkers> awesome :) i can draw up some pictures and throw some roughed up ideas your way too if that helps
15:16:04 <seungkyua> right all docker container logs are json type. We need to change that.
15:16:30 <seungkyua> and kubernetes logs in /var/log/xxx
15:16:44 <srwilkers> but really jayahn -- it's great work. :)
15:16:48 <jayahn> as you said, the current ps will be just the first of all the waves coming after. let's make this base thing work, then continuosly enhance it
15:16:54 <srwilkers> i agree
15:17:16 <srwilkers> thats all i had
15:17:36 <jayahn> seungkyua is our senior developer. he agrees with you, srwilkers.
15:17:39 <jayahn> :)
15:17:50 <srwilkers> nice -- pleasure to meet you seungkyua o/
15:18:07 <seungkyua> nice to meet you.
15:18:22 <jayahn> if he says yes.. it means yes for us :)
15:18:23 <srwilkers> #topic default alert list spec
15:18:26 <srwilkers> awesome :)
15:18:31 <seungkyua> this is the first time online chat. :)
15:18:36 <jayahn> ah. this is very very early draft.
15:19:00 <jayahn> just want to get everyone's opinion on "what is the best way to write it".
15:19:41 <mateuszb> I'll take a look at it tomorrow jayahn :)
15:20:02 <jayahn> I think we first define "alert/alarm definition" things like we would like to alert on cpu idle, cpu percent, etc.
15:20:16 <jayahn> but not defining actual trigger threshold in this spec.
15:20:17 <srwilkers> yeah, i was going to say your input would be awesome mateuszb since this touches what you've been working against
15:20:29 <v1k0d3n> jayahn: wrt to the fluent work, i can take a look at it too...we've been doing a lot of this recently too for some internal demos. would be nice to get this into upstream.
15:20:32 <srwilkers> jayahn: yeah i agree
15:20:46 <srwilkers> v1k0d3n: thatd be awesome :)
15:20:50 <jayahn> v1k0d3n, awesome :)
15:21:24 <jayahn> thanks mateuszb. i will add more alert definition in this week. your feedback would be really helpful
15:22:55 <srwilkers> anything else on this one?
15:22:59 <jayahn> i think korzen_ only has the firs half of meeting time. let's turn it to him now. :)
15:23:09 <srwilkers> sounds good
15:23:17 <korzen_> yes
15:23:26 <srwilkers> #topic multi namespace support for entrypoint
15:23:44 <korzen_> so I would like to highlight that multiple namespace support is done in the PS
15:23:52 <srwilkers> i just workflowed it :)
15:23:55 <korzen_> #link https://review.openstack.org/#/c/510810 Support services in different namespaces
15:24:09 <korzen_> #link https://review.openstack.org/#/c/511515/  Add jobs and daemonsets namespace support
15:24:27 <korzen_> after it being merged, the full solution is enabled
15:24:57 <korzen_> so we can add cross namespace dependencies for services via enpoints, and for jobs and daemonsets in dependencies section
15:25:19 <korzen_> I am testing it in use-case where every service have its own infra
15:25:31 <v1k0d3n> nice work over there korzen_ ! :) great to see this added.
15:25:36 <srwilkers> nice korzen_ :)
15:25:49 <korzen_> like keystone namespace would have its own mariadb, rabbimq ingress etc
15:25:56 <korzen_> ceph would be common
15:26:08 <v1k0d3n> yeah that's awesome. always been the goal...
15:26:08 <korzen_> thx ;)
15:26:20 <v1k0d3n> you guys made it reality. :)
15:26:55 <korzen_> glad to see that it is appreciated ;)
15:27:39 <korzen_> I guess that multiple namespace it is all
15:27:45 <korzen_> for RBAC
15:27:45 <srwilkers> #topic RBAC support
15:28:01 <korzen_> #link https://review.openstack.org/#/c/464630  RBAC authorization support
15:28:22 <korzen_> this one i huge but it contains all RBAC rules that are needed to be run
15:28:39 <korzen_> I wanted to get portdirect review on that one
15:29:02 <korzen_> all necessary details are included in agenda
15:29:49 <korzen_> I would also test in for multiple namespace use-case in following days
15:30:20 <korzen_> but example with ceph and ceph-config made this PS ready for multiple namespace
15:30:25 <jayahn> korzen_ we will try to review this RBAC one as well
15:30:36 <srwilkers> ill get portdirect to look at it today and provide his feedback
15:31:10 <korzen_> ok, I need to run
15:31:15 <korzen_> it is all from my side
15:31:16 <srwilkers> later korzen_ :)
15:31:26 <srwilkers> #topic log based alerting approaches
15:31:27 <jayahn> thanks korzen_
15:31:27 <korzen_> bye
15:31:29 <srwilkers> take it away mateuszb
15:31:41 <mateuszb> I've got a couple of patchsets in review regarding log-based alarms
15:31:55 <mateuszb> I've grouped them into 2 categories depending on the approach:
15:32:02 <mateuszb> 1. Based on ElastAlert:
15:32:07 <mateuszb> ElastAlert chart: https://review.openstack.org/#/c/516629/
15:32:11 <mateuszb> Nagios: passive check for DB errors: https://review.openstack.org/#/c/518543/
15:32:14 <mateuszb> Pushing notifications from ElastAlert to Nagios: https://review.openstack.org/#/c/518711/
15:32:34 <mateuszb> and 2. Based on fluent-plugin-prometheus:
15:32:43 <mateuszb> Gathering DB errors count using fluent-plugin-prometheus: https://review.openstack.org/#/c/514938/
15:32:51 <mateuszb> Example log-based alert in Prometheus: https://review.openstack.org/#/c/515061/
15:32:54 <mateuszb> Nagios: Prometheus check for DB errors in logs: https://review.openstack.org/#/c/519318/
15:33:28 <mateuszb> So I've verified that both of the solutions are ready to be integrated with Nagios (I wasn't so sure about ElastAlert+Nagios, but it works well)
15:33:29 <jayahn> these are really beautifully categorized examples. :)
15:34:00 <mateuszb> I'm leaving it as it is until the decision is made which of the two approaches we choose (I'd vote for ElastAlert as it's precisely designed for log-based alerting - with a lot of configuration capabilities in place)
15:34:08 <mateuszb> So any comments and votes are welcome :)
15:36:14 <mateuszb> that's all from me
15:37:02 <jayahn> ElastAlert seems great tool to use. +1 on that.
15:37:02 <jayahn> however, since we are probably use prometheus alert manager for metric-based alert, it would be good to use single solution for all the alert. so +1 on fluent-plugin-prometheus. :)
15:37:26 <jayahn> i will do some discussion with my team members, and will leave our feedback.
15:37:34 <mateuszb> You're not helping ;)
15:37:40 <mateuszb> Ok that would be great
15:37:42 <jayahn> yeap. i know. :(
15:37:45 <srwilkers> yeah, im a bit torn on this.  i feel like ive introduced some confusion with nagios, as it was meant to be pitched as a deadmans switch for things like backing prometheus with ceph
15:38:12 <srwilkers> but elastalert is able to fire off alerts independently right?  it doesnt need nagios or alertmanager?
15:38:33 <mateuszb> No, it doesn't need nagios and alertmanager
15:38:58 <srwilkers> okay, that makes me feel better.
15:39:22 <mateuszb> It fires off alerts independently - but there is a possibility to execute a script, which in turns executes the passive chech to Nagios
15:39:30 <mateuszb> check *
15:39:35 <srwilkers> cool :)
15:40:10 <jayahn> I would like to compare alert template on both solution, i mean how flexible it is to set some alert patterns.
15:40:30 <srwilkers> yeah, that's something to consider for sure
15:40:39 <srwilkers> but great work all around on this stuff mateuszb
15:41:07 <mateuszb> Well, I may prepare a list of what's needed to add additional alert in both cases
15:41:57 <mateuszb> to make things faster, I'll write it on slack tomorrow
15:42:01 <srwilkers> sounds good :)
15:42:08 <mateuszb> in order not to wait until the next meeting :)
15:42:18 <jayahn> great!
15:43:53 <srwilkers> anything else?
15:44:07 <mateuszb> no, that's all. Thanks
15:44:15 <srwilkers> #topic reviews needed
15:45:07 <srwilkers> Cell service:  https://review.openstack.org/#/c/516810/
15:45:10 <jayahn> cell service and nova placement is two essential stuff to do ocata. these are almost ready. pls do final review on this. :)
15:45:27 <srwilkers> :)
15:46:06 <jayahn> FYI, as portdirect's request, we will do separate upstream "value override to make ocata work". probably make a new mvp values.
15:46:51 <jayahn> Neutron: Correct section name for linuxbridge bridge_mappings config: https://review.openstack.org/#/c/518503/
15:46:52 <srwilkers> jayahn: that'd be awesome.
15:47:13 <jayahn> we have been testing vlan-based provider-network w/ linuxbridge
15:47:21 <jayahn> to support some of legacy openstack env.
15:47:36 <jayahn> this is one of few thing we are fixing while doing that.
15:47:56 <jayahn> i think it is rather straight-forward. pls review. :)
15:48:04 <srwilkers> just workflowed it
15:48:15 <jayahn> thanks
15:49:03 <jayahn> that is all
15:49:40 <srwilkers> awesome :)
15:49:58 <srwilkers> any other last minute items?
15:50:22 <srwilkers> otherwise we can take the open discussion to the openstack-helm channel -- im getting rushed out of my conference room :)
15:50:37 <jayahn> bye
15:50:49 <srwilkers> :)
15:50:52 <mateuszb> bye
15:50:53 <srwilkers> thanks for coming everyone
15:50:57 <srwilkers> #endmeeting