15:00:37 #startmeeting openstack-helm 15:00:38 Meeting started Tue Oct 17 15:00:37 2017 UTC and is due to finish in 60 minutes. The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:39 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:41 The meeting name has been set to 'openstack_helm' 15:00:49 #topic rollcall 15:00:57 o/ 15:01:03 o/ 15:01:09 o/ 15:01:17 o/ 15:01:18 Here's the agenda, all -- I'll give you a couple mins to add topics: https://etherpad.openstack.org/p/openstack-helm-meeting-2017-10-17 15:03:13 o/ 15:03:17 o/ 15:03:55 #topic update to official project status 15:04:25 Just an update -- we submitted OSH for TC governance last week 15:04:27 https://etherpad.openstack.org/p/openstack-helm-meeting-2017-10-17 15:04:32 o/ 15:04:35 yeap. 15:04:53 Review is ongoing, but it's looking very positive. Seven +1 roll-call votes so far. :) 15:05:42 That's really all I have to say about that. Hopefully good progress next week. Any other thoughts on the topic? 15:06:19 #topic log-based alerting 15:06:21 nothing other than congrats mattmceuen 15:06:30 :D 15:06:32 Thanks @portdirect :) back at you man 15:06:42 I've been investigating different methods for sending alerts based on logs 15:06:54 Tat was discussed during last AT&T and Intel meeting 15:07:08 yeah, really happy to see this mattmceuen. great seeing you leading the charge. 15:07:16 Before I proceed with an implementation I'd like to know your opinion ;) 15:07:31 As for now I'm aware of 2 approaches: 15:07:36 1. Using ElastAlert 15:07:55 2. Using Prometheus plugin for fluentd (https://github.com/kazegusuri/fluent-plugin-prometheus ) 15:08:24 The idea behind both of them is the same: search for specified patterns (for example "Can't connect to MySQL") in container logs 15:08:35 and if the pattern occurs - fire an alert 15:09:01 I'd be rather in favor of using ElastAlert as it's easier to configure and more intuitive for me 15:09:17 It periodically queries ElasticSearch to retrieve a given pattern 15:09:39 It has highly modular approach - each rule is a separate file 15:09:59 I will look at ElastAlert. Just a question here. What is our purpose here? there can probably be many method to do log-based alarm. do we want to say "there is a reference implementation on log-based alert"? 15:09:59 You can configure timeouts, thresholds and so on out of the box. I'd require, however, additional chart to be implemented 15:10:22 How does that compare to the Prometheus plugin, mateuszb? 15:10:58 Does it also support a modular approach, or more monolithic? 15:11:05 jayahn - that's a good question :) I'd like to know what the requirements from AT&T side are 15:11:21 mateuszb: thats not to important here :) 15:11:42 what does the community want - hopefully we align internally with that 15:11:46 important stuff would be "what alert we want to have". this can be a shared list. how to implement it will highly depends on what tools you use. 15:11:49 Agree. 15:12:00 I would like srwilkers as our LMA SME to weigh in as well 15:12:05 ++ 15:12:06 so, IMHO, defining the basic set of alerts would be valuable. 15:12:35 i'd need to look at the differences between elastalert and the prometheus plugin, but i'd honestly prefer to handle it via prometheus if the end result is the same 15:13:10 srwilkers: how is the prometheus ps progressing? 15:13:17 I agree. I can imagine that for the beginning we can start with alerts which inform about DB/rabbitmq connection issues in specified pods 15:13:23 its pretty clost to being out of WIP is it not? 15:13:26 portdirect: its out of WIP and ready for some initial reviews 15:13:36 w00t 15:13:49 im currently working on getting jobs set up for the controller manager and scheduler, but thats roughly another ~10m worth of work 15:14:03 once that's done, i'll be happy with where it stands 15:14:33 would it make sense to take this as a homework assignment: srwilkers and jayahn to get acquainted w/ ElastAlert, and mateuszb maybe you could do a comparative example of a few rules, implemented by both tools, that you could share? 15:14:43 if we know what alerts we want to create, then we can use that list to evaluate how "a specific alert implmentation tool" can satisfy our requirements. that said, initiating an effort to defining a basic set of alerts would be also valuable. 15:14:44 The result would be the same (or at least as for now I don't know any limitations) 15:14:53 that sounds like an awesome idea mattmceuen 15:15:23 if you call this homework, im going to be a typical CS student and wait until the last minute ;) 15:15:33 and presumably there is nothing precluding both from being implemented? 15:15:34 Sorry, I meant to say "ice cream" 15:15:40 if they serve diff use cases 15:16:05 Ok, so I'll prepare 2 patchets (one for ElastAlert and one for fluentd-prometheus plugin) 15:16:07 portdirect: nope, there's not. like jayahn said, it'd just be a matter of determining if we want to choose a reference or not 15:16:25 we should say "pizza". typical CS student can do anything for pizza. 15:16:29 @jayahn that is another good homework assignment :) do we have any list in progress around specific alerts needed? 15:16:37 ha! 15:16:40 mateuszb: in regards to the fluentd plugin, it'd be nice to make sure it works with the fluentbit+fluentd set up that's in review currently 15:17:03 i can write up some initial docs regarding alert needed 15:17:06 srwilkers: I'll do that on top of fluentbit+fluentd PS 15:17:28 srwilkers: in fact, that is a requirement for fluentd-prometheus plugin to work 15:17:45 jayahn mateuszb awesome, thank you! 15:18:04 cause it enforces us to use only one fluentd instance per cluster which I think will be true when fluentd+fluentbit PS is merged 15:18:33 jayahn that would be great 15:19:06 #action mateuszb will prep comparison of ElastAlert and fluentd-prometheus 15:19:34 #action jayahn will take a stab at initial alerting need documentation 15:19:41 okay 15:19:44 * portdirect notes a mech eng student only works for alcohol 15:19:52 :) 15:20:05 Good discussion guys, anything else on this topic? 15:20:17 * jayahn an korean student regardless of major onnly works for alcohol 15:20:21 that's all from my side 15:20:23 ahahaha 15:20:43 @jayahn lol 15:20:54 #topic cw's RFC on storage class name 15:21:06 I'm not sure we have cw? 15:21:14 doesnt seem so 15:21:30 this was just to rename the general storage class right? 15:21:44 Well let's get him some opinions offline. He has an RFC out for whether we should use "general" or some other name. 15:21:48 yup 15:21:56 I have some feedback in there: https://review.openstack.org/#/c/511975/ 15:22:14 thx portdirect, copy/paste was failing me ;-) 15:22:51 I'll leave it at that for now -- if you'd like to see the discussion or weigh in, please visit the review! 15:23:24 #topic Official project docs move (timing and process) 15:23:52 After we become an official project, I think we'll want to move our docs here, correct? https://docs.openstack.org/openstack-helm/latest/ 15:23:58 (away from readthedocs) 15:24:31 yup 15:24:47 I think this should be very simple to do 15:24:49 Will we deprecate the readthedocs at that point, or leave it up? Maintain it? 15:25:23 I know lamt has insight into what needs done, but i think a theme change and a single ps to infra 15:25:32 I vote for retiring the readthedocs 15:25:48 though it would be possible to have both 15:25:55 and they would stay in sync 15:25:56 i also vote for retiring the readthedocs once we move to the official doc 15:25:57 i'd rather retire them than maintain them 15:26:02 but seems silly to have them in two places 15:26:03 +1 15:27:02 #action mattmceuen to discuss readthedocs->docs.openstack.org with lamt 15:27:36 Alrighty: reviews time 15:27:41 #topic reviews needed 15:27:48 once we move to docs.openstack.org, i can ask official translation to to take openstack-helm project. 15:27:59 great point jay 15:28:07 i talked with srwilkers on the review. 15:28:07 awesome 15:28:38 yep :) 15:28:39 jayahn: long as you promise not to write nasty things about us in korean ;) 15:28:58 save any nasty comments for english please :-D 15:29:19 nothing to add. personally, i would like to have both prometheus and fluent working in our env. asap. :) .. let's get it done. 15:29:26 ha ha.. 15:29:39 yes 15:29:47 jayahn: +1 15:29:50 you have google translation. so i will never wirte nasty stuff in korean. 15:30:10 Here's the review for fluent: https://review.openstack.org/#/c/507023/ 15:30:21 however, we know how to avoid google translation, but still understand each other. tweaked version of korean writing. lol 15:30:29 ;) 15:30:37 in regards to that review 15:31:09 im happy with it after discussing it with jayahn. takeaway was that we can leave kafka as an opportunity for a future addition to OSH 15:31:17 +2 15:31:17 potentially something for a new contributor to try their hand at 15:31:22 yes. 15:31:23 excellent 15:31:31 that was also the one with some fun stuuf in the dep chack was it not? 15:31:36 *check 15:32:12 ah yes: https://review.openstack.org/#/c/507023/9/fluent-logging/templates/deployment-fluentd.yaml 15:32:37 we should probably make a blueprint for extending the dep checking model to account for conditionals 15:32:47 as this is starting to turn up more and more 15:32:51 esp in neutron 15:32:58 very good point 15:32:59 * portdirect why is it always neutron? 15:33:00 portdirect: i agree 15:33:05 +1 15:33:21 there are somethings that are best handled via overrides, but conditionals may not be the best place for them 15:33:30 took me chewing on that a bit to feel that way 15:33:47 well, not all conditionals ill say 15:33:51 I agree - I've had a few chats with alanmeadows on this 15:34:12 and will try and summarise what we were mulling over and throw it out there 15:34:47 * portdirect notes that hes should bring up somthing else in any other business 15:34:52 #action portdirect to create a blueprint for extending the OSH dependency checking model to account for conditionals 15:35:06 before other business 15:35:20 any other outstanding reviews we need more eyeballs on? 15:36:00 (also thanks pete :) ) 15:36:01 https://review.openstack.org/#/c/457754/ 15:36:02 nope, im good 15:36:15 https://review.openstack.org/#/c/507628/ 15:36:33 ^^ both of these ceph ones are right up on my list of things we want 15:36:46 i think the disc targetting has a bit to go 15:36:51 but is getting close 15:37:10 That is great to hear 15:37:12 nice. ill take a gander and see what's going on there 15:37:36 Team: good work on reviews this week, I think we're going in a good direction there. 15:37:45 #topic other business 15:37:50 take it away portdirect 15:38:14 so - (and I'm gonna me a blueprint for this) 15:38:33 we have a need to be able to run multiple configs of some daemonsets 15:38:46 ie nova-compute and ovs-agent 15:39:19 is "multiple" on the order of per-node, or a constant? 15:39:46 we can currently achive this though some footwork using the manifests: true/false, and multiple deployments of a chart, but I'd like a cleaner solution 15:40:15 give me big-o notation, portdirect! 15:40:20 multiple for us could mean a lot of things unfortunatly - from groups of hardware to canary nodes 15:41:02 so I'm tinking of extending the to_oslo_config logic for damonsets, and the damonset manifests themselves to allow this 15:41:19 so an example config would be as folows: 15:43:06 * srwilkers waits on the edge of his seat 15:43:23 he probably got distracted while he was typing 15:43:49 https://www.irccloud.com/pastebin/Fi1hpGbz/ 15:44:07 so most compute nodes would be debug=true 15:44:24 ones labeled with `arbitary_label` would be debug=false 15:44:55 and if its hostname was `hostname` it would be debug=false regardless of what labels etc were applied 15:45:25 thoughts before I do the initial write-up/proposal? 15:46:08 hmm 15:46:32 interesting - am I correct in assuming we would continue not to use this for e.g. disk targeting? Or do you see it being used for that? 15:46:57 disc tragtting could potentially benifit from this 15:47:12 though thats solving a slightly diff problem 15:47:52 my only concern is that "node_groups" and "nodes" are only meaningful if you already know what they mean 15:48:11 this would need to be documented 15:48:57 Would it make sense to put them under something (or call them something) more descriptive, like (but better than) per_thing_configs.node_group 15:49:42 good point 15:49:46 so like this? 15:50:08 https://www.irccloud.com/pastebin/FetsDzFf/ 15:51:10 that is a good point, but not the point I was trying to make :) 15:51:18 i suppose i'd need to see what's written up and get the line of thinking behind it 15:51:18 lol 15:51:51 will do - just putting it out there, for us internally this will be required 15:52:02 I meant literally an id like "per_thing_conf" that made clear what a "node_group" or a "nodes" is -- what's the context 15:52:09 but would be great to have a solution that works well upstream 15:52:18 I really like the flexibility 15:52:24 gotcha 15:52:45 i will follow up on portdirect's upcoming write-up on this. 15:53:03 +1 same, looking forward to it 15:53:05 and. with my thinking hat. 15:53:37 Aright guys - any other topics? 15:53:42 I'm good 15:53:49 oh, unrelated 15:53:56 but TC voting is happening currently 15:53:59 dont forget to vote! 15:54:06 thanks for the reminder! 15:54:47 Ok team - good meeting. Have a great day, see you in the chat room! 15:54:48 theres this dude YamSaple 15:55:05 whatever you do dont vote for him. 15:55:13 haha 15:55:27 we don't want to steal time away from him distracting you in our chat room portdirect -- excellent point 15:55:27 ;) 15:55:48 #endmeeting