15:00:41 #startmeeting openstack-helm 15:00:42 Meeting started Tue Jun 19 15:00:41 2018 UTC and is due to finish in 60 minutes. The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:43 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:45 #topic rollcall 15:00:45 The meeting name has been set to 'openstack_helm' 15:01:08 GM all 15:01:20 Here's our agenda for today's OpenStack-Helm meeting: https://etherpad.openstack.org/p/openstack-helm-meeting-2018-06-19 15:01:43 o/ 15:01:49 Please add anything to it you'd like to discuss, PS needing review, etc! 15:01:50 o/ 15:01:52 o/ 15:02:10 o/ 15:02:14 o/ 15:03:33 mattmceuen: radeks is from my team, we're going to be slowly attending more and doing more with osh as I've spoken about before. 15:03:42 Welcome radeks! 15:03:54 Glad to have you with us :) 15:04:01 o/ 15:04:07 Alrighty - first off do we have Robert Choi in the house? 15:04:17 (o/ roman_g!) 15:04:44 He's got the first agenda item, but we can come back to him if he's not on yet. 15:05:02 #topic Doc Update Touchpoint 15:05:24 For anyone new around here, our documentation is a key area we want to enhance, grow, and groom 15:05:25 from reading the etherpad, im not sure if he or jayahn are here today? 15:05:40 +++ to that mattmceuen :) 15:05:57 oh, maybe I misunderstood "next week" -- maybe Robert meant next next week :) 15:06:30 We've been capturing some enhancements that should be made to the Multinode Install guide here: https://etherpad.openstack.org/p/openstack-helm-multinode-doc 15:07:05 i am here, but stuck in different meeting and task. 15:07:08 I've started implementing a number of bits in this PS: https://etherpad.openstack.org/p/openstack-helm-multinode-doc 15:07:11 Hey jayahn! 15:07:13 we meant next week irc meeting. 15:07:59 I saw PS from Matt, would review today/tomorrow. One thing I would add is answer to the question 15:08:02 Cool thanks for clarifying :) for now I'll just paste in your agenda note for general awareness at the end, since that's good to be aware of 15:08:10 ^ jayahn 15:08:16 "What to do next once installation finishes?" 15:08:16 Thanks in advance roman_g 15:08:27 Wrong link mattmceuen ? 15:09:07 roman_g: use openstack - and be happy? 15:09:12 o/ running late 15:09:27 portdirect: hehe 15:09:28 whoops 15:09:40 portdirect: Matt: add link to https://docs.openstack.org/openstack-helm/latest/install/developer/exercise-the-cloud.html to the bottom of Multinode install page. 15:09:41 Someday I will master the art of copy and paste 15:09:42 https://review.openstack.org/#/c/576342/ 15:11:12 @sigit in Slack was asking on #openstack-helm on Thursday, where could he get openrc file 15:11:28 and that exercise-the-cloud.html is the answer 15:12:18 So for that one roman_g, I think we could potentially copy a use-it script to the multinode install batch of scripts, rather than cross-linking from the multinode guide back to the dev scripts. 15:13:00 or just move this file from /developer/ subdir to one level up to /install/ 15:13:01 roman_g: we just need to get better at refacoring what we have 15:13:07 However, the use-it script makes some assumptions around network setup, etc... not sure if that's a good idea for a multinode setup or not. What do you think portdirect 15:13:12 for example the openrc is at the bottom of here: https://docs.openstack.org/openstack-helm/latest/install/developer/exercise-the-cloud.html 15:13:19 mattmceuen: totally agree 15:13:47 at the end of the multinode guide - we should have a 'real' openstack deployment 15:14:08 I suppose we could leave the scripts as-is and then just say at the bottom of the multinode guide something like, "for examples of how to exercise your new OSH cloud, please see the developer guide" 15:14:22 which means the dev-kick-the-tyres script wont be relevant in 99% of cases. 15:14:57 we should find actual openstack docs on cloud use - and link there 15:15:05 if they dont exist, lets make them 15:15:26 as it should be the same regardless of deployment system: osh, osa, kolla, trippleo etc 15:16:43 ++ 15:17:56 I added that to the etherpad 15:18:07 rwellum, I haven't gone through your additions to the etherpad yet in detail 15:18:19 Do you want to talk through them in brief here? 15:19:19 Yeah - for the most part it's digging out 'things' from the various playbooks that I was taking for granted when running the AIO 15:20:24 I mainly have issues with ceph - I still think some fundamentals are missing 15:20:36 What are you working toward with that - do you see the assumptions making it back into the multinode guide? Or something standalone? 15:21:20 I am a little on the fence here - because to take the guide away from executing the gate scripts is quite a big step. 15:21:34 Maybe a third guide is in order? 15:21:34 im not sure i quite agree 15:22:14 in that once you had k8s setup, with hosts able to resolve k8s services dns, and ceph-common on the host 15:22:30 what more is/was required, other than building the charts? 15:22:45 Nothing - agreed 15:23:07 Yeah I regressed :( - I don't know why ceph is acting up for me now. 15:23:31 so from this point on: https://docs.openstack.org/openstack-helm/latest/install/multinode.html#deploy-openstack-helm 15:23:46 the guide should be totally agnostic of k8s deployment tooling 15:23:55 provided the above criteria are met 15:24:21 So after the PS above, the next PS I'll put in will split out the AIO setup entirely 15:24:44 There's a few assumptions regarding number of nodes etc - in the multinode scripts I believe - or? 15:24:51 nice - i think getting this in - even if we just have a stub for the above points will really help 15:25:03 The idea is that the new multinode guide will still link to it as an example way to set up a dev-grade k8s cluster, but also to link out to more legit ways to set up a k8s cluster for prod use 15:25:16 ++ 15:26:02 ok I can buy into that 15:26:30 That may help us refer out to good ways to stand up clusters independently of the rest of the guide, and avoid guide sprawl 15:27:14 agreed - we are already at risk of that - eg the multiple places we tell you how to set up sudo :) 15:27:16 I'll take a look at your material in there rwellum and chew on it as well with an eye toward how it can best fit in 15:28:16 So speaking of the troubleshooting doc - we also have an etherpad for that one https://etherpad.openstack.org/p/openstack-helm-troubleshooting 15:28:25 Ok - I also will document how I create the k8s cluster - as a potential example - but also on the fence because really - we don't need another k8s deployer :) 15:28:49 And it's as portdirect says - osh is agnostic 15:29:22 Def don't spend too much time on anything that you don't think would be valuable to others, though rwellum - some things are always going to be operator-specific 15:29:34 rwellum: can we not just link to kubead, kubespray and other community proejcts? 15:29:42 portdirect: that would be the sane thing to do 15:30:12 link to the kubernetes - the hard way ;) 15:30:30 HAHA 15:30:38 ^^ actually we 100% should 15:30:40 i think it's important we reduce how much kubernetes specific documentation/support we offer 15:30:55 the hard way is how i learned 15:31:02 it's actually pretty good 15:31:02 as virtually every deployment tool is based on it 15:31:08 same here 15:31:29 Yeah agreed 15:31:30 Yup I'm planning on linky-ing in the next PS, if you have any recommendations for good installers / guides etc for prod use please add them to the multinode etherpad! 15:31:30 it's nearly production install. 15:32:05 So for troubleshooting: https://etherpad.openstack.org/p/openstack-helm-troubleshooting 15:32:30 At the very very bottom I stashed a couple of errors I ran into during my recent multinode install adventure 15:32:48 Symptom: one MDS started fine, but another died, complaining in the logs about a feature flags incompability. 15:32:48 Cause: This error was caused by an old version of `docker.io/ceph/daemon:tag-build-master-luminous-ubuntu-16.04` being cached on one node, which was incompatible with a newer version on another node. 15:32:48 Resolution: Pull the updated docker image on the node with the sad mds 15:32:48 Error message: mds.mds-ceph-mds-65bb45dffc-qfqnr handle_mds_map mdsmap compatset compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2} not writeable with daemon features compat={},rocompat={},incompat={1=base v0.20,2=client writeable 15:32:48 ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=file layout v2}, killing myself 15:32:48 Symptom: kube-system ingress (host networking) is running and openstack ingress (non-host networking) is failing 15:32:48 Cause: The default calico pod subnet conflicts with a preexisting subnet in this environment 15:32:49 Resolution: in the multinode-vars.yaml file, override the default via `kubernetes_cluster_pod_subnet: 10.25.0.0/16` 15:32:49 Error message: Readiness probe failed: Get http://192.168.23.131:10254/healthz: dial tcp 192.168.23.131:10254: getsockopt: connection refused 15:33:13 well, i dont know if i'd say a production installation necessarily, but i like that it gives people exposure to whats going on, instead of just throwing kubeadm at the wall and getting a cluster 15:33:16 but thats just me 15:33:23 IRC rendered that sadly -- that's two different errors. But the point is that I think this might be a straightforward approach to recording and helping people solve common issues 15:33:33 There's a 'WHAT'S NEXT!!' placeholder in that guide too - I added, my subtle way of asking for a core who knows osh to add to it. 15:33:36 Symptom / Cause / Resolution / Error Message 15:33:37 +1 mattmceuen 15:33:48 short & sweet & googleable 15:34:06 sounds good 15:34:17 if we can get a nice start on this 15:34:23 Interested in feedback, is that what a troubleshooting guide should look like? 15:34:32 then we should try to answer questions via irc/slack as ps's 15:34:46 so they are kept as reference 15:34:48 I like that 15:35:02 i find that 90% of support is answering the same things 15:35:07 My thought for the TS guide is that before someone opens a bug on osh, we guide them to this guide. 15:35:20 ++ 15:35:24 and its my bad for not documenting them, but this would lower the barrier to that loads mattmceuen 15:36:12 Cool, I will add those couple errors as a start into the TS guide and we squint at it, and then continually add to it 15:36:53 Alright - anything else on the Doc front before moving along? 15:37:11 Good discussion guys and appreciate all the attention on this since last week 15:37:31 Alrighty srwilkers you're up! 15:37:37 #topic Logging Updates 15:37:38 cool 15:37:49 fluentbit sidecar for ceph-mon and ceph-osd: https://review.openstack.org/#/c/575832/ 15:37:49 change to logging.conf for openstack services: https://review.openstack.org/#/c/576001/ 15:38:16 i've proposed adding fluentbit sidecars to the ceph-mon and ceph-osd charts, to allow us to gather the logs that get placed in /var/log/ceph in those pods 15:38:57 My question here srwilkers was why not make it a default? 15:39:37 as that creats a hard dependency on fluentd 15:39:37 rwellum: it's set as a default currently as we don't deploy fluentd in the single or multinode gates for openstack-helm 15:39:44 and what portdirect said 15:40:04 and we want these charts to be compose able simply 15:40:11 it follows what we've done with the prometheus exporters tied to things like rabbitmq and mariadb -- we leave them disabled by default so as not to create dependencies or assumptions 15:40:43 but for those who want to use it, it provides additional insight into ceph logged events 15:41:08 also leaves the door open for people to add alternate log aggregators 15:41:23 Ok makes sense 15:42:38 in addition, we can also use the tags on the logged events to possibly add some sane fluentd filters in the future if we want 15:43:14 srwilkers for the logging.conf change - looks awesome; would it prevent openstack logs from going to stdout by default? I.e. breaking kubectl logs? 15:43:47 no. you can define multiple handlers 15:43:53 neat 15:44:12 There's a difference between fluentbit and fluentd though right? 15:44:20 but the cool thing with that change is that for any version >= ocata, we can use the fluent formatters 15:44:31 that will be really great then 15:44:44 srwilkers: I'll get some new images published with that over the next 48 hours 15:44:45 rwellum: functionality is the same, but the big difference is that fluentbit has a much smaller resource footprint than fluentd 15:45:12 Yeah was confused because you said fluentbit sidecars 15:45:14 so we use fluentbit for the sidecar, then forward the messages to a fluentd serving as an aggregator 15:45:27 Yeah makese sense 15:45:37 I'll go read the PS again 15:45:54 but the fluent formatter and handler for the openstack services makes me happy, as we can send the logs directly to fluentd 15:46:07 instead of stdout > fluentbit > fluentd 15:46:18 this also gets us something we've wanted for awhile 15:46:56 using the fluent formatter and handler gets us full stacktraces when they're raised 15:47:04 woo hoo!! 15:47:18 can also get us tags for things like the project name, host name, etc 15:47:26 stacktraces - the most important part of logging :) 15:47:46 and since the services get a unique tag, we can define filters in fluentd in the future 15:47:58 ie: do this for nova, do this for neutron, etc 15:48:51 can also take the recent updates to the fluent-logging chart and create multiple indices in elasticsearch, and use the project tags to create indices per openstack service if your heart so desired 15:49:06 but as mentioned, you do need at least ocata to use that formatter 15:49:29 but even if you dont, i still feel this gives an operator greater control over what they want to see and how 15:49:46 Have to bow out a little early today - will continue with deployment this afternoon and will bug the IRC :) 15:50:01 Thanks rwellum! Looking forward to seeing the success and/or fallout! :) 15:50:11 that's it for me 15:50:17 Thanks srwilkers - that's awesome 15:50:30 Looking forward to playing with that :) 15:50:38 #topic topics for next time 15:50:49 Just one thing to get in front of y'all -- a request for a discussion next week 15:50:49 Support multi versions of Rally - Let's have some time to think about it and discuss again next week. 15:51:11 There is a little more discussion / brainstorming in the agenda https://etherpad.openstack.org/p/openstack-helm-meeting-2018-06-19 15:51:25 #topic Roundtable 15:51:39 We have 9 minutes left - anything else you would like to discuss? 15:52:31 Or also - any PS urgently needing review? 15:53:12 oh this is pretty cool: https://review.openstack.org/#/c/570658/ 15:53:29 to be honest, I'm kinda suprised it works at all 15:53:37 but would be really nice to have 15:53:46 I'm not sure how we could gate it 15:53:46 https://twitter.com/IanFromATT/status/1008815392016551941 15:54:07 and also really uncomfortable with some of the things it does 15:54:14 I sense a third party gate hooked up to a macbook portdirect 15:54:28 https://review.openstack.org/#/c/575157/ 15:54:38 as messing with a mac deployment (ie doeploying homebrew etc) is kinda bad i think 15:54:55 as most people wont want to redeploy their mac to clean up an env... 15:55:05 but with some changes, could be super valuable 15:55:06 portdirect: yeah, im not a huge fan of that one 15:56:28 Alright guys - unless there's anything else I can give you a full 3 minutes back 15:56:46 Thanks! 15:56:49 #endmeeting