15:00:50 #startmeeting openstack-helm 15:00:50 Meeting started Tue May 29 15:00:50 2018 UTC and is due to finish in 60 minutes. The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:51 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:54 The meeting name has been set to 'openstack_helm' 15:00:56 #topic rollcall 15:01:02 GM all! 15:01:05 good morning 15:01:13 hi 15:01:16 o/ 15:01:19 hello 15:01:21 hihi! 15:01:23 Here's the agenda for today: https://etherpad.openstack.org/p/openstack-helm-meeting-2018-05-29 15:01:28 Welcome 15:01:34 oops 15:01:38 Please add in anything else you'd like to discuss in there 15:01:57 Hey d|k :) you have given up your anonymity! 15:02:15 o/ 15:02:24 Hey portdirect! Aren't you on vacation? 15:02:29 Yup 15:02:58 In a mountain chalet on dodgy 3g :) 15:02:59 o/ 15:03:05 Nice 15:03:27 hey roman_g tdoc piotrrr lamt 15:03:34 o/ 15:03:40 hey rwellum 15:04:03 Hey mattmceuen 15:04:31 portdirect just make sure not to work too hard in this team meeting, just provide color commentary and lob virtual rotten tomatos 15:05:20 Full disclosure: I am brushing off all kinds of mental and technological cobwebs this morning following the long (USA) weekend, following the Vancouver summit 15:05:43 First thing 15:05:51 #topic Quick Summit Recap 15:06:06 A lot of good things were set in motion for OSH in Vancouver 15:06:42 Met a lot of folks f2f which was excellent as well, and a number of imminent or potential OSH users 15:07:21 Props on all the sessions, I attended most of them, good stuff. (I would be one of those.) 15:07:22 I won't give a rundown of them all here, since it's not all public - but I will be reaching out to them to get them pulled into our team 15:07:38 Awesome - thanks for attending tdoc 15:07:59 Glad to hear you thought they were valuable, since obviously I'm slightly biased :-D 15:08:41 Yeah, I think it gave us a good impression of where the project is and is heading... 15:08:50 yup, good stuff indeed, +1 15:10:02 Also there was a lot of discussion around Airship in the summit, which was great from an OSH perspective, as Airship is a consumer of OSH. As well as one platform to run it on. 15:10:36 summit recap from youtube viewer: good (not great) presentations - listeners seemed to be a bit frustrated. But I hope that you attracted at least some attention to the Airship, since that one of goals. Overall, looking at other videos and comparing to previous years, it is seen that there is way less hype (which is good), and more work in direction of support of stabilization of the platform. 15:11:29 Thoughts on why people seemed frustrated? 15:11:37 Agree with your assessment roman_g that there is overall less hype and more delivery focus in the presentations 15:11:58 Yes, was the frustration specific to the OSH presentations, or do you mean overall with the summit roman_g? 15:12:23 OSH/Airship. 15:13:24 Thoughts on why people seemed frustrated? - I would say that not all people who were coming to your sessions were target audience. 15:13:35 Ok - I still have a few related sessions I need to catch up on via youtube, and I'll keep an ear out for that 15:13:48 That's what I understood from questions which were asked. 15:14:07 Same, good info roman_g 15:14:25 The sessions I was at in person had a good vibe with a lot of Q&A that extended far past the end of the presentation, which was great to see 15:14:39 that's very good 15:15:00 But I'll re-watch those too in case I missed some things -- there were a lot of notes to be taken 15:15:05 Thanks roman_g 15:15:18 Any other summit tidbits to share? 15:15:40 want to visit one of them ) 15:15:45 as a speaker 15:16:02 Re the workshop I attended.. 15:16:05 +1 15:17:15 I could help out a bit with the folks sitting around me - most had no or very little k8s experience. I did think that running a set of scripts one after another was a bit counterintuitive - because on one hand it showed the power of osh on the other hand it didn't teach much about what was going on. 15:17:43 Agree 15:18:00 But I don't have a good alternative tbh. 15:18:01 Yeah, that's a hard thing to balance 15:18:12 Was wrestling with the same internal monologue rwellum 15:18:37 Unfortunalty the scenarios I had slides for changed st the last minute (while on stage) 15:18:49 Yeah I noticed that portdirect 15:18:53 So had to freestyle the whole thing. 15:18:55 You had to pivot 15:19:04 My impression is that some people (like me), might not necessarily care much about Airship, but do care about using OSH. And they want to use it to deploy OS on top of their, already existing, kubernetes clusters. So, I can imagine that such people might be a bit frustrated with yet one large box of moving parts being introduced into the picture (Airship). Sure, as far as I understand airship 15:19:06 will not be a hard requirement for OSH, but I would say it would make sense to make a clearer distinction between the two, where possible. 15:19:24 Airship has a nice four-line "stand up the stack" experience, which is awesome for showing that the thing exists and works, but really doesn't give hands-on at all. OTOH you don't want to make it too deep or else newbies may get left behind 15:19:54 So I think the script approach is at least one good middle ground way to demo the product and peel back the curtain just a bit 15:20:08 Many of the people around me were confused when we got to the stage where we had to run 'make'. 15:20:16 For example 15:20:47 Motto: run `watch -n1 "kubectl -n xxxx get pods"` for each namespace in tmux screens, and provide more usefull info on what is happening during installation and other phases. 15:20:55 piotrrr yep, no requirement to use airship for OSH at all. 15:21:17 rwellum: all the vms had the wrong things provisioned on them :( 15:21:21 Oh well 15:21:24 roman_g: +1 15:21:28 also good to have some monitoring of docker pulling images, but it's impossible right now 15:21:32 rwellum: that was a bit unclear in the intructions, people where running the make in the wrong dir. 15:21:47 tdoc_: me too - took a while to figure out 15:22:01 portdirect: :( 15:22:27 it's because I had some prior knowledge that I was able to follow... 15:22:39 Yeah I'm really sorry 15:22:39 Overall I think most people I spoke to were just very happy to play with a live demo. So that was good. 15:22:54 but unfortunaltely in my instances the db charts had troubles, so got stuck there. 15:22:59 Yes I will say that for having the wrong things provisioned on the VMs -- way to roll with the punches portdirect :) made for a still valuable workshop 15:23:01 I had to figure it out with over 100 eyes on me 15:23:13 50 ppl 15:23:24 tdoc_: I think although there were plenty of VM's - there were some latency issues maybe> 15:23:25 300 eyes? 15:23:27 ? 15:23:35 yup 15:23:59 I think some people had issues with containers/pods getting stuck for some reason 15:24:00 not complaining, given the conditions etc, it was still worthwhile.. 15:24:11 good 15:24:44 Alrighty - moving on to our next topic: 15:24:51 #topic Storyboard 15:25:06 So we had reasons to move to Storyboard before 15:25:12 But now we are truly motivated 15:25:24 I have been told that once we migrate to Storyboard, we can have 15:25:26 ... 15:25:30 Honey Badger Stickers 15:25:52 %^$*!!! 15:25:54 (which is of course the OSH mascot) 15:25:55 I know right 15:25:57 is that the condition for stickers? 15:26:01 It is 15:26:14 I will volunteer then - I want stickers 15:26:17 slightly tounge-in-cheekly, but that's the agreement :) 15:27:08 What's the status of the migration? 15:27:12 Who is leading this atm? 15:27:20 The biggest challenges with migration will be 15:27:20 1) communicating it to everyone 15:27:20 2) using the new storyboard-friendly git commit headers 15:27:25 rwellum 15:27:43 He has done a POC of the migration that he shared a couple weeks back for feedback 15:28:01 rwellum, have any concerns been raised? 15:28:01 Yeah the POC is still up for everyone to look at and play with 15:28:09 No not to me. 15:28:26 So let's just pull the trigger? 15:28:30 There are other teams I spoke to at the Summit that are holding back for various reasons. 15:28:31 It sounds like the migration itself is small potatos, and the trigger can be pulled whenever we're ready 15:28:41 Yeah - it's all ready I think. 15:28:44 rwellum: what like? 15:28:48 honey badger don't care about holding back 15:29:06 Do we want to set a target of e.g. next monday so we can communicate? 15:29:19 Also can you update the docs, to point to storyboard etc? 15:29:19 Like Cinder for example, they are so embedded in the old way and the sample migration took days to run and didn't complete. 15:29:29 portdirect yup 15:29:45 interesting rwellum 15:30:09 But I think for OSH - less worries as still new, you guys are writing the process newly. 15:30:26 Bad English but ykwim 15:30:50 Yeah, we dont have that much stuff to pack up and take over. 15:31:08 Yeah so if we target next monday, I'll contact the infra team and ask them to initiate the next step. 15:31:49 Good. Yeah, the biggest constructive criticism I've received re: OSH is that we could use a better commuity roadmap so it's easy to see where the project is going and to volunteer for work items. We've been making good strides but the storyboard migration is a great opportunity to get that in good shape. 15:32:12 ++ 15:32:28 Excellent - let's do that, let me know if Monday turns out to be a bad day for any reason. I'll plan on sending some comm out in the ML 15:32:48 Ok I will. 15:33:09 once you guys switch, will that mean new bugs can't be filed in launchpad? 15:33:37 ie will it be clear to end users at what point they should use which tracker? 15:33:40 It definitely means they /shouldn't/ be; will we be able to actually disable launchpad? 15:33:49 That's the idea tdoc_ 15:34:00 The disable part - that's one of the things I want to check with infra 15:34:08 I'll report back. 15:34:13 excellent 15:34:25 Also think it would be good to add a 'low hanging fruit' project group - for simple things to pass onto new users. 15:34:31 +1 15:34:34 Simple bugs etc. 15:34:55 +1 15:35:08 that's important for me 15:35:10 We would like to cut our 1.0 release in the next couple months, and identification e.g. low-hanging doc updates as well would be good ones 15:35:13 +1 15:35:28 Yup - that would be a great addition too 15:35:30 imo 15:35:38 rwellum: do you have the bandwidth to take a stab at getting a low hanging list up? 15:35:45 Yes I will do that. 15:35:55 Awesome :D 15:36:00 Thx dude 15:36:06 np 15:36:46 Next topic: 15:36:56 #topic Creating a set of guidelines which would help contributors troubleshoot issues, e.g. stuck containers 15:37:10 piotrrr, want to speak to this one? 15:38:57 Adding some solid operational docs is definitely something we want to do as part of our 1.0 release 15:39:15 yes, so we're just starting with OSH, and we're running into all kinds of different issues. Stuck containers/pods etc. We have no know-how on to troubleshoot those issues. If we're running into such problems, other contributors/operators might also be. 15:39:22 It would be good to capture a list of topics to speak to (and then create storyboard items for!) -- this is a good one 15:39:47 +1 piotrrr 15:39:50 I think it would be nice to have. I've been at the point where I see a bunch of pods in init state and wondering what to do. 15:39:51 So, my question would be whether the OSH community would like to collab on creating a doc with tips/hints for troubleshooting OSH and OS running on top of it 15:40:10 Yes! 15:40:37 And this is where new users add huge value :) 15:40:45 tdoc_: they were docker-pulling? )) you can't monitor progress of that unfortunatelly 15:41:09 It's often unclear to me which pod is waiting for which other pod to complete. 15:41:16 We have a troubleshooting doc already, we should all get into the habit a little bit more of adding things into it after we fix them! 15:41:22 I know to go check the init containers, the eventually kublet logs - but this is prob not intuitive for new k8s users 15:41:41 +1 15:41:44 +1. 15:42:02 Is there any good "general" kubernetes troubleshooting guide out there that we can refer to for good "technique"? 15:42:23 Just the k8s docs that I'm aware of 15:42:30 haven't seen that 15:42:34 Though they are quite thin 15:42:53 but having dashboard open helps a bit 15:43:04 I have some debug steps from kolla-k8s - some would apply, I can look at the troubleshooting guide and see if any can help. 15:43:23 mattmceuen: in the workshop, how many people were hung up on ceph ns activation? 15:43:28 Tip #1 :) we have LMA user interfaces - good one roman_g 15:44:00 I think at least 3-5 folks portdirect 15:44:08 that step was easy to miss for whatever reason 15:44:12 ok, how do we want to start with? Maybe creating a etherpad where everyone from the OSH team could braindump their approaches/hints for troubleshooting. We can organize those into public docs later on. 15:44:22 apache airflow dag dashboard? 15:44:40 roman_g: not airship 15:44:58 The actual lma stack from osh-infra. 15:45:06 https://etherpad.openstack.org/p/openstack-helm-troubleshooting 15:45:20 portdirect: ah, yep 15:45:25 ^ let's use that to jot down ideas as they come to us (and troubleshooting steps as we do them) 15:45:37 mattmceuen: pin to channel topic here & in slack? 15:45:58 Then we can turn them into storyboard things and doc updates at our convenience 15:46:01 Good idea 15:47:06 thanks piotrrr for bringing this up, let's revisit next week and see how the etherpadding is going 15:47:11 https://github.com/openstack/kolla-kubernetes/blob/master/doc/source/deployment-guide.rst - look at the ts guide at the end. I did most of that. 15:47:48 sounds good, thanks 15:48:04 I'll add to the etherpad 15:48:26 (I would be happy to help turn the notes into proper docs later on) 15:48:43 That would be awesome, thank you! 15:49:08 rwellum there is a lot of good stuff in there that could be adapted to OSH 15:49:48 Yeah it's all k8s 15:49:59 Ok - we have another item tdoc_ wanted to bring up 15:50:03 #topic Roundtable 15:50:39 yeah, so I brought this up the irc chan... Having some DNS issues with rabbitmq. 15:51:08 It seems in my case the rabbitmq-server won't start because it can't lookup its hostname. 15:51:19 I haven't had a chance to catch up on the full conversation yet; you're still seeing the issue tdoc_? 15:51:55 That DNS record does not exist yet, because the readines/liveliness probes don't pass yet... So chicken-egg. 15:52:16 It's odd that the pod cannot resolve itself though 15:52:17 I need to add this service.alpha.kubernetes.io/tolerate-unready-endpoints to make it work 15:52:30 I think you noted it's not an issue in the gates; do you know what the difference between your environment & the gates might be? 15:53:24 What version of k8s are you running? 15:53:39 1.10.2 15:54:10 And you've now tried with both kube-dns and coredns? 15:54:15 I'm somewhat assuming it does not come up it the gates, but not familiar with that environment myself yet... 15:54:23 yup, tried both 15:54:44 Hmm, this is an odd outlier :/ 15:55:00 Are you on very slow machines? 15:55:15 As far as I've understood the docs though, k8s is not supposed to expose dns records for headless services until the probes indicate ready. 15:55:27 Correct 15:55:47 Though the pod should be able to resolve itself 15:55:52 yeah, all my stuff is running in our local openstack cloud, so in VMs which might not be the fastest... 15:56:10 It complains about rabbitmq-rabbitmq-0.rabbitmq-dsv-7b1733.openstack.svc.cluster.local 15:56:29 Which I think is the record for the service 15:56:37 From the 1st rabbit pod? 15:56:42 yup 15:56:58 Can you paste the full logs? 15:57:07 Please paste them in the OSH chat 15:57:16 So we can keep this going since we're almost out of time 15:57:19 hmm, i'm not sure I have those handy right now 15:57:42 No worries, if you can share them when you have them handy that would be helpful 15:57:44 it's something like ERROR: epmd .... that hostname.... domain not found.... 15:57:54 (sorry best I can do right now) 15:58:15 We'll get it figured out 15:58:48 Alright, with two minutes left - any final discussion points? 15:59:44 Alright - thanks for a great meeting all 15:59:51 See you in #openstack-helm 15:59:54 #endmeeting