#openstack-meeting-5 log

15:00:30 <mattmceuen> #startmeeting openstack-helm
15:00:31 <openstack> Meeting started Tue Dec 12 15:00:30 2017 UTC and is due to finish in 60 minutes.  The chair is mattmceuen. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:34 <openstack> The meeting name has been set to 'openstack_helm'
15:00:39 <mattmceuen> #topic Rollcall
15:00:44 <mattmceuen> GM everyone!
15:00:53 <raymaika> o/
15:00:53 <srwilkers> o/
15:00:57 <srwilkers> its morning
15:01:02 <srwilkers> wont say if its good yet or not
15:01:22 <mattmceuen> Here's the agenda - I'll give folks a couple mins to fill it out https://etherpad.openstack.org/p/openstack-helm-meeting-2017-12-12
15:02:01 <mattmceuen> srwilkers:  it's gonna be a great day man
15:02:15 <portdirect> o/
15:02:23 <srwilkers> our optimistic captain at the helm would never steer us wrong
15:02:43 <mattmceuen> #titanic
15:02:57 <portdirect> too soon mattmceuen
15:02:59 <portdirect> too soon
15:03:05 <srwilkers> oof
15:03:28 <srwilkers> im more of a hunt for red october kinda guy
15:03:38 <mattmceuen> alrighty let's get this show on the road
15:03:54 <mattmceuen> #topic Dependencies on non-default images
15:04:34 <mattmceuen> Let me lay this out for y'all.  I want to accomplish two things here:
15:04:35 <mattmceuen> 1) Come up with a general principle for us OSH engineers to apply
15:04:35 <mattmceuen> 2) Come up with a tactical plan for Hyunsun's PS
15:05:07 <mattmceuen> First the problem statement:  Hyunsun has a PS that has a feature (lbaas plugin), which is turned off by default
15:05:37 <mattmceuen> The feature doesn't work with the default neutron kolla 3.2.0 image that we have configured
15:06:05 <mattmceuen> It doesn't cause any issues unless you turn the feature on, but if you turn it on, you also have to switch out the image to support the feature
15:06:43 <mattmceuen> That's something that either needs to be documented very well (with a reference to an image you can use which supports the feature), or, we should apply a "feature must wait till the default images support it" rule
15:07:05 <mattmceuen> So that's the RFC for you all.  I've heard both opinions.
15:07:58 <mattmceuen> I am personally leaning toward "don't merge until the default image supports"
15:08:21 <mattmceuen> Perhaps leaving the door open for "... unless there is a really special circumstance we haven't thought of yet"
15:08:41 <portdirect> Agreed, though we should confirm if the 4.0.0 image works ok with newton
15:08:45 <mattmceuen> Otherwise we could end up with spaghetti dependencies
15:08:56 <mattmceuen> you're skipping ahead portdirect!
15:09:07 <portdirect> lol - I'll be quite ;)
15:09:36 <mattmceuen> Any dissenting or reaffirming opinions?
15:09:39 <srwilkers> im skeptical of having a default image that doesnt work.  as it shouldn't be considered the default at that point
15:09:48 <mattmceuen> Everyone on xmas vacation already? :-D
15:10:10 <srwilkers> its nothing but a placeholder then
15:10:31 <portdirect> ++
15:10:41 <mattmceuen> Alrighty:
15:11:01 <mattmceuen> #agreed features should not be merged until they are supported by the default images, even if they're turned off by default
15:11:11 <mattmceuen> Next:  let's get tactical
15:11:48 <mattmceuen> lbaas is supposed to be supported since kilo, and Hyunsun has or will file a bug with kolla for not supporting
15:12:17 <portdirect> I suspect that will not yeild joy as newton is eol.
15:12:20 <mattmceuen> We could potentially swap in the kolla 4.0.0 image just for the needful image, or swap in a loki image if it supports it out of box
15:12:20 <srwilkers> in the past, we've had issues getting kolla to provide fixes to images that don't work with the charts we're building
15:12:25 <srwilkers> plus what portdirect said
15:12:54 <portdirect> I think the first step would be to see if a 4.0.0 image works with 3.0.3 - if it does great
15:13:01 <srwilkers> portdirect: ++
15:13:09 <mattmceuen> Agree - I will pas that on to Hyunsun.  Thanks guys.
15:13:23 <portdirect> on that ps as well while we are here
15:13:51 <srwilkers> tlam__: late to the party!
15:13:52 <portdirect> we will need to test it a bit - as lbaas agent with haproxy used to be prone to leaving zombie processes about
15:13:58 <tlam__> o/ sorry was running late
15:14:09 <portdirect> so we may need an init system in that pod to reap them...
15:14:11 <mattmceuen> (the PS by the way is https://review.openstack.org/#/c/522162/)
15:14:14 * portdirect finished rant
15:14:35 * srwilkers thinks the patchset needs more cowbell
15:14:41 <mattmceuen> shortest rant I've heard out of you yet portdirect, you've been refining your style
15:15:09 <mattmceuen> Yup Hyunsun affirmed the zombie apocalype this morning
15:15:10 <portdirect> I just want LBaaS in :D its a great thing for us to use with magnum :)
15:15:31 <mattmceuen> amen
15:15:33 <portdirect> Kubernetes on Openstack on Kubernetes awaits :D
15:15:47 * srwilkers groans
15:15:50 <mattmceuen> turtles all the way down
15:15:53 * portdirect dares not abbreviate that.
15:16:07 <mattmceuen> Next:
15:16:14 <mattmceuen> #topic Fluentd Chart
15:16:18 <mattmceuen> Take it away swilkers!
15:16:44 <srwilkers> the patchset in question is: https://review.openstack.org/#/c/519548/
15:17:12 <srwilkers> sungil and jayahn have done a great job at getting this work started, and i feel bad that it's moved destinations twice as we've worked to get osh-infra sorted
15:18:00 <srwilkers> i think the works almost there, but it might need some tweaking to really shine.  i think the charts need to be separated to appropriately handle rbac for both services without getting too confusing
15:18:51 <mattmceuen> do we have jayahn?
15:18:54 <srwilkers> i also think the configuration files need to be defined in values.yaml to allow for customization of the filters and matches for complex use cases
15:18:56 <srwilkers> i dont think we do :(
15:19:00 <portdirect> how come split for rbac?
15:19:32 <portdirect> would it not just be a rbac-fluent.yaml, and rbac-fluentbit.yaml ?
15:19:54 <srwilkers> the helm-toolkit function names the entrypoints by release
15:19:59 <portdirect> totally agree on moving configs to values.
15:20:09 <srwilkers> so splitting them out in the way you mentioned results in duplicate names
15:20:28 <portdirect> but the entrypoint service account would be the same for both
15:21:26 <srwilkers> okay, thats a misunderstanding on my part then
15:21:55 <portdirect> though it does touch on tins rbac work - and how much simpler that will make things - can we add that to parking lot
15:22:27 <mattmceuen> yup
15:22:34 <mattmceuen> Can the values file configurability be done in a follow-on PS?
15:23:18 <srwilkers> it could be.  the prime value add there in my mind is that we could then configure fluentd to capture the logs running in the osh-infra gates
15:24:21 <mattmceuen> Cool - I'm  looking forward to getting the great work to date merged if possible
15:24:54 <mattmceuen> So where did you land srwilkers - do you think we need to split the fluentd chart after all?
15:26:30 <srwilkers> its my opinion that it'd make things cleaner and i dont think the collector and aggregator need to be coupled in the same chart, but thats just my opinion
15:26:40 <srwilkers> im not entirely stuck on it
15:27:36 <mattmceuen> Would that be overly difficult to change later if we went down the single-chart path today?
15:27:40 <srwilkers> nah
15:28:04 <MarkBaker> o/
15:28:10 <srwilkers> hey MarkBaker
15:28:15 <mattmceuen> awesome - I am in git'er merged mode as the holidays approach :-D
15:28:21 <mattmceuen> GM MarkBaker!
15:28:36 <srwilkers> let me make sure nothing else needs to be cleaned up in that patchset then should be good to go
15:28:38 <alanmeadows> Egg Nog in one hand, +2 mouse in the other -- sounds dangerous.
15:28:56 <mattmceuen> alanmeadows same hand
15:29:03 <alanmeadows> nice
15:29:08 <portdirect> why does `merica not understand the benefits of mulled wine?
15:29:21 <mattmceuen> sounds like a cultural learning opportunity
15:29:24 <mattmceuen> srwilkers you keep the talking stick
15:29:28 <MarkBaker> alanmeadows, drinks egg nogs that require 2 hands?
15:29:29 <mattmceuen> #topic Prometheus 2.0
15:29:51 <mattmceuen> alanmeadows is a legit pro at egg nog
15:29:58 <alanmeadows> It comes in a stein
15:30:07 <srwilkers> so prometheus 2.0 was released a bit ago.  it brought some benefits im happy to see
15:30:21 <srwilkers> the storage layer was drastically reworked to improve performance and reduce resource consumption
15:31:00 <srwilkers> it also changed the rules format from gotpl to yaml, which makes me especially happy
15:31:21 * portdirect does happy dance
15:31:37 <srwilkers> ive got a patchset to change the prometheus chart in osh-infra to use prometheus 2.0 by default
15:32:00 <srwilkers> there are a few other items i want to get merged first before looking to merge it, but it works currently
15:32:14 <alanmeadows> That would be fantastic, one primary concern surrounding prometheus up until this point was its resource consumption
15:32:19 <srwilkers> one of the new storage features added was the ability to snapshot the time series database
15:32:49 <srwilkers> alanmeadows: yeah, i've had a few instances running at home and it wasnt uncommon for prometheus to fall over after chewing through resources
15:33:28 <srwilkers> i was curious if there was appetite for including a cron job in the prometheus chart for snapshotting the database at configured intervals
15:35:15 <portdirect> srwilkers: what would the objective of the cron job be? backup?
15:35:23 <alanmeadows> Beyond that, we should think about how we might trigger that action as well, and how we might apply the same approach to things like mariadb - preupgrade actions across all of these data warehouses
15:35:29 <srwilkers> portdirect: yep
15:35:48 <mattmceuen> so prometheus can have multiple servers replicating the same data in case one goes down.  Would we be using it that way?
15:35:51 <srwilkers> alanmeadows: also agree
15:35:54 <portdirect> we could really do with a `helm fire-hook foo`
15:36:02 <portdirect> that operates the same way test does
15:36:09 <alanmeadows> yes
15:36:21 <alanmeadows> really the ask would just be make 'test' arbitrary
15:37:01 <portdirect> should we look into the feasibility of making a ps for that?
15:37:17 <portdirect> ohh github - s/ps/pr
15:37:22 <mattmceuen> I like that idea
15:37:23 <srwilkers> i think that'd make sense
15:37:26 <alanmeadows> It satisfies two outstanding asks, being able to break tests apart into impacting vs non-impacting
15:37:37 <alanmeadows> and arbitrary actions like backups/snapshots/reversions/... ?
15:38:14 <portdirect> i think so
15:38:28 <portdirect> give us a new hammer, and we'll find nails...
15:38:45 <srwilkers> :)
15:38:52 <alanmeadows> or just hit things
15:38:59 <srwilkers> that too
15:39:06 <mattmceuen> Any other prom bits you want to cover now srwilkers?  I'm looking fw to 2.0
15:39:08 <srwilkers> but that concludes my points there
15:39:12 <srwilkers> nope, thats it for me
15:39:30 <mattmceuen> cool.  portdirect get ready
15:39:44 <mattmceuen> #topic The Future of Ceph!
15:39:58 <alanmeadows> Is this, the Ceph of Tomorrow, Today?
15:40:10 <mattmceuen> give us a glimpse of this amazing future, technologist
15:40:17 <portdirect> its the ceph of the futrure, tomorrow.
15:40:28 * alanmeadows sips some nog.
15:40:44 <portdirect> at kubecon i had some good chats with the ceph peeps re the various ceph-chart efforts
15:41:14 <portdirect> and I think (ok hope) that we all have the same desire for there to be one well maintained chart that deploys ceph
15:41:29 <portdirect> rather than the 3 or so versions I know of today.
15:41:51 <mattmceuen> variety is the spice of maintenance
15:42:00 <portdirect> the chart used by ceph/ceph-helm is actually a fork of ours, which in turn is a chartified version of seb-hans work
15:42:40 <portdirect> I put a summary of the steps that we hashed out to get to a single chart in the etherpad
15:42:54 <portdirect> for the sake of meeting logging I'll paste them here
15:44:14 <portdirect> As ceph goes much further than just OpenStack, it makes sense for this to be hosted either by Ceph, or in K8s/charts
15:44:15 <portdirect> ceph/ceph-helm is based on the osh ceph chart from approx 3 months ago
15:44:15 <portdirect> We met with the ceph maintainers (core team) at kubecon and discussed their desires/issues with both of our charts and come up with the following proposals:
15:44:15 <portdirect> 1) Split Keystone endpoint creation out of the ceph chart and into its own thing (that would live in OSH)
15:44:15 <portdirect> 2) Merge the healthchecks from OSH into Ceph-Helm
15:44:15 <portdirect> 3) Merge the luminous support from Ceph-Helm into OSH
15:44:15 <portdirect> 4) Update the loopback device creation scripts from bash to ansible
15:44:16 <portdirect> 5) Combine the disc targetting efforts from both OSH and Ceph-Helm into a single effort that brings the reliability of RH's approach with the OSD by bus-id from OSH
15:44:16 <portdirect> 6) The Ceph-Helm chart will then be moved/mirrored to k8s/charts
15:44:17 <portdirect> 7) At this point, add an OSH gates to experimentally use the Ceph-Helm chart
15:44:17 <portdirect> 8) Once stabilised and we have confidence, depreciate the OSH ceph chart
15:44:45 <portdirect> the order is obviously somewhat flexible - but as a general outline how does this seem?
15:47:16 <mattmceuen> digesting...
15:47:23 <alanmeadows> What is the destination, for example in #2 -- ceph/ceph-helm or K8s/charts?
15:47:45 <portdirect> ceph/ceph-helm
15:48:08 <alanmeadows> is this mismash of combination in various targets before aligning on one target because this spans a large period of time?
15:48:12 <portdirect> and then once the majority of big changes are done we move to k8s/charts
15:48:36 <portdirect> i would like us at 7 by eoy
15:48:49 <alanmeadows> i.e. #2 does work in ceph-helm, #3 in osh
15:48:50 <portdirect> and 8 in the first two weeks of next
15:49:34 <portdirect> yup - I have merge rights in ceph/ceph-helm to faciliate this moving faster
15:49:44 <jayahn> Hi late
15:49:51 <mattmceuen> hey jayahn!
15:50:32 <mattmceuen> portdirect:  s/disc/disk/ and I then I like the plan
15:50:52 <srwilkers> hey jayahn
15:50:55 <jayahn> Just fell a sleep. :)
15:51:16 <jayahn> While waiting for the meetinf
15:51:38 <srwilkers> just curious portdirect, as i havent paid much attention to the ceph work.  does the luminous support include enabling the built-in prometheus metrics exporter via ceph-mgr?
15:52:08 <srwilkers> as that makes the ceph-exporter work something we can drop once that's accomplished i think
15:53:17 <portdirect> srwilkers: it does :D
15:53:27 <srwilkers> nice :)
15:54:00 <mattmceuen> I think your plan is the plan portdirect, unless there are any other thoughts
15:54:37 <alanmeadows> It gets us to a unified chart the community owns, I'm all good
15:54:43 <mattmceuen> t minus 5 mins
15:55:22 <mattmceuen> and still agenda items - may have to punt till next week.  alanmeadows, will yours fit in 5?
15:55:25 <jayahn> unified ceph chart. sounds really good to me
15:56:37 <portdirect> if we can fit in alanmeadows's topic that would be great
15:56:43 <mattmceuen> #topic Holistic etcd approach
15:57:19 <mattmceuen> to quote alanmeadows:     Holistic etcd approach
15:57:19 <mattmceuen> Various charts trying to use etcd, can we (and should we) unify an approach, or let etcds sprinkle the cloud?
15:57:19 <mattmceuen> e.g. https://review.openstack.org/#/c/525752/
15:57:19 <mattmceuen> Rabbit would likely follow in approach at some point
15:57:19 <mattmceuen> Calico ....
15:57:37 <alanmeadows> I see a few different etcds popping up
15:58:12 <alanmeadows> This seems like we need to tackle this one if nothing else to be cognizant of what we're doing
15:59:03 <portdirect> agreed - I'd like us to get a solid etcd chart that we can use
15:59:04 <mattmceuen> Start with a spec of one etcd chart to rule them all?
15:59:29 <alanmeadows> I think so, with a few harder needs in mind
15:59:40 <alanmeadows> not just resiliency but backups, disaster recovery, and so on
15:59:44 <mattmceuen> Let's let this marinate and continue to discuss next time - we're out of time friends
15:59:51 <mattmceuen> thanks everyone
16:00:07 <mattmceuen> see y'all in the #openstack-helm !
16:00:10 <mattmceuen> #endmeeting