14:07:23 <slagle> #startmeeting TripleO Edge Squad Meeting
14:07:24 <openstack> Meeting started Thu Jan 10 14:07:23 2019 UTC and is due to finish in 60 minutes.  The chair is slagle. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:07:25 <slagle> ping slagle, csatari, jaosorior, owalsh, fultonj, gfidente, hjensas, jtomasek, bogdando, dtantsur, rbrady, d0ugal, toure, abishop
14:07:26 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:07:27 <slagle> #info remove or update your nick from the Meeting Template on the etherpad if you want (or don't want) to be ping'd for the start of the meeting
14:07:29 <openstack> The meeting name has been set to 'tripleo_edge_squad_meeting'
14:07:31 <dtantsur> \o
14:07:32 <jaosorior> o/
14:07:33 <slagle> #link https://etherpad.openstack.org/p/tripleo-edge-squad-status
14:07:34 <gfidente> o/
14:07:35 <d0ugal> o/
14:07:39 <fultonj> o/
14:07:48 <slagle> hello!
14:07:52 <bogdando> o/
14:08:03 <slagle> Anyone can use the #link, #action, #help, #idea, #agreed, and #info commands, not just the moderatorǃ
14:08:05 <slagle> #topic Agenda
14:08:07 <slagle> * Review past action items
14:08:09 <slagle> * Goals/action items for the week
14:08:10 <csatari> o/
14:08:31 <abishop> o/
14:08:41 <slagle> #topic Review past action items
14:08:53 <slagle> (sorry for the late start btw)
14:09:10 <d0ugal> np
14:09:17 * slagle fultonj/slagle get help (perhaps from ci team) for ci inventory for second node for https://review.openstack.org/#/q/topic:bp/split-controlplane-ci (slagle, 14:13:30)
14:09:26 <fultonj> progress!
14:09:33 <fultonj> waiting for results of recheck of...
14:09:36 <fultonj> #link https://review.openstack.org/#/c/615988/
14:09:41 <fultonj> (15 hours!)
14:10:06 <fultonj> thanks to bogdando slagle and panda for help with improvements of...
14:10:09 <fultonj> #link https://review.openstack.org/#/c/617368/
14:10:29 <owalsh> o/
14:10:33 <slagle> np. it's looking pretty good
14:10:34 <fultonj> work should continue but i'll be in brno next week
14:10:57 <toure> o/
14:11:19 <slagle> fultonj: did you double check that 615988 is in the zuul queue?
14:11:23 <fultonj> so multinode job is triggering both standalone deployments but they're failing and coordination (including sharing of control plane data) needs improvement
14:11:34 <fultonj> slagle: i don't see it in zuul queue
14:11:44 <fultonj> suppose i could recheck again
14:11:57 <slagle> fultonj: i think it needs to be rebased to clear the zuul merge conflict
14:12:07 <slagle> should just be able to do it from the gerrit ui
14:12:16 <openstackgerrit> John Fulton proposed openstack-infra/tripleo-ci master: WIP/DNM: Add split-controlplane standalone non-voting job definition  https://review.openstack.org/615988
14:12:27 <fultonj> rebased
14:12:48 <slagle> ok i see it in check now
14:12:53 <slagle> thanks!
14:13:05 <slagle> will continue to keep an eye on it
14:13:21 <fultonj> thanks
14:13:40 * slagle slagle send scaling doc to ML
14:13:47 <openstackgerrit> Dougal Matthews proposed openstack/tripleo-common master: Workflow for provisioning bare metals without Nova  https://review.openstack.org/576856
14:14:03 <slagle> i don't think i did that...
14:14:29 <fultonj> i read scaling doc, looked good. many many parts.
14:14:58 <slagle> ok i'll send it out. i don't see any further comments on it
14:15:14 <slagle> #action slagle send scaling doc to ML
14:15:26 * slagle owalsh continue investigating nova AZ config
14:15:52 <owalsh> slagle: working on it right now
14:16:04 <slagle> cool thanks!
14:16:10 <fultonj> owalsh nice work on #link https://review.gerrithub.io/c/redhat-openstack/infrared/+/435504
14:17:15 <owalsh> fultonj: thanks, not sure how accurate the traffic shaping is but it gets us something resembling a DCN deployment
14:17:52 <owalsh> must look at routed networks next
14:19:01 * slagle slagle document the split-controlplane deployment from a central undercloud (not using standalone)
14:19:01 <fultonj> has anyone tried split control plane deployments outside of standalone and gotten multi-ceph?
14:19:22 <slagle> i didn't get to the docs but will try again this week
14:19:42 <slagle> fultonj: i could try that in my environment. multiple ceph clusters you mean? 1 per DCN?
14:19:49 <fultonj> slagle: yes
14:20:02 <fultonj> e.g. do two DCNs and put a ceph on one of each
14:20:07 <openstackgerrit> Merged openstack/tripleo-validations master: Fix ansible lint errors  https://review.openstack.org/628953
14:20:13 <slagle> ok, i was running through my environment for the docs, so i can try that
14:20:26 <fultonj> i'd like to try it too
14:20:34 <vkmc> EmilienM, hey o/ can we merge this? https://review.openstack.org/#/c/629485/1 and https://review.openstack.org/#/c/629484/
14:20:38 <openstackgerrit> Gael Chamoulaud proposed openstack/tripleo-validations stable/rocky: Fix ansible lint errors  https://review.openstack.org/629871
14:20:38 <slagle> #action slagle document the split-controlplane deployment from a central undercloud (not using standalone)
14:20:43 <fultonj> slagle: if you have a wip of your doc i could try with 3 VMs
14:20:53 <slagle> #action slagle/fultonj try multi-ceph deployments with DCN
14:20:56 <openstackgerrit> Quique Llorente proposed openstack/tripleo-quickstart-extras master: Rewrite nodepool-setup for the zuul repro  https://review.openstack.org/629872
14:21:00 <slagle> fultonj: will hopefully have something soon
14:21:08 <openstackgerrit> Gael Chamoulaud proposed openstack/tripleo-validations master: Update the bugs link to storyboard  https://review.openstack.org/627838
14:21:20 * slagle slagle update roles patch https://review.openstack.org/#/c/626563/
14:21:21 <EmilienM> vkmc: maybe :P
14:21:40 <openstackgerrit> Gael Chamoulaud proposed openstack/tripleo-validations master: Fix the misspelling of "interface"  https://review.openstack.org/628845
14:21:46 <fultonj> any thoughts on putting glance api into roles patch?
14:21:47 * vkmc sends alfajores to EmilienM
14:21:49 <vkmc> what about now?
14:21:53 * EmilienM +A
14:21:56 <fultonj> slagle abishop gfidente ^ ?
14:22:34 <fultonj> looks like omit for now
14:22:39 <fultonj> and add later
14:22:47 <fultonj> i guess that's all there is to say
14:22:49 <fultonj> for now
14:22:54 <abishop> fultonj: my comments are noted in https://review.openstack.org/626563
14:23:01 <gfidente> fultonj abishop I realized we might have a problem there where we need CephExternal role for GlanceApi
14:23:10 <slagle> yea, was just catching up on the comments about GlanceAPI on the review from this morning
14:23:20 <gfidente> for glanceapi only I mean
14:23:48 <gfidente> because cinder or nova instead won't use CephExternal but the other Ceph services as they need to be pointed to the local Ceph cluster
14:23:49 <slagle> abishop: do you want to document somewhere how you got this to work?
14:23:55 <abishop> gfidente: yeah, this needs more thought, and probably input from glance folks
14:24:05 <slagle> just your poc i mean. so other folks can try it
14:24:14 <owalsh> we don't want this on all distributed computes though, *maybe* on the HCI nodes
14:25:36 <openstackgerrit> Quique Llorente proposed openstack/tripleo-quickstart-extras master: Rewrite nodepool-setup for the zuul repro  https://review.openstack.org/629872
14:25:38 <fultonj> so ceph is there but doesn't get configured unless you -e environments/ceph-ansible/...
14:25:43 <abishop> slagle: can do that, but I'm pondering the best place/method for describing what I did (could just add doc notes to my PoC?)
14:25:57 <slagle> abishop: sure, just etherpad or whereever
14:26:00 <owalsh> but for larger sites probably need an "edge controller" role that includes glance api cache, nova-conductor, mq, and db
14:26:36 <vkmc> EmilienM++
14:26:38 <abishop> slagle: sure, will put together something, which we can move to a better spot as necessary
14:26:53 <slagle> thanks
14:27:02 <fultonj> owalsh: ack, "edge controller" role would be a new thing
14:27:11 <gfidente> slagle abishop fultonj basically what is concerning me is that we need both Ceph and CephExternal services enabled on the same role
14:27:18 <slagle> #action abishop document GlanceAPI at the edge for DCN
14:27:23 <abishop> and I don't think we want to introduce another db
14:27:31 <owalsh> abishop: that's for multicell
14:27:46 <abishop> owalsh: ah
14:28:24 <owalsh> could have another role too with just the glance api cache, e.g for sites with limited bandwith (otherwise it's kinda pointless, not going to save much by caching the images for 3 nodes)
14:28:36 <fultonj> gfidente: can you elaborate on ceph-extrnal and ceph-internal?
14:29:14 <abishop> fultonj: I'm guessing "ceph" for edge site, and "ceph-external" would be ceph at central site
14:29:18 <slagle> owalsh: does nova-compute not cache images on its own?
14:29:27 <owalsh> slagle: yea
14:29:28 <gfidente> fultonj so might end up in a situation where, if we colocate glance api with cinder-volume or nova-compute
14:29:36 <openstackgerrit> Gael Chamoulaud proposed openstack/tripleo-validations master: Fix the containerized location of ceilometer ceilometer.conf  https://review.openstack.org/625083
14:29:48 <gfidente> fultonj in which case glance needs to be pointed to the central ceph cluster, while cinder and nova do not
14:29:51 <abishop> slagle, owalsh: yeah, that would mitigate need for glance doing caching at edge
14:29:57 <owalsh> slagle: and if we bump the TCP windows then nova-compute *should* be able to download the images reasonable fast over hight latency
14:30:10 <fultonj> glance -> swift | nova/cinder -> ceph
14:30:23 <gfidente> owalsh is nova compute keeping the images cache after guests using such an image are deleted?
14:30:27 <fultonj> glance caches to ephemeral local disk on edge
14:30:47 <gfidente> fultonj yeah if glance in the edge goes to swift, then yes it all works
14:30:55 <owalsh> gfidente: not sure why it cleans up the images, I'll check
14:30:57 <gfidente> fultonj the problem is if in central site we use ceph as well
14:31:02 <abishop> fultonj: ack, that's what I do in my PoC (glance uses swift, which is enabled in central site)
14:31:06 <owalsh> s/why/when
14:31:13 <gfidente> abishop that explains it then
14:31:21 <gfidente> abishop glance/swift in central
14:31:27 <fultonj> in addition to it being in Alan's the POC that set up is documented
14:31:31 <fultonj> #link https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/standalone.html#example-2-nodes-2-nic-using-remote-compute-with-tenant-and-provider-networks
14:31:39 <fultonj> (which was one of the AIs)
14:32:51 <fultonj> e.g. "GlanceBackend: swift | GlanceCacheEnabled: true"
14:33:00 <slagle> owalsh: can't you also pre-cache images for nova-compute?
14:33:13 <slagle> and tell it to keep them around
14:33:21 <owalsh> slagle: not sure... saw something about that but I couldn't see an implementation
14:33:33 <fultonj> slagle: so you're saying, why have glance cache if nova can cache?
14:33:43 <slagle> fultonj: kind of wondering :)
14:33:47 <fultonj> good question
14:33:59 <owalsh> fultonj: and why cache if downloading is fast (enough) and bandwidth isn't constrained
14:34:00 <slagle> https://blueprints.launchpad.net/nova/+spec/nova-image-cache-management-2
14:34:25 <slagle> 2012 :). probably done and since been deprecated by now
14:34:29 <slagle> but we should look into it
14:34:39 <csatari> Here is a spec for the image chaching in Glance: https://review.openstack.org/#/c/619638/
14:34:43 <fultonj> VNF images are big
14:34:47 <fultonj> tend to be
14:35:39 <owalsh> fultonj: cache still has to fetch it so it depends on how many computes nodes would be using it
14:35:47 <fultonj> if they break you want to bring them up quickly, so i heard people asking for a solution to that problem
14:36:00 <gfidente> slagle well first because every nova node has its own cache, second because I think it deletes cached images when all guests using it are gone
14:36:02 <fultonj> yes, our assumption is that one of them would be running
14:36:05 <fultonj> only the second one is fast
14:36:12 <fultonj> (to boot)
14:36:32 <slagle> gfidente: this spec/bp seems to indicate otherwise. but investigation is needed
14:37:16 <owalsh> not necessarily faster - e.g if the center->edge link is 10gbit/s it might be the disk on the glance cache that limits it
14:37:31 <slagle> #action continue investigating glance and nova image caching status/options
14:37:32 <gfidente> slagle doesn't look like it ever got implemented
14:37:44 <owalsh> although assuming the central glance is busier, it gives us a way to scale out glance
14:38:07 <gfidente> owalsh yeah scales as well
14:38:25 <gfidente> owalsh slagle but we could consider it optional
14:38:57 <owalsh> ack, similar with nova cells which is why I'd combine the two in an edge controller role
14:38:58 <openstackgerrit> Merged openstack/tripleo-heat-templates master: Ensure /var/lib/iscsi actually exists before mounting it  https://review.openstack.org/629132
14:39:10 <slagle> gfidente: there is also remove_unused_base_images which is a boolean
14:39:24 <gfidente> slagle yeah that I saw
14:39:44 <slagle> ok, well if anyone has time to dig into this a bit, i added an action item above
14:39:46 <owalsh> and maybe an edge controller + compute role, and maybe just and edge glance cache + compute role, plus all of these with ceph too... but that's getting a bit ridiculous
14:40:21 <fultonj> owalsh: i don't think pulling from central will work in all cases so we should have some kind of localization answer for those who ask for it
14:41:10 <owalsh> fultonj: ack, definitely has some use cases
14:42:29 <slagle> #topic Goals/action items for the week
14:43:10 * fultonj will be out next week
14:43:21 <slagle> one thing i wanted to mention is that the CFP for the summit ends January 23
14:43:27 <slagle> ~2 weeks
14:43:36 <slagle> if we want to submit anything tripleo/edge related
14:44:05 <slagle> we could brainstorm some ideas on the squad etherpad
14:44:25 <openstackgerrit> Dan Prince proposed openstack/tripleo-heat-templates master: WIP: flatten the neutron service configurations  https://review.openstack.org/629664
14:45:45 <fultonj> slagle: split control plane deployment demo w/ multiceph for summit?
14:46:13 <slagle> fultonj: sure :)
14:47:49 <slagle> anything else?
14:48:26 <slagle> #endmeeting