14:07:23 #startmeeting TripleO Edge Squad Meeting 14:07:24 Meeting started Thu Jan 10 14:07:23 2019 UTC and is due to finish in 60 minutes. The chair is slagle. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:07:25 ping slagle, csatari, jaosorior, owalsh, fultonj, gfidente, hjensas, jtomasek, bogdando, dtantsur, rbrady, d0ugal, toure, abishop 14:07:26 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:07:27 #info remove or update your nick from the Meeting Template on the etherpad if you want (or don't want) to be ping'd for the start of the meeting 14:07:29 The meeting name has been set to 'tripleo_edge_squad_meeting' 14:07:31 \o 14:07:32 o/ 14:07:33 #link https://etherpad.openstack.org/p/tripleo-edge-squad-status 14:07:34 o/ 14:07:35 o/ 14:07:39 o/ 14:07:48 hello! 14:07:52 o/ 14:08:03 Anyone can use the #link, #action, #help, #idea, #agreed, and #info commands, not just the moderatorǃ 14:08:05 #topic Agenda 14:08:07 * Review past action items 14:08:09 * Goals/action items for the week 14:08:10 o/ 14:08:31 o/ 14:08:41 #topic Review past action items 14:08:53 (sorry for the late start btw) 14:09:10 np 14:09:17 * slagle fultonj/slagle get help (perhaps from ci team) for ci inventory for second node for https://review.openstack.org/#/q/topic:bp/split-controlplane-ci (slagle, 14:13:30) 14:09:26 progress! 14:09:33 waiting for results of recheck of... 14:09:36 #link https://review.openstack.org/#/c/615988/ 14:09:41 (15 hours!) 14:10:06 thanks to bogdando slagle and panda for help with improvements of... 14:10:09 #link https://review.openstack.org/#/c/617368/ 14:10:29 o/ 14:10:33 np. it's looking pretty good 14:10:34 work should continue but i'll be in brno next week 14:10:57 o/ 14:11:19 fultonj: did you double check that 615988 is in the zuul queue? 14:11:23 so multinode job is triggering both standalone deployments but they're failing and coordination (including sharing of control plane data) needs improvement 14:11:34 slagle: i don't see it in zuul queue 14:11:44 suppose i could recheck again 14:11:57 fultonj: i think it needs to be rebased to clear the zuul merge conflict 14:12:07 should just be able to do it from the gerrit ui 14:12:16 John Fulton proposed openstack-infra/tripleo-ci master: WIP/DNM: Add split-controlplane standalone non-voting job definition https://review.openstack.org/615988 14:12:27 rebased 14:12:48 ok i see it in check now 14:12:53 thanks! 14:13:05 will continue to keep an eye on it 14:13:21 thanks 14:13:40 * slagle slagle send scaling doc to ML 14:13:47 Dougal Matthews proposed openstack/tripleo-common master: Workflow for provisioning bare metals without Nova https://review.openstack.org/576856 14:14:03 i don't think i did that... 14:14:29 i read scaling doc, looked good. many many parts. 14:14:58 ok i'll send it out. i don't see any further comments on it 14:15:14 #action slagle send scaling doc to ML 14:15:26 * slagle owalsh continue investigating nova AZ config 14:15:52 slagle: working on it right now 14:16:04 cool thanks! 14:16:10 owalsh nice work on #link https://review.gerrithub.io/c/redhat-openstack/infrared/+/435504 14:17:15 fultonj: thanks, not sure how accurate the traffic shaping is but it gets us something resembling a DCN deployment 14:17:52 must look at routed networks next 14:19:01 * slagle slagle document the split-controlplane deployment from a central undercloud (not using standalone) 14:19:01 has anyone tried split control plane deployments outside of standalone and gotten multi-ceph? 14:19:22 i didn't get to the docs but will try again this week 14:19:42 fultonj: i could try that in my environment. multiple ceph clusters you mean? 1 per DCN? 14:19:49 slagle: yes 14:20:02 e.g. do two DCNs and put a ceph on one of each 14:20:07 Merged openstack/tripleo-validations master: Fix ansible lint errors https://review.openstack.org/628953 14:20:13 ok, i was running through my environment for the docs, so i can try that 14:20:26 i'd like to try it too 14:20:34 EmilienM, hey o/ can we merge this? https://review.openstack.org/#/c/629485/1 and https://review.openstack.org/#/c/629484/ 14:20:38 Gael Chamoulaud proposed openstack/tripleo-validations stable/rocky: Fix ansible lint errors https://review.openstack.org/629871 14:20:38 #action slagle document the split-controlplane deployment from a central undercloud (not using standalone) 14:20:43 slagle: if you have a wip of your doc i could try with 3 VMs 14:20:53 #action slagle/fultonj try multi-ceph deployments with DCN 14:20:56 Quique Llorente proposed openstack/tripleo-quickstart-extras master: Rewrite nodepool-setup for the zuul repro https://review.openstack.org/629872 14:21:00 fultonj: will hopefully have something soon 14:21:08 Gael Chamoulaud proposed openstack/tripleo-validations master: Update the bugs link to storyboard https://review.openstack.org/627838 14:21:20 * slagle slagle update roles patch https://review.openstack.org/#/c/626563/ 14:21:21 vkmc: maybe :P 14:21:40 Gael Chamoulaud proposed openstack/tripleo-validations master: Fix the misspelling of "interface" https://review.openstack.org/628845 14:21:46 any thoughts on putting glance api into roles patch? 14:21:47 * vkmc sends alfajores to EmilienM 14:21:49 what about now? 14:21:53 * EmilienM +A 14:21:56 slagle abishop gfidente ^ ? 14:22:34 looks like omit for now 14:22:39 and add later 14:22:47 i guess that's all there is to say 14:22:49 for now 14:22:54 fultonj: my comments are noted in https://review.openstack.org/626563 14:23:01 fultonj abishop I realized we might have a problem there where we need CephExternal role for GlanceApi 14:23:10 yea, was just catching up on the comments about GlanceAPI on the review from this morning 14:23:20 for glanceapi only I mean 14:23:48 because cinder or nova instead won't use CephExternal but the other Ceph services as they need to be pointed to the local Ceph cluster 14:23:49 abishop: do you want to document somewhere how you got this to work? 14:23:55 gfidente: yeah, this needs more thought, and probably input from glance folks 14:24:05 just your poc i mean. so other folks can try it 14:24:14 we don't want this on all distributed computes though, *maybe* on the HCI nodes 14:25:36 Quique Llorente proposed openstack/tripleo-quickstart-extras master: Rewrite nodepool-setup for the zuul repro https://review.openstack.org/629872 14:25:38 so ceph is there but doesn't get configured unless you -e environments/ceph-ansible/... 14:25:43 slagle: can do that, but I'm pondering the best place/method for describing what I did (could just add doc notes to my PoC?) 14:25:57 abishop: sure, just etherpad or whereever 14:26:00 but for larger sites probably need an "edge controller" role that includes glance api cache, nova-conductor, mq, and db 14:26:36 EmilienM++ 14:26:38 slagle: sure, will put together something, which we can move to a better spot as necessary 14:26:53 thanks 14:27:02 owalsh: ack, "edge controller" role would be a new thing 14:27:11 slagle abishop fultonj basically what is concerning me is that we need both Ceph and CephExternal services enabled on the same role 14:27:18 #action abishop document GlanceAPI at the edge for DCN 14:27:23 and I don't think we want to introduce another db 14:27:31 abishop: that's for multicell 14:27:46 owalsh: ah 14:28:24 could have another role too with just the glance api cache, e.g for sites with limited bandwith (otherwise it's kinda pointless, not going to save much by caching the images for 3 nodes) 14:28:36 gfidente: can you elaborate on ceph-extrnal and ceph-internal? 14:29:14 fultonj: I'm guessing "ceph" for edge site, and "ceph-external" would be ceph at central site 14:29:18 owalsh: does nova-compute not cache images on its own? 14:29:27 slagle: yea 14:29:28 fultonj so might end up in a situation where, if we colocate glance api with cinder-volume or nova-compute 14:29:36 Gael Chamoulaud proposed openstack/tripleo-validations master: Fix the containerized location of ceilometer ceilometer.conf https://review.openstack.org/625083 14:29:48 fultonj in which case glance needs to be pointed to the central ceph cluster, while cinder and nova do not 14:29:51 slagle, owalsh: yeah, that would mitigate need for glance doing caching at edge 14:29:57 slagle: and if we bump the TCP windows then nova-compute *should* be able to download the images reasonable fast over hight latency 14:30:10 glance -> swift | nova/cinder -> ceph 14:30:23 owalsh is nova compute keeping the images cache after guests using such an image are deleted? 14:30:27 glance caches to ephemeral local disk on edge 14:30:47 fultonj yeah if glance in the edge goes to swift, then yes it all works 14:30:55 gfidente: not sure why it cleans up the images, I'll check 14:30:57 fultonj the problem is if in central site we use ceph as well 14:31:02 fultonj: ack, that's what I do in my PoC (glance uses swift, which is enabled in central site) 14:31:06 s/why/when 14:31:13 abishop that explains it then 14:31:21 abishop glance/swift in central 14:31:27 in addition to it being in Alan's the POC that set up is documented 14:31:31 #link https://docs.openstack.org/tripleo-docs/latest/install/containers_deployment/standalone.html#example-2-nodes-2-nic-using-remote-compute-with-tenant-and-provider-networks 14:31:39 (which was one of the AIs) 14:32:51 e.g. "GlanceBackend: swift | GlanceCacheEnabled: true" 14:33:00 owalsh: can't you also pre-cache images for nova-compute? 14:33:13 and tell it to keep them around 14:33:21 slagle: not sure... saw something about that but I couldn't see an implementation 14:33:33 slagle: so you're saying, why have glance cache if nova can cache? 14:33:43 fultonj: kind of wondering :) 14:33:47 good question 14:33:59 fultonj: and why cache if downloading is fast (enough) and bandwidth isn't constrained 14:34:00 https://blueprints.launchpad.net/nova/+spec/nova-image-cache-management-2 14:34:25 2012 :). probably done and since been deprecated by now 14:34:29 but we should look into it 14:34:39 Here is a spec for the image chaching in Glance: https://review.openstack.org/#/c/619638/ 14:34:43 VNF images are big 14:34:47 tend to be 14:35:39 fultonj: cache still has to fetch it so it depends on how many computes nodes would be using it 14:35:47 if they break you want to bring them up quickly, so i heard people asking for a solution to that problem 14:36:00 slagle well first because every nova node has its own cache, second because I think it deletes cached images when all guests using it are gone 14:36:02 yes, our assumption is that one of them would be running 14:36:05 only the second one is fast 14:36:12 (to boot) 14:36:32 gfidente: this spec/bp seems to indicate otherwise. but investigation is needed 14:37:16 not necessarily faster - e.g if the center->edge link is 10gbit/s it might be the disk on the glance cache that limits it 14:37:31 #action continue investigating glance and nova image caching status/options 14:37:32 slagle doesn't look like it ever got implemented 14:37:44 although assuming the central glance is busier, it gives us a way to scale out glance 14:38:07 owalsh yeah scales as well 14:38:25 owalsh slagle but we could consider it optional 14:38:57 ack, similar with nova cells which is why I'd combine the two in an edge controller role 14:38:58 Merged openstack/tripleo-heat-templates master: Ensure /var/lib/iscsi actually exists before mounting it https://review.openstack.org/629132 14:39:10 gfidente: there is also remove_unused_base_images which is a boolean 14:39:24 slagle yeah that I saw 14:39:44 ok, well if anyone has time to dig into this a bit, i added an action item above 14:39:46 and maybe an edge controller + compute role, and maybe just and edge glance cache + compute role, plus all of these with ceph too... but that's getting a bit ridiculous 14:40:21 owalsh: i don't think pulling from central will work in all cases so we should have some kind of localization answer for those who ask for it 14:41:10 fultonj: ack, definitely has some use cases 14:42:29 #topic Goals/action items for the week 14:43:10 * fultonj will be out next week 14:43:21 one thing i wanted to mention is that the CFP for the summit ends January 23 14:43:27 ~2 weeks 14:43:36 if we want to submit anything tripleo/edge related 14:44:05 we could brainstorm some ideas on the squad etherpad 14:44:25 Dan Prince proposed openstack/tripleo-heat-templates master: WIP: flatten the neutron service configurations https://review.openstack.org/629664 14:45:45 slagle: split control plane deployment demo w/ multiceph for summit? 14:46:13 fultonj: sure :) 14:47:49 anything else? 14:48:26 #endmeeting