18:01:12 #startmeeting container-networking 18:01:14 Meeting started Thu Nov 19 18:01:12 2015 UTC and is due to finish in 60 minutes. The chair is daneyon_. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:01:15 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:01:18 The meeting name has been set to 'container_networking' 18:01:21 Agenda 18:01:26 #link https://wiki.openstack.org/wiki/Meetings/Containers#Agenda 18:01:38 I'll give everyone a minute to review the agenda 18:02:17 #topic roll call 18:02:36 o/ 18:02:38 o/ 18:02:51 hi 18:03:17 Thank you hongbin Tango gus for joining 18:03:19 gus: Pretty early for you? 18:03:26 o/ 18:03:40 Tango: yep, 5am - so use small words ;) 18:03:52 Thank you eghobo_ for joining 18:03:58 o/ 18:03:58 #topic Flannel host-gw patch 18:04:03 #link https://review.openstack.org/#/c/241866/ 18:04:09 skipped roll call? 18:04:20 oh, lagged 18:04:23 I think the community has settled on option 2 of the patch 18:04:49 which means the patch has to be refactored since it implements option 3 18:05:11 Thank you adrian_otto for joining 18:05:17 So I tried out the host-gw backend on our atomic-5 image but couldn't get it to work. 18:05:29 No connectivity between the containers 18:05:39 anyone willing to refactor the host-gw patch to implement option 2? 18:05:46 Wonder if there is anything else we need to configure 18:06:21 daneyon_: I would be interested to carry it forward. 18:06:32 Tango you may need to open-up the neutron anti-spoofing iptables (enabled by default) 18:06:47 Tango that's great 18:07:19 daneyon_: Does that prevent rules to be added to the iptables? Because I saw no change in the NAT table. 18:07:21 Tango: yep, it needs to "spoof" IP, but not MAC> 18:07:22 . 18:07:29 If you haven;t done so already, can you leave a comment in the review letting everyone know you are going to carry it forward using option 2? 18:07:49 daneyon_: Sure, will do that. 18:08:31 Tango I have yet to play with the host-gw mode, but I would not be surprised if neutron's anti-spoofing is blocking. ping me more offline to discuss 18:08:46 i'll move the agenda forward 18:08:48 daneyon_: ok, sounds good. 18:08:55 #topic Latest on Swarm CNM patches 18:08:59 #link https://review.openstack.org/#/c/245286/2 18:09:03 #link https://review.openstack.org/#/c/244978/ 18:09:07 #link https://review.openstack.org/#/c/244848/ 18:09:20 good news, 1 of the 3 merged... yay!!!! 18:09:30 and the other 2 are well on their way. 18:09:56 adrian_otto if you have time it would be much appreciated if you could review the final 2. 18:10:11 i have addressed comments from previous patch sets 18:10:16 and i think they are ready to go 18:10:34 otherwise we need another core to +2 and +1 workflow 18:10:57 then we'll have the flannel network-driver implemented for Swarm :-) 18:11:24 unless their are any questions, I'll move the agenda forward. 18:11:28 daneyon_: I will make time to review them, thanks 18:11:43 adrian_otto thank you sir. 18:11:46 #topic Flannel for Mesos Update 18:11:59 #link https://bugs.launchpad.net/magnum/+bug/1516767 18:11:59 Launchpad bug 1516767 in Magnum "Add etcd support in mesos bay" [Undecided,New] 18:12:12 * gus is new to this one and the bug doesn't give background - why are we considering flannel for mesos? 18:12:22 everyone take a moment to checkout out the link 18:13:02 hongbin and i discussed not implementing the flannel driver for mesos until etcd is supported and has a chance to make 18:13:21 Yes, it is 18:13:31 My understanding is that mesos uses host IPs (or things NATted to look like host IPs), and so doesn't need any additional routing help since the hosts already know how to reach each other.. 18:13:32 otherwise we have to make considerable changes to how magnum implements k/v stores 18:14:10 gus, that is correct. mesos is using "legacy" docker networking... containers get nat'd by the host (vm in our case) 18:14:24 ok, so where does flannel come into it> 18:14:24 ? 18:14:35 this mesos issue brings up an important point of the Magnum CNM... 18:14:58 Not every network-driver needs to be implemented across all bay types 18:15:32 I agree 18:15:38 right - I don't think there's any need for any additional network driver for mesos? 18:15:42 +1 18:15:45 If flannel doesn't fit into mesos, we don't have to support that 18:16:10 Somce network-drivers may provide significant value to a particular bay type, but not to others. We should be supportive of this instead of trying to jam a particular networking implementation across all bay types 18:16:14 thoughts? 18:16:23 Agree 18:16:42 It may not make sense to fill out the matrix 18:17:05 I hadn't realised network drivers were being considered *across* bay types. Yes, I strongly suggest network drivers should only be relevant/useful within specific bay types. 18:17:39 gus just as with swarm, an operator/user may want to use flannel to realize the container network within the cluster... For example, if they do not want container ip's nat'd. 18:18:07 So that raises the question: how do we communicate which is supported where? 18:18:10 daneyon_: except then you aren't using mesos, right? 18:18:13 this is what the Magnum CNM is all about, providing choice of container networking implementations, while using sensible defaults for simplicity 18:18:30 mesos does not yet support the Magnum CNM 18:19:06 we were planning on using flannel as the 1st implementation of the CNM for mesos, but then we hit the etcd issue 18:19:21 the idea is to have what we as a community agree is the best default choice for networking for each bay type, and a way that alternate choices can be selected when needs justify a departure from the default. 18:19:56 consistent application of one networking type for all bays is a non-goal. 18:20:04 this caused me to think about our general approach to implementing the CNM... in that every net-driver does not need to necessarily cover every bay type (ex > mesos). 18:20:53 Tango a net-driver support matrix is also what hongbin and i discussed 18:21:13 any takers to create the matrix on the wiki? 18:21:38 daneyon_: You would be the best candidate :) 18:22:03 #action Tango to implement option 2 in the flannel host-gw patch https://review.openstack.org/#/c/241866/ 18:22:14 Tango agreed 18:22:31 #action danehans to create a network-driver support matrix 18:23:40 gus the matrix would show a N/A or not supported for flannel/mesos within the matrix 18:23:56 daneyon_: yep, makes sense. 18:24:05 adrian_otto thx for sharing 18:24:24 so we are in agreement on this topic 18:24:35 thanks for the discussion. 18:24:47 #topic Open Discussion 18:25:05 I have a topic to kick it off and it includes Tango 18:25:28 Tango ran into an issue that requires the flannel pkg to be upgraded 18:25:55 however our version of atomic (fc21) does not support the req flannel pkg. 18:26:44 A question. Why not put flannel in a container? 18:26:46 we can do 1 of 2 things: 1. live with the bug (vxlan does not work) or 2. Update the image so we can support a newer version of atomic and req flannel pkg 18:26:50 thoughts? 18:27:15 hongbin not a bad idea 18:27:23 has anyone tried running flannel in a container 18:27:44 you's ned to use --net=host 18:27:49 in my experience working on the kolla project, some services are a PITA to containerize 18:28:12 it may be easier updating the atomic version, which i think needs to be done no matter what 18:28:16 hongbin: I am going to try it for one of my real deployment and will share experience later 18:28:23 and containerizing flannel could be a follow-on 18:28:47 Some update: I saw kojipkgs has new update for flannel. I tried building the packages but they all failed, even with the fc22 base. 18:28:47 eghobo_ that would be great to know 18:29:02 eghobo_: thx 18:29:09 I would urge us not to just containerize flannel as a point solution, but could we containerize the full scope of what we run on bay nodes? 18:29:23 it might be much easier to do image maintenance that way 18:29:30 i have a feeling that adrian_otto is correct using the --net=host, but that should not be an issue since the tenant owns the nova vm 18:30:09 it could potentially make bay nodes a bit more secure as well 18:30:19 adrian_otto agreed. and containerizing flannel would be just 1 of the work streams of the BP. 18:30:24 adrian_otto: we cannot put kublet at container yet 18:30:34 We have BP to containerize k8s into containers 18:30:37 #link https://blueprints.launchpad.net/magnum/+spec/run-kube-as-container 18:30:37 adrian_otto: I've certainly run all these bits in the past in containers, with liberal use of --net=host and --privileged. 18:30:46 is anyone interested in creating this BP and kicking off the implementation? 18:31:07 eghobo_: I think it's possible with some elbow grease 18:31:42 Would the BP cover just Flannel, or other services as well? 18:31:45 gus: I talked with Kub and Docker folks and volumes are not going to work 18:32:16 eghobo_: depending on where you mount things from yes 18:32:17 Tango I think we just leverage the bp hongbin shared 18:32:17 as Tim K said only 90% functionality will work 18:33:00 I think it would be nice to expand the language in that bp, so it's not only k8s... but i'm not hung up on it. 18:33:03 eghobo_: do you know exactly what won't work? 18:33:32 it's kind of a question of what you actually _want_ to run on the outside - you could run an entire nested docker for example, but you probably don't want to. 18:33:59 as with the network-driver stuff, even if we can't containerize all services, let's containerize what makes sense. 18:34:26 +1 18:34:56 note flannel is also implemented in go, so "upgrading" flannel is about as easy as installing a container - you just need to install a single file. 18:35:04 does anyone want to lead the effort of containerizing the services, maybe starting off with flannel? 18:35:17 great, let's proceed on that track, and apply what we learn. 18:35:33 daneyon_: wanghua is the owner of the BP 18:35:46 daneyon_: He said he was interesting to work on that 18:36:07 #danehans check with wanghua on implementation status of https://blueprints.launchpad.net/magnum/+spec/run-kube-as-container 18:36:13 #action danehans check with wanghua on implementation status of https://blueprints.launchpad.net/magnum/+spec/run-kube-as-container 18:37:28 i still think we need to update the atomic image to a newer release, but that's a bit outside the scope of networking. 18:38:06 any other discussion topics? 18:39:03 I am checking the logs for the failed flannel builds. We may need to get help from the IRC. 18:39:40 gus that is correct re: the single flanneld bin. however, we have been using pkg's from the distro provider 18:40:33 yeah sure, just highlighting that we might not need to do that. We're only supporting a single architecture so just downloading our own flannel binary somewhere is quite feasible. 18:40:47 Tango it was my understanding that the newer flannel src rpm's will not build w/o going to atomic f22 or newer 18:41:10 daneyon_: I tried building it against fc22, but that also failed 18:41:25 oh and re running everything in containers: I think mesos slaves only have support for a single "special" docker container (--docker_mesos_image) - it will assume it can garbage collect all the others. 18:41:40 it seems like we are running an old school version of atomic and no matter what we do with flannel we should maintain the distro image 18:41:55 or drop it and just focus on coreos, ubuntu, etc.. 18:42:34 gus: there is bunch of images at https://hub.docker.com/u/mesoscloud/ 18:42:57 but I have no idea they works or not ): 18:43:22 eghobo: yep, the mesos slave is usually just the single service. But if we start putting various cluster-admin pieces in containers that might no longer be true. 18:43:23 gus agreed. Since it's a broader discussion topic, we just need to address the design change (from pkg's to src) with the community. 18:44:30 i took an actio to check with wanghua on the status of the BP. If it's stalled, we will need someone to drive the BP forward 18:44:56 or update the atomic image to f22, update the flannel pkg, etc.. 18:45:34 or we just live with the bug :-( Not so great since vxlan provides better performance than UDP and we don;t have the host-gw option implemented yet. 18:45:48 I am following up on the image 18:45:56 thanks Tango 18:46:51 What's the appetite for using a complex neutron setup underneath? 18:47:06 gus: Can you elaborate? 18:47:13 i think if we containerize or not, their will be some services (at least for the foreseeable future) that can not be containerize and we have to support them directly on the OS. 18:47:20 I'm thinking of something like a route per host - ie: doing all of flannel's work in neutron itself. 18:47:35 (More like what k8s does on GCE) 18:48:02 Would that be BP or patches done in neutron? 18:48:11 gus: every bay gets a neutron router and a shared network, a float is created so you can ssh to the nodes from outside 18:48:16 downside: In a typical neutron setup using linuxbridge or ovs (without DVR), it won't scale well in practice since the traffic would all go up through the network node. 18:48:16 i think it's pretty basic 18:48:47 we are using the neutron lbaas, but that is needed for ha of api, etcd, etc.. 18:49:10 daneyon_: right, it's basic currently. Given the questions raised in the host-gw discussion about scaling beyond a single L2 network, I was just thinking of what that future might look like. 18:49:52 gus i'm open to hearing other neutron network designs. Feel free to start the discussion on the ML 18:50:17 daneyon_: ack. 18:50:37 gus: Just confirm. Do you know kuryr? 18:50:45 hongbin: yes. 18:50:55 note: that we also need to communicate with the kuryr team. 18:51:02 gus: And we plan to have a kuryr net-driver 18:51:07 so we are in sync 18:51:26 any other discussion? 18:51:38 if not i'll close the meeting 18:51:54 hongbin: right - if it's just a matter of routes, we can probably assemble it using the current neutron api - but it would have some serious implications depending on how neutron itself was configured. 18:52:08 daneyon_: (nothing else from me) 18:52:15 ok 18:52:29 thanks everyone for joining and for the great discussion 18:52:38 #endmeeting