18:01:37 <daneyon_> #startmeeting container-networking
18:01:38 <openstack> Meeting started Thu Jan  7 18:01:37 2016 UTC and is due to finish in 60 minutes.  The chair is daneyon_. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:01:39 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:01:42 <openstack> The meeting name has been set to 'container_networking'
18:01:46 <daneyon_> Agenda
18:01:51 <daneyon_> #link https://wiki.openstack.org/wiki/Meetings/Containers#Agenda
18:02:02 <daneyon_> i'll give everyone a few minutes to review the agenda.
18:02:08 <daneyon_> #topic roll call
18:02:15 <dane_leblanc_> o/
18:02:34 <Tango> o/
18:03:15 <daneyon_> Thank you dane_leblanc_ Tango for joining.
18:03:30 <dane_leblanc_> daneyon: The agenda link is for containers, not container networking?
18:04:29 <daneyon_> you will need to scroll down to hit the subteam agenda
18:04:29 <dane_leblanc_> #link https://wiki.openstack.org/wiki/Meetings/Containers#Container_Networking_Subteam_Meeting
18:04:35 <daneyon_> that's it, thx
18:05:22 <daneyon_> #topic Flannel host-gw patch
18:05:29 <daneyon_> #link https://review.openstack.org/#/c/241866/
18:06:09 <Tango> So I got to spend some more time on this in December before the break
18:06:11 <daneyon_> Tango have you been able to spend any time on this patch?
18:06:26 <daneyon_> how is it coming along?
18:06:46 <Tango> I debugged and found the problem:  an extra route on the local host causing the response to be trapped
18:07:06 <Tango> So the work around is to delete these extra route, and everything works very well
18:07:28 <Tango> I notified Angus, he is looking into replicating the problem to fix the code
18:07:38 <daneyon_> so basicly each flanneld host creates a route to other flanneld hosts for the flannel subnet that sits behind the host, correct?
18:08:05 <Tango> Yes, there is one route in the routing table for each other flannel host
18:08:20 <Tango> this allows the packet to reach the target host without encapsulation
18:08:28 <daneyon_> right
18:08:46 <Tango> The performance looks very good:  on a 10 GBits/sec network, I am getting 6 GBits/sec
18:08:50 <Tango> with hostgw
18:09:02 <daneyon_> so this adding routes to the other flannel subnets with the next-hop of each flannel host would be expected.
18:09:13 <daneyon_> is the route you;re reffering to some other route that is being added?
18:09:25 <daneyon_> nice!
18:09:30 <Tango> For vxlan, I get 1.7 Gbits/sec
18:09:42 <dane_leblanc_> Wow, significant difference
18:09:52 <daneyon_> i would expect a big difference
18:09:54 <Tango> And for udp, it's pretty bad:  0.385 GBits/sec
18:10:14 <dane_leblanc_> Is this for packets of 500 bytes or so?
18:10:14 <Tango> So this does answer the cost of encapsulation
18:10:22 <hongbin> o/
18:10:30 <Tango> The MTU is set at 1500
18:10:42 <Tango> I use iperf3 to run the test
18:10:50 <daneyon_> i think the flannel vxlan will be a much better option when magnum has support for ironic hosts and the cloud provider has vxlan hw in the ironic hosts
18:10:57 <eghobo> Tango: we got the same numbers at our infra ;)
18:11:14 <daneyon_> hi hongbin thanks for joining
18:11:24 <Tango> eghobo: great, coorelation
18:11:47 <daneyon_> eghobo for vxlan, hostgw mode or both?
18:12:13 <eghobo> udp, vxlan
18:12:25 <Tango> So I am picking up the patch that Angus started, with the work around for now until the bug is fixed in Flannel
18:12:46 <daneyon_> Tango could you push the latest patch set so others can test drive?
18:13:01 <eghobo> i need OpenStack changes for hostgw, but no changes allow during holidays
18:13:19 <Tango> yep, working on that right now, should have it shortly
18:13:27 <dane_leblanc_> Tango: the bug description lists the host-gw option as unconditional. Should the description be changed to what's listed as option (2)?
18:14:03 <Tango> I will change the option so that the user can specify any option:  udp, vxlan, hostgw
18:14:19 <daneyon_> Tango +1 re dane_leblanc_ updating the commit message to state option 2
18:14:25 <dane_leblanc_> Tango: Sounds great.
18:14:38 <daneyon_> Tango I am still confused on the extra route issue
18:15:02 <daneyon_> I understand how hostgw mode works, what was the problem with the routes being added?
18:15:14 <Tango> So there is one route for every other flannel host:  if there are n hosts, you would see n-1 route in the table
18:15:43 <Tango> The problem is that there is one route for the local host itself, which is not needed and not correct
18:15:58 <daneyon_> right, these routes are how the flannel host selects the correct hop to send packets.
18:16:00 <Tango> so the number of route is n routes instead of n-1
18:16:12 <daneyon_> ah
18:16:16 <daneyon_> i see now
18:16:41 <Tango> The local route confuses the response packet
18:16:47 <daneyon_> so, that must be an issue when using hostgw in a standalone setup too
18:17:30 <Tango> probably
18:17:51 <daneyon_> is their an issue that has been created in upstream flannel? If not, I think it's a good idea to create one and add a link to it in the review
18:18:12 <daneyon_> Tango thanks for pushing through the issue
18:18:26 <Tango> One question I want check with everyone is, what should the default backend option be:  udp, vxlan, or hostgw
18:18:36 <daneyon_> I think the hostgw option will make a lot of magnum users :-)
18:19:01 <hongbin> Yes, if it works
18:19:04 <daneyon_> I say leave the default as-is
18:19:09 <Tango> Angus was suggesting hostgw since it's the best, but this assumes all the nodes are on the same L2 network
18:19:22 <Tango> later when we add more advanced networking, it may break
18:19:23 <daneyon_> when the changes bake for a while and we get feedback, we can then change the default
18:19:36 <Tango> udp is the most general
18:20:00 <daneyon_> and hostygw is a good solution for a small-med size cluster... not for large though
18:20:07 <Tango> true
18:20:32 <Tango> so maybe we leave the default as udp, and in the user guide, give guidance on what to use
18:20:33 <dane_leblanc_> host-gw would eventually be a good option when we get Kuryr integrated, right?
18:20:47 <daneyon_> and when M supports ironic nodes and a cloud provider has vxlan hw, I would expect to see vxlan be a solid option for balancing scale and perf
18:21:13 <Tango> dane_leblanc_: maybe not, since kuryr would allow connecting between different networks
18:21:34 <daneyon_> dane_leblanc_ kuryr would be a completely different network-driver with it's own options
18:22:01 <dane_leblanc_> daneyon: I see.
18:22:23 <daneyon_> or kuryr would fall under the libnetwork driver and pass a label to specify which libnetwork driver (kuryr,calico,weave, etc.) to use
18:23:45 <daneyon_> eg --network-droiver=libnetwork, --label=libnetwork-driver=kuryr or --label=libnetwork-driver=overlay etc..
18:24:08 <daneyon_> s/droiver/driver
18:25:10 <daneyon_> #action Tango to update https://review.openstack.org/#/c/241866/ to include wip patch set and updated commit msg to indicate option 2
18:25:26 <daneyon_> any other discussion about the hostgw patch?
18:25:51 <daneyon_> #topic Review Action Items
18:26:00 * daneyon_ danehans to address the 2-daemon approach on the kube irc and provide add'l info through Magnum ML
18:26:22 <daneyon_> I sent an msg to the ML before the holiday break.
18:27:19 <daneyon_> Brendan Burns from kube said containerizing etcd, flannel and kube services were out of documentation and support convenience.
18:28:26 <Tango> Do they recommend this approach?
18:28:28 <daneyon_> ATM i think it's still best to run flannel and etcd on the host instead of a container. The 2 docker daemon solution overcomplicates things.
18:28:41 <hongbin> +1
18:28:59 <hongbin> It is not the common practice in CoreOS as well
18:29:06 <daneyon_> Tango they don;t recommend it or recommend against it. They are taking a Switzerland approach ;-)
18:29:26 <daneyon_> I think it's up to us to make the call
18:29:56 <Tango> ok, I guess the reason we were thinking doing this is to simplify things, so if it's more complicated, then maybe not
18:30:16 <daneyon_> However, if we don't containwrize flannel, we need to update the flannel pkg in our images so vxlan works again
18:30:50 <hongbin> daneyon_: I think Tango built a new image already?
18:30:57 <Tango> So I have been working on using diskimagebuilder to build new images
18:31:06 <daneyon_> oh, good
18:31:24 <daneyon_> any details you can share Tango on your DIB progress?
18:31:24 <Tango> I uploaded a new one:  fedora-21-7.qcow2
18:31:44 <Tango> This has k8s 1.1, docker 1.9.1, flannel 0.5.5
18:31:56 <daneyon_> is this still an atomic image or simply f21?
18:32:10 <Tango> The image is fedora only without atomic
18:32:13 <Tango> f21
18:32:17 <daneyon_> ok
18:32:40 <Tango> I skip atomic to make it easier to work with, you can run apt-get install
18:32:40 <daneyon_> how is it coming along making it an "official" image?
18:32:55 <daneyon_> yeah, atomic is a PITA
18:33:16 <dane_leblanc_> Yay for apt-get
18:33:25 <Tango> Since we move to k8s 1.1, we need more testing on the API
18:33:25 <daneyon_> i think we still need to support a micro os, but I would like that to be coreos and forget atomic
18:33:28 <hongbin> simply because the heat templates are developed from a guy from Redhat :)
18:33:37 <hongbin> at the very beginning
18:33:39 <daneyon_> not a pressing need, but something more long-term
18:34:02 <daneyon_> as long as we have at least 1 OS that is well supported by Magnum and is easy to use, maintaine, etc..
18:34:04 <hongbin> CoreOS is almost ready
18:34:21 <hongbin> I have patches that are under review
18:34:50 <hongbin> #link https://review.openstack.org/#/q/status:open+project:openstack/magnum+branch:master+topic:bp/coreos-k8s-bay
18:35:06 <daneyon_> hongbin right on. larsks did a great job, now we need to continue tailoring the templates to meet our needs.
18:35:21 <daneyon_> nice hongbin
18:35:23 <hongbin> daneyon_: agreed
18:35:40 <hongbin> I am all for moving away from Atomic
18:35:58 <Tango> Good to hear agreement on atomic
18:36:20 <daneyon_> Tango can you post a link to the f21 image so we have it recorded?
18:36:26 <eghobo> hongbin: +1 but I still think we need Ubuntu as well
18:36:32 <Tango> https://fedorapeople.org/groups/magnum/fedora-21-7.qcow2
18:36:46 <daneyon_> #link https://fedorapeople.org/groups/magnum/fedora-21-7.qcow2
18:36:55 <daneyon_> Tango thanks
18:37:14 <hongbin> eghobo: want a BP for ubuntu?
18:37:28 <daneyon_> eghobo I am all for supporting Ubuntu or any other add'l distro as long as it's well supported by the community
18:37:37 <daneyon_> the matrix of support can get out of hand
18:38:01 <eghobo> hongbin: I think BP is exists, Tango?
18:38:04 <Tango> I am setting up DIB to choose fedora, ubuntu, and I guess coreOS now that we are close to getting it working
18:38:05 <daneyon_> i would much rather have a solid solution on 1 distro than breakage on multiple distros
18:38:55 <dane_leblanc_> What image should be used as default for upstream gate testing? Whatever is smallest?
18:39:05 <Tango> +1, but we do have to show user how to create their own image
18:39:25 <hongbin> dane_leblanc_: The gate is using Atomic
18:39:56 <Tango> dane_leblanc_: I am also trying to get one of the mininal image to work, either fedora or ubunu
18:40:10 <Tango> this would solve the size problem
18:40:22 <daneyon_> yeah, i think f21 minimal is key to the gate
18:40:30 <daneyon_> otherwise it's a pretty big image
18:41:27 <daneyon_> #topic Future Meetings
18:41:46 <daneyon_> I know we discussed this topic briefly before the holiday break
18:42:27 <daneyon_> Unless the group disagrees, I think we can move our discussions back to the general magnum meetings.
18:42:51 <daneyon_> If the group would like to continue the sub team, I would like to ask someone to chair the meetings.
18:43:09 <daneyon_> thoughts?
18:43:36 <hongbin> daneyon_: you are not available to chair this meeting?
18:43:55 <daneyon_> hongbin I'm divided
18:44:08 <hongbin> np from me to move it back if you want
18:44:12 <dane_leblanc_> daneyon: I would agree with moving back to using just the container meeting. Doesn't seem to be too much network-specific stuff to discuss
18:44:30 <daneyon_> I am being pulled into a few different directions and I don;t feel i have the necessary time to lead the sub team
18:44:52 <Tango> Sounds reasonable, we can resume if there is not enough time in the general meeting
18:45:19 <daneyon_> OK
18:45:54 <daneyon_> The I will consider that an agreement and let Adrian know that we are moving our discussion back to the general meeting.
18:46:36 <daneyon_> I think these meetings have been helpful over the last 6 months.
18:46:52 <daneyon_> I appreciate everyone's involvement in magnum networking.
18:46:59 <Tango> Seems like we have an agreement, and Adrian can kick us back out if we take too much time in the general meeting
18:47:01 <daneyon_> we are headed in the right direction.
18:47:30 <daneyon_> If we can get the hostgw patch merged and add a few network-drivers, then I will be very :-)
18:48:14 <Tango> I do want to mention an observation that we may want to keep an eye on
18:48:28 <daneyon_> #agreed Move the subteam meeting back to the general magnum meeting #link https://wiki.openstack.org/wiki/Meetings/Containers#Weekly_Containers_Team_Meeting
18:48:41 <daneyon_> Tango go ahead
18:48:51 <Tango> There seems to be divergence between the Docker community and Kubernetes community with regard to networking
18:48:58 <daneyon_> agreed
18:49:06 <Tango> This would complicate things for Magnum
18:49:15 <daneyon_> one of the reasons why i chose --network-driver instead of --libnetwork-driver
18:49:28 <Tango> although it does give Magnum a chance to be agnostic and give user choices
18:49:56 <Tango> I not sure how things will shake out, but at the moment it's very confusing
18:50:01 <daneyon_> i feel that I was prepared for the goog<>docker war between networking.
18:51:02 <Tango> kuryr seems to lean toward docker libnetwork for now
18:51:06 <daneyon_> for example contiv is a container networking solution that has support for kube and docker libnetwork
18:51:19 <daneyon_> i believe calico too
18:52:25 <daneyon_> when using --network-driver=calico with a k8s bay type, the M templates will need to make sure the correct DIR's, bins, config files, etc.. are orchestrated
18:52:41 <daneyon_> the same when using --network-driver=calico with a swarm bay type
18:53:23 <Tango> anyway, a lot of activities on networking ahead
18:53:28 <daneyon_> in genreal, as we add drivers we will see a lot more heat templates or conditions in the jinja/heat conditional support templates
18:54:00 <daneyon_> we need to get conditional logic in heat templates or implement the jinja layer that was discussed at the last mid cycle
18:54:22 <daneyon_> I believe Rob Pothier is working with the Heat community to add conditional support
18:54:32 <Tango> daneyon_: Is there a BP yet on adding kuryr as a driver?
18:54:41 <daneyon_> IMO this will be huge for Magnum to support a minimal # of templates
18:54:48 <daneyon_> not that i know of Tango
18:55:01 <daneyon_> the last we discussed, one of 2 things needed to happen
18:55:10 <daneyon_> 1. Magnum add support for ironic hosts
18:55:23 <daneyon_> 2. kuryr add support for nested VM's
18:56:00 <daneyon_> as we discussed in the past, I would expect someone fomr the K team to take the lead on adding the K driver to M
18:56:13 <daneyon_> Same goes with any other driver that gets added.
18:56:33 <daneyon_> If support lags in the future for a particular driver, then it gets pulled
18:56:54 <daneyon_> and M only includes drivers that are actively maintained
18:57:11 <daneyon_> otherwise we go down the same path and pain of our distro support
18:57:18 <hongbin> sounds good
18:57:29 <daneyon_> we have just a few mins left
18:57:35 <daneyon_> #topic Open Discussion
18:57:44 <daneyon_> any open discssion?
18:57:47 <daneyon_> quick?
18:57:48 <daneyon_> :-)
18:58:35 <daneyon_> Otherwise I'll send a msg to Adrian and the mailer that the sub team is merging back to the general community.
18:59:04 <daneyon_> I'll take that as no open discussion
18:59:22 <daneyon_> thanks again for everyone's support!!!
18:59:33 <Tango> Thanks daneyon_ for hosting
18:59:34 <daneyon_> and Happy New Year
18:59:38 <daneyon_> yw
18:59:40 <dane_leblanc_> Happy New Year!
18:59:49 <daneyon_> take care everyone and have a great day.
18:59:56 <daneyon_> #endmeeting