17:23:12 <flaviof> #startmeeting ovn-community-development-discussion
17:23:13 <openstack> Meeting started Thu May 21 17:23:12 2020 UTC and is due to finish in 60 minutes.  The chair is flaviof. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:23:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:23:16 <openstack> The meeting name has been set to 'ovn_community_development_discussion'
17:23:27 <flaviof> well then, I know that much. ;)
17:23:33 <zhouhan> thx flaviof
17:23:39 <flaviof> anyone want to go first?
17:23:47 <zhouhan> I can go first
17:24:51 <zhouhan> I sent a fix for the dp_hash issue. imaximets: could you take a look: #link https://patchwork.ozlabs.org/project/openvswitch/patch/1589527067-91901-1-git-send-email-hzhou@ovn.org/
17:25:14 <zhouhan> I did some reviews, most for numans's I-P
17:25:40 <zhouhan> I had a question on _lore_'s patch for the SRC_IP_POLICY
17:25:57 <_lore_> zhouhan: sure
17:26:38 <zhouhan> by GW router, do you mean the non-distributed GW router, or a distributed router with a distributed gateway port?
17:26:57 <_lore_> 'distributed router with a distributed gateway port'
17:27:23 <_lore_> gw router port on a given chassis
17:27:39 <zhouhan> _lore_: ok, then the ARP is sent from which component to that router?
17:27:44 <_lore_> yes
17:27:52 <_lore_> for non-FIP case
17:28:04 <_lore_> this is what we want to avoid
17:28:36 <_lore_> for this reason we need to chage reg1/eth.src after table=9,10,11
17:28:49 <zhouhan> Sorry, my question was: from which component is the ARP sent to the distributed router with DGP?
17:29:04 <_lore_> zhouhan: I did not get you
17:29:18 <zhouhan> _lore_: I am still not clear about the scenario, before going to the solution.
17:29:39 <zhouhan> _lore_: Let's discuss in the email offline :)
17:29:44 <zhouhan> That's my update
17:30:15 <_lore_> zhouhan: sure, but the scenario is this one:
17:30:45 <_lore_> the chassis where we have the FIP has a direct connection to the underlay network using a localnet port
17:31:03 <_lore_> so we want to send the ARP out to that port
17:31:16 <_lore_> agree?
17:32:06 <zhouhan> well, depends on the logical topology. I am not sure what's the source and destination, and the logical components connecting them.
17:32:27 <_lore_> local chassis has a direct connection to the ToR
17:32:32 <_lore_> switch
17:33:29 <_lore_> OpenStack want the possibility to send traffic directly avoid going through the tunnel since the chassis has a direct connection to the external world
17:33:50 <zhouhan> _lore_: yes, if there is just a single localnet logical switch connecting a VIF and the TOR, then it should work, without worrying about logical routers and distributed gateway ports. But I guess your scenario is more complex than that.
17:34:40 <_lore_> on the local chassis you mean?
17:34:49 <zhouhan> _lore_: I wasn't sure if it is for the typical k8s scenario or openstack. If it is for openstack, maybe I have some clue now.
17:35:18 <_lore_> I think k8s does not use FIP so far, just OpenStack
17:35:29 <_lore_> I guess
17:35:30 <_lore_> not sure
17:36:47 <zhouhan> _lore_: let's see if some else wants to report. After that we can continue. Or discuss offline.
17:36:47 <_lore_> the goal is: if you have a FIP associated to a given logical switch port you want to send traffic directly and not going through the tunnel to the gw router
17:36:53 <_lore_> ack
17:37:07 * _lore_ is on mute
17:37:34 <zhouhan> so, anyone else?
17:38:00 <_lore_> it seems not :)
17:38:10 <zhouhan> ok, let's continue
17:38:11 <panda> I can go last
17:38:21 <zhouhan> ok, panda please go ahead
17:38:34 <panda> zhouhan: thanks.
17:38:45 <panda> mine is not an update, but a presentation, I'd like to start contributing to the project. I already studied the architecture and I'm now studying the code
17:39:02 <panda> I plan to propose a patch on the documentation with my the list of task that helped me to start. But I'll have some questions for the mailing list.
17:39:23 <zhouhan> panda: welcome!
17:39:24 <panda> In the meantime I'm looking for low hanging fruit bugs or tasks to give a direction to my studies. If you have anything to propose I'd like to hear
17:39:43 <panda> _lore_ already helped me to bootstrap, and I have a long term tasks from him.
17:39:56 <panda> zhouhan: thanks :)
17:40:10 <flaviof> welcome panda!
17:40:18 <panda> flaviof: thanks!
17:40:25 <flaviof> I can go in next, really quick.
17:40:44 <flaviof> I have not been doing a lot on core OVN, but have been implementing a cool functionality in Openstack that is based on OVN.
17:41:03 <flaviof> It is called port forwarding. For folks who don't know, it uses OVN load balancers to carve out a single FIP into multiple internal VM based on proto+port.
17:41:18 <flaviof> Have POC running great and now moving onto integration tests.
17:41:18 <flaviof> #link https://review.opendev.org/#/c/723863/23/doc/source/ovn/port_forwarding.rst Port forwarding functionality from ML2/OVN
17:41:32 <flaviof> If any of you have interest on that or any other OVN related integration matters with Openstack, please do not be shy to say hi!
17:41:39 <flaviof> Including you, panda ! ;)
17:41:49 <flaviof> That is all from me.
17:42:05 <zhouhan> flaviof: very cool!
17:42:32 <flaviof> zhouhan: thanks. It mostly works because of people like you, so thanks to _you_!
17:43:21 <panda> flaviof: interesting :)
17:44:20 <flaviof> anyone got something he/she want to say here?
17:45:48 <_lore_> zhouhan: do you think we can proceed?
17:45:57 <zhouhan> _lore_: sure
17:46:01 <_lore_> :)
17:46:03 * panda will reserve the questions for the mailing list.
17:46:35 <zhouhan> _lore_: firstly, how does the normal IP traffic work?
17:46:54 <_lore_> non-FIP?
17:47:18 <zhouhan> _lore_: in the email you said IP traffic works as expect but just ARP doesn;t work
17:47:20 <_lore_> e.g. for external world? going through the gw router
17:47:27 <_lore_> ah ok
17:47:48 <_lore_> normal FIP traffic is going through the localnet port
17:48:06 <_lore_> like IP using FIP as src IP
17:48:36 <zhouhan> logically, it is going through the LR, and SNAT is done by the LR, right?
17:48:52 <_lore_> yes, locally
17:49:07 <_lore_> s/locally/logically
17:49:45 <zhouhan> when the VM send the packets to external, the nexthop is the LR, and LR's next hop is the external GW (on the TOR)
17:50:00 <_lore_> yes
17:50:49 <zhouhan> now the ARP is for the LR's IP, why should it be sent out through localnet?
17:51:06 <zhouhan> or do you mean the ARP from LR to the TOR's IP?
17:51:56 <_lore_> nope, the ARP has src IP the FIP
17:52:04 <_lore_> not the LR external IP
17:52:29 <zhouhan> So you mean the ARP from LR to the TOR, right?
17:53:40 <_lore_> yes
17:54:03 <_lore_> let' say your VM is pinging 1.1.1.1
17:55:08 <_lore_> the external network from logical router to the external network is 172.16.0.0/24 and you have associated the FIP 172.16.0.100 to the VM
17:56:06 <_lore_> you want system sends an ARP req to the gw of the network using 172.16.0.100 as src IP and dnat_snat external mac as src mac
17:56:41 <_lore_> sending the ARP using the localnet port on the chassis
17:57:18 <_lore_> the scenario is a little bit tricky, I agree :)
17:57:32 <zhouhan> In the logical pipeline, the IP packet from VM should first hit the LR, which then triggers the ARP to the external GW IP. Now which packet is observed on the tunnel?
17:57:35 <flaviof> _lore_: if you don't mind also add the next hop (TOR's) mac address in your example.
17:57:47 <_lore_> flaviof: sure
17:58:04 <_lore_> zhouhan: this is the point
17:58:09 <_lore_> no packet on the tunnel
17:58:30 <_lore_> the local logical router pipeline magaes the arp
17:58:51 <_lore_> whitout sending the packet to gw router
17:59:15 <_lore_> flaviof: let's the next hop is 172.16.0.254
17:59:24 <zhouhan> _lore_: but you said the problem is some packets were seen on the tunnel, and the patch is to avoid that, right? My question is, which packet was on the tunnel? The IP packet? Or just ARP packets?
17:59:42 <_lore_> zhouhan: before the commit that introduce the issue you reported
18:00:07 <_lore_> with the patch I send this week no packets are sent to the tunnel
18:00:16 <zhouhan> _lore_: yes I am talking about the original patch, not the later one.
18:00:34 <_lore_> zhouhan: even with the origianl one no packets are sent to the tunnel
18:00:42 <zhouhan> _lore_: still trying to understand the original problem :)
18:00:57 <_lore_> you are right, I have not been so clear :)
18:01:05 <_lore_> the orignal case is:
18:01:48 <_lore_> in the scenario I described before the packet for 172.16.0.254 is sent to the gw router and the gw router is sending the ARP
18:01:51 <_lore_> right?
18:03:03 <_lore_> then when the ARP reply arrives the packets start flowing
18:03:47 <_lore_> this is the behaviour before the offending commit
18:04:43 <zhouhan> ok, do you mean when ARP reply arrives to the GW node, the IP packets start being sent through local chassis directly?
18:05:21 <_lore_> correct
18:05:47 <_lore_> this is the original FIP behaviour
18:06:19 <_lore_> with the offending commit or the last patch the ARP is sent by the local node and not by the GW
18:06:19 <zhouhan> So before the ARP is sent, which packet is sent through the tunnel to the GW node? The IP packet or the ARP packet?
18:06:47 <_lore_> the first IP packet that triggers the ARP
18:07:01 <zhouhan> ok, that's clear now. Thanks
18:07:01 <_lore_> just this packet
18:07:25 <zhouhan> And all these nodes are on same L2 (e.g. under the TOR), right?
18:07:30 <_lore_> yes
18:07:47 <_lore_> the issue that when the ARP arrives this first IP packet is re-inhected but on the GW
18:08:20 <_lore_> while the second is sent by the local device so the ToR is confused
18:08:30 <_lore_> are we on the same page now?
18:08:43 <zhouhan> Yes, I think so.
18:08:46 <_lore_> ok, cool
18:08:48 <flaviof> +1 ;)
18:08:57 <_lore_> sorry to be not so clear
18:08:58 <zhouhan> So the actual problem is the reinjection that confuses TOR
18:09:00 <_lore_> anyway
18:09:09 <_lore_> yes
18:09:46 <zhouhan> If there is no reinjection (sacrifice the first packet), then there is no real problem, but can be optimized to avoid the tunnel for the first packet.
18:10:04 <_lore_> yes, I think so
18:10:10 <_lore_> but I am not 100% sure
18:10:33 <zhouhan> I see. Let me revisit your patch. Thanks for the explain!
18:10:57 <_lore_> sure, thank to you for be so patient :)
18:11:26 <_lore_> another possible solution could be add nat info to port_binding table
18:11:42 <_lore_> but the issue is we have no access to the db in pinctrl thread
18:12:53 <_lore_> so we came up adding a new stable to logical router pipeline in order to overwrite reg1/eth.src just for FIP
18:13:19 <_lore_> doing so we can manage even ARP and first IP packet locally
18:14:25 <_lore_> last this week I added the possibility to attach strace or perf to ovn-scale-test
18:14:25 <zhouhan> "doing so" you mean the current patch, right?
18:14:32 <_lore_> zhouhan: yes
18:15:06 <zhouhan> _lore_: cool. I will take look. Thanks!
18:15:16 <_lore_> zhouhan: basically in the last patch I reverted offending commit and added this new stage
18:16:08 <_lore_> I think now we are all the same page :)
18:16:52 <zhouhan> _lore_: yes, I think so.
18:17:07 <zhouhan> flaviof: are we still in the meeting?
18:17:21 <flaviof> yes. but we can end if you think we should
18:17:31 <flaviof> _lore_: that is the table called S_ROUTER_IN_IP_SRC_POLICY, right?
18:17:41 <_lore_> flaviof: right
18:17:55 <_lore_> maybe the name is not the best one :)
18:18:12 <flaviof> ack. just wanted to mentioning it here to have a quick way to search for it. This discussion is an integral part of it. ;)
18:18:23 <flaviof> good discussion. Thank you both for doing it here. Anything else to talk about or shall we call it a meeting?
18:19:43 <zhouhan> flaviof: I think maybe we are done
18:19:49 <_lore_> I guess si
18:19:51 <_lore_> *so
18:19:59 <flaviof> yeah. si ! ;)
18:20:04 <zhouhan> bye everyone :)
18:20:08 <flaviof> bye all
18:20:10 <_lore_> si == yes in Italian :)
18:20:16 <flaviof> <3
18:20:19 <flaviof> #endmeeting