17:14:56 <mmichelson> #startmeeting ovn-community-development-discussion
17:14:57 <openstack> Meeting started Thu May 14 17:14:56 2020 UTC and is due to finish in 60 minutes.  The chair is mmichelson. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:14:58 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:15:00 <openstack> The meeting name has been set to 'ovn_community_development_discussion'
17:15:34 <mmichelson> Just as a reminder, we're coming up on hard freeze. Thanks to everyone for mentioning which patches they want included in 20.06
17:15:46 <mmichelson> I'm planning to spend time today and tomorrow to help with the review effort on those patches
17:16:02 <mmichelson> I plan to create the 20.06 branch on Monday
17:16:59 <mmichelson> As far as my activity goes, After doing the GROUP_MOD message split patch, I took on some of the smaller issues that have been reported as of late. Thanks to everyone who has helped review those patches
17:17:08 <mmichelson> And that's all from me.
17:17:17 <blp> I have a quick update.
17:17:41 <blp> I rebased the OVN DDlog code against master earlier this week. All the tests pass.
17:17:49 <blp> ryzhyk has been working on performance.
17:18:22 <blp> That's all I have for the moment, unless there are questions.
17:18:39 <ryzhyk> Yes, I am making progress on performance, but we need scale tests.
17:18:48 <mmichelson> blp, ryzhyk awesome!
17:19:26 <ryzhyk> So far I've been using Han's old scale test log, but it is getting increasingly irrelevant with all the changes to OVN since it was created.
17:19:41 <ryzhyk> (that's it from me)
17:20:18 <numans> ryzhyk, If you want explore this one - https://github.com/dceara/ovn-heater
17:20:18 <numans> for the scale testing.
17:20:52 <ryzhyk> numans: thanks!
17:21:02 <numans> But we are planning to run a scale test with the ddlog changes. We need to figure out a way to compile ovn-northd-ddlog in the container images. Once that is done, it should be straightforward.
17:21:39 <numans> I can go real quick.
17:21:44 <blp> Yeah, we'll do some preliminary work and then we can figure out how to do that.
17:22:13 <numans> Ok. sounds good.
17:22:37 <numans> I did some reviews this week. And I spent much of the time refactoring/reworking on patch 1 and 2 of my I-P patch series
17:22:46 <numans> Thanks to dceara for the reviews.
17:23:02 <numans> I'll continue to do that and planning to submit v6 by tomorrow.
17:23:13 <numans> I appreciate more reviewers joining in :).
17:23:19 <numans> That's it from me.
17:24:14 <dceara> hi all
17:25:09 <dceara> numans, ryzhyk I can hack ovn-heater to compile ovn-northd-ddlog for the scale test container images. I just need to know what branches to use
17:25:33 <zhouhan> numans: sorry that I didn't get time to review your I-P patches last week. I will resume this week.
17:25:51 <numans> zhouhan, thanks.
17:26:04 <blp> dceara: We're in a little bit of a transition at the moment, we'll get back to you on that.
17:26:11 <dceara> blp, sure
17:27:38 <zhouhan> can someone pin the link to the meeting logs?
17:28:26 <mmichelson> zhouhan, I don't understand what you mean by "pin"
17:28:35 <flaviof> zhouhan: http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2020/
17:28:46 <numans> flaviof, you forgot to use the #link :)
17:28:51 <flaviof> LOL
17:29:10 <zhouhan> mmichelson: I meant, pin in this IRC channel, like the "FAQ: http://docs.openvswitch.org/en/latest/faq/"
17:29:23 <flaviof> there is a handy link to that dir in ovn.org too
17:29:24 <zhouhan> thanks flaviof
17:29:29 <mmichelson> ah ok
17:30:14 <zhouhan> May I go next?
17:30:33 <mmichelson> #topic Open vSwitch, a Linux Foundation Collaborative Project || FAQ: http://docs.openvswitch.org/en/latest/faq/ || Hyper-V meeting Tues 10:00 Pacific || OVN meeting Thurs 10:15 am US Pacific || Use ovs-discuss@openvswitch.org for questions if you don't get an answer here. || OVN weekly meeting logs can be found at: http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/
17:30:50 <mmichelson> oh crap it's /topic isn't it
17:31:00 <mmichelson> And I don't have permission
17:31:02 <mmichelson> zhouhan, go ahead
17:31:06 <zhouhan> Firstly I have a question regarding the OVS FAQ on the compatibility
17:31:23 <blp> I can change the topic.
17:31:29 <zhouhan> 2.11.x	3.10 to 4.18
17:31:29 <zhouhan> 2.12.x	3.10 to 5.0
17:31:29 <zhouhan> 2.14.x	3.10 to 5.5
17:31:41 <zhouhan> It didn't mention 2.13, why is that?
17:31:56 <zhouhan> blp: do you know?
17:32:10 <flaviof> blp++
17:32:25 <blp> zhouhan: Probably just overlooked.
17:33:10 <zhouhan> We tried 2.13 compiling with 5.4, it has a message: configure: error: Linux kernel in /lib/modules/5.4.0-31.generic.x86_64/build is version 5.4.0, but version newer than 5.0.x is not supported (please refer to the FAQ for advice)
17:33:27 <zhouhan> So it seems not an overlook, but on purpose ...
17:33:58 <zhouhan> It's confusing though
17:34:15 <zhouhan> I continued debugging the problem of: deferred action limit reached, drop record action
17:34:49 <blp> zhouhan: It looks like 2.13.x supports the same versions as 2.12.x.
17:35:06 <zhouhan> ok, thanks blp
17:35:45 <zhouhan> I think I made some progress on the endless recirc problem. The issue was that there is slowpath required for the actions while there is also a group action which requires dp_hash + recirc
17:36:21 <blp> OK, I sent a patch to update the FAQ.
17:36:39 <blp> zhouhan: Oh that's a little awkward. Do you have a lead on a fix?
17:36:43 <zhouhan> Whenever this combination comes, it tries to execute dp_hash in userspace, and only do recirc in kernel
17:37:45 <zhouhan> However, the usespace hash generated is not carried for injecting the packet to datapath, so after recirc back, the upcall doesn't have dp_hash value, again.
17:38:13 <blp> Oh. Have you figured out why we don't pass the dp_hash back?
17:38:14 <zhouhan> So when it hits the same group action, it generates recirc and dp_hash actions again
17:38:17 <blp> Is that the easy fix?
17:38:59 <zhouhan> At this time, the recirc_id generated is the same as the older one because all the metadata in state is the same, which caused the loop
17:39:19 <zhouhan> blp: I haven't got time on the fix yet.
17:39:28 <zhouhan> blp: I think there are two options
17:39:30 <blp> Userspace does know how to put dp_hash into a flow, see odp_flow_key_from_flow__().
17:39:49 <zhouhan> thanks for the pointer!
17:40:06 <blp> It will do so if it detects recirc support in the datapath. Perhaps it's not being detected properly?
17:40:24 <blp> I think we log whether that feature is detected as supported. Check the log, if that's the problem then we should fix the detection logic.
17:40:24 <zhouhan> I think this is one option. The other option is always do dp_hash in datapath instead of "trying to help" in userspace
17:40:49 <blp> I think there is some reason why we do that, although I don't recall what it was.
17:41:32 <zhouhan> I wonder if dp_hash anyway requires recirc in datapath, why would it help by doing dp_hash in userspace
17:41:54 * numans says bye and disappears.
17:41:57 <blp> What's the reason that it gets slow-pathed to begin with?
17:41:58 <zhouhan> The other question is why is there slowpath required in the first place
17:42:02 * blp waves at numans
17:42:10 <zhouhan> blp: exactly
17:42:22 <blp> "trace" should explain the reason for slow-pathing.
17:43:06 <zhouhan> blp: the scenario is ping LRP's IP. The LR is replying ICMP by simply setting fields, and I can't tell why is slowpath needed
17:43:35 <blp> Some fields aren't supported for setting in the datapath, that could be the reason.
17:44:02 <blp> But ofproto/trace should say. For example, if it's because of fields that the datapath can't set, it should say something about unsupported actions.
17:44:15 <imaximets> blp, zhouhan: execute.hash that passed back to datapath could only be set from the upcall->hash, i.e. the hash that received from the datapath during upcall.  Userspace never passes dp_hash that calculated by userspace to datapath, it only returns same hash that was calculated by datapath itself before upcall.
17:44:32 <zhouhan> blp: ofproto/trace only tells slowpath needed, but didn't tell which action requires that.
17:45:17 <blp> OK, it should be possible to figure it out. Do you have the trace handy?
17:45:23 <zhouhan> imaximets: is it possible to pass it back (i.e. is it a small fix?)
17:45:58 <zhouhan> Final flow: icmp,reg11=0xe,reg12=0x18,reg14=0xa,reg15=0x11,tun_id=0xa0011ff0002,tun_src=10.172.66.12,tun_dst=10.78.211.43,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_erspan_ver=0,tun_flags=csum|key,metadata=0xff0002,in_port=457,vlan_tci=0x0000,dl_src=aa:aa:bb:00:01:02,dl_dst=aa:aa:bb:00:03:01,nw_src=10.227.183.232,nw_dst=10.9.0.1,nw_tos=0,nw_ecn=0,nw_ttl=53,icmp_type=8,icmp_code=0
17:45:58 <zhouhan> Megaflow: recirc_id=0,eth,icmp,tun_id=0xa0011ff0002,tun_src=10.172.66.12,tun_dst=10.78.211.43,tun_tos=0,tun_flags=-df+csum+key,in_port=457,vlan_tci=0x0000/0x1000,dl_src=aa:aa:bb:00:01:02,dl_dst=aa:aa:bb:00:03:01,nw_src=10.227.183.232,nw_dst=10.9.0.1,nw_ttl=53,nw_frag=no,icmp_type=0x8/0xff,icmp_code=0x0/0xff
17:45:58 <zhouhan> Datapath actions: ct_clear,ct_clear,ct_clear,set(eth(src=aa:aa:aa:00:01:01,dst=aa:aa:aa:00:00:01)),set(ipv4(src=10.9.0.1,dst=10.227.183.232,ttl=254)),set(icmp(type=0,code=0)),hash(l4(0)),recirc(0x3)
17:46:00 <zhouhan> This flow is handled by the userspace slow path because it:
17:46:02 <zhouhan> - Uses action(s) not supported by datapath.
17:46:08 <blp> It's not a good idea to pass a userspace-caluclated hash back to the kernel because the kernel would calculate a different value.
17:46:11 <zhouhan> blp: This is the last part of trace
17:46:26 <imaximets> zhouhan, I'm not sure.  It might be possible to check if we have no upcall->hash, but have dp_hash and pass dp_hash to execute.hash instead, but I'm not sure.
17:46:59 <zhouhan> blp: imaximets: Then do you think it is a better idea to always do dp_hash in datapath?
17:47:31 <imaximets> zhouhan, I think, yes, it's better than 'datapath hash' is calculated by datapath.
17:47:47 <zhouhan> (sorry this might have taken too long, in case someone else want to update)
17:48:07 <blp> It's the set actions for ICMP that are doing it. The datapath doesn't know how to change ICMP.
17:48:13 <imaximets> s/than/when/
17:49:02 <zhouhan> blp: is it the ICMP fields setting require slowpath?
17:49:08 <blp> zhouhan: yes
17:49:14 <zhouhan> blp: ok, thanks!
17:49:44 <zhouhan> blp: it would be better if trace can just point this out :)
17:49:56 <blp> zhouhan: yes
17:50:22 <zhouhan> other than this, I was involved in some discussions and also trying to fix some bugs in ovn.
17:51:25 <zhouhan> One of the discussion was about ARP flows exploding in LRs. I think I can work out the configurably disable static ARP resolve in LR, which would solve the issue for ovn-k8s.
17:51:38 <zhouhan> That's my update :)
17:52:00 <_lore_> zhouhan: regarding gateway flow issue, IIRC this chunks was to distribute non DVR traffic
17:52:43 <zhouhan> _lore_: I got your point, but why was different prirority needed?
17:53:13 <_lore_> I need to get back to it since I can't recall the details now
17:53:26 <zhouhan> _lore_: ok, thanks!
17:53:44 <zhouhan> _lore_: It seems an optimization, right?
17:53:49 <_lore_> nope
17:54:14 <_lore_> let's say you have FIP 192.168.1.1
17:54:33 <_lore_> this is to distribute traffic for 192.168.1.0/24 IIRC
17:54:47 <_lore_> but I will check
17:54:49 <zhouhan> _lore_: ok, I was thinking we could revert it, if it is an optimization and if we couldn't solve the route priority problem before the release
17:55:11 <_lore_> what is the issue you are facing?
17:55:47 <zhouhan> _lore_: If you see my example, the /16 route is overriding the /24 route, which is wrong
17:56:00 <_lore_> ack
17:56:19 <_lore_> I will look into it
17:56:26 <zhouhan> thanks _lore_
17:56:33 <_lore_> yw :)
17:57:55 <_lore_> zhouhan: do you have a unit-test for it?
17:59:18 <zhouhan> _lore_: here is what I did in a sandbox:
17:59:23 <zhouhan> 989  ovn-nbctl lr-add lr1
17:59:23 <zhouhan> 990  ovn-nbctl lrp-add lr1 lrp1 aa:aa:aa:aa:aa:01 192.168.0.1/24
17:59:23 <zhouhan> 991  ovn-nbctl lrp-add lr1 lrp2 aa:aa:aa:aa:aa:02 192.168.100.1/24
17:59:25 <zhouhan> 992  ovn-nbctl lr-route-add lr1 10.0.0.0/24 192.168.0.2
17:59:27 <zhouhan> 993  ovn-nbctl lr-route-add lr1 10.0.0.0/16 192.168.100.2
17:59:29 <zhouhan> 994  ovn-sbctl lflow-list
17:59:31 <zhouhan> 995  ovn-nbctl --help | grep gateway
17:59:33 <zhouhan> 996  ovn-sbctl show
17:59:35 <zhouhan> 997  ovn-nbctl lrp-set-gateway-chassis lrp1 chassis-1
17:59:37 <zhouhan> 998  ovn-sbctl lflow-list
17:59:46 <_lore_> ok, thx
18:00:46 <_lore_> I think we should add some unitest for it
18:02:13 <blp> _lore_: please!
18:02:40 <_lore_> will do :)
18:05:06 <mmichelson> Anyone else want to take a turn at the mic?
18:06:02 <flaviof> bye all!
18:09:29 <blp> bye!
18:10:03 <dceara> bye!
18:11:20 <mmichelson> #endmeeting