17:14:56 #startmeeting ovn-community-development-discussion 17:14:57 Meeting started Thu May 14 17:14:56 2020 UTC and is due to finish in 60 minutes. The chair is mmichelson. Information about MeetBot at http://wiki.debian.org/MeetBot. 17:14:58 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 17:15:00 The meeting name has been set to 'ovn_community_development_discussion' 17:15:34 Just as a reminder, we're coming up on hard freeze. Thanks to everyone for mentioning which patches they want included in 20.06 17:15:46 I'm planning to spend time today and tomorrow to help with the review effort on those patches 17:16:02 I plan to create the 20.06 branch on Monday 17:16:59 As far as my activity goes, After doing the GROUP_MOD message split patch, I took on some of the smaller issues that have been reported as of late. Thanks to everyone who has helped review those patches 17:17:08 And that's all from me. 17:17:17 I have a quick update. 17:17:41 I rebased the OVN DDlog code against master earlier this week. All the tests pass. 17:17:49 ryzhyk has been working on performance. 17:18:22 That's all I have for the moment, unless there are questions. 17:18:39 Yes, I am making progress on performance, but we need scale tests. 17:18:48 blp, ryzhyk awesome! 17:19:26 So far I've been using Han's old scale test log, but it is getting increasingly irrelevant with all the changes to OVN since it was created. 17:19:41 (that's it from me) 17:20:18 ryzhyk, If you want explore this one - https://github.com/dceara/ovn-heater 17:20:18 for the scale testing. 17:20:52 numans: thanks! 17:21:02 But we are planning to run a scale test with the ddlog changes. We need to figure out a way to compile ovn-northd-ddlog in the container images. Once that is done, it should be straightforward. 17:21:39 I can go real quick. 17:21:44 Yeah, we'll do some preliminary work and then we can figure out how to do that. 17:22:13 Ok. sounds good. 17:22:37 I did some reviews this week. And I spent much of the time refactoring/reworking on patch 1 and 2 of my I-P patch series 17:22:46 Thanks to dceara for the reviews. 17:23:02 I'll continue to do that and planning to submit v6 by tomorrow. 17:23:13 I appreciate more reviewers joining in :). 17:23:19 That's it from me. 17:24:14 hi all 17:25:09 numans, ryzhyk I can hack ovn-heater to compile ovn-northd-ddlog for the scale test container images. I just need to know what branches to use 17:25:33 numans: sorry that I didn't get time to review your I-P patches last week. I will resume this week. 17:25:51 zhouhan, thanks. 17:26:04 dceara: We're in a little bit of a transition at the moment, we'll get back to you on that. 17:26:11 blp, sure 17:27:38 can someone pin the link to the meeting logs? 17:28:26 zhouhan, I don't understand what you mean by "pin" 17:28:35 zhouhan: http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2020/ 17:28:46 flaviof, you forgot to use the #link :) 17:28:51 LOL 17:29:10 mmichelson: I meant, pin in this IRC channel, like the "FAQ: http://docs.openvswitch.org/en/latest/faq/" 17:29:23 there is a handy link to that dir in ovn.org too 17:29:24 thanks flaviof 17:29:29 ah ok 17:30:14 May I go next? 17:30:33 #topic Open vSwitch, a Linux Foundation Collaborative Project || FAQ: http://docs.openvswitch.org/en/latest/faq/ || Hyper-V meeting Tues 10:00 Pacific || OVN meeting Thurs 10:15 am US Pacific || Use ovs-discuss@openvswitch.org for questions if you don't get an answer here. || OVN weekly meeting logs can be found at: http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/ 17:30:50 oh crap it's /topic isn't it 17:31:00 And I don't have permission 17:31:02 zhouhan, go ahead 17:31:06 Firstly I have a question regarding the OVS FAQ on the compatibility 17:31:23 I can change the topic. 17:31:29 2.11.x 3.10 to 4.18 17:31:29 2.12.x 3.10 to 5.0 17:31:29 2.14.x 3.10 to 5.5 17:31:41 It didn't mention 2.13, why is that? 17:31:56 blp: do you know? 17:32:10 blp++ 17:32:25 zhouhan: Probably just overlooked. 17:33:10 We tried 2.13 compiling with 5.4, it has a message: configure: error: Linux kernel in /lib/modules/5.4.0-31.generic.x86_64/build is version 5.4.0, but version newer than 5.0.x is not supported (please refer to the FAQ for advice) 17:33:27 So it seems not an overlook, but on purpose ... 17:33:58 It's confusing though 17:34:15 I continued debugging the problem of: deferred action limit reached, drop record action 17:34:49 zhouhan: It looks like 2.13.x supports the same versions as 2.12.x. 17:35:06 ok, thanks blp 17:35:45 I think I made some progress on the endless recirc problem. The issue was that there is slowpath required for the actions while there is also a group action which requires dp_hash + recirc 17:36:21 OK, I sent a patch to update the FAQ. 17:36:39 zhouhan: Oh that's a little awkward. Do you have a lead on a fix? 17:36:43 Whenever this combination comes, it tries to execute dp_hash in userspace, and only do recirc in kernel 17:37:45 However, the usespace hash generated is not carried for injecting the packet to datapath, so after recirc back, the upcall doesn't have dp_hash value, again. 17:38:13 Oh. Have you figured out why we don't pass the dp_hash back? 17:38:14 So when it hits the same group action, it generates recirc and dp_hash actions again 17:38:17 Is that the easy fix? 17:38:59 At this time, the recirc_id generated is the same as the older one because all the metadata in state is the same, which caused the loop 17:39:19 blp: I haven't got time on the fix yet. 17:39:28 blp: I think there are two options 17:39:30 Userspace does know how to put dp_hash into a flow, see odp_flow_key_from_flow__(). 17:39:49 thanks for the pointer! 17:40:06 It will do so if it detects recirc support in the datapath. Perhaps it's not being detected properly? 17:40:24 I think we log whether that feature is detected as supported. Check the log, if that's the problem then we should fix the detection logic. 17:40:24 I think this is one option. The other option is always do dp_hash in datapath instead of "trying to help" in userspace 17:40:49 I think there is some reason why we do that, although I don't recall what it was. 17:41:32 I wonder if dp_hash anyway requires recirc in datapath, why would it help by doing dp_hash in userspace 17:41:54 * numans says bye and disappears. 17:41:57 What's the reason that it gets slow-pathed to begin with? 17:41:58 The other question is why is there slowpath required in the first place 17:42:02 * blp waves at numans 17:42:10 blp: exactly 17:42:22 "trace" should explain the reason for slow-pathing. 17:43:06 blp: the scenario is ping LRP's IP. The LR is replying ICMP by simply setting fields, and I can't tell why is slowpath needed 17:43:35 Some fields aren't supported for setting in the datapath, that could be the reason. 17:44:02 But ofproto/trace should say. For example, if it's because of fields that the datapath can't set, it should say something about unsupported actions. 17:44:15 blp, zhouhan: execute.hash that passed back to datapath could only be set from the upcall->hash, i.e. the hash that received from the datapath during upcall. Userspace never passes dp_hash that calculated by userspace to datapath, it only returns same hash that was calculated by datapath itself before upcall. 17:44:32 blp: ofproto/trace only tells slowpath needed, but didn't tell which action requires that. 17:45:17 OK, it should be possible to figure it out. Do you have the trace handy? 17:45:23 imaximets: is it possible to pass it back (i.e. is it a small fix?) 17:45:58 Final flow: icmp,reg11=0xe,reg12=0x18,reg14=0xa,reg15=0x11,tun_id=0xa0011ff0002,tun_src=10.172.66.12,tun_dst=10.78.211.43,tun_ipv6_src=::,tun_ipv6_dst=::,tun_gbp_id=0,tun_gbp_flags=0,tun_tos=0,tun_ttl=0,tun_erspan_ver=0,tun_flags=csum|key,metadata=0xff0002,in_port=457,vlan_tci=0x0000,dl_src=aa:aa:bb:00:01:02,dl_dst=aa:aa:bb:00:03:01,nw_src=10.227.183.232,nw_dst=10.9.0.1,nw_tos=0,nw_ecn=0,nw_ttl=53,icmp_type=8,icmp_code=0 17:45:58 Megaflow: recirc_id=0,eth,icmp,tun_id=0xa0011ff0002,tun_src=10.172.66.12,tun_dst=10.78.211.43,tun_tos=0,tun_flags=-df+csum+key,in_port=457,vlan_tci=0x0000/0x1000,dl_src=aa:aa:bb:00:01:02,dl_dst=aa:aa:bb:00:03:01,nw_src=10.227.183.232,nw_dst=10.9.0.1,nw_ttl=53,nw_frag=no,icmp_type=0x8/0xff,icmp_code=0x0/0xff 17:45:58 Datapath actions: ct_clear,ct_clear,ct_clear,set(eth(src=aa:aa:aa:00:01:01,dst=aa:aa:aa:00:00:01)),set(ipv4(src=10.9.0.1,dst=10.227.183.232,ttl=254)),set(icmp(type=0,code=0)),hash(l4(0)),recirc(0x3) 17:46:00 This flow is handled by the userspace slow path because it: 17:46:02 - Uses action(s) not supported by datapath. 17:46:08 It's not a good idea to pass a userspace-caluclated hash back to the kernel because the kernel would calculate a different value. 17:46:11 blp: This is the last part of trace 17:46:26 zhouhan, I'm not sure. It might be possible to check if we have no upcall->hash, but have dp_hash and pass dp_hash to execute.hash instead, but I'm not sure. 17:46:59 blp: imaximets: Then do you think it is a better idea to always do dp_hash in datapath? 17:47:31 zhouhan, I think, yes, it's better than 'datapath hash' is calculated by datapath. 17:47:47 (sorry this might have taken too long, in case someone else want to update) 17:48:07 It's the set actions for ICMP that are doing it. The datapath doesn't know how to change ICMP. 17:48:13 s/than/when/ 17:49:02 blp: is it the ICMP fields setting require slowpath? 17:49:08 zhouhan: yes 17:49:14 blp: ok, thanks! 17:49:44 blp: it would be better if trace can just point this out :) 17:49:56 zhouhan: yes 17:50:22 other than this, I was involved in some discussions and also trying to fix some bugs in ovn. 17:51:25 One of the discussion was about ARP flows exploding in LRs. I think I can work out the configurably disable static ARP resolve in LR, which would solve the issue for ovn-k8s. 17:51:38 That's my update :) 17:52:00 <_lore_> zhouhan: regarding gateway flow issue, IIRC this chunks was to distribute non DVR traffic 17:52:43 _lore_: I got your point, but why was different prirority needed? 17:53:13 <_lore_> I need to get back to it since I can't recall the details now 17:53:26 _lore_: ok, thanks! 17:53:44 _lore_: It seems an optimization, right? 17:53:49 <_lore_> nope 17:54:14 <_lore_> let's say you have FIP 192.168.1.1 17:54:33 <_lore_> this is to distribute traffic for 192.168.1.0/24 IIRC 17:54:47 <_lore_> but I will check 17:54:49 _lore_: ok, I was thinking we could revert it, if it is an optimization and if we couldn't solve the route priority problem before the release 17:55:11 <_lore_> what is the issue you are facing? 17:55:47 _lore_: If you see my example, the /16 route is overriding the /24 route, which is wrong 17:56:00 <_lore_> ack 17:56:19 <_lore_> I will look into it 17:56:26 thanks _lore_ 17:56:33 <_lore_> yw :) 17:57:55 <_lore_> zhouhan: do you have a unit-test for it? 17:59:18 _lore_: here is what I did in a sandbox: 17:59:23 989 ovn-nbctl lr-add lr1 17:59:23 990 ovn-nbctl lrp-add lr1 lrp1 aa:aa:aa:aa:aa:01 192.168.0.1/24 17:59:23 991 ovn-nbctl lrp-add lr1 lrp2 aa:aa:aa:aa:aa:02 192.168.100.1/24 17:59:25 992 ovn-nbctl lr-route-add lr1 10.0.0.0/24 192.168.0.2 17:59:27 993 ovn-nbctl lr-route-add lr1 10.0.0.0/16 192.168.100.2 17:59:29 994 ovn-sbctl lflow-list 17:59:31 995 ovn-nbctl --help | grep gateway 17:59:33 996 ovn-sbctl show 17:59:35 997 ovn-nbctl lrp-set-gateway-chassis lrp1 chassis-1 17:59:37 998 ovn-sbctl lflow-list 17:59:46 <_lore_> ok, thx 18:00:46 <_lore_> I think we should add some unitest for it 18:02:13 _lore_: please! 18:02:40 <_lore_> will do :) 18:05:06 Anyone else want to take a turn at the mic? 18:06:02 bye all! 18:09:29 bye! 18:10:03 bye! 18:11:20 #endmeeting