#openvswitch log

17:16:42 <mmichelson> #startmeeting ovn-community-development-discussion
17:16:43 <openstack> Meeting started Thu May  7 17:16:42 2020 UTC and is due to finish in 60 minutes.  The chair is mmichelson. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:16:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:16:46 <openstack> The meeting name has been set to 'ovn_community_development_discussion'
17:16:49 <blp> Ah, let me restart the one thing I said.
17:16:51 <blp> I have good news, I have all the tests passing with the ddlog implementaiton of ovn-northd.
17:16:58 <numans> that's great.
17:17:00 <zhouhan> blp: wonderful
17:17:02 <mmichelson> blp, Awesome!
17:17:10 <blp> I also rebased against OVN master yesterday,
17:17:23 <blp> which made some tests fail because there are new features,
17:17:29 <blp> but it shouldn't take long to impelemnt them.
17:17:36 <_lore_> hi all, sorry to be late
17:17:38 <numans> blp++
17:17:56 <blp> Then, we'll work on performance. ryzhyk will probably be the real source of power there.
17:18:41 <flaviof> hi all
17:19:10 <mmichelson> blp, OK, with regards to performance, do you already have some known problems or is it a matter of running tests and tweaking based on how they go?
17:19:26 <blp> mmichelson: Haven't even tried anything performance wise yet.
17:19:36 <blp> I just assume there'll need some work.
17:20:00 <mmichelson> Got it
17:20:18 <blp> At the very least we need to run benchmarks to compare performance.
17:22:20 <numans> blp, We have a test setup for performance testing. I think we can trigger a run with your branch. I think dceara has plans to do that.
17:23:03 <mmichelson> Yeah, and with an up-to-date ovn-northd-ddlog implementation, this should be real interesting.
17:23:19 <blp> I'll keep everyone up-to-date.
17:23:38 <blp> That's all I've got.
17:23:48 <_lore_> can I go next?
17:24:20 <mmichelson> go for it!
17:24:31 <_lore_> this week I worked on locking feature of ovn-northd and I spotted this issue
17:24:52 <_lore_> I have a cluster with 3 ovn-northd, one active and the other in standby
17:25:29 <_lore_> if I put the interface on active northd down the cluster does not recover
17:25:52 <_lore_> I mean, the other 2 remain in standby
17:26:11 <_lore_> I think because the active does not relase the lock
17:26:28 <_lore_> is there any feature on server side to take care of it?
17:27:05 <zhouhan> ovsdb server named lock should release when the connection lost is detected
17:27:42 <_lore_> zhouhan: is the detection done on client side?
17:27:53 <zhouhan> _lore_: it is on the server side
17:28:05 <blp> When the server detects that a connection has broken, that releases any locks that the connection held.
17:28:07 <_lore_> ok, so for some reason it does not work in my case
17:28:20 <zhouhan> _lore_: is the probe disabled?
17:28:45 <_lore_> need to check, it is default in ovn-fake-multinode
17:28:52 <_lore_> numans: ..^
17:30:26 <_lore_> I will double check
17:30:54 <_lore_> anyway I added some unixctl commands to check the connection status on ovn-northd
17:31:03 <_lore_> I will post the patch later today
17:31:18 <numans> _lore_, I think it configures to 180 seconds.
17:31:45 <_lore_> ack, thx
17:32:24 <_lore_> one question, what happen if the connection is flapping?
17:33:10 <blp> From the point of view of the probe, either it succeeds or fails.
17:33:23 <mmichelson> Sounds like it's at the mercy of the probe interval. So if the probe interval is short, then flapping may cause a premature release of the lock
17:33:47 <blp> If one particular database connection is flapping, and the probe sees a failure, then it means that a backup will get the connection.
17:33:55 <blp> the lock, i mean
17:34:02 <blp> If that backup is more reliable, then that's a good thing.
17:34:26 <_lore_> ok, thx
17:34:29 <_lore_> I will try
17:34:50 <_lore_> that's all from my side
17:35:18 <zhouhan> May I go next
17:35:37 <zhouhan> I continued working on the dp_hash fastpath v.s. slowpath inconsistency problem. I back ported the upstream kernel fix to OVS tree and it solved the inconsistency problem for same flow.
17:35:52 <zhouhan> However, it doesn’t solve the "same 5-tuple same bucket” requirement, because the dp_hash generated by socket is random.
17:36:39 <mmichelson> Would numan's patch for specifying hash fields help with that sort of consistency?
17:36:46 <mmichelson> Or is this related to that?
17:36:53 <zhouhan> Here is the backport patch: https://patchwork.ozlabs.org/project/openvswitch/patch/1588554154-30608-1-git-send-email-hzhou@ovn.org/
17:37:07 <numans> zhouhan, Yesterday I tested and with Fedora 32 with 5.6.8+ kernel and with ovs master, I noticed that the same bucket is selected.
17:37:38 <zhouhan> mmichelson: yes, numan's patch enables changing to "hash" instead of "dp_hash", which solves the requirement. However, it would be a performance degrade compares to dp_hash.
17:37:39 <numans> with distro kernel datapath module
17:38:03 <numans> zhouhan, I'll retest again to be sure.
17:38:16 <zhouhan> numans: yes, in some environment it is not reproduced, even with old kernel, I noticed that as well.
17:38:30 <numans> Ok.
17:39:10 <zhouhan> blp: do you know what's the general practice of backporting kernel patch? I thought the author would usually backport to OVS tree, but it seems that doesn;t always happend
17:39:40 <zhouhan> For the dp_hash consistency problem, here is more discussion: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/050001.html
17:39:40 <blp> Sometimes authors backport, but when they don't, someone else has to do it.
17:40:00 <numans> blp, Will using selection method=hash in group flows result in performance degradation when compared to dp_hash ?
17:40:01 <zhouhan> blp: ok, could you help review the backport patch?
17:40:26 <blp> The VMware team eventually catches up with these things, but they don't necessarily do it quickly, so if there's any particular patch you need then you have to either ask someone politely or do it yourself.
17:40:45 <zhouhan> blp: I see, thanks.
17:40:59 <mmichelson> blp, who would a good "someone" be for these sorts of things?
17:41:07 <blp> Greg
17:41:23 <blp> Greg Rose
17:42:08 <flaviof> let me drop a link format for that here
17:42:09 <flaviof> #link https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/050001.html discussion on dp_hash consistency problem
17:42:21 <mmichelson> flaviof, thanks!
17:42:23 <flaviof> is this is too disruptive, lmk and I will not offe it again ;)
17:42:28 <zhouhan> flaviof: thanks!
17:42:44 <flaviof> np
17:43:01 <zhouhan> I noticed another constraint of dp_hash, which is the 64 hash value limit
17:43:21 <blp> zhouhan: It looks like only dp_hash uses recirculation. I think that method=hash is going to send a lot of packets to userspace, hurting performance.
17:43:22 <mmichelson> I thought it was 256?
17:43:24 <zhouhan> So in LB case, if the backend number is high, it automatically fall back to "hash"
17:43:56 <zhouhan> blp: exactly, that's also my concern.
17:44:11 <zhouhan> mmichelson: it is 64 and I verified that :)
17:44:22 <mmichelson> zhouhan, ok I guess I was mistaken
17:44:41 <zhouhan> In ECMP use case, it may be ok, since usually there won't be so many nexthops.
17:44:51 <blp> The constnats I see in the source imply 256 bucket max.
17:45:05 <blp> #define MAX_SELECT_GROUP_HASH_VALUES 256
17:45:28 <mmichelson> That's what I was basing my question on, too.
17:45:49 <zhouhan> I am pretty sure, let me find the code I saw
17:46:11 <zhouhan> I will send it later, let me continue
17:46:16 <mmichelson> I'm glad we're having this talk because when my turn comes around, I have more stuff regarding groups and buckets to discuss.
17:46:25 <numans> ovn-k8s adds a lot of backends to the load balancer. Some times > 64.
17:46:27 <_lore_> zhouhan: sorry to jump in, what probe interval are you refering to since after 300s it does not recover yet
17:47:32 <zhouhan> I found another more problem of dp_hash in ECMP testing, when pinging a LRP's IP. The packet is dropped because of "deferred action limit reached, drop recirc action"
17:47:55 <zhouhan> This suggests endless recirc in the datapath. I have no idea yet why would this happen.
17:48:21 <zhouhan> There is datapath flow like: recirc_id(0x1ef492),tunnel(tun_id=0xa0011ff0002,src=10.172.66.12,dst=10.78.211.43,flags(-df+csum+key)),in_port(2),eth(),eth_type(0x0800),ipv4(frag=no), packets:460, bytes:45080, used:6.278s, actions:hash(l4(0)),recirc(0x1ef492)
17:48:35 <blp> The probe interval is the idle time before a probe is sent. An additional interval is allowed without a reply before disconnecting. If the interval is 180 s then 360 s are required to disconnect.
17:49:03 <zhouhan> The recirc_id is same between match and action, so it is recirced back to same entry.
17:49:11 <zhouhan> blp: any suggestion how to debug this?
17:49:11 <_lore_> blp: ack, thx
17:50:39 <blp> zhouhan: My suggestion is always ofproto/trace ;-)
17:51:29 <zhouhan> blp: with dp_hash, if I give the hash input (any value), the trace would should expected actions without dropping anything.
17:52:15 <zhouhan> blp: without hash provided, it always goes to "no live bucket". I thought there might be some extra tricks needed here to debug.
17:52:58 <zhouhan> I wonder if it is related to how the value of "dp_hash" is generated by the datapath.
17:53:18 <zhouhan> maybe we can discuss this offline, if no immediate hint
17:53:43 <zhouhan> Apart from this, I did some code review, and also some minor patches.
17:54:32 <zhouhan> One of them is a fix to the manpage generation. #link: https://patchwork.ozlabs.org/project/openvswitch/patch/1588354161-17113-1-git-send-email-hzhou@ovn.org/
17:54:39 <blp> Hmm.
17:55:01 <zhouhan> blp: could you take a look at this small change? Without this the ovn-architecture manpage is not generated properly.
17:55:17 <numans> zhouhan, thanks for looking into the I-P patches.
17:55:21 <numans> and dceara too
17:55:34 <zhouhan> numans: I will continue reviewing this week.
17:55:48 <numans> thanks.
17:55:53 <dceara> numans, me too, i just looked at the first patch until now
17:56:09 <zhouhan> One of the patches I reviewed worth discussing is the multi-localnet ports on same LS.
17:57:09 <zhouhan> I finally got the whole point after discussion in the review. I think I am fine with the approach, although there might be some modeling incompleteness in OVN by using a single LS to model multiple L2 networks.
17:57:38 <zhouhan> That's all that I want to report :)
17:57:39 <numans> I saw the discussion between you and ihrachys. I didn't get the chance to look into the patches yet.
17:57:47 <numans> I can go real real quick.
17:58:02 <numans> I reviewed few patches.
17:58:15 <numans> And I worked on the load balancer selection fields support. Thanks to Han, Mark and Maciej for the reviews.
17:58:23 <numans> I applied that patch to master.
17:58:32 <blp> zhouhan: The question in my mind is, why would it recirc to the same bucket? It's very odd. Can you get that out of the trace?
17:59:14 <numans> I also worked on a small feature to mark a packet from each VIF of the logical port if the mark is set by CMS.
17:59:21 <numans> #link  https://patchwork.ozlabs.org/project/openvswitch/patch/20200501184810.1082602-1-numans@ovn.org/
17:59:31 <numans> This feature is required by ovn-k8s.
17:59:40 <numans> That's it from me.
17:59:58 <mmichelson> I'll jump in real quick
18:00:39 <mmichelson> I was working on an issue reported by Girish ages ago where he was attempting to set up a large load balancer. The problem is the GROUP_MOD request sent by ovn-controller exceeded the max size of an openflow message.
18:00:41 <zhouhan> blp: the trace never showed recirc back to same ID. The trace shows either "no live bucket" and stop, or (if I provide a hash value like dp_hash(0x1/0xf)), it will just finish normally and finally output the packet.
18:01:12 <zhouhan> blp: or is there anything I missed for how to do the trace in this case?
18:01:24 <mmichelson> I wrote up a patch to split the request into an ADD, followed by one or more INSERT_BUCKETS commands. This way the entire group can be communicated to ovs-vswitchd
18:02:17 <mmichelson> This introduces two problems.
18:02:19 <mmichelson> 1) If you try to run `ovs-ofctl dump-groups br-int` you get an openflow error, presumably because it's attempting to communicate the entire group in a single OF response instead of using multipart.
18:02:50 <mmichelson> And more importantly 2) There are messages in the ovs-vswitchd log saying that too many hash values are required, so it falls back to "default hash method" from dp_hash
18:03:10 <mmichelson> I'm curious about what this "default hash method" is, since it's not documented in the group documentation in ovs-ofctl
18:03:39 <mmichelson> My assumption is that this means a super slow userspace hashing method, but it's not clear what the hash is based on.
18:04:34 <numans> mmichelson, I think default hash method is selection_method=hash with no hash fields.
18:04:36 <numans> #link https://github.com/openvswitch/ovs/blob/master/ofproto/ofproto-dpif-xlate.c#L4553
18:04:41 <numans> see this function.
18:05:00 <numans> I mean no hash fields specified by the controller
18:05:12 <zhouhan> blp: mmichelson: My bad for the MAX dp_hash values. It is 256 instead of 64. So I guess what I tested was 256. :(
18:05:33 <mmichelson> numans, I had seen that, but I hadn't followed what method that actually used for hashing. I guess I can dig a little deeper
18:06:05 <numans> Ok. I think it uses 5-tuple hash.
18:06:14 <mmichelson> all right
18:07:03 <mmichelson> OK, I'm about 75% through writing a test for it in the test suite. I'll go ahead and post the patch when I'm done with it. I suspect it won't make 20.06
18:07:11 <mmichelson> That's fine though.
18:07:36 <mmichelson> Speaking of, I haven't seen any responses to my e-mail from Friday about which patches people want reviewed for inclusion in 20.06.
18:07:54 <numans> mmichelson, I've few patches. I reply to your email.
18:08:01 <mmichelson> numans, thanks
18:08:09 <mmichelson> And that's all from me.
18:08:32 <dceara> I have a short update too
18:09:39 <dceara> I sent a v5 of the IDL fix for monitor_cond_since when disconnects happen. zhouhan, I don't know how easy it is to scale test this in your scenario but I was wondering if you could try it out as you saw the issue on a lot of nodes
18:09:55 <dceara> #link https://patchwork.ozlabs.org/project/openvswitch/list/?series=175319 monitor_cond_since fix
18:10:47 <dceara> zhouhan, also a question, re _lore_'s ovn-scale-test PR, I can merge it but I wasn't sure if you're ok with reworking the ACL/PG parts as a follow up PR
18:11:03 <dceara> #link https://github.com/ovn-org/ovn-scale-test/pull/193 network_policy PR in ovn-scale-test
18:11:08 <zhouhan> blp: mmichelson: I take it back. I found what I tested is due to this line: "if (group_setup_dp_hash_table(group, 64))"
18:11:40 <mmichelson> zhouhan, what a roller coaster ride!
18:11:50 <dceara> That's it from me, thanks
18:12:13 <zhouhan> It was hardcoded to 64 when no selection method is specified. So with numan's fix, it should use 256 because dp_hash is specified with the OF1.5 :)
18:12:40 <zhouhan> dceara: sure, I can take a second look for the PR.
18:13:15 <zhouhan> dceara: and thanks a lot for the monitor_cond_since fix! I will review it.
18:13:41 <dceara> zhouhan, np, now I managed to add a test for it too so that should give more confidence
18:13:52 <zhouhan> dceara: for the testing, I am not sure. We only saw it once, but we will keep watching.
18:14:36 <dceara> zhouhan, ok, with our scale test setup I see the issue often but just on a few nodes
18:14:55 <dceara> <10 out of 200 usually
18:15:25 <zhouhan> dceara: that's good, then it will be easy to verify the fix at least :)
18:15:59 <dceara> zhouhan, with the fix, we don't hit the bug anymore, just saying :)
18:16:39 <flaviof> May I go next?
18:17:29 <flaviof> zhouhan your comment on man page made me recall a modest progress I got working for ovn (with Russell's help):
18:17:37 <flaviof> #link https://docs.ovn.org/en/latest/ref/index.html OVN readthedocs: reference guide
18:17:49 <flaviof> That is rebuilt out of a webhook from ovn repo, so it should always be up to date.
18:19:30 <flaviof> also...
18:19:47 <flaviof> I noticed a change in behavior for acls. The acls for allow-related action do not conntrack packets to the logical router.
18:20:15 <numans> flaviof++
18:20:20 <zhouhan> flaviof: Actually I checked that one, too :) However it is not up to date.
18:20:42 <zhouhan> flaviof: maybe it is from a stable branch?
18:21:02 <flaviof> oh, it should now... the webhook was not working until ~1week ago
18:21:20 <blp> mmichelson: The default hash method is a symmetric L4 hash.
18:21:23 <flaviof> did you check after that?
18:21:27 <zhouhan> The recent change made by blp in ovn-architecture doc is not there.
18:21:33 <zhouhan> I checked just now :)
18:21:48 <flaviof> oh, ok. sorry, that part needs to be address still. my bad.
18:22:14 <flaviof> the webhook does not yet do the "make docs" target.
18:23:06 <zhouhan> flaviof: ok, I see.
18:23:20 <flaviof> will work on that next. ;)
18:23:49 <flaviof> anyways, on the acl behavior subject.
18:24:03 <flaviof> A few months back, I wrote a little blog covering ovsdbapp as an alternative
18:24:03 <flaviof> to do what russellb wrote in an old gist with shell commands:
18:24:20 <flaviof> #link https://gist.github.com/russellb/4ab0a9641f12f8ac66fdd6822ee7789e russellb/ovn-test-icmp-reproducer.sh
18:24:20 <flaviof> #link https://github.com/flavio-fernandes/ovsdbapp_playground/blob/a9e780ce7ad57215b2200eba14c515482be84d63/scripts/step2_create_logical_ports.py russellb's equivalent in ovsdbapp
18:24:31 <zhouhan> blp: I think the default for userspace datapath is "symmetric L4 hash", but for kernel datapath it is the hash from skb.
18:25:20 <flaviof> The acls for allow-related action do not conntrack packets to the logical router.
18:25:20 <flaviof> In order to make it work, I needed to add new explicit rules:
18:25:20 <flaviof> #link https://github.com/flavio-fernandes/ovsdbapp_playground/commit/a9e780ce7ad57215b2200eba14c515482be84d63 acl rules changes to make
18:25:20 <flaviof> Is this a known and expected behavior? No worries if you do not know; just thought of
18:25:21 <flaviof> bringing it up here in case any of you have made changes in the last month or so that could have changed this behavior.
18:25:39 <flaviof> that is all. sorry for the overlapp.
18:25:44 <numans> flaviof, you mean you're not able to ping the router ip unless an ACL is added ?
18:25:59 <blp> zhouhan: I don't see anything datapath dependent in ofproto-dpif-xlate.c for this. Look at pick_select_group() and pick_default_select_group().
18:26:23 <flaviof> numans: no, sorry. I can ping the rtr just fine, using just the 'allow-related' rule.
18:26:50 <numans> flaviof, ok
18:27:08 <flaviof> numans: but now it seems that the conntrack logic does not take packets to rtr into account. This may be a fad
18:27:52 <numans> ok.
18:28:18 <flaviof> argh... sorry. I _used_ to be able to ping router by having the  'allow-related' rule.  Now that is no longer working.
18:28:18 <numans> flaviof,  can you report this in discuss ML so tht we can discuss there ?
18:28:28 <flaviof> numans: will do. thanks.
18:28:39 * numans signs off.
18:28:44 <numans> Bye everryone
18:28:49 <mmichelson> Bye!
18:28:53 <mmichelson> #endmeeting