17:19:32 <numans> #startmeeting ovn_community_development_discussion
17:19:33 <openstack> Meeting started Thu Jul  2 17:19:32 2020 UTC and is due to finish in 60 minutes.  The chair is numans. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:19:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:19:37 <openstack> The meeting name has been set to 'ovn_community_development_discussion'
17:19:41 <numans> Hello
17:19:49 <panda> o/
17:19:59 <numans> Who want to go first
17:20:32 <numans> Ok. I can go quick.
17:20:44 <numans> I did some code reviews this week and a few small patches.
17:21:09 <numans> This week I mostly worked on trying/exploring to optimize the conntrack usage in OVN
17:21:31 <numans> so that we don't send the packet to conntrack to all the packets even if there is one ACL with allow-related.
17:21:53 <numans> Lets see how that goes.
17:22:03 <_lore_> hi all
17:22:17 <numans> I plan to look into dceara's lflow patches tomorrow.
17:22:25 <numans> _lore_, Hi
17:22:31 <numans> That's it from me.
17:22:56 <numans> If some want to go next, then please.
17:24:49 <dceara> I can go next
17:26:00 <dceara> Thanks numans for reviewing the Port_Group split v2 I sent earlier this week.
17:26:11 <numans> dceara, welcome
17:27:01 <dceara> While working on it I was thinking it might be useful to have a way to incrementally process Port_Group.ports updates. E.g. when a port is added/deleted from a port. However I don't see a easy way to implement it right now.
17:28:26 <dceara> Complementary to numans investigation into conntrack I'm also looking into the possibility for the CMS to configure explicit rules to have part of the traffic skip conntrack. The end goal would be to be able to combine stateful and stateless ACLs in the same logical switch.
17:29:24 <dceara> And today I sent new revisions of the DNAT lflow reduction patches and also a v8 of the IDL recovery patch.
17:29:37 <dceara> That's it on my side, thanks.
17:30:34 <mmichelson> Sorry, I just got back in.
17:31:09 <mmichelson> I guess I can go next
17:31:13 <numans> mmichelson, no worries. We start few minutes back.
17:31:43 <mmichelson> To run counter to what dceara is doing, I'm looking to add some new conntrack usage in the case where ECMP routes are used. The idea is that if traffic is received over an ECMP route, then we want to send reply traffic out the same route.
17:32:33 <mmichelson> Based on numans' and dceara's changes now being in master and 20.06, I plan next week to tag 20.06.1.
17:32:54 <mmichelson> And I believe that's all for me to report.
17:33:46 <mmichelson> Do we have anybody else that wishes to share?
17:33:53 <imaximets> I could.
17:34:24 <imaximets> A quick update on stable releases for those who do not read mail-list carefully.
17:35:15 <imaximets> We discussed a little bit with Ben and, apparently, now I will prepare stable OVS releases.
17:35:57 <numans> imaximets++
17:36:03 <imaximets> Target date to release new stables on all branches starting from 2.5 is not decided yet, but it should be somewhere soon.
17:36:40 <imaximets> Technically, I'm waiting for patches to recommend new stable DPDK releases that should be very soon.
17:37:11 <imaximets> After that I'll prepare patches and tag releases after review/merge.
17:37:44 <imaximets> For the main OVS release we're now at the date of soft freeze.
17:37:56 <imaximets> But, I think we need an official email for that.
17:38:33 <imaximets> I wanted to catch blp here to ask, but it seems I will write an email to him to coordinate the dates.
17:38:52 <mmichelson> imaximets, I was just about to ask if blp was going to be the one to send out that official email :)
17:39:00 <mmichelson> But I guess you'll work that out with him
17:39:06 <flaviof> imaximets++ Thanks a lot for doing this!
17:39:14 <imaximets> mmichelson, yes, I think so.
17:40:03 <imaximets> For the dates, my assumption is that branch should be created somewhere on July 17 (wraping to the end of the week as it was last time)
17:40:26 <imaximets> But we will discuss that on a list, I suppose.
17:40:58 <imaximets> From the non release related things:
17:41:15 <imaximets> I sent out stream record/replay patch set.
17:42:01 <imaximets> I told about it long time ago.  To summarize: it allowes to record all the dats on the streams and replay it afterwards.
17:42:35 <imaximets> More details here:
17:42:41 <imaximets> #link https://patchwork.ozlabs.org/project/openvswitch/list/?series=186549
17:42:54 <imaximets> And that's it from my side.
17:42:55 <dceara> imaximets: The record/replay functionality seems really cool! I plan to try it out next week.
17:44:22 <imaximets> dceara, thanks!
17:45:06 <zhouhan> imaximets: This is cool. I will try to find some time to review it, too.
17:45:32 <imaximets> zhouhan, cool!
17:46:02 <zhouhan> imaximets: could you also review dceara's IDL patch. I acked it.
17:46:57 <imaximets> zhouhan, yes.  v7 looked fine, so I asked dceara to post v8.  Will look at v8 closely.
17:47:10 <zhouhan> mmichelson: Question on ECMP with conntrack, even if we can use conntrack for reply in OVN, maybe it doesn't help for traffic initiated from OVN to external network. The ECMP implementation on external routers (physical) would decide the return path.
17:49:48 <mmichelson> zhouhan, Yes, that's a possibility. In the case I'm working to fix, each ECMP route is going to a separate physical router, and we know that the far end routes back to those same physical routers.
17:51:19 <zhouhan> mmichelson: I see. I guess it helps for debugging. Is there any other reason for that?
17:52:43 <zhouhan> numans: dceara: Could you explain the idea how to avoid conntrack for ACL? Does it require that the stateless ACLs have higher priority than stateful ACLs?
17:53:12 <mmichelson> zhouhan, I don't understand what you mean. It's to ensure the return traffic traverses the same return path.
17:54:34 <zhouhan> mmichelson: I mean, why is it required to be on same return path? One obvious reason I can think is to make trouble shooting easier, but I think there might be other benefits you are thinking about.
17:55:55 <zhouhan> mmichelson: or, is it because the physical routers have stateful features enabled, so it is mandatory to ensure it returns on same path?
17:56:23 <mmichelson> zhouhan, https://bugzilla.redhat.com/show_bug.cgi?id=1849683 is the bug report that describes the problem in more detail. One cited reason in the report is ensuring that connection-oriented traffic follows the same path. it likely is due to stateful features enabled in the router as you mention
17:56:23 <openstack> bugzilla.redhat.com bug 1849683 in OVN "[RFE] Add support for stateful next hop (ECMP bypass)" [Urgent,New] - Assigned to mmichels
17:56:52 <zhouhan> mmichelson: got it, thanks
17:57:05 <dceara> zhouhan: What I was looking at (for conntrack) was that if the CMS decides that, for example, "UDP traffic doesn't need stateful ACLs" we could add a filter in pre-ACL tables, set a bit in a register and based on that decide later if we send the packet to conntrack or not and if we match on ct_state or not.
17:57:47 <dceara> zhouhan: this would allow the CMS to decide what traffic should be stateful firewalled and what traffic should be stateless firewalled
17:58:05 <numans> zhouhan, right now even if one ACL with allow-related is present we send the traffic to conntrack (both in the ingress and egress pipeline of the ls)
17:58:59 <numans> zhouhan, datapath testing done by our ovs team and dceara shows significant performance drops because of conntrack in general being slow
17:59:16 <zhouhan> dceara: it means you will do the actual ACL matching for stateless ACLs in "pre-ACL" table, otherwise, how do you know if the packet should have the bit set or not?
18:00:12 <zhouhan> numans: yes, I understand the motivation, but just wonder the "how" part.
18:01:10 <numans> zhouhan, that's the challenging part. I did  some poc by adding few actions like this :
18:01:13 <numans> table=10(ls_in_stateful     ), priority=100  , match=(reg0[1] == 1 && ip4), action=(ct_commit(ct_label=0/1); next; ct_save(ip4.src,ip4.dst); )
18:01:22 <numans> ct_save would actually translate to learn action
18:01:49 <numans> and another ovn action - ct_is_saved(..) to check if we need to send the traffic to conntrack or not for the reply traffic.
18:02:06 <numans> zhouhan, I'm just doing some poc now. Honestly don't know if its worth or not.
18:02:37 <numans> and also planning to add a new acl type - allow-reply
18:02:40 <dceara> zhouhan: It would be hierarchical in a way, in pre-acl we'd say, if traffic matches a pattern (e.g., is UDP) set a bit. Then in the ACL tables rules that have "allow", e.g. "match=udp.dport==4242" will translate to two openflows: 1) if register bit is 0 add match on ct_state 2) if register bit is 1 don't add match on ct_state.
18:02:54 <numans> for ACLs of type allow-related, we would still keep the present implementation.
18:03:25 <dceara> zhouhan: I'll try to send out an rfc soon so we can discuss about it on the mailing list.
18:03:38 <zhouhan> dceara: I see. So pre-ACL checks more general rule, like is it UDP. That makes sense.
18:03:52 <numans> dceara, The approach I'm taking kind of might come in your way. Like I'm determing if we need to send the pkt to conntrack or not in pre-acl stage.
18:04:05 <numans> but not sure :)
18:04:23 <dceara> numans: I call dibs on the highest priority in table pre-acl :)
18:04:36 <numans> dceara, :)
18:04:46 <dceara> numans: joking, we can definitely find a way to have both work together
18:05:05 <numans> dceara, yeah provided the approach I'm taking makes sense.
18:05:25 <numans> ofcourse there are few concerns with the approach I'm taking.
18:05:30 <zhouhan> dceara: and it is implies that the "general" rules are with higher priority than anything else, e.g. if there is an ACL without TCP/UDP, but only IP and ports.
18:06:33 <zhouhan> dceara: that ACL will be handled partially as stateful, even if it has "allow-related"
18:06:49 <zhouhan> dceara: for UDP packet, it will not be tracked.
18:07:07 <dceara> zhouhan: I was actually thinking only of acls with action "allow"
18:07:37 <dceara> zhouhan: Those get changed "under the hood" to allow-related right now if there's at least one allow-related ACL configured on the LS.
18:07:48 <numans> dceara, how would skip the conntrack in the egress pipeline ?
18:07:58 <zhouhan> dceara: I mean, in pre-ACL stage you already marked the bit to skip conntrack, so even if the packet matches a "allow-related" rule, it won't get processed as stateful.
18:08:40 <mmichelson> That sounds like a conflicting configuration
18:09:48 <zhouhan> mmichelson: yes, it would be. But if we consider the "general" rules as high priority and overrides the specific rules, it might still work. (if documented well enough)
18:09:55 <dceara> zhouhan: Yes, but as mmichelson said, it would be the job of the CMS to come up with a good general match.
18:10:23 <zhouhan> ok, I think we are on the same page :)
18:10:34 <dceara> :)
18:10:59 <zhouhan> numans: Regarding "learn" action, there were problems of flow explosion.
18:11:34 <numans> zhouhan, I also thought of translating ct_save to controller action.
18:11:39 <numans> may be it can be done that way.
18:11:51 <numans> zhouhan, but right now I'm just exploring if it is possible to do or not
18:12:08 <numans> and not worring about flow explosion or hardtimeout of the flows.
18:12:35 <zhouhan> numans: I am not sure about the details of your approach yet, no comment. It is great if you already thought about those problems :)
18:13:14 <numans> zhouhan, Honestly I haven't thought much :). I'm thinking to tackle that as step 2.
18:13:26 <numans> zhouhan, and throw away the code/approach if it can't solve it
18:13:33 <zhouhan> numans: understand :)
18:14:41 <zhouhan> I don't have much to update this week except some quick reviews. I will continue reviewing dceara's IP_INPUT flow reducing patches.
18:15:12 <zhouhan> I will probably not attend the next 2 week's meetings.
18:15:16 <zhouhan> (that's my update)
18:15:36 <mmichelson> Hopefully you're doing something fun the next 2 weeks
18:15:37 <dceara> zhouhan: thanks for the reviews!
18:15:47 <flaviof> zhouhan: you will be missed
18:16:01 <zhouhan> mmichelson: hopefully (PTO :)
18:16:15 <numans> zhouhan, have a good PTO
18:16:21 <zhouhan> flaviof: will miss you all!
18:16:21 <flaviof> +1
18:16:35 <zhouhan> numans: thx!
18:16:37 <mmichelson> OK, so anybody else, or have we reached the end of the meeting?
18:16:45 <imaximets> quick question
18:17:09 <imaximets> zhouhan, what with the raft issue you told a few weeks ago?
18:17:21 <imaximets> zhouhan, have you managed to find a root cause?
18:17:31 <zhouhan> imaximets: I didn't reproduce it (didn't try hard either)
18:17:36 * dceara is curious too about that
18:17:44 <imaximets> zhouhan, ack.
18:18:59 <mmichelson> All right, thanks everybody.
18:20:24 <imaximets> numans, I guess, you should stop the meeting as you started it. :)
18:20:35 <numans> imaximets, ack.
18:20:38 <numans> waiting for the signal
18:20:40 <numans> #endmeeting