18:15:50 <mmichelson> #startmeeting ovn_community_development_discussion
18:15:51 <openstack> Meeting started Thu Nov 12 18:15:50 2020 UTC and is due to finish in 60 minutes.  The chair is mmichelson. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:15:53 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:15:55 <openstack> The meeting name has been set to 'ovn_community_development_discussion'
18:16:03 <flaviof> hi everybody!
18:16:11 <numans_> Hi
18:16:24 <_lore_> hi all
18:16:28 <mdgray> Hi
18:16:52 <mmichelson> I guess I can go ahead and get things going
18:17:34 <mmichelson> First off, I sent an email last week announcing that tomorrow (13 November) is the soft freeze for OVN 20.12. So if you have any patches you want included in 20.12, please do your best to get it submitted to the mailing list by tomorrow.
18:18:07 <mmichelson> As far as work is concerned, I got my SNAT CT zone selection patch submitted yesterday. I submitted a v2 today that addresses 0-day robot's problems.
18:18:36 <mmichelson> I'll be doing reviews today and tomorrow to try to reduce the review load after soft freeze
18:18:49 <mmichelson> And that's all from me, I suppose.
18:19:07 <flaviof> mmichelson: you fixed the robot or did the robot complained about something you had to fix?
18:19:25 <mmichelson> flaviof, the robot complained about something I had to fix.
18:19:37 <blp> hi
18:19:49 <mmichelson> flaviof, there were some minor checkpatch problems, but there was also a complaint about losing const-ness in a function call I was making
18:19:59 <mmichelson> It wasn't reported by gcc locally, so that was strange. But it's fixed in v2
18:20:04 <flaviof> mmichelson: ack. ++
18:20:27 <blp> Must be different compiler versions.
18:21:11 <mmichelson> blp, that's what I assume
18:22:01 <blp> OK, I'll go next then.
18:22:04 <flaviof> mmichelson blp are we planning to have ovn-nrthd-ddlog in ovn 20.12?
18:22:19 <mdgray> mmichelson: I had a similar issue a while back. Passed in travis as well - compiler versions for sure.
18:22:40 <blp> I am continuing to iterate on the ddlog series. I'd love to have it in 20.12. I'd assume that it would be OK after the soft freeze because it's been cooking for some time, but I'd accept a contrary ruling of course.
18:24:26 <numans> blp, If I may, there are a couple of issues I found with my ddlog testing.
18:24:33 <mmichelson> blp, FYI we did some scale testing with DDLog. We encountered some issues after a while. We think it has something to do with probe intervals. I'll let numans expand on it
18:24:46 <blp> numans: Please explain
18:25:24 <numans> 1. There is compilation issue with the travis CI. If you could please take a look. Looks like make dist has few issues. (not a big issue :)) - https://travis-ci.com/github/numansiddique/ovn/jobs/434778314
18:26:07 <numans> 2. Here are the results with ddlog with 500 fake nodes - https://imgur.com/JJpsMhv   vs no ddlog - https://imgur.com/gmbhfoa
18:26:28 <numans> blp, In the logs I see that ovn-northd-ddlog is sending probe interval every 1 second, even if I disable probe interval  - ovn-nbctl set NB_Global . options:northd_probe_interval=0
18:26:41 <numans> And I think as scale increases, ovn-northd-ddlog is dropping the connection to sb db due to 1 second probe interval.
18:26:48 <numans> I think its happening because of this - https://github.com/blp/ovs-reviews/blob/ddlog9/northd/ovn-northd-ddlog.c#L893
18:28:08 <numans> I'm planning to hack the probe interval thing and run again to rule out if the issue is because of constant reconnections to the sb db.
18:28:48 <blp> Ah. OK. Thanks for those reports. I'll fix them for v6.
18:29:12 <blp> Probably simple to fix.
18:29:48 <numans> blp, In the northd-ddlog logs I observed this - 2020-11-10T13:51:32.186Z|01351|timeval|WARN|Unreasonably long 335427ms poll interval (214ms user, 1ms system)
18:30:00 <numans> blp, not sure if it is related to the probe interval.
18:30:03 <numans> blp, welcome.
18:30:31 <flaviof> numans: nice graphs! Is the probe interval on the non-ddlog version test set to 0?
18:31:06 <numans> flaviof, I think with ddlog, the north_probe_interval configured on the db is not taking effect.
18:31:07 <blp> Wow, that's a really long main loop. The difference between elapsed and user+system is suspicious though.
18:32:28 <numans> blp, Also with really huge SB DB, we are hitting this issue - https://github.com/ovn-org/ovn/commit/1e59feea933610b28fd4442243162ce35595cfee with ddlog too
18:32:37 <numans> blp, again related to probe interval.
18:32:48 <numans> You may want to consider the same for ddlog too.
18:33:17 <numans> other than the ddlog testing, I worked on my hairpin patches
18:33:21 <numans> and submitted v4.
18:33:27 <numans> I also did some code reviews.
18:33:31 <numans> That's it from me.
18:33:56 <blp> numans: Thanks for pointing that issue out. I'll look at fixing it too.
18:34:04 <numans> blp, thanks.
18:35:54 <numans> If some one wants to go next.
18:35:57 <_lore_> can I go next?
18:36:01 <_lore_> pretty fast
18:36:11 <numans> sure
18:36:21 <mmichelson> go for it
18:36:26 <_lore_> this week I posted a native implementation for bfd in ovn
18:36:33 <_lore_> (actually today)
18:36:51 <_lore_> it is not complete yet, but I wanted to post it to collect feedbacks
18:37:28 <_lore_> we decided to implement it in ovn since the ovs one seems not so suitable to our needs, we want to enable it on ovn entities, like logical router ports
18:37:50 <blp> Makes sense.
18:38:01 <_lore_> moreover we would enstablish a connection between a logical router port and multiple external entities
18:38:12 <_lore_> not belonging to ovn "domain"
18:38:14 <blp> Does it make sense to try to share code between the implementations?
18:38:23 <blp> (I don't have a guess at that.)
18:38:32 <_lore_> I looked at ovs implementation
18:38:52 <_lore_> and it relies on ovs stuff (netdev, interface, ecc.)
18:38:57 <_lore_> so it is not so easy
18:39:17 <_lore_> maybe we can do as follow-up series
18:40:14 <_lore_> I implemented basic functionalities
18:40:25 <_lore_> tx/rx, state machine
18:40:33 <blp> I think that any code sharing would have to be factoring out a common library and using it both places. I don't have any idea whether that would be productive.
18:41:00 <_lore_> yes, I agree
18:41:21 <_lore_> when the code is more or less ready let's see how much we can share
18:41:27 <_lore_> agree?
18:42:02 <blp> yes
18:42:06 <_lore_> ok, cool
18:42:14 <_lore_> so I will continue to work on my implementation
18:42:21 <_lore_> that's all from my side, thx
18:42:53 <flaviof> _lore_ native in ovn meaning the bfd packets are generated/handled by the ovn-controller process?
18:43:01 <_lore_> yes
18:43:06 <_lore_> flaviof: correct
18:43:11 <flaviof> _lore_ cool beans.
18:43:21 <blp> How big is this likely going to need to scale? Hundreds or thousands of packets/sec might use a lot of CPU%.
18:43:41 <_lore_> blp: I do not think so
18:43:55 <_lore_> but even in ovs packets are sent in userspace
18:44:11 <_lore_> as upcall
18:44:59 <blp> Yes, of course. This takes one more hop across IPC, which could be an additional bottleneck.
18:45:56 <_lore_> I agree
18:46:29 <_lore_> let's sync with osp/osh folks on it
18:46:57 <_lore_> I think the main use-case is lb health-check
18:47:06 <_lore_> for ecmp routing
18:47:48 <blp> OK. Some network virtualization systems end up running BFD (etc.) in a full mesh across all chassis. That is expensive.
18:48:02 <blp> If it's just for a limited number of connections, it will scale better.
18:48:27 <_lore_> I do not have any exact number now
18:48:41 <_lore_> I will sync with osh folks and get back on it
18:48:57 <blp> What is osh?
18:49:17 <_lore_> open-shift
18:49:19 <_lore_> :)
18:49:32 <flaviof> g-osh! ;)
18:50:01 <blp> Ah. I did not know the abbrev.
18:50:14 <blp> And osp?
18:50:45 <imaximets> blp, OpenStack Platform, IIUC.
18:50:50 <numans> openstack
18:50:52 <_lore_> yes
18:50:59 <blp> Thanks.
18:51:02 <_lore_> sorry for the confusion :(
18:51:12 <blp> Anyone else?
18:51:19 <blp> I have my homework list, I think.
18:51:26 <imaximets> I have a quick update.
18:51:53 <imaximets> There is a recurring issue with the size of sb db.
18:52:15 <imaximets> So, this week I tried to prepare PoC patches to combine lflows.
18:52:45 <imaximets> There are lots of lflows that differs only in logical datapath column, so they could be combined to save space.
18:53:27 <imaximets> by introducing Logical_Datapath_Group table or something similar.
18:54:08 <imaximets> northd changes was kind of straightforward, but I found it really hard to make changes in ovn-controller.
18:54:10 <blp> Oh, interesting. Or the logical_datapath column could be made a set: empty -> all datapaths, otherwise all the listed datapaths.
18:54:25 <imaximets> blp, yep.
18:54:33 <blp> ovn-controller makes me sad right now.
18:54:37 <blp> It's complicated.
18:55:01 <imaximets> the issue with ovn-controller is that it works directly with database without any abstraction, so it's hard to modify schema in general.
18:55:36 <imaximets> the second big issue is that almost all parts of lflow processing depends on fact that lflow to datapath is 1:1 relation.
18:56:03 <blp> eventually it might make sense to apply ddlog to ovn-controller, but it does not solve any problems immediately
18:56:23 <imaximets> dceara said that he could help reducing dependencies from the logical datapath, at least we could eliminate dependency in match expression parsing code.
18:56:58 <imaximets> So, I'll wait for him to do that before continuing my efforts.
18:57:49 <blp> I guess another option would be to add a match on datapath to the expression logic.
18:58:08 <blp> And then allow the logical_datapath field to be empty.
18:58:52 <blp> A match on a port would have a prerequisite on the logical datapath that contains the port, I guess.
18:59:15 <imaximets> but will this reduce number of lflows in sb db?
18:59:41 <numans> blp, there are many lflows like match(1), action=next; which imaximets wants to reduce is what I understand.
18:59:58 <numans> this is just one example. there could be many
19:00:13 <imaximets> numans, there are lots of different flows. :)
19:00:16 <blp> Sure it would. That kind of flow would naturally work on any logical datapath. No prerequisites would be generated, since it doesn't match on a port.
19:00:29 <zhouhan> blp: I am thinking the other way around. I think putting everything in match condition as a string makes it hard for optimization in ovn-controller because we can't get any metadata before parsing the logical flow
19:00:58 <blp> Oh, i guess the other issue is that it's hard to just get the flows a chassis needs. Currently that's pretty easy.
19:01:08 <zhouhan> blp: yes
19:01:19 <numans> that's a good point.
19:01:43 <numans> blp, in the case of large scale deployments we have noticed that disabling conditional monitoring helps.
19:01:44 <imaximets> this will hit performance of ovn-controller.
19:02:04 <blp> numans: Is that because of CPU% on the dbserver?
19:02:08 <numans> otherwise ovsdb-server takes lot of cpu cycles to handle conditional monitoring
19:02:11 <numans> blp, yes
19:02:14 <zhouhan> blp: even today, because port exists only in match, we lose some chance to use that to filter out the lflows that doesn't need to be parsed by the chassis
19:02:31 <blp> OK.
19:03:23 <imaximets> Anyway ovn-controller needs some refactoring and untangling.
19:04:06 <mmichelson> that it does
19:04:10 <imaximets> Once this done I'll be able to continue with lflow combining.
19:05:02 <imaximets> In any way of implementation it should reduce size of sb db by factor of number of logical switches in some cases.
19:05:42 <imaximets> That's it from my side.
19:06:06 <mmichelson> Anybody else?
19:06:08 <flaviof> may I?
19:06:47 * flaviof thinking yes.... ;)
19:06:51 <flaviof> As mentioned in the ovs-dev ML, I'm wrapping up the "fair meters" changes, now using a bool column in the NB Meters table. TY blp and dceara for all the support!
19:07:06 <flaviof> #link https://mail.openvswitch.org/pipermail/ovs-dev/2020-November/377277.html the current plan on implementing fair meters
19:07:06 <flaviof> #link https://patchwork.ozlabs.org/project/ovn/patch/20201107213913.GC2753733@ovn.org/ blp generously explaining fair meters changes for ovn-northd-ddlog
19:07:15 <flaviof> I should be done soon, but want to wait for ddlog's merges first.
19:07:29 <flaviof> Besides ACL logs, I see only  one other potential user of Meters table in NB:
19:07:29 <flaviof> #link https://github.com/ovn-org/ovn-kubernetes/blob/061d543f363d48d2a4f0fc1df95fea2c6129e3cf/go-controller/pkg/ovn/ovn.go#L435  controller_event
19:07:42 <blp> flaviof: Do you want any support for me at the moment on the fair meters changes?
19:07:43 <flaviof> There is nothing in northd that deal with that, so this is looking easier than I thought. I hope I'm not wrong. ;)
19:07:44 <flaviof> That is all from me.
19:08:08 <flaviof> blp maybe. But let me try it first. I'll let you know!
19:09:01 <numans> blp, one question on the ddlog, I notice that northd-ddlog is using update method instead of update2/update3. is that intentional ? Just curious.
19:09:30 <numans> i mean the jsonrpc updates from the ovsdb-server.
19:09:39 <blp> numans: someone else wrote that code and I haven't revisited it. I'll take a look.
19:09:56 <numans> blp, ack. thanks.
19:11:06 <blp> I already got in the coreutils doc update that Dumitru pointed out was needed. https://lists.gnu.org/archive/html/coreutils/2020-11/msg00014.html
19:11:56 <imaximets> blp, side question, there is a concern that ddlog doesn't use ovsdb-idl, but re-implements it.  Maybe it's better to use ovsdb-idl instead?  That might be easier to maintain.
19:12:42 <blp> The reason that ddlog doesn't use the idl is that it needs the deltas, which the idl hides. We could extend the idl to expose the deltas; it's a valid option.
19:12:51 <blp> If you like, I can explore that path.
19:13:20 <numans> blp, that could be useful to ovn-controller as well.
19:13:22 <imaximets> it would be great.  At least to check.
19:13:34 <blp> I'll explore.
19:13:36 <zhouhan> blp: for deltas do you mean something different from change-tracking?
19:13:53 <imaximets> blp, thanks.
19:14:08 <zhouhan> blp: or improvements on the IDL change-tracking?
19:14:18 <blp> Changes and deltas are the same. I don't know whether the idl change-tracking code exposes all the information I need. That will be one of the things I check.
19:14:39 <numans> I think idl change tracking doesn't expose what column changed.
19:14:54 <zhouhan> blp: ok. IDL change-tracking currently doesn't provide old data, if a row is modified.
19:15:13 <zhouhan> numans: yes, that's another one
19:16:00 <numans> python ovs idl provides a callback for any updates. and the callback function can figure out what columns changed.
19:16:19 <blp> Old data is essential, because ddlog needs to be passed the row that was deleted and the row that was added.
19:16:34 <blp> Maybe the API can be adapted. I will see.
19:17:23 <zhouhan> one more thing, For set/map columns, it would be better to get just what's changed, because there can be huge data in a column, such as address-set, static-route, etc.
19:17:46 * numans says bye.
19:17:49 <blp> DDlog isn't prepared to consume anything other than full row updates at the moment.
19:18:10 <zhouhan> blp: ok, I guess that might be needed
19:18:29 <blp> If there's a bottleneck, then there are various ways we could deal with it.
19:18:48 <blp> For example, we could break a table down into multiple relations.
19:18:57 <blp> I don't think that's a problem yet though.
19:19:31 <zhouhan> for scenarios like: add a member in a group for policy enforcement - we don't want all the related flows got recomputed
19:19:47 <zhouhan> if the group is huge, e.g. thousands of members
19:21:08 <zhouhan> or, in a deployment we could have a single router, with ten thousands of routes. If any of the routes change, it is just one row change in the DB, but the row contains all the routes.
19:21:49 <blp> It's likely that in that particular situation what's changing is a northbound group membership. That would normally just translate to a single southbound group membership change. If it actually affected thousands of flows, that's a problem in the flow table design.
19:21:58 <mmichelson> zhouhan, that sounds like it could be solved by blp's idea of breaking into multiple relations
19:22:26 <blp> If there is indeed a problem here, then we will come up with a way to systematically break up the tables into multiple relations.
19:22:33 <zhouhan> mmichelson: oh, yes, if that's discussed already
19:22:33 <mmichelson> Actually, no it wouldn't. It just would shift where the burden is some.
19:22:40 <mmichelson> Hmm
19:23:07 <blp> If a change to nb will cause a large change to sb, then there's no way to avoid computation at least linear in the change to the sb.
19:23:21 <blp> Otherwise, it can probably be avoided.
19:23:52 <blp> I have a meeting with my boss in 5 minutes.
19:24:05 <blp> Well, probably more like 15, he's usually late.
19:24:09 <zhouhan> If we wont to solve this by schema change, there will be many places to change ...
19:24:43 <blp> The "breaking into multiple relations" appraoch I am talking about would not require any change to the schemas, only to ovn-northd-ddlog.
19:25:01 <zhouhan> blp: oh, I misunderstood then
19:25:05 <blp> (schemas? schemata?)
19:25:21 <mmichelson> yeah I misunderstood as well
19:25:35 <blp> But I don't want to start down that path until we know there is a problem
19:25:47 <blp> and it might make more sense to look at ovn-controller first
19:27:25 <zhouhan> blp: if the "breaking into multiple relations" is in ddlog, then still we will need to solve the problem of observing what's really changed in the huge column
19:27:54 <flaviof> zhouhan I thought you were referring to columns that have a long list of values. Like ports in a port_group? Ins't that something what could be turned into a separate table? Forgive me if I'm not understanding it right.
19:28:07 <blp> That's not a problem, it's just a matter of writing code.
19:28:13 <blp> we're good at writing code
19:28:21 <flaviof> blp++
19:28:23 <zhouhan> blp: sure :)
19:28:52 <zhouhan> it's the change-tracking enhancement I was talking about
19:29:08 <blp> flaviof: That's what I was talking about.
19:29:12 <zhouhan> flaviof: yes that's another approach, but not what blp mentioned
19:29:19 <zhouhan> :D
19:29:23 <flaviof> ack. TY
19:29:25 <blp> OK folks, I need to run. Talk to you next week!
19:29:37 <zhouhan> thanks, bye
19:29:41 <flaviof> bye all!
19:29:46 <imaximets> bye
19:29:57 <mmichelson> bye!
19:30:02 <mmichelson> #endmeeting