14:01:15 <ralonsoh> #startmeeting neutron_drivers
14:01:15 <opendevmeet> Meeting started Fri Jul 21 14:01:15 2023 UTC and is due to finish in 60 minutes.  The chair is ralonsoh. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:15 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:15 <opendevmeet> The meeting name has been set to 'neutron_drivers'
14:01:17 <felixhuettner[m]> o/
14:01:18 <ralonsoh> hello all
14:01:21 <obondarev> hi
14:01:26 <slaweq> hi
14:01:30 <racosta> hi
14:01:37 <haleyb> hi
14:01:53 <ralonsoh> this is the agenda of the meeting: https://wiki.openstack.org/wiki/Meetings/NeutronDrivers
14:02:05 <ralonsoh> let's give 30 seconds more
14:02:52 <ralonsoh> ok, let's start
14:02:56 <ralonsoh> the topic we have is
14:03:04 <ralonsoh> #link https://bugs.launchpad.net/neutron/+bug/2027742
14:03:09 <ralonsoh> [RFE] unmanaged dynamic router resources - OVN
14:03:12 <ralonsoh> racosta, please
14:03:27 <racosta> First of all, thank you very much for taking the time to discuss this RFE.
14:03:37 <racosta> This is an opensource project with global impact, and I try to contribute as much as possible with the development of the solution and with experience/test results.
14:03:44 <racosta> As I mentioned in the RFE information and mailing list messages, I don't know any other option that allows learn and reinjecting routes at the SDN level (BGP with default gw/static routes does not do the same thing). https://lists.openstack.org/pipermail/openstack-discuss/2023-July/034377.html
14:03:51 <racosta> Why is it important? -> high availability
14:04:01 <racosta> In the proposed RFE, all the basic resources are created by OpenStack Neutron, including the tenant router. What do I need to add for interconnect to work? the LRP in the TS network.
14:04:09 <racosta> The LRP of the tenant's router connects with the TS (this transit switch behaves like a provider), as shown in the figure: https://drive.google.com/file/d/1lBP7MdvukRlQIE1yWU3K3jWGqnAGGQd3/view?usp=sharing
14:04:20 <racosta> I tested an OVN interconnect integrated with 3 OpenStack installations and it worked very well. The TS is not known to Neutron and nothing happens to it. The point here is the "learned" Static Routes and LRPs (used to connect with the TS).
14:04:34 <racosta> I understand the arguments about security that Rodolfo commented on the thread, but in this case, this LRP and these routes are created solely and exclusively in the tenant's network (having no overlap or problems with other tenant's subnets).
14:04:56 <racosta> Why should we make db_sync flexible? Even if other ways of managing TS are applied in Neutron, the static routes (which is the main point for the interconnect to work), would have to be made more flexible, and the tenant router would need to skip static routes learned from db_sync, for example. There are other points here about adding TS support as a network, scalability is the biggest one.
14:05:09 <racosta> While it is possible to create the TS in Neutron itself (with a new RFE), this would mean having to add a new router in the tenant project and create static routes for that project. Don't you have problems with scaling on OpenStack? I mean, Neutron's networking backend have no infinite resources.
14:05:23 <racosta> The big problem with this approach is that with large scale deployments (thousands of tenants), the SDN can reach the limit and we will face timeout problems. The most classic I've faced with OpenStack Ussuri (which has lower limits) is "network-vif-plugged", caused by the OVN's delay in transitioning the Port events from chassis to the OVSDB Southbound and consequently Nova's ends up failing -  (even with configured network-vif-plugged
14:05:23 <racosta> timeouts of 300 seconds).
14:06:16 <racosta> what do you think about this RFE?
14:07:13 <ralonsoh> first of all, is everyone aware of the goal of this RFE? Do you have questions about it?
14:08:04 <slaweq> I read today the thread on the ML
14:08:35 <slaweq> IIUC it's about interconnecting 2 different OpenStack clusters
14:08:43 <slaweq> using L3
14:08:53 <slaweq> is that correct?
14:08:57 <racosta> Yep, two or more OpenStack clusters.
14:09:02 <obondarev> not only OpenStack
14:09:06 <mlavalle> that's my understanding as well
14:09:09 <ralonsoh> using OVN IC, that creates an IC Transit Switch and requires OVN routers
14:09:15 <slaweq> ok
14:09:18 <mlavalle> and yes, connect more that OpenStack
14:09:20 <obondarev> also connect openstack to smth else, right?
14:09:32 <slaweq> and this reminded me old things which we had proposed long time ago:
14:09:38 <racosta> Yes, I tested it integrated with ovn-kube.
14:09:41 <slaweq> Old spec https://specs.openstack.org/openstack/neutron-specs/specs/stein/neutron-interconnection.html
14:09:54 <slaweq> old API-REF https://review.opendev.org/c/openstack/neutron-lib/+/626871 - was later removed with https://review.opendev.org/c/openstack/neutron-lib/+/626871 when we archived this stadium project,
14:10:09 <slaweq> and stadium project  https://github.com/openstack-archive/neutron-interconnection/tree/1c5fbf56ff05f503745f7de7041f3e3e258d4f73
14:10:30 <slaweq> did You looked maybe at that spec and API? Maybe we could somehow base on that to implement it with ovn-ic now?
14:10:34 <felixhuettner[m]> if i got it correctly the point of the RFE here is to not interfere with the resources created by/for OVN-IC inside the local ovn cluster
14:10:53 <mlavalle> I have questions regarding ovn-kube: has this approach already been implemented with ovn-kube?
14:10:55 <felixhuettner[m]> i did not understand that ic should be intergrated in neutron (or maybe just as a future topic)
14:11:03 <racosta> I saw that slaweq but it wasn't continued.
14:11:45 <slaweq> racosta yes, it was proposed by guys from Orange but later they stopped working on it, at least u/s and there was nobody interested in that later
14:12:31 <racosta> That's the point Felix, the goal is for Neutron not to interfere with the ovn-ic.
14:12:51 <slaweq> I'm just saying about it now, as maybe it would be useful - making it somehow part of neutron would mean that neutron will know about those resources so there may not be interfere problem anymore :)
14:13:40 <felixhuettner[m]> that might also be an option (and would even make it more easily useable)
14:13:43 <racosta> mlavalle, to integrate with ovn-kube used this doc: https://github.com/kubeovn/kube-ovn/blob/v1.11.0/docs/cluster-interconnection.md
14:13:59 <felixhuettner[m]> allthough i would only see the "usage" of this transit switch within neutron
14:14:04 <felixhuettner[m]> not the management of it
14:14:14 <felixhuettner[m]> otherwise we have one neutron ruling over other neutrons
14:14:30 <felixhuettner[m]> which at least in our case is not what we want
14:15:07 <ralonsoh> how is that? each openstack cloud will create its resources (TS, routers, etc)
14:15:28 <felixhuettner[m]> no the transit switch is own by ovn-interconnect (in a separate database)
14:15:32 <racosta> Yes Felix, remote LSP learned via ovn-ic should not be managed by Neutron.
14:15:42 <felixhuettner[m]> and then "instanciated" in each individual ovn deployment
14:15:55 <felixhuettner[m]> and i meant the transit switch within the ic databases
14:16:23 <ralonsoh> hold one, this IC belong to other OVN deployment?
14:16:42 <ralonsoh> the IC database belongs to each cloud/OVN cluster
14:16:53 <slaweq> I think I need to read more about ovn-ic because I never knew about this :) Because of that I may say totally stupid things regrarding this old spec - sorry for that
14:16:55 <ralonsoh> each one has its own IC NB/IC SB database
14:17:01 <felixhuettner[m]> no
14:17:07 <felixhuettner[m]> there is one central IC NB+SB
14:17:19 <felixhuettner[m]> and all OVN deployments have their individual normal NB+SB
14:17:32 <felixhuettner[m]> and the ovn-ic daemon connects to the IC-NB+IC-SB and the normal NB+SB
14:17:52 <felixhuettner[m]> and these IC NB+SB contain the information about the existence of this transit switch and who is connected there
14:18:07 <ralonsoh> right rigth, that's correct
14:18:20 <racosta> no worries slaweq, I had already seen it and even opened a thread in March mentioning this spec, but the case here is a little different.
14:18:37 <ralonsoh> so let's go to the NB resources case, what is the problem here
14:18:42 <slaweq> racosta now I see :)
14:19:06 <felixhuettner[m]> so the ovn-ic daemon will sync information from the IC-NB+IC-SB to the normal NB
14:19:16 <felixhuettner[m]> this includes the transit switch (a logical switch)
14:19:31 <felixhuettner[m]> the routes to other routes (logical router static routes)
14:19:51 <felixhuettner[m]> and the ports of other devices on the transit switch (logical switch ports)
14:20:12 <racosta> local and the remote LSP's on the transit switch
14:20:15 <ralonsoh> and can these resources be tagged somehow from the IC controoler?
14:20:25 <ralonsoh> I mean the local NB resources
14:20:35 <felixhuettner[m]> they are tagged iirc
14:20:45 <felixhuettner[m]> let me check
14:20:47 <ralonsoh> so we have a way to recognize them, right
14:21:22 <felixhuettner[m]> so logical switches have other_config:interconn-ts=***
14:21:38 <ralonsoh> and routers?
14:21:41 <racosta> If they are not natively tagged, you can add an interconnect tag
14:21:57 <ralonsoh> NAT rules in OVN
14:22:15 <racosta> The routers are Neutron Native router...
14:22:20 <ralonsoh> no
14:22:21 <ralonsoh> no
14:22:28 <felixhuettner[m]> static routes have external_ids:ic-learned-route=xxx
14:22:38 <ralonsoh> you said that you DON'T want to do resources from Neutron
14:22:52 <ralonsoh> felixhuettner[m], ok
14:23:07 <ralonsoh> and the problem are the learned static routes, right?
14:23:15 <felixhuettner[m]> and the logical switch ports
14:23:19 <ralonsoh> that you don't have a way to correlate to the routers
14:23:20 <felixhuettner[m]> and i think they have type=remote
14:23:23 <felixhuettner[m]> but i'm not sure
14:23:54 <felixhuettner[m]> but i guess we could exclude all resources with these tags from the sync
14:24:16 <racosta> Of course we can correlate ralonsoh, the LRP connected to the TS is linked with the Neutron router.
14:24:21 <ralonsoh> that's the point, if you can correlate all of them, you can create these rules in the sync tool
14:24:37 <felixhuettner[m]> +1
14:24:53 <ralonsoh> racosta, ok, I've been trying to have a reply for this question for a whole week
14:24:59 <ralonsoh> let me ask it again
14:25:18 <ralonsoh> you are proposing a strategy here where the resources are NOT created in Neutron
14:25:21 <ralonsoh> right?
14:26:44 <racosta> Yes, interconnect LRP only (and learned static routes). Everything else is from Neutron DB and will be linked with TS via LRP
14:28:08 <racosta> We can define a tag to identify this LRP and handle it in the sync tool.
14:28:25 <ralonsoh> ok, I have no idea what you are proposing, at all. I'm totally disoriented
14:28:35 <ralonsoh> I think you are mixing Neutron DB and OVN DB
14:28:53 <ralonsoh> in any case, at this point I'll stop writing and let other people to collaborate
14:29:29 <felixhuettner[m]> aah, i missed that connection
14:29:36 <felixhuettner[m]> so we where above at
14:29:48 <felixhuettner[m]> there is a transit switch replicated to the normal northbound
14:29:59 <felixhuettner[m]> and there is a normal router created by neutron in the standard way
14:30:21 <ralonsoh> ^ that's what I was expecting, finally
14:30:28 <ralonsoh> so this must be very clear in the spec
14:30:37 <felixhuettner[m]> and the needed LRP and LSP port to connect the router to the transit switch was still missing
14:30:42 <ralonsoh> what is created from the IC controller
14:30:46 <felixhuettner[m]> and that is not created by ovn-ic
14:30:46 <ralonsoh> and what manually in Neutron
14:30:57 <felixhuettner[m]> and not created by neutron
14:31:04 <felixhuettner[m]> and i think this causes all this confusion
14:31:08 <ralonsoh> yes
14:31:13 <ralonsoh> ok, I'll stop now
14:31:14 <racosta> yep. TS - dynamic ovn managed. Logical_Router - Neutron managed. We need to connect this.
14:31:27 <slaweq> ralonsoh speaking about spec - I think that this needs spec with well described problem statement and proposed solution (maybe with examples)
14:31:44 <mlavalle> and a nice set of diagrams
14:31:50 <ralonsoh> +1
14:31:52 <slaweq> mlavalle++
14:32:02 <haleyb> yes, i like pictures too
14:32:18 <felixhuettner[m]> i agree as well, the amount of confusion it has created is quite large already :)
14:32:19 <obondarev> so after all, what are neutron changes? Are there a lot?
14:32:22 <mlavalle> depicting the components and who is responsible of managing each (i.e. OVN-IC vs Neutron)
14:33:15 <racosta> I like too haleyb, I tried to put it but the openstack list blocks images...
14:33:43 <obondarev> my impression was that the only change is to skip some resources (not delete) during db sync, is it correct?
14:34:01 <felixhuettner[m]> yes (from my perspective)
14:34:09 <felixhuettner[m]> just defining which is not easy
14:34:12 <mlavalle> what is really positive about this proposal is that we have a feature in the underlying sdn backend that we are not leveraging in Neutron and we should explore how to do it, especially since other CMSs are already doing it
14:34:24 <felixhuettner[m]> racosta: should we work together on this spec?
14:34:46 <felixhuettner[m]> (allthough i'll be on vaccation soonish)
14:35:08 <ralonsoh> just a heads-up: any spec will be approved for the next release (C)
14:35:10 <racosta> Of course Felix, that would be nice.
14:35:13 <haleyb> from reading the bug and listening here, i'm still not sure how this IC thing learns routes, etc - is it just sync or getting neutron events? how does it keep up to date? just something to make clear
14:35:58 <felixhuettner[m]> it reads the routes from one clusters NB and puts it to the central IC-SB
14:36:10 <felixhuettner[m]> then the other cluster takes it from the IC-SB and adds it to the clusters NB
14:37:41 <racosta> Yes, there is an ovn-ic daemon that monitors the NB DB and replicates to other elements of the interconnect domain (another OpenStack, for example) - but this route "learned" is limited to the scope of TS.
14:38:58 <haleyb> ok, so it's watching OVN events after an initial sync
14:39:12 <felixhuettner[m]> +1
14:39:15 <racosta> In practice, a subnet behind an Openstack router can communicate to another subnet behind a router on another OpenStack (using L3).
14:40:18 <ralonsoh> (I would expect a good documentation of how to create this IC between clusters)
14:40:23 <ralonsoh> any other question?
14:40:41 <ralonsoh> so let's vote for this RFE
14:40:51 <ralonsoh> +1 (plus an spec)
14:41:29 <ralonsoh> folks?
14:41:35 <slaweq> +1 for RFE as I think that's great thing to have possibility to interconnect different clusters togethere
14:41:54 <mlavalle> let's go for a spec
14:42:01 <mlavalle> +1
14:42:12 <ralonsoh> obondarev, ?
14:42:16 <mlavalle> the good news is that we have time to be thorough with it
14:42:20 <obondarev> +1
14:42:33 <ralonsoh> so perfect, the RFE is approved. I'll comment that in the LP bug
14:42:34 <haleyb> +1
14:42:37 <ralonsoh> sorry
14:42:43 <ralonsoh> haleyb, I missed you!
14:42:44 <ralonsoh> sorry
14:42:57 <haleyb> i'll go back to my corner :)
14:43:02 <ralonsoh> my bad...
14:43:13 <ralonsoh> anything else you want to comment?
14:43:27 <haleyb> no, will wait for spec
14:43:42 <ralonsoh> thank you all for attending this meeting. Have a nice weekend
14:43:55 <ralonsoh> (I'll be on PTO next week, just a heads-up)
14:43:59 <felixhuettner[m]> thank you, have a nice weekend
14:44:01 <obondarev> o/
14:44:02 <ralonsoh> #endmeeting