14:00:30 <haleyb> #startmeeting neutron_drivers
14:00:30 <opendevmeet> Meeting started Fri Nov 10 14:00:30 2023 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:30 <opendevmeet> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:30 <opendevmeet> The meeting name has been set to 'neutron_drivers'
14:00:38 <slaweq> o/
14:00:53 <obondarev_> o/
14:00:59 <mlavalle> o/
14:01:14 <mlavalle> haleyb: veterans day?
14:01:34 <haleyb> mlavalle: yes, and i didn't even know until yesterday
14:02:00 <mlavalle> it's in our calendar, but it is not a holiday for us
14:02:41 <lajoskatona> o/
14:02:54 <haleyb> jlibosva, racosta_: are you around? you have the agenda items today
14:02:59 <jlibosva> o/
14:03:01 <jlibosva> I am
14:03:06 <racosta_> o/
14:03:43 <jlibosva> shall I start?
14:03:45 <haleyb> ok, i think we have quorom
14:03:57 <racosta_> please, go ahead jlibosva.
14:04:50 <jlibosva> I'm not familiar with the format of this meeting so I guess once we enter the "on demand agenda" I shall start?
14:05:09 <mlavalle> yeah go ahead
14:05:16 <mlavalle> we can improvise a bit
14:05:51 <haleyb> jlibosva: right, we can go right to on-demand
14:06:14 <haleyb> #topic on-demand
14:06:22 <haleyb> :)
14:07:00 <jlibosva> ok, thanks :) so I was told to bring this topic here, it's related to ports and its bindings
14:07:02 <jlibosva> #link https://review.opendev.org/c/openstack/neutron/+/892815
14:07:34 <jlibosva> so we can have multiple mech drivers configured and based on the vif type the driver binds the port on a host
14:08:00 <jlibosva> the vif type allows PUT actions (Updated) in the API layer so it's currently possible to change the type while a port is bound
14:09:05 <obondarev> but that will not be a correct change, right?
14:09:17 <jlibosva> now since API is uses stateless resources - ie we don't know if the updated port is bound or not until we look at the DB, which is layer below API, we can't forbid the update operation on port. So I made the patch to fail changing type if port is bound on the db layer
14:09:55 <jlibosva> obondarev: right
14:10:26 <jlibosva> there was a bug in Nova too that if user changed the type, the networking went down. From the Nova perspective this should not be an allowed operation too. And I agree
14:10:45 <obondarev> so it's not an API change to me, just handling an incorrect request
14:10:47 <jlibosva> or - if the type is changed we would need to re-bind the port
14:11:26 <jlibosva> yes, I agree with obondarev. the state a port is in should not allow the change
14:11:49 <jlibosva> there are some concerns on the review that would be good to discuss here
14:12:58 <jlibosva> Rodolfo and Lajos had some concerns and I see Lajos just dropped and Rodolfo is on PTO
14:14:26 <obondarev> so what happens currently when someone updates vif_type of a bound port?
14:14:57 <haleyb> this is vnic_type right?
14:15:18 <slaweq> why do we want to allow changing that on unbound ports? Maybe it would be better (and easier) to simply forbid that on the API level for all types of ports, wdyt?
14:15:28 <jlibosva> yes, the attribute is vnic_type, sorry
14:16:09 <jlibosva> slaweq: the only scenario I could think of would be that someone wants to create a port (like SR-IOV) and forgets to set the type. so instead of deleting the resource and re-creating one can just update, if it's not bound
14:16:32 <obondarev> just like any other update I think
14:16:41 <slaweq> but in such case user can easily delete and then create port again
14:16:42 <obondarev> not save 1 API call at least :)
14:16:48 <obondarev> to*
14:17:02 <jlibosva> as for what happens, I think there will be inconsistency when mechanism drivers or agents query that attribute
14:18:36 <lajoskatona> sorry seems I have some network issue
14:23:04 <haleyb> are there any other questions?
14:23:13 <mlavalle> so let me attempt to summarize the issue: over the years, we, the Neutron community, allowed a certain specific case of a port update and now we realize we shouldn't have
14:23:35 <jlibosva> so I guess there are two ways how to approach it? 1) disallow to update the attribute on API level 2) always check if the port is bound if the attribute is updated on the DB level (the linked patch)
14:24:08 <lajoskatona1> I think I have now working network, sorry for popping up and down
14:24:15 <obondarev> can 2 also include api-ref note?
14:24:24 <haleyb> and if 2) does it need to be discoverable with an extension
14:24:48 <lajoskatona1> yes my concern was to add an extension to show the users that hey the api behaves deifferently
14:24:52 <mlavalle> so we want to fix the bahavior but we worry about changing the behavior. Let's make that behavior discoverable and maybe optional
14:25:08 <lajoskatona1> Iyes good summary
14:25:20 <obondarev> but could anyone used it for any good reason?
14:25:23 <lajoskatona1> I am not sure to be that rigid and even for bug fix add extension
14:25:37 <lajoskatona1> perhaps not necessary and better to fix it
14:25:50 <mlavalle> obondarev: it doesn't seem logical, but we just don't know
14:26:04 <slaweq> I'm not sure we need extension for that - we will just prevent users from doing something what can lead only to the bad things finally so IMO release note would be enough in this case
14:26:10 <lajoskatona1> so my goal was to discuss it and have consensus before just doing anything unchangble
14:26:28 <obondarev> agree with slaweq
14:27:04 <mlavalle> I would be ok with just fixing it. I was not advicating a position, just weighing the alternatives
14:27:05 <lajoskatona1> as I remember from ralonsoh's comment he had the concern
14:27:21 <jlibosva> #link https://bugs.launchpad.net/nova/+bug/1981813
14:27:28 <jlibosva> there is the nova portion and how it was discovered
14:27:43 <haleyb> right, if the change results in an unusable resource i would think an error is appropriate
14:28:48 <haleyb> in my opinion
14:28:57 <obondarev> +
14:28:58 <lajoskatona1> +1
14:29:04 <mlavalle> +1
14:29:22 <haleyb> i was going to ask for a vote, but there it is
14:29:26 <lajoskatona1> and if there is no API extension we can backport it
14:29:41 <lajoskatona1> as I see from the nova bug the fix was backported
14:30:26 <mlavalle> teah, that's a plus
14:30:31 <mlavalle> yeah
14:31:05 <haleyb> so we proceed as a bug fix and don't require an extension
14:31:09 <haleyb> slaweq: opinion?
14:31:34 <slaweq> I'm good with that, no extension needed IMO
14:32:17 <haleyb> ok, i'll take the +1's as consensus on moving forward without an extension
14:32:20 <slaweq> regarding backport - I think we should maybe ask stable core team for opinion also
14:32:25 <slaweq> but personally I don't see reason why we shouldn't backport it
14:32:34 <haleyb> good discussion on the topic
14:32:38 <jlibosva> thanks everyone
14:32:57 <mlavalle> jlibosva: thanks for bringing it up :-)
14:33:06 <jlibosva> the credit goes to lajoskatona1 :)
14:33:30 <lajoskatona1> I stopped the thing, sorry for that
14:33:32 <mlavalle> yeah, but you also took the time to chat with us today
14:33:52 <jlibosva> that's always a pleasure
14:33:57 <haleyb> slaweq: it seems like the nova-related change was backported, so i would +1 a backport
14:34:08 <haleyb> of course that was a CVE
14:34:13 <slaweq> good for me then
14:34:23 <mlavalle> yeah, otherwise the whole thing is useless
14:34:36 <lajoskatona1> +1 for backport with re-no
14:35:00 <jlibosva> well, afaiu Nova doesn't need the Neutron patch. They basically just made Nova aware that the type can be changed in Neutron. Which they didn't count with before
14:35:13 <jlibosva> just to make it clear :)
14:35:24 <haleyb> jlibosva: so just to confirm - change is good, but add rlease note, and you will update the api-ref?
14:35:32 <jlibosva> haleyb: yesir
14:36:16 <haleyb> ack, thanks
14:36:31 <haleyb> racosta_: ok, we can move on to your items
14:36:45 <racosta_> ok, thanks.
14:36:53 <racosta_> Although I added the same topics I presented at PTG, I believe that's required a formal ack for RFE's in driver meetings.
14:37:07 <racosta_> # RFE: BGP peer connect mode. I think we're on track in this one.
14:37:14 <racosta_> #link https://review.opendev.org/c/openstack/neutron-specs/+/899210
14:37:38 <racosta_> It is basically a new configurable parameter to keep the connection with the BGP peer 'active' and avoid unnecessary disconnections if it enters an idle state on the switch side.
14:37:59 <haleyb> #link https://bugs.launchpad.net/neutron/+bug/2006145
14:39:56 <racosta_> It's a trivial change on the n-d-r side, and will be configurable to allow compatibility with anyone who uses the default passive default.
14:40:06 <racosta_> *passive mode
14:40:21 <racosta_> Any questions or comments on this?
14:41:20 <lajoskatona1> nothing from me
14:42:09 <haleyb> not from me, thinks it fixes a valid issue
14:42:16 <mlavalle> +1
14:42:30 <haleyb> +1
14:42:36 <obondarev> no questions
14:42:45 <obondarev> +1
14:42:46 <slaweq> IIRC obondarev had some concerns in the spec review, right?
14:43:07 <obondarev> ah, did I? Let me check
14:43:24 <slaweq> no, I think I messed it with some other spec, sorry :)
14:43:34 <obondarev> ok, no worries :)
14:44:32 <haleyb> slaweq, lajoskatona1: votes? to make it official
14:44:43 <slaweq> +1
14:44:46 <lajoskatona1> +1
14:45:07 <haleyb> ok, thanks, i've marked it rfe-approved
14:45:39 <racosta_> ok, thanks. we can move on to the next
14:45:49 <racosta_> #RFE: BGP speaker peer sessions resilient. This one is related to RMQ/Infra failures.
14:46:05 <racosta_> # LP: https://bugs.launchpad.net/neutron/+bug/2006145
14:46:09 <racosta_> #link https://review.opendev.org/c/openstack/neutron-specs/+/899209
14:46:30 <racosta_> It's a little more complex. The goal is to introduce a new speaker cache logic for the DRAgent can keep the speaker settings and the BGP peer sessions in case of RPC Exceptions, and/or reestablishment of communication via RPC. Basically: a new config option 'speaker_cache_timeout'.
14:47:18 <racosta_> In the RFE proposal, the cache timeout time is configurable and can be adjusted according to the time it actually takes for the RMQ to respond correctly again (transient time after RMQ/Infra comes back - cluster convergence, mysql issues, etc.)
14:47:53 <racosta_> There would be another way to do it, as Felix's suggestion.
14:48:04 <racosta_> It could be implemented as long as we noticed when the RMQ was offline and online, but I didn't find how to obtain this information via oslo_messaging (it would only be possible to experience timeouts in this case).
14:49:38 <racosta_> I think the cache timeout solves transient RMQ issues.
14:49:50 <racosta_> Any questions or comments on this?
14:51:03 <haleyb> i think mine were answered in the bug
14:51:13 <racosta_> yeah
14:53:25 <lajoskatona1> If i understand well in case of such rmq issue the agent removed the bgp settings?
14:54:11 <obondarev> I'm a bit confused, https://bugs.launchpad.net/neutron/+bug/2006145 does not sound like an RFE, but rather like a bug description
14:54:29 <racosta_> yesh, it can remove the complete BGP speaker confg if the RPC return is empty value
14:54:38 <obondarev> and it was already marked as approved
14:55:49 <lajoskatona1> I like the idea to keep these settings till the agent can connect again (if it is below the timeout)
14:55:58 <lajoskatona1> so +1 from me to the proposal
14:56:38 <mlavalle> +1 from me as well
14:57:05 <haleyb> obondarev: i maybe should have marked it rfe-triaged
14:57:40 <racosta_> this was an old thread in the original bug obondarev, issue or RFE? IMO this is a BUG, but as there was no consensus I proposed as RFE...
14:58:16 <haleyb> and since adding the config option was necessary i believe was the reason
14:58:30 <obondarev> I see, so there are 2 specs for a single RFE https://bugs.launchpad.net/neutron/+bug/2006145?
14:59:00 <obondarev> https://review.opendev.org/c/openstack/neutron-specs/+/899209 and https://review.opendev.org/c/openstack/neutron-specs/+/899210
14:59:47 <racosta_> no no, are different cases.
15:00:13 <obondarev> ah, sorry, my bad
15:00:50 <racosta_> no worries, but the two are related to the n-d-r (BGP sessions).
15:01:19 <obondarev> I just had 2 same tabs opened :)
15:02:09 <obondarev> I'm ok with this RFE and spec, +1
15:02:19 <haleyb> +1 from me too
15:02:23 <slaweq> +1 from me too
15:02:28 <lajoskatona1> +1
15:02:33 <haleyb> ok, thanks
15:02:44 <mlavalle> +1
15:03:20 <haleyb> and since that was all on the agenda, and we're over time, we are done
15:03:31 <slaweq> o/
15:03:32 <haleyb> thanks everyone for attending, have a good weekend
15:03:38 <haleyb> #endmeeting