14:00:07 <slaweq> #startmeeting neutron_drivers
14:00:08 <openstack> Meeting started Fri Nov 13 14:00:07 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:00:09 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:00:12 <openstack> The meeting name has been set to 'neutron_drivers'
14:00:15 <mlavalle> o/
14:00:20 <lajoskatona> Hi
14:00:20 <slaweq> welcome everyone!
14:00:26 <obondarev> hi
14:01:15 <slaweq> lets wait few more minutes for other people to join
14:01:32 <ralonsoh> hi
14:02:44 <slaweq> ping haleyb: njohnston amotoki yamamoto
14:02:54 <slaweq> maybe they will join us soon
14:04:05 <mlavalle> don't pay them today
14:04:17 <obondarev> :D
14:05:13 <slaweq> LOL
14:05:35 <haleyb> hi, sorry i'm late
14:07:35 <slaweq> ok, lets start
14:07:42 <slaweq> we quorum already
14:07:57 <slaweq> and we have only one topic in the on demand agenda for today
14:08:01 <slaweq> no new RFEs to discuss
14:08:05 <slaweq> #topi On Demand
14:08:08 <slaweq> #topic On Demand
14:08:23 <slaweq> obondarev: topic was added by You so please go on :)
14:09:07 <obondarev> yep, it's about https://bugs.launchpad.net/neutron/+bug/1887523
14:09:08 <openstack> Launchpad bug 1887523 in neutron "Deadlock detection code can be stale" [High,In progress]
14:09:10 <obondarev> so let me briefly describe my point
14:09:42 <obondarev> I think the bug with it's current title, description and suggested approach is not quite correct
14:10:06 <obondarev> "Deadlock detection code can be stale" - don't think so
14:10:24 <obondarev> "neutron has it's own implementation of it which is missing a bunch of deadlocks" - not True
14:11:11 <obondarev> so indeed neutron has a few decorators on top of oslo.db one
14:11:13 <lajoskatona> yeah perhaps "neutron has its own extra decorators around oslo decorator"
14:11:30 <obondarev> but they were added for a reason
14:11:36 <obondarev> for many reasons in fact
14:12:05 <obondarev> neutron had a long history of fighting against DB errors in concurrent scenarios
14:12:29 <obondarev> git blame may shed some light on it: https://github.com/openstack/neutron/blame/stable/rocky/neutron/db/api.py
14:12:36 <obondarev> just a few examples
14:12:45 <obondarev> https://bugs.launchpad.net/neutron/+bug/1596075
14:12:47 <openstack> Launchpad bug 1596075 in neutron "Neutron confused about overlapping subnet creation" [High,Fix released] - Assigned to Kevin Benton (kevinbenton)
14:12:52 <obondarev> https://bugs.launchpad.net/neutron/+bug/1612798
14:12:53 <openstack> Launchpad bug 1612798 in neutron "Move db retry logic closer to where DB error occur" [Critical,Fix released] - Assigned to Kevin Benton (kevinbenton)
14:13:13 <obondarev> https://github.com/openstack/neutron/commit/948461c8b2fbeb30e4fa3a43cc523cff76327d4e
14:13:31 <obondarev> most were quite a tricky ones
14:14:00 <obondarev> so I don't think moving back to oslo.db decorator is the right thing to do
14:14:24 <obondarev> there were not much work done in oslo.db: https://github.com/openstack/oslo.db/commits/master/oslo_db/api.py
14:14:36 <mlavalle> you mean lately?
14:14:43 <obondarev> right
14:14:45 <obondarev> thanks
14:15:05 <obondarev> in fact oslo.db retries were started by neutron folks)
14:15:11 <mlavalle> LOL
14:15:11 <obondarev> IIRC
14:15:29 <obondarev> and in fact neutron retry logic now handles more cases
14:15:48 <slaweq> but from what is written in the bug description we are missing some of deadlocks and not handling them properly
14:15:50 <obondarev> some more info could be found here: https://github.com/openstack/neutron/blob/master/doc/source/contributor/internals/retries.rst
14:16:20 <obondarev> slaweq, correct, so I think the bug should be about quota deadlocks
14:16:29 <slaweq> maybe we should change this LP to something like "compare our implementation with oslo db and update our where it's needed"
14:16:29 <obondarev> not about bad neutron retry logic
14:17:05 <mlavalle> mhhhhh....
14:17:06 <obondarev> at least the bug should clearly show where neutron retries are bad
14:17:33 <lajoskatona> yeah that 's perhaps missing
14:17:43 <obondarev> as if we just start replacing retry_if_session_inactive - we may got regressions
14:17:53 <mlavalle> in principle I might agree wioth you obondarev. However, I want to point out that Mohammed is the CEO of a big operator
14:17:54 <obondarev> wjich might be hard to spot
14:18:00 <lajoskatona> if I undrstand well from summit generally there are scaling issues with neutron and one thing was that db issues can be behind that
14:18:09 <obondarev> even with Loki service plugin
14:18:48 <mlavalle> so at the very least I would like to get more input from mnaser and hear more about his point of view
14:18:51 <obondarev> so in the bug comment and proposed patch I agree that quota retries have issues
14:19:25 <ralonsoh> let me remember you that we are still migrating to the new DB engine facade
14:19:35 <ralonsoh> that will remove subtransactions
14:19:37 <mlavalle> that is also true
14:19:42 <obondarev> but please let's not just replace retry_if_session_inactive for oslo-db.wrap_db_retries all over neutron
14:19:47 <ralonsoh> and other issues related to mixing both facades
14:20:18 <ralonsoh> so I recommend start investigating this after completing the migration
14:20:37 <obondarev> ralonsoh, so there is no evidence what's exactly wrong with quota deadlock issue, right?
14:20:53 <ralonsoh> I can't tell, sorry
14:20:59 <ralonsoh> but
14:21:26 <ralonsoh> with the new facade at least we know each thread has one single transaction without subtransactions
14:21:33 <ralonsoh> and the context is unique
14:21:36 <ralonsoh> (per thread)
14:21:45 <ralonsoh> e.g.:  https://review.opendev.org/#/c/715315/20/neutron/tests/functional/services/portforwarding/test_port_forwarding.py
14:21:45 <patchbot> patch 715315 - neutron - Finish the new DB engine facade migration - 20 patch sets
14:22:24 <obondarev> great, so I'd suggest we put this bug on hold for now, and after new facade is there - try to reproduce and investigate
14:22:36 <ralonsoh> +1 to this
14:22:41 <mlavalle> I propose three steps:
14:23:09 <mlavalle> 1) finish the migration to the new db engine facade, as indicated by ralonsoh and obondarev
14:23:27 <mlavalle> 2) Limit this bug, for the time being, to the quota issue
14:23:58 <mlavalle> 3) Seek more input from mnaser. He may giove us some good insights, given his operational experience
14:24:06 <obondarev> +1
14:24:16 <slaweq> mlavalle++
14:24:22 <lajoskatona> sounds good
14:24:24 <ralonsoh> +1 to this proposal
14:24:47 <slaweq> and also if there will be another similar issues to what we have now with quota, lets treat them separately as regular bugs
14:24:50 * mnaser is happy to chime in -- just ping me via email/ml (cc'd directly) or any other way :)
14:25:05 <ralonsoh> we can always increase the scope of 2) to other code places
14:25:14 <mlavalle> thanks mnaser !
14:25:18 <njohnston> makes sense to me
14:26:04 <slaweq> ok, so I think we have agreement on that for now
14:26:15 <slaweq> I will sum it up in the LP comment after the meeting
14:26:29 <lajoskatona> thanks slaweq
14:26:30 <obondarev> thanks slaweq
14:27:50 <slaweq> so I think this topic is done for today, right?
14:27:57 <obondarev> +
14:27:57 <mlavalle> I think so
14:28:14 <slaweq> do You have anything else You want to discuss today? if not, I will give You 30 minutes back :)
14:28:34 <mlavalle> yaaay, weekened is now a bit closer!
14:28:41 <slaweq> LOL
14:28:50 <slaweq> for me it's almost there :)
14:28:53 <lajoskatona> it's time :-)
14:28:56 <obondarev> same here)
14:28:58 <slaweq> ok, thx for attending the meeting
14:29:01 <ralonsoh> bye! enjoy the weekend
14:29:06 <slaweq> and have a great weekend!
14:29:06 <njohnston> o/
14:29:07 <obondarev> bye!
14:29:08 <lajoskatona> Bye
14:29:09 <mlavalle> o/
14:29:09 <slaweq> #endmeeting