#openstack-meeting-3 log

18:00:57 <SumitNaiksatam> #startmeeting networking_policy
18:00:58 <openstack> Meeting started Thu Dec  8 18:00:57 2016 UTC and is due to finish in 60 minutes.  The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:59 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:01:01 <openstack> The meeting name has been set to 'networking_policy'
18:01:26 <SumitNaiksatam> #info agenda https://wiki.openstack.org/wiki/Meetings/GroupBasedPolicy#Dec_8th_2016
18:01:37 <rkukura> hi
18:01:38 <SumitNaiksatam> #topic Remove expunge_all calls
18:01:43 <SumitNaiksatam> #link https://review.openstack.org/#/c/399772/
18:01:44 <tbachman> :(
18:01:48 <SumitNaiksatam> i just rebased it
18:01:50 <SumitNaiksatam> tbachman: :-)
18:02:03 <tbachman> SumitNaiksatam: I just got back to this mid-yesterday
18:02:03 <SumitNaiksatam> songole: any more data points at your end?
18:02:08 <SumitNaiksatam> tbachman: np
18:02:55 <songole> Regarding sharing across threads..
18:03:11 <songole> we use getsession method provided by neutron
18:03:32 <songole> would that be an issue? we are talking about green threads here, right?
18:04:06 <tbachman> songole: in my mind, it *shouldn’t*, but I have yet to get through this code
18:04:37 <tbachman> I believe get_session is supposed to provide an interface that ensures each green thread gets its own session
18:04:44 <tbachman> but I haven’t gotten thru this myself just yet
18:04:49 * tbachman is doubting :)
18:05:05 <songole> ok
18:05:35 <songole> regarding the specific patch, we ran tests locally with the patch
18:05:48 <SumitNaiksatam> i dont thinkg neutron does anything there, i think this has to have been implemented in oslo_db
18:06:04 <rkukura> songole: have you verified that each thread does its own getsession call, and that neither the session itself nor any of the DB objects associated with the sesssion are shared across threads?
18:06:04 <songole> and we hit the db exception which sumit and I debugged for lbaas v1
18:06:31 <songole> rkukura: yes. each thread does its own getsession
18:06:44 <rkukura> songole: thanks
18:07:13 <tbachman> okay, now I’m confused
18:07:21 <tbachman> you’re hitting this w/o the expunge?
18:07:30 <tbachman> (i.e. the DetachedInstanceError exception)
18:08:23 <SumitNaiksatam> tbachman: i share your confusion ;-)
18:09:26 <SumitNaiksatam> songole: you hit the exception with or without the expunge_all ?
18:10:00 <SumitNaiksatam> tbachman: thanks for referring to the exact exception, just so that we know we are all talking about the same thing :-)
18:10:01 <songole> I didn't see it myself. I just got an email from india team regarding this.
18:10:09 <songole> I will confirm
18:10:47 <SumitNaiksatam> songole: i hope its with the expunge_all call, otherwise it flies in the face of most of our analysis so far :-(
18:11:38 <songole> SumitNaiksatam: I understand.. I will confirm..
18:12:52 <tbachman> SumitNaiksatam: I’m hoping to have a better handle on this tomorrow
18:12:59 <SumitNaiksatam> tbachman: on the side perhaps we can continue to rebase/recheck your patch to see if we see any consistent pattern of failures
18:13:07 <SumitNaiksatam> i will try to keep an eye as well
18:13:11 <tbachman> I realize that we’re short on time here, so unfortunately I’m not sure what to do here.
18:13:49 <SumitNaiksatam> i think there are several changes being made on the NFP side, and perhaps some of the exceptions we were seeing earlier on this patch were not related and might have gone away
18:14:09 <SumitNaiksatam> tbachman: yeah same here, hence the offline email thread last week
18:14:43 <tbachman> SumitNaiksatam: maybe we can address this again offline tomorrow?
18:14:49 <tbachman> for at least the core GBP stuff?
18:14:49 <SumitNaiksatam> tbachman: sue
18:14:51 <SumitNaiksatam> *sure
18:15:07 <tbachman> If we’re lucky, any findings there might give us clues for NFP as well (if there are any)
18:16:10 <SumitNaiksatam> songole: so you are confirming that your usage of threads in NFP is session-safe i.e. sessions are not getting shared threads?
18:16:46 <songole> SumitNaiksatam: From our understanding, they are thread safe
18:16:53 <SumitNaiksatam> *shared across
18:17:26 <songole> And, the lb db exception, i was seeing even without nfp
18:18:35 <SumitNaiksatam> songole: oh, but i thought you were earlier saying that was not the case with the lb exception
18:19:08 <SumitNaiksatam> songole: so the lb exception occured with or without GBP in the mix?
18:19:29 <songole> with GBP and with or without nfp
18:19:54 <SumitNaiksatam> songole: okay, so the expunge_all is still relevant
18:20:12 <SumitNaiksatam> because without GBP, it wont be
18:20:42 * tbachman would like to expunge expunge_all
18:21:03 <SumitNaiksatam> tbachman: yay to that
18:21:20 <SumitNaiksatam> tbachman: songole: any other data point we need to discuss?
18:21:33 <tbachman> SumitNaiksatam: nothing from me (for now)
18:22:07 <SumitNaiksatam> okay, so lets assume that NFP is indeed getting one session per thread
18:22:42 <SumitNaiksatam> and sessions are not leaking across threads
18:23:22 <SumitNaiksatam> can we analyze the exceptions we see henceforth on tbachman’s with that assumption?
18:23:49 <SumitNaiksatam> tbachman’s patch i meant
18:24:22 <SumitNaiksatam> i think we got into the session leaking discussion because we were seeing some NFP exceptions which we could not explain
18:24:29 <tbachman> SumitNaiksatam: FWIW, in the original bug report from Ivar, the problem seemed to be state that wasn’t being created
18:24:30 <rkukura> Don’t forget that the DB objects stay associated with the session, so its important not to leak them across threads either.
18:24:34 <tbachman> the UTs were failing b/c state wasn’t there
18:24:40 <SumitNaiksatam> and attirbuted it to either the removal of the expunge_all call or the session leaking
18:25:51 <tbachman> (e.g. assertEqual for a configuration that’s read back, but the configuration doesn’t exist)
18:26:26 <tbachman> we should make sure that’s what the NFP failures were seeing (I believe they were, but don’t have the data in front of me atm)
18:27:07 <SumitNaiksatam> tbachman: you are referring to this bug report, right: #link https://bugs.launchpad.net/group-based-policy/+bug/1404412
18:27:07 <openstack> Launchpad bug 1404412 in Group Based Policy "Session's persistent objects across transactions create inconsistency" [High,Fix released] - Assigned to Ivar Lazzaro (mmaleckk)
18:27:08 <SumitNaiksatam> ?
18:27:20 <tbachman> SumitNaiksatam: ack
18:27:45 <SumitNaiksatam> what i recall was that the problem was of stale state
18:28:01 <SumitNaiksatam> and there are not threading issues with the UTs (at least not that i am aware off)
18:28:22 <SumitNaiksatam> doing the expunge_all was cleaning the in-session state and forcing reload from the db
18:29:51 <tbachman> SumitNaiksatam: one theory I had was that tox runs UTs in parallel, and somehow the two threads were affecting the same DB state
18:29:53 <tbachman> but that’s a shaky theory at best
18:30:07 <tbachman> so, it wasn’t the sequential transactions per-se
18:30:08 <SumitNaiksatam> tbachman: hmmm, i would be doubtful about that
18:30:28 <SumitNaiksatam> tbachman: it might very well have been a bug at that time though
18:30:31 <songole> sorry, networking issues. reconnected
18:30:35 <tbachman> SumitNaiksatam: ack
18:30:37 <tbachman> yeah
18:30:46 <SumitNaiksatam> tbachman: and hence that problem does not manifest any more now
18:31:01 <tbachman> SumitNaiksatam: ack. But there are still a few things that don’t  make sense to me
18:31:01 <SumitNaiksatam> tbachman: because you are not seeing any issues with the UTs after removing the call
18:31:03 <SumitNaiksatam> songole: np
18:31:13 <tbachman> for example, changing it to expire should have had the same effect, were that the case
18:31:15 <tbachman> but it didn't.
18:31:42 <SumitNaiksatam> tbachman: hmmm, didnt think of that, but makes sense
18:32:26 <SumitNaiksatam> rkukura: to your earlier point about sharing db objects across threads, if that is a culprit is some way with regards to the NFP errors, it would be difficult to track
18:32:58 <SumitNaiksatam> rkukura: however, i don’t understand why it would manifest after removing the expunge_all call
18:33:23 <SumitNaiksatam> *is some way - in some way
18:33:34 <rkukura> SumitNaiksatam: Agreed it might be difficult to track, but the DB objects use the session when you try to follow associations, etc.
18:34:42 <tbachman> I think there are event logs we can turn on here, but I’m guessing it would be a complete overkill of info
18:35:52 <SumitNaiksatam> rkukura: answering my own question, if db objects were being used across threads, and removing the expunge_all call led to an exception, i think it means that it actually revealed an incorrect use of the db objects (across threads), no?
18:36:20 <tbachman> SumitNaiksatam: since the DB objects are tied to sessions, and sessions should only be used by a single thread, I’d say yes :)
18:36:27 <SumitNaiksatam> tbachman: okay
18:36:31 <rkukura> SumitNaiksatam: Since expunge_all() dissaciates the DB objects from the session, it seems like it could be preventing the errors
18:36:33 * tbachman believes that’s what rkukura was getting at
18:36:44 <rkukura> dissassociates
18:37:07 <tbachman> keep in mind that the DB objects can only be used in this state if no backrefs are accessed
18:37:15 <tbachman> otherwise we get the exception
18:37:22 <SumitNaiksatam> tbachman: ah okay
18:37:43 <tbachman> So, if the NFP code is doing this and not accessing backrefs, then we won’t see the exceptions in their code
18:38:06 <SumitNaiksatam> tbachman: but that is regardless of the expunge_all, right?
18:38:24 <tbachman> SumitNaiksatam: actually, without the expunge_all, the object is still tied to the session
18:38:27 <tbachman> and the backref will work
18:38:30 <tbachman> and worse
18:38:35 <SumitNaiksatam> tbachman: ah okay
18:39:14 <tbachman> If two contexts are sharing the same DB object, and mutating it, then there are issues
18:40:04 <tbachman> SumitNaiksatam: I will also say that I’m still worried that my understanding of this is limited
18:40:09 <rkukura> I don’t think a DB object can be associated with two sessions at once
18:40:15 <SumitNaiksatam> tbachman: by contexts you mean?
18:40:38 <rkukura> I read context as session, but maybe tbachman meant thread
18:40:43 <tbachman> yeah
18:40:57 * tbachman hates the terminology here
18:41:06 <tbachman> context here meant thread
18:41:07 <tbachman> sorry about that
18:41:09 <SumitNaiksatam> but how can the db object with multiple sessions?
18:41:19 <SumitNaiksatam> *be associated with
18:41:31 <rkukura> SumitNaiksatam: I think “multiple sessions” was my mistaken interprettation of tbachman
18:41:40 <tbachman> rkukura: ack
18:41:48 <tbachman> I meant multiple threads sharing the same DB object
18:41:49 <SumitNaiksatam> ah sorry
18:41:57 <tbachman> SumitNaiksatam: my bad :)
18:42:07 <SumitNaiksatam> tbachman: no my bad, didnt read your comment
18:42:11 * tbachman is barely monolingual :)
18:42:26 <SumitNaiksatam> tbachman: lol
18:42:31 <SumitNaiksatam> one more item on the agenda
18:42:32 <rkukura> My point was that the DB objects (until expunged) stay associated with the session, so if they are shared across threads, they can cause the session to be shared across threads.
18:43:04 <SumitNaiksatam> rkukura: but that is not a good usage pattern, right?
18:43:12 <rkukura> SumitNaiksatam: Correct
18:44:21 <rkukura> I’d also be somewhat concerned about continuing to use DB objects in the same thread after the transaction in which they were obtained is commited, and especially if a new transaction on the thread’s session gets started. Those old DB objects might become involved in the new transaction.
18:44:52 <SumitNaiksatam> rkukura: agree, but i believe that pattern exists in several places in neutron code
18:45:00 <SumitNaiksatam> so we cant change that
18:45:31 <tbachman> It might be something we should emphasize for new code
18:45:42 <tbachman> (in GBP of course)
18:46:18 <SumitNaiksatam> as for our NFP code, removing the expunge_all call somehow  is leading stale state being read within the threads, which was not earlier the case because (implicitly) the assumption for expunge was being made
18:46:26 <SumitNaiksatam> *leading to
18:46:34 <rkukura> Or we just call expunge_all in the right places to dissassociate the DB objects from the sessions they came from ;)
18:47:06 <SumitNaiksatam> rkukura: oh that can get really messy, i am sure you were indeed joking :-)
18:47:31 <SumitNaiksatam> one more item on the agenda
18:47:46 <SumitNaiksatam> #topic Configuring extension drivers from policy drivers
18:47:51 <rkukura> SumitNaiksatam: Not really - I think it might explain why expunge_all seems to solve some problems, and maybe cause others
18:48:44 <SumitNaiksatam> rkukura: yes, it might explain, but i dont think we want to solve that by calling expunge in a different place
18:48:47 <SumitNaiksatam> #link https://review.openstack.org/#/c/407985/
18:49:35 <SumitNaiksatam> in the above review you mentioned something to the effect of aggregating the driver extension i have proposed in the patch with another one
18:50:34 <SumitNaiksatam> rkukura: this was ostensibly to avoid the need (and potential error) in specifying multiple driver extensions in the configraion
18:50:45 <rkukura> right
18:51:08 <rkukura> Or configuring extension meant for one driver with the other driver.
18:51:09 <SumitNaiksatam> i was thinking what if we allowed the policy driver to specify the driver extensions
18:52:18 <SumitNaiksatam> so if a particular policy driver always knows that it needs certaing driver extensions, it can internally specify that
18:52:49 <SumitNaiksatam> that would avoid the need for any additional extension driver configuration for that particular policy driver
18:52:56 <rkukura> SumitNaiksatam: That may be a reasonable idea, but I’m a bit worried about it not being consistent with the way EDs work in ML2.
18:53:14 <tbachman> FWIW, here’s the log of a failure in the NFP gatefor the last recheck on the expunge_all (search for “Traceback”): http://logs.openstack.org/72/399772/3/check/gate-group-based-policy-dsvm-nfp-ubuntu-xenial-nv/67fd5b7/logs/q-svc.txt.gz
18:53:44 <SumitNaiksatam> tbachman: thanks
18:54:02 <SumitNaiksatam> rkukura: but the GBP framework can diverge where relevant, right?
18:54:05 <rkukura> First question is whether we want users to be able to choose whether or not to enable these extensions. If so, we should make them individual EDs that are not automatically pulled in by the PD/MD.
18:54:53 <SumitNaiksatam> rkukura: there are two aspects to this, one is how they are implemented, and second how they are configured
18:55:33 <SumitNaiksatam> rkukura: if they are implemented as separate extensions/drivers, it leaves the possibility open for configuring them individually
18:55:49 <rkukura> I think it would be reasonable for a deployer to decide they want to use the aim_mapping PD, but only expose the standard GBP API. But I don’t know if the PD code will even work if the EDs are not enabled.
18:56:49 <SumitNaiksatam> rkukura: specificall as for the AIM PD (and its reliance on the ml2+, etc) i think there is enough going on there that we should assume that it wont work without the EDs
18:57:37 <SumitNaiksatam> rkukura: so in such cases why not let the PD decide that it always needs certain EDs
18:58:09 <SumitNaiksatam> * or at least certain EDs
18:58:23 <rkukura> But then why package each (required) feature as a separate extension and ED? Wouldn’t it be a lot simpler to just have one extension and one ED for all of them?
18:58:46 <SumitNaiksatam> rkukura:  so that some other PD could potentially consume them in a different combination
18:59:27 <rkukura> SumitNaiksatam: maybe that makes sense
18:59:41 <SumitNaiksatam> okay we reached the hour
18:59:53 <SumitNaiksatam> thanks all for your time
18:59:58 <SumitNaiksatam> bye!
19:00:01 <SumitNaiksatam> #endmeeting