18:01:06 #startmeeting networking_policy 18:01:07 Meeting started Thu Mar 23 18:01:06 2017 UTC and is due to finish in 60 minutes. The chair is SumitNaiksatam. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:01:08 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:01:11 The meeting name has been set to 'networking_policy' 18:01:26 hi SumitNaiksatam 18:01:29 hi all 18:01:32 #info agenda https://wiki.openstack.org/wiki/Meetings/GroupBasedPolicy#March_23rd_2017 18:01:37 rkukura: hi 18:01:38 hi 18:02:01 so most of the newton sync patches merged over the past few days 18:02:06 thanks all for the work and the reviews 18:02:06 SumitNaiksatam: congrats! 18:02:14 * igordcard claps 18:02:20 * tbachman knows that was no small effort 18:02:25 tbachman: but still facing some niggling issues :-) 18:02:28 :) 18:02:36 igordcard: have to apologize to you since it created more work for you 18:02:47 so lets go to QoS first 18:03:04 #topic QoS via NSP patch 18:03:04 SumitNaiksatam: it's ok I invested on the wrong week to do it :p 18:03:06 #link https://review.openstack.org/#/c/426436 18:03:15 igordcard: :-( 18:03:28 igordcard: so per your latest comment, this is in good shape now? 18:03:59 SumitNaiksatam: it looks like it is, at least by comparing the latest nfp gate failures with the nfp gate failures of one of the last merged patch 18:04:10 igordcard: nice 18:04:28 I am disabling QoS entirely on the aim gate, is this ok? 18:04:36 igordcard: i was planning to look at it before the meeting, but got distracted with something else 18:04:43 igordcard: oh 18:04:59 igordcard: does it break if you dont? 18:05:24 SumitNaiksatam: yeah and the errors weren't very explicit 18:05:43 SumitNaiksatam: I believe it was mentioned back in the time that aim didn't have to support qos? 18:06:01 igordcard: yes, aim doesnt have to support 18:06:10 igordcard: but i would be curious to know why it failed 18:06:32 igordcard: so when you say you disabled qos, you mean which configuration? 18:06:35 #link http://logs.openstack.org/36/426436/20/check/gate-group-based-policy-dsvm-aim-ubuntu-xenial-nv/fd4ebeb/console.html#_2017-03-21_22_10_08_099496 18:07:04 I’m part way through reviewing that patch, and thought the devstack config looked reasonable 18:07:11 #link https://review.openstack.org/#/c/426436/22/devstack/override-defaults 18:07:17 rkukura: great 18:07:37 igordcard: okay got it 18:07:41 It does not add the QoS extension driver for AIM, but does otherwise 18:07:41 igordcard: that looks fine to me 18:07:49 rkukura: yeah, that is fine for now 18:08:14 rkukura: so great, rkukura i was going to request you to review, but looks like you are already on it 18:08:38 yes 18:08:48 so if nothing major, lets try to merge it before it diverges more 18:08:56 igordcard: we would have to backport to stable/newton 18:09:12 and then the question is if we should backport it to stable/mitaka as well 18:09:16 only comment so far other than ripping out the clean_session stuff had to do with the exception text 18:09:45 i ah 18:09:48 *ah 18:10:02 I’m not sure I understand the original text either 18:10:02 so looks like it will need another rebase :-( 18:10:42 * rkukura will be back in about 2 minutes 18:10:57 great, feel free to leave all the comments there and I'll fix and rebase on the next patchset 18:11:24 igordcard: thanks 18:11:49 #topic NFP patches 18:12:10 the other big thing I had were the NFP patches 18:12:33 oh and there is songole right on cue :-) 18:12:38 Hi 18:12:49 Wrong timing :) 18:12:55 songole: lol 18:13:06 lol 18:13:09 * igordcard thanks all and gracefully leaves to get home 18:13:11 songole: so you and hemanth are mostly shepherding the NFP patches 18:13:20 igordcard: thanks a bunch for taking the time to join! 18:13:27 igordcard: good night! 18:13:32 :) 18:13:50 songole: a disruptive patch just merged 18:14:14 What is it? Qos? 18:14:17 disruptive in the sense that it requires a rebase for other patches 18:14:24 songole: no, QoS not merged yet 18:14:43 Ah. 18:14:48 oh, I should have mentioned this in the bigger context - we are completely eliminating all the “clean_session” stuff 18:15:39 #link https://review.openstack.org/#/c/448885/ 18:16:09 so this eliminates the use of the clean_session flag 18:16:22 but it causes merge conflicting with the existing patches 18:16:39 so if you see conflicts thats the first thing you need to take care of 18:18:11 ok 18:18:12 songole: other than that, how are we doing on the NFP patches? 18:18:41 songole: there were some patches which tried to fix the NFP gate job, and which we merged, but the gate job is still broken 18:18:42 we are facing a few issues with lbaasv2 18:18:48 songole: ah okay 18:18:56 songole: do we need to discuss here? 18:19:12 in the base mode where we used to use the namespace lb implementation 18:19:21 without having to launch a VM for lb 18:19:41 songole: ah, but you can do that any more? 18:19:45 *cant 18:20:01 looks like it. default is octavia 18:20:08 bummer!!! 18:20:10 which spins up a vm 18:20:16 hmmm 18:20:25 okay i see the difficulty now 18:20:43 LBaa(!OS)S 18:20:56 tbachman: :-) :-( 18:21:02 lol 18:21:15 mixed feelings 18:21:32 songole: so there is no way to adapt the old driver to v2? 18:21:34 so, it may take sometime to get the tests running 18:21:43 just to validate the gate 18:21:52 I think you can still use LBaaSv2, but it’s deprecated 18:22:05 tbachman: i think you mean v1 18:22:07 * rkukura finally back 18:22:08 ah 18:22:09 k 18:22:15 and v1 is totally out of newton 18:22:16 it will finally hit us at some point 18:22:17 hence the issue 18:22:37 ash said there might be a way to use namespace in v2 18:22:48 songole: okay 18:23:08 songole: that said should the tests work on mitaka? 18:23:11 * tbachman was confused 18:23:20 tbachman: np, i know what you meant 18:23:29 mitaka should be good 18:23:46 the issue is only with newton 18:24:08 but nfp tests are failing on mitaka for a different reason.. 18:24:17 songole: so what i am suggesting is that to some small extent we can at least retroactively validate against stable/mitaka (but this will be after the backport, so master is already merged by then) 18:24:25 songole: ah okay 18:24:51 songole: what happens if you dont launch the service instance in newton? 18:25:05 songole: can we fake the responses? 18:25:54 songole: just so that we can dervie benefit from all the other things in that gate test 18:26:03 Will explore the idea 18:26:14 songole: right now since the whole job fails we cant tell what is broken 18:26:52 songole: okay thanks 18:26:52 got it 18:27:04 songole: anyting else you want to bring up on the NFP patches today? 18:27:42 nothing more.. 18:27:47 songole: okay things 18:27:54 #topic Open Discussion 18:28:19 so we are seeing some DB perplexing DB issues when running newton (with aim_mapping driver) 18:28:31 which are not noticed in the gate 18:28:55 * tbachman listens in 18:29:00 so if you hit any wierdness, get in touch with me, might save you some time 18:29:22 tbachman: we think that the session is somehow leaking across threads 18:29:36 SumitNaiksatam: sounds familiar ;) 18:29:46 tbachman: again :-) :-( 18:29:52 I was always a little nervous when we backed out the expunge_all 18:30:09 b/c I wasn’t convinced we no longer had a root cause 18:30:31 tbachman: i think the expunge all would have created even more problems 18:30:37 tbachman: but thanks for bringing that up 18:30:45 SumitNaiksatam: ack. but it was something that let us know that there might be a problem there 18:30:50 (i.e. that it was a possibility) 18:30:54 tbachman: i had lost track of the fact that neutron does some expunging in newton 18:31:06 SumitNaiksatam: do you have any logs of the more recent failures? 18:31:13 tbachman: initially i had patched that but then i let it be there 18:31:57 tbachman: these are in the deployment, i think jishnu refreshed the fab 18:32:05 SumitNaiksatam: ack 18:32:16 tbachman: will send you once we are able to reproduce them again 18:32:21 SumitNaiksatam: thx! 18:32:27 * rkukura back in moment 18:32:33 * tbachman may be working on the same problem with Jishnu in parallel 18:32:34 i also ran into a very wierd issue in stable/mitaka 18:32:40 tbachman: ah right 18:32:49 the issue is complicated to explain 18:33:20 but basically I was seeing this exception - “ResourceClosedError: This transaction is closed" 18:33:29 i know exactly where it is happening 18:33:34 but not why! 18:33:38 i have a workaround 18:33:43 SumitNaiksatam: are you able to bring up a working devstack on newton consistently? 18:33:47 but need to get the root cause 18:34:05 songole: the devstack installation is happening in every gate job run 18:34:08 songole: so yes 18:34:17 songole: i tried last week on my system 18:34:20 not this well 18:34:24 *week 18:34:44 SumitNaiksatam: this is only on Mitaka? 18:34:48 right. 18:34:58 tbachman: and yes, this later issue on mitaka 18:35:10 tbachman: the session sharing issue on newton 18:36:08 SumitNaiksatam: so this isn’t related to the problem you were seeing with the newton sync 18:36:12 (b/c it’s mitaka) 18:36:37 (i.e. caused by the addition of the context manager) 18:37:05 * rkukura is back 18:37:07 tbachman: no 18:37:26 tbachman: yes, mitaka issue has probably been around 18:37:39 tbachman: its just that we didnt catch it 18:37:51 tbachman: the issue is that all the DB operations are committed 18:38:14 however, when the outer most transaction context exits, the exception is thrown 18:38:20 (i am referring to the resource closed) 18:39:19 * tbachman wishes he had slayed that dragon when he first encountered it :( 18:39:21 this is the trace from that: #link https://www.pastiebin.com/58cd90f6bce04 18:39:49 tbachman: i dont know if its just one dragon 18:40:12 and for the newton issue, this is the trace: #link https://www.pastiebin.com/58d16e63140f9 18:40:30 we dont have a lead on the newton issue 18:40:46 on the mitaka issue, like i said, i have a workaround, but its not committed in code 18:41:09 the newton issue gets triggered only when we use a heat template to exercise the system 18:41:47 rkukura: and you are looking at this, #link https://www.pastiebin.com/58d01638c9db3 18:42:19 so three DB issues, when the aim drivers are used 18:42:43 just want to make sure everyone is aware in case you hit them 18:42:46 I’ve looked at the log, but that’s about as far as I’ve got. Seems it retries and succeeds. 18:43:02 rkukura: yeah there is no functional issue there 18:43:14 rkukura: i think this might be on account of the expunge that neutron is doing 18:43:50 SumitNaiksatam: Where is that expunge? 18:45:10 rkukura: so i recall that before the extension attributes get processed, the expunge gets call 18:45:13 *called 18:45:41 i am really not comfortable with that expunge but i did not patch it since it was not breaking things in the gate tests 18:45:45 right - I’ll look and see if that could be related 18:45:53 rkukura: just do a grep and you will see it 18:45:59 i think its in a couple of places 18:46:05 it’s weird — how come we don’t see the parent class implementation in delete_policy_target? 18:46:08 (in the trace) 18:46:19 this is the neutron code that i am referring to, not GBP 18:46:24 (the one for stable/mitaka) 18:46:58 tbachman: its difficult to read that trace in the way things get called on account of the decorators involved 18:47:05 yeah 18:47:30 tbachman: you will flip even more if i tell you how the workaround works 18:47:35 :) 18:48:16 anyway, not a very happy place right now! 18:48:28 :( 18:48:36 * tbachman hands SumitNaiksatam a snickers 18:48:40 tbachman: :-) 18:48:55 tbachman: so if you are running tempest tests, be warned 18:49:04 SumitNaiksatam: thx 18:49:22 not that its very helpful just saying that :-) 18:49:44 we might have to pool our collective wisdom at some point to get past these DB issues 18:50:19 alrighty, if not else for today, we can stop here 18:50:34 i wanted to suggest something small 18:50:48 on a much smaller scale :) 18:50:52 annak: yes please, and thanks for all the newton patches! 18:51:05 to move to neutron_lib pep8 factory 18:51:09 instead of neutron 18:51:17 annak: okay 18:51:30 i think that's what we're supposed to do. 18:51:40 annak: sure then we should 18:51:44 ok :) i'll do that 18:51:49 annak: i havent explored that 18:51:53 annak: great, thanks! 18:52:34 annak: did you see any issues using newton with the vmw backend? 18:52:42 i mean GBP newton 18:53:22 the vmw plugin is still very minimal, and no, no GBP-specific issues 18:53:35 lots of backend-specific issues :) 18:53:41 annak: hmmm, okay 18:53:54 annak: we are seeing these issues when we have concurrent operations 18:54:36 I am still far from testing anything under load, but thanks, i'll keep that in mind 18:54:44 annak: so may be if you have a test suite which exercises things in parallel, perhaps you can report back your experience 18:54:47 annak: yeah 18:55:10 okay, thanks all for joining 18:55:11 bye 18:55:22 bye! 18:55:25 #endmeeting