15:00:33 #startmeeting distributed_virtual_router 15:00:34 Meeting started Wed Oct 8 15:00:33 2014 UTC and is due to finish in 60 minutes. The chair is Swami. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:36 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:38 The meeting name has been set to 'distributed_virtual_router' 15:01:14 #info RC1 cut happened last Thursday 15:01:29 If anyone is testing the DVR code please make sure that you are testing the RC1 code. 15:01:46 #topic Bugs 15:02:02 There are couple of high bugs that we are currently working on 15:02:34 #link https://bugs.launchpad.net/neutron/+bug/1377241 15:02:36 Launchpad bug 1377241 in neutron "Lock wait timeout on delete port for DVR" [High,In progress] 15:03:34 I am working on the above bug and I have posted a patch for review. It is still a WIP. Please take a look at it and provide your comments. 15:03:45 sure Swami 15:03:58 #link https://review.openstack.org/#/c/124849/ 15:04:37 viveknarasimhan: yes this has the lock for the csnat delete and also I made a minor change for the gateway-clear to call the csnat-delete only when the gateway is associated with the current network id. 15:05:11 There was some issue with the upstream test_requirements, I could not run tox last night since it was broken, I will check it again today and see if it works. 15:05:53 This lockwait bug exposes other areas where there might be a timing issue. 15:06:09 Swami: unit tests need more setup now 15:06:18 ok Swami 15:06:50 Because of the router_interface_delete and gateway_clear not ordered, there is more timing issues. 15:07:09 This was the reason that we are also seeing the DBDuplicateError for Snat agent binding. 15:07:34 #link https://bugs.launchpad.net/neutron/+bug/1378468 15:07:35 Launchpad bug 1378468 in neutron "DBDuplicateError found sometimes when router_interface_delete issued with DVR" [Undecided,In progress] 15:07:43 Swami: other tests also. as of 10/6 15:08:03 ChuckC: can you elaborate on the setup please. 15:08:31 You need to install postgresql postgresql-contrib postgresql-clien 15:08:50 and postgresql-server-dev-9.3 15:09:18 Should it be done manually 15:09:29 to run tests other than unit tests, more setup is needed, but I think devstack will handle that 15:09:31 will it not pull it automatically from the requirements.txt file 15:09:56 Right now devstack is broken, it is not able to pull all the information. 15:10:12 It was giving me some errors on mysql.conf not found or something. 15:10:25 did you try 15:10:30 run_tests.sh instead of tox? 15:10:33 These are not in requirements.txt, but I think devstack will cover them once a fix merges (let me look) 15:10:53 viveknarasimhan: I just ran tox 15:11:26 viveknarasimhan: last night I ran both, after cleaning up my existing .venv and .tox 15:11:33 viveknarasimhan: but did not succeed. 15:12:17 Ok I will again today by manually installing the packages that chuck mentioned. 15:12:26 Ok 15:12:38 coming back to our discussion. 15:12:46 when you say Tox failure i assume its not the old proxy issue we discussed 15:12:52 For the above bug with DBDuplicateError I have posted a patch. 15:13:08 #link https://review.openstack.org/#/c/126793/ 15:13:29 viveknarasimhan: yes not the old proxy, since i was running it from my home yesterday. 15:13:35 ok 15:13:58 the review is https://review.openstack.org/#/c/126175, but it's for gate, not devstack 15:14:12 In order to work around this timing issue I went back to introducing the hints, and based on the hints, if it is router_interface_action I will not call the schedule_snat. 15:15:22 I am not sure if this is the optimial solution for this timing issue, but in the short term, this helps by preventing any calls to schedule_router. 15:15:53 ok i will look closer 15:16:07 viveknarasimhan: Also ChuckC is working on a patch that Carl Baldwin started for removing the rpc from the delete_port, can you review it. 15:16:16 https://review.openstack.org/#/c/122880 15:16:37 thanks, Swami 15:16:42 i see that is WIP 15:16:45 ChuckC: You might have to create a bug on the launchpad for this issue, I did see that it did not have a bug id on it. 15:16:47 but, yes i will review that as well 15:17:11 Swami: I don't really have context for submitting a bug (I don't know the symptoms) 15:17:31 ChuckC: no problem 15:17:41 I need some help here, since I'll need to test the fix somehow also 15:17:48 I will file a bug on that for splitting the rpc from db transactions for delete_port 15:18:03 Once I file the bug I will let you know. 15:18:12 thanks Swami 15:18:35 By the way last couple of days I have been testing your patch along with my fix for the DB lockwait timeout, I did not see any issues. 15:18:39 viveknarasimhan: I don't think it's WIP any more 15:19:17 Swami: great! 15:19:27 ChuckC: Ok , will review it 15:19:40 viveknarasimhan: With both my patches that I mentioned above, I still see some errors related to dhcp when I run the clean up script. 15:19:49 I sent you an email about this. 15:20:21 There is a "KeyError network_id" in dhcp_rpc.py. I already filed a bug upstream on this, and someone is working on it. 15:20:49 #link https://bugs.launchpad.net/bugs/1378508 15:20:50 Launchpad bug 1378508 in neutron "KeyError in DHPC RPC when port_update happens.- this is seen when a delete_port event occurs " [Undecided,New] 15:20:57 I couldn't spend time on the DHCP bug today 15:21:10 due to interview calls and VLAN/FLAT bridge overlapping troubleshooting 15:21:12 This is not related to DVR so I did not tag it as l3-dvr-backlog. 15:21:22 i will look at both your reviews 15:21:34 but DHCP might be a different problem you have uncovered 15:21:40 unrelated to the patch.. 15:21:48 Also I do see some "SAWarning" messages in the logs 15:22:03 But these are not critical. 15:22:25 But I am not sure what are the side effects of these warning messages, 15:23:12 I also see some log messages such as "Will not send event port_delete_end for network 3db2c093-c033-4b18-860b-2691892aaea7: no agent available. Payload: {'port_id': u'c7f99dd3-1523-4128-85d2-ec15ff0f7f2e'}" 15:24:35 those are audit logging messages 15:24:37 All these are related to "timing" issues that is my hunch 15:24:50 the xxx_delete_start and xxx_delete_end where xxx represents the resource 15:25:03 Why those messages are shown as red.(error) 15:25:44 These are not seen on all logs, but it occurs sometimes. 15:26:12 Like yesterday I ran the regression testing for clean up around 20 times and I did see these errors couple of times. 15:27:40 getting back to the bugs 15:27:48 #link https://bugs.launchpad.net/neutron/+bug/1376325 15:27:50 Launchpad bug 1376325 in neutron "Cannot enable DVR and IPv6 simultaneously" [Medium,New] 15:28:00 This bug is related to IPv6 configuration with DVR 15:28:19 we did not promote support of IPv6 15:28:27 so i am not sure why this is considered bug 15:28:36 this would be a feature 15:28:42 Upstream does support IPv6, but our l3-agent kind of does not like it. 15:28:43 I was just thinking about that one, and if there was an easy fix to add the ip rule correctly 15:28:57 haleyb: hi 15:29:23 Yes this week we don't have rajeev and mike, they are on vaccation,so we are limited by resources 15:29:52 I will ask rajeev to take a look at it. 15:30:20 hi swami. i'll see if i have time to look further, it's obvious what is broken, just not sure if a simple fix will get it working. I'm not looking at optimizing things 15:30:25 haleyb: I am not sure about the level of that bug. 15:31:13 haleyb: sure take a look at it and get an estimate on that work, 15:31:22 medium or low, armando had marked as medium 15:31:31 Once we wrap up the critical ones we can come back to these backlog items. 15:32:33 I think that's all I had for the bugs 15:33:09 Do you guys have any other topic to discuss today. 15:33:31 #topic DVR Backlog 15:33:57 viveknarasimhan: I have not filed bugs for all the backlog items yet 15:34:11 Please file the bugs 15:34:12 viveknarasimhan: I will do it today as per our earlier discussion 15:34:32 sure.. tag them as l3-dvr-backlog only if failure happens with DVR enabled 15:34:37 viveknarasimhan: Once I file the bugs I will let you know and if I have missed anything you can let me know. 15:35:31 #topic DVR-Documentation 15:35:47 There are couple of bugs filed for DVR-Documentation. 15:35:54 ok sure 15:36:09 Armando wanted us to work on those after we complete the bugs for the Juno RC. 15:36:35 I have asked Vinod to take a look at it and it seems Vinod was ok with starting with one of the doc bugs. 15:36:53 I have also spoken to Edgar on this to get the ownership of the DVR documentation 15:37:19 Vivek, may be if you and me can help Vinod we can complete the documentation for the DVR. 15:37:29 i had a discussion with Vinod today 15:37:38 he will be able to address doc bugs for L2 15:37:55 carl_baldwin: hi 15:37:57 but for l3 extensions, we felt it will be good if you could guide 15:38:19 sure, I can help vinod if he have any questions 15:38:52 Let me see how these bugs end up and then take up the doc work. 15:39:23 #topic Open Discussion 15:39:26 Swami: sorry to be late. 15:39:36 carl_baldwin: no worries 15:39:45 I just went over the bugs. 15:39:51 carl_baldwin: Swami has volunteered to submit a bug for 122880 15:39:57 Most of our discussion was with the DB lockwait timeout issue. 15:40:33 Swami: 122880 didn’t help with that in your testing, right? 15:40:47 carl_baldwin: Yes it did not help 15:40:57 But it is good to have it. 15:41:23 Adding a lock to the transaction that does the "csnat_port_delete" helps a bit. 15:41:41 I have posted a couple of patches upstream as WIP for review. 15:42:06 #link https://review.openstack.org/#/c/124849/ 15:42:20 #link https://review.openstack.org/#/c/126793/ 15:42:40 With both these patches I don't see the "Internal Server Error" any more. 15:43:00 Swami: Have you seen my latest comment on https://review.openstack.org/#/c/105855/ ? 15:43:04 Please take a look at it and let me know your thoughts. 15:43:22 Swami: I’ll have a look. 15:44:08 carl_baldwin: no I have been heads down on the lockwait issue, I will take a look at it. 15:44:22 Maybe it can wait until mrsmith is back. 15:44:55 no problem I will check it out and fix it, if it is minor one. 15:45:29 i have question for bhailey: 15:45:30 I’m not sure it is minor but you can probably judge that better. 15:45:52 viveknarasimhan: i'm here 15:45:52 viveknarasimhan: do you mean haleyb ? 15:46:02 yes 15:46:07 for review 123911 15:47:00 i see PS11 came back full circle to be PS8 15:47:42 haleyb: please let us know the self.dvr_agent attribute error issue not popping out 15:47:44 Yes, removing that self.dvr_agent check broke the tests, since it immediately calls _report_state() and blows up 15:47:50 on report_state running earlier than rpc_loop 15:48:17 also pls feel free to let us know 15:48:25 any help that might be required on 123911 15:48:57 Any other topics 15:49:14 if not we can end this meeting 15:49:15 viveknarasimhan: i think it's ready to go now, if you just want to review and give a +1/-1 as needed 15:49:26 PS 11 looks ok to me 15:49:34 will give +1 later today 15:49:51 can you post the link in here for reference. 15:50:10 https://review.openstack.org/#/c/123911/8..11/neutron/plugins/openvswitch/agent/ovs_dvr_neutron_agent.py 15:50:25 viveknarasimhan: thanks 15:51:06 thanks everyone for joining the meeting 15:51:14 see you all next week 15:51:17 bye 15:51:20 good day swami, carl, brian , chuck 15:51:25 #endmeeting