15:00:33 <Swami> #startmeeting distributed_virtual_router
15:00:34 <openstack> Meeting started Wed Oct  8 15:00:33 2014 UTC and is due to finish in 60 minutes.  The chair is Swami. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:36 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:38 <openstack> The meeting name has been set to 'distributed_virtual_router'
15:01:14 <Swami> #info RC1 cut happened last Thursday
15:01:29 <Swami> If anyone is testing the DVR code please make sure that you are testing the RC1 code.
15:01:46 <Swami> #topic Bugs
15:02:02 <Swami> There are couple of high bugs that we are currently working on
15:02:34 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1377241
15:02:36 <uvirtbot> Launchpad bug 1377241 in neutron "Lock wait timeout on delete port for DVR" [High,In progress]
15:03:34 <Swami> I am working on the above bug and I have posted a patch for review. It is still a WIP. Please take a look at it and provide your comments.
15:03:45 <viveknarasimhan> sure Swami
15:03:58 <Swami> #link https://review.openstack.org/#/c/124849/
15:04:37 <Swami> viveknarasimhan: yes this has the lock for the csnat delete and also I made a minor change for the gateway-clear to call the csnat-delete only when the gateway is associated with the current network id.
15:05:11 <Swami> There was some issue with the upstream test_requirements, I could not run tox last night since it was broken, I will check it again today and see if it works.
15:05:53 <Swami> This lockwait bug exposes other areas where there might be a timing issue.
15:06:09 <ChuckC> Swami: unit tests need more setup now
15:06:18 <viveknarasimhan> ok Swami
15:06:50 <Swami> Because of the router_interface_delete and gateway_clear not ordered, there is more timing issues.
15:07:09 <Swami> This was the reason that we are also seeing the DBDuplicateError for Snat agent binding.
15:07:34 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1378468
15:07:35 <uvirtbot> Launchpad bug 1378468 in neutron "DBDuplicateError found sometimes when router_interface_delete issued with DVR" [Undecided,In progress]
15:07:43 <ChuckC> Swami: other tests also.  as of 10/6
15:08:03 <Swami> ChuckC: can you elaborate on the setup please.
15:08:31 <ChuckC> You need to install postgresql postgresql-contrib postgresql-clien
15:08:50 <ChuckC> and postgresql-server-dev-9.3
15:09:18 <Swami> Should it be done manually
15:09:29 <ChuckC> to run tests other than unit tests, more setup is needed, but I think devstack will handle that
15:09:31 <Swami> will it not pull it automatically from the requirements.txt file
15:09:56 <Swami> Right now devstack is broken, it is not able to pull all the information.
15:10:12 <Swami> It was giving me some errors on mysql.conf not found or something.
15:10:25 <viveknarasimhan> did you try
15:10:30 <viveknarasimhan> run_tests.sh instead of tox?
15:10:33 <ChuckC> These are not in requirements.txt, but I think devstack will cover them once a fix merges (let me look)
15:10:53 <ChuckC> viveknarasimhan: I just ran tox
15:11:26 <Swami> viveknarasimhan: last night I ran both, after cleaning up my existing .venv and .tox
15:11:33 <Swami> viveknarasimhan: but did not succeed.
15:12:17 <Swami> Ok I will again today by manually installing the packages that chuck mentioned.
15:12:26 <viveknarasimhan> Ok
15:12:38 <Swami> coming back to our discussion.
15:12:46 <viveknarasimhan> when you say Tox failure i assume its not the old proxy issue we discussed
15:12:52 <Swami> For the above bug with DBDuplicateError I have posted a patch.
15:13:08 <Swami> #link https://review.openstack.org/#/c/126793/
15:13:29 <Swami> viveknarasimhan: yes not the old proxy, since i was running it from my home yesterday.
15:13:35 <viveknarasimhan> ok
15:13:58 <ChuckC> the review is https://review.openstack.org/#/c/126175, but it's for gate, not devstack
15:14:12 <Swami> In order to work around this timing issue I went back to introducing the hints, and based on the hints, if it is router_interface_action I will not call the schedule_snat.
15:15:22 <Swami> I am not sure if this is the optimial solution for this timing issue, but in the short term, this helps by preventing any calls to schedule_router.
15:15:53 <viveknarasimhan> ok i will look closer
15:16:07 <Swami> viveknarasimhan: Also ChuckC is working on a patch that Carl Baldwin started for removing the rpc from the delete_port, can you review it.
15:16:16 <ChuckC> https://review.openstack.org/#/c/122880
15:16:37 <ChuckC> thanks, Swami
15:16:42 <viveknarasimhan> i see that is WIP
15:16:45 <Swami> ChuckC: You might have to create a bug on the launchpad for this issue, I did see that it did not have a bug id on it.
15:16:47 <viveknarasimhan> but, yes i will review that as well
15:17:11 <ChuckC> Swami: I don't really have context for submitting a bug (I don't know the symptoms)
15:17:31 <Swami> ChuckC: no problem
15:17:41 <ChuckC> I need some help here, since I'll need to test the fix somehow also
15:17:48 <Swami> I will file a bug on that for splitting the rpc from db transactions for delete_port
15:18:03 <Swami> Once I file the bug I will let you know.
15:18:12 <ChuckC> thanks Swami
15:18:35 <Swami> By the way last couple of days I have been testing your patch along with my fix for the DB lockwait timeout, I did not see any issues.
15:18:39 <ChuckC> viveknarasimhan: I don't think it's WIP any more
15:19:17 <ChuckC> Swami: great!
15:19:27 <viveknarasimhan> ChuckC: Ok , will review it
15:19:40 <Swami> viveknarasimhan: With both my patches that I mentioned above, I still see some errors related to dhcp when I run the clean up script.
15:19:49 <Swami> I sent you an email about this.
15:20:21 <Swami> There is a "KeyError network_id" in dhcp_rpc.py. I already filed a bug upstream on this, and someone is working on it.
15:20:49 <Swami> #link https://bugs.launchpad.net/bugs/1378508
15:20:50 <uvirtbot> Launchpad bug 1378508 in neutron "KeyError in DHPC RPC when port_update happens.- this is seen when a delete_port event occurs " [Undecided,New]
15:20:57 <viveknarasimhan> I couldn't spend time on the DHCP bug today
15:21:10 <viveknarasimhan> due to interview calls and VLAN/FLAT bridge overlapping troubleshooting
15:21:12 <Swami> This is not related to DVR so I did not tag it as l3-dvr-backlog.
15:21:22 <viveknarasimhan> i will look at both your reviews
15:21:34 <viveknarasimhan> but DHCP might be a different problem you have uncovered
15:21:40 <viveknarasimhan> unrelated to the patch..
15:21:48 <Swami> Also I do see some "SAWarning" messages in the logs
15:22:03 <Swami> But these are not critical.
15:22:25 <Swami> But I am not sure what are the side effects of these warning messages,
15:23:12 <Swami> I also see some log messages such as "Will not send event port_delete_end for network 3db2c093-c033-4b18-860b-2691892aaea7: no agent available. Payload: {'port_id': u'c7f99dd3-1523-4128-85d2-ec15ff0f7f2e'}"
15:24:35 <viveknarasimhan> those are audit logging messages
15:24:37 <Swami> All these are related to "timing" issues that is my hunch
15:24:50 <viveknarasimhan> the xxx_delete_start and xxx_delete_end where xxx represents the resource
15:25:03 <Swami> Why those messages are shown as red.(error)
15:25:44 <Swami> These are not seen on all logs, but it occurs sometimes.
15:26:12 <Swami> Like yesterday I ran the regression testing for clean up around 20 times and I did see these errors couple of times.
15:27:40 <Swami> getting back to the bugs
15:27:48 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1376325
15:27:50 <uvirtbot> Launchpad bug 1376325 in neutron "Cannot enable DVR and IPv6 simultaneously" [Medium,New]
15:28:00 <Swami> This bug is related to IPv6 configuration with DVR
15:28:19 <viveknarasimhan> we did not promote support of IPv6
15:28:27 <viveknarasimhan> so i am not sure why this is considered bug
15:28:36 <viveknarasimhan> this would be a feature
15:28:42 <Swami> Upstream does support IPv6, but our l3-agent kind of does not like it.
15:28:43 <haleyb> I was just thinking about that one, and if there was an easy fix to add the ip rule correctly
15:28:57 <Swami> haleyb: hi
15:29:23 <Swami> Yes this week we don't have rajeev and mike, they are on vaccation,so we are limited by resources
15:29:52 <Swami> I will ask rajeev to take a look at it.
15:30:20 <haleyb> hi swami.  i'll see if i have time to look further, it's obvious what is broken, just not sure if a simple fix will get it working.  I'm not looking at optimizing things
15:30:25 <Swami> haleyb: I am not sure about the level of that bug.
15:31:13 <Swami> haleyb: sure take a look at it and get an estimate on that work,
15:31:22 <haleyb> medium or low, armando had marked as medium
15:31:31 <Swami> Once we wrap up the critical ones we can come back to these backlog items.
15:32:33 <Swami> I think that's all I had for the bugs
15:33:09 <Swami> Do you guys have any other topic to discuss today.
15:33:31 <Swami> #topic DVR Backlog
15:33:57 <Swami> viveknarasimhan: I have not filed bugs for all the backlog items yet
15:34:11 <viveknarasimhan> Please file the bugs
15:34:12 <Swami> viveknarasimhan: I will do it today as per our earlier discussion
15:34:32 <viveknarasimhan> sure.. tag them as l3-dvr-backlog only if failure happens with DVR enabled
15:34:37 <Swami> viveknarasimhan: Once I file the bugs I will let you know and if I have missed anything you can let me know.
15:35:31 <Swami> #topic DVR-Documentation
15:35:47 <Swami> There are couple of bugs filed for DVR-Documentation.
15:35:54 <viveknarasimhan> ok sure
15:36:09 <Swami> Armando wanted us to work on those after we complete the bugs for the Juno RC.
15:36:35 <Swami> I have asked Vinod to take a look at it and it seems Vinod was ok with starting with one of the doc bugs.
15:36:53 <Swami> I have also spoken to Edgar on this to get the ownership of the DVR documentation
15:37:19 <Swami> Vivek, may be if you and me can help Vinod we can complete the documentation for the DVR.
15:37:29 <viveknarasimhan> i had a discussion with Vinod today
15:37:38 <viveknarasimhan> he will be able to address doc bugs for L2
15:37:55 <Swami> carl_baldwin: hi
15:37:57 <viveknarasimhan> but for l3 extensions, we felt it will be good if you could guide
15:38:19 <Swami> sure, I can help vinod if he have any questions
15:38:52 <Swami> Let me see how these bugs end up and then take up the doc work.
15:39:23 <Swami> #topic Open Discussion
15:39:26 <carl_baldwin> Swami: sorry to be late.
15:39:36 <Swami> carl_baldwin: no worries
15:39:45 <Swami> I just went over the bugs.
15:39:51 <ChuckC> carl_baldwin: Swami has volunteered to submit a bug for 122880
15:39:57 <Swami> Most of our discussion was with the DB lockwait timeout issue.
15:40:33 <carl_baldwin> Swami: 122880 didn’t help with that in your testing, right?
15:40:47 <Swami> carl_baldwin: Yes it did not help
15:40:57 <Swami> But it is good to have it.
15:41:23 <Swami> Adding a lock to the transaction that does the "csnat_port_delete" helps a bit.
15:41:41 <Swami> I have posted a couple of patches upstream as WIP for review.
15:42:06 <Swami> #link https://review.openstack.org/#/c/124849/
15:42:20 <Swami> #link https://review.openstack.org/#/c/126793/
15:42:40 <Swami> With both these patches I don't see the "Internal Server Error" any more.
15:43:00 <carl_baldwin> Swami: Have you seen my latest comment on https://review.openstack.org/#/c/105855/ ?
15:43:04 <Swami> Please take a look at it and let me know your thoughts.
15:43:22 <carl_baldwin> Swami: I’ll have a look.
15:44:08 <Swami> carl_baldwin: no I have been heads down on the lockwait issue, I will take a look at it.
15:44:22 <carl_baldwin> Maybe it can wait until mrsmith is back.
15:44:55 <Swami> no problem I will check it out and fix it, if it is minor one.
15:45:29 <viveknarasimhan> i have question for bhailey:
15:45:30 <carl_baldwin> I’m not sure it is minor but you can probably judge that better.
15:45:52 <haleyb> viveknarasimhan: i'm here
15:45:52 <carl_baldwin> viveknarasimhan: do you mean haleyb ?
15:46:02 <viveknarasimhan> yes
15:46:07 <viveknarasimhan> for review 123911
15:47:00 <viveknarasimhan> i see PS11  came back full circle to be PS8
15:47:42 <viveknarasimhan> haleyb: please let us know the self.dvr_agent attribute error issue not popping out
15:47:44 <haleyb> Yes, removing that self.dvr_agent check broke the tests, since it immediately calls _report_state() and blows up
15:47:50 <viveknarasimhan> on report_state running earlier than rpc_loop
15:48:17 <viveknarasimhan> also pls feel free to let us know
15:48:25 <viveknarasimhan> any help that might be required on 123911
15:48:57 <Swami> Any other topics
15:49:14 <Swami> if not we can end this meeting
15:49:15 <haleyb> viveknarasimhan: i think it's ready to go now, if you just want to review and give a +1/-1 as needed
15:49:26 <viveknarasimhan> PS 11 looks ok to me
15:49:34 <viveknarasimhan> will give +1 later today
15:49:51 <Swami> can you post the link in here for reference.
15:50:10 <viveknarasimhan> https://review.openstack.org/#/c/123911/8..11/neutron/plugins/openvswitch/agent/ovs_dvr_neutron_agent.py
15:50:25 <Swami> viveknarasimhan: thanks
15:51:06 <Swami> thanks everyone for joining the meeting
15:51:14 <Swami> see you all next week
15:51:17 <Swami> bye
15:51:20 <viveknarasimhan> good day swami, carl, brian , chuck
15:51:25 <Swami> #endmeeting