15:01:46 #startmeeting neutron_dvr 15:01:47 Meeting started Wed Sep 14 15:01:46 2016 UTC and is due to finish in 60 minutes. The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:01:48 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:01:51 The meeting name has been set to 'neutron_dvr' 15:01:54 #chair Swami 15:01:55 Current chairs: Swami haleyb 15:02:24 #topic Announcements 15:03:03 RC1 is this week 15:03:16 haleyb: great. 15:04:20 yes, we've come a long way i think, lots of changes, bug count decreasing 15:04:41 that said, we still have a few bugs to try and get fixed 15:05:06 haleyb: agree 15:05:17 #topic Bugs 15:05:26 haleyb: thanks 15:05:35 There is no new bugs filed this week. 15:05:51 But some of the bugs have been closed recently. 15:06:21 Let us go over the list provided by John here. 15:06:24 https://bugs.launchpad.net/neutron/+bug/1580648 15:06:27 Launchpad bug 1580648 in neutron "Two HA routers in master state during functional test" [Undecided,Opinion] 15:07:01 There was a change merged that i thought fixed this, but apparently not 15:07:04 There is a patch that has merged, but John mentioned that would fix the problem and I did see a note by Ann that he could still reproduce this issue. 15:07:24 So this is still in plate and to be watched out for. 15:07:48 I've checked this bug today, it is still reproduced with latest code, although I consider this as keepalived limitation 15:08:04 I put a comment on the bug report 15:08:25 akamyshnikova__: so do you want to close the bug or do some more triaging. 15:09:16 akamyshnikova__: let me know your thoughts on this, while we move forward. 15:09:19 akamyshnikova__: is it something fixed in a later keepalived version? 15:10:44 Swami, I can look through keepalived last changes if it is fixed, but I'm not sure. 15:11:06 akamyshnikova__: ok thanks, let us wait then. 15:11:11 The next in the list is 15:11:13 https://bugs.launchpad.net/neutron/+bug/1602614 15:11:14 Launchpad bug 1602614 in neutron "DVR + L3 HA loss during failover is higher that it is expected" [High,Fix released] - Assigned to Carl Baldwin (carl-baldwin) 15:11:32 This bug has been fixed with the l2pop patch that merged recently. 15:11:43 Thanks to anilvenkata for his efforts. 15:12:36 The next one in the list is 15:12:39 #link https://bugs.launchpad.net/neutron/+bug/1602320 15:12:40 Launchpad bug 1602320 in neutron "ha + distributed router: keepalived process kill vrrp child process" [Medium,Fix released] - Assigned to He Qing (tsinghe-7) 15:13:01 This patch had also merged recently and so this bug can be marked as fixed. 15:13:25 These are all the updates from the L3+HA+DVR team. 15:14:25 Now let us come back to the existing bugs and let us talk about the critical ones. 15:14:41 #link https://bugs.launchpad.net/neutron/+bug/1612192 15:14:41 Launchpad bug 1612192 in neutron "L3 DVR: Unable to complete operation on subnet" [Critical,Confirmed] 15:15:15 haleyb: last week you mentioned if we can drop the severity on this bug. Do you have any input on this. 15:15:57 Swami: i am going to lower, that string has not been seen by logstash since 9/8 15:16:23 haleyb: ok, that would be great. 15:16:26 haleyb: thanks 15:16:49 Does the same thing apply for this bug too. 15:16:52 #link https://bugs.launchpad.net/neutron/+bug/1612804 15:16:53 Launchpad bug 1612804 in neutron "test_shelve_instance fails with sshtimeout" [Critical,Confirmed] 15:17:53 looking 15:18:31 still see some 15:20:00 ok, then we should still get to see what is causing these tests to fail. Are these test failures only seen in the multinode job or also on the single node check jobs. 15:20:30 gate-tempest-dsvm-multinode-full-ubuntu-xenial and some others 15:20:45 haleyb: thanks 15:20:59 haleyb: so let us monitor this bug and find out the root cause. 15:21:04 some have nothing to do with neutron, but if we're not working it could be seen 15:21:38 i will see if it's in something other than the check queue 15:21:50 haleyb: ok, 15:22:06 haleyb: I knew that these are some of the vulnerable tests that have failed in the past. 15:22:25 haleyb: But so far no clues on why it is failing. 15:22:53 eliminating check queue dropped to 9 in past two days 15:23:26 haleyb: good that it dropped. 15:23:27 all others in experimental queue 15:24:26 haleyb: ok 15:24:45 #link https://bugs.launchpad.net/neutron/+bug/1593354 15:24:46 Launchpad bug 1593354 in neutron "SNAT HA failed because of missing nat rule in snat namespace iptable" [Undecided,New] 15:25:16 John has already triaged this bug and have mentioned that it is not reproduceable in Master but the bug was reported in mitaka. 15:25:42 I don't think john had triaged this in mitaka yet. So let us wait for his input on this. 15:26:31 #link https://bugs.launchpad.net/neutron/+bug/1571676 15:26:32 Launchpad bug 1571676 in neutron "After binding a floating IP to VM, the static route can't work in DVR." [Undecided,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:26:43 I just reopened this bug since it was outdated. 15:26:48 I do have a patch for review. 15:27:14 #link https://review.openstack.org/#/c/308068/ 15:28:23 haleyb: can you take a look at it. 15:28:55 Swami: yes, will take a look, i remember we couldn't merge before because >1 tenant shares the namespace 15:29:27 haleyb: yes you are right. Now each router has its own routing table and the extra routes will be addressed in that table. 15:30:09 I initially wanted to have it on top of the fast-path exit, then I decided to move to on top so that it can be backported. 15:30:37 the next one in the list is. 15:30:40 #link https://bugs.launchpad.net/neutron/+bug/1619312 15:30:42 Launchpad bug 1619312 in neutron "dvr: can't migrate legacy router to DVR" [High,Confirmed] - Assigned to Brian Haley (brian-haley) 15:31:20 i got some good info from armando and kevin on a direction, will work on patch today 15:31:35 has to do with transaction guard code merged a little while back 15:31:58 haleyb: yes I did see your discussion with kevin on this guarded transact 15:32:08 s/transact/transaction 15:32:49 #link https://bugs.launchpad.net/neutron/+bug/1577488 15:32:50 Launchpad bug 1577488 in neutron "[RFE]"Fast exit" for compute node egress flows when using DVR" [Wishlist,In progress] - Assigned to Swaminathan Vasudevan (swaminathan-vasudevan) 15:33:18 haleyb: the patches for this RFE is ready for review. 15:33:44 #link https://review.openstack.org/#/c/283757/ ( Server side patch) 15:34:04 #link https://review.openstack.org/#/c/355062/ - Agent side patch. 15:34:12 Swami: yes, will look 15:34:26 haleyb: thanks 15:34:53 That's all I had for bugs today. 15:35:42 Any other bugs from anyone? 15:36:51 #topic Gate failures 15:38:37 so there were some fixes recently in a couple of areas that seemed to have helped the dvr jobs 15:38:46 multiple in the dhcp-agent 15:39:01 haleyb: has all the fixes in the dhcp merged. 15:39:35 but a really ugly one in the infra code that could have caused two tunnels with the same VNI to be configured - one overcloud, one under 15:39:52 haleyb: that is not good. 15:40:06 Swami: a lot of minor dhcp changes have merged, so exceptions are down 15:41:12 Swami: yes, the VNI one was interesting and would have randomly affected multinode jobs where we saw a dhcp failure, packet was sometimes going the wrong way from what i saw in the review 15:41:31 haleyb: thanks 15:42:02 haleyb: how far are we from making the multinode job voting 15:42:40 http://grafana.openstack.org/dashboard/db/neutron-failure-rate shows the latest, but i'm still confused as to why there are multiple jobs with the same name, think its the xenial change-over 15:43:48 Swami: the dvr-multinode failure rate is ~5% in the gate queue now, that might still be high to change it to voting 15:44:25 haleyb: yes 15:45:09 haleyb: may be if we need to prioritize the bugs we need to prioritize on the basis of gate failures and so we can make it voting soon. 15:46:25 Swami: yes, we just need another look at the failures to see if it's really a dvr issue 15:47:26 haleyb: make sense 15:48:10 Swami: "we" means you or me typically too :) 15:48:29 haleyb: yes understood. 15:49:01 haleyb: since I am done with the fast-path-exit I might have some bandwidth to help you. Please let me know. 15:50:08 Swami: if you want to go through logstash looking for a gate failure on that job it would help 15:50:41 haleyb: sure will do 15:51:39 #topic Stable backports 15:52:14 https://review.openstack.org/#/c/363970/ 15:52:23 More of a note - keep tagging things as backport potential, or just cherry-pick 15:52:31 haleyb: need a +2 on this and related one for liberty. 15:52:44 haleyb: sure will do 15:52:56 Swami: will look, hadn't seen that one 15:53:08 haleyb: thanks 15:53:43 #topic Open discussion 15:53:59 haleyb: not related to neutron backport, but the nova migration patch that just got merged is a candidate for mitaka backport and I have the patch up for review. 15:54:15 https://review.openstack.org/#/c/367646/ 15:54:52 Swami: had we already packported the neutron change? 15:54:57 There was a comment in there about the requirement to backport. If possible can you add in your comment. 15:55:23 haleyb: I though neutron change merged in mitaka. 15:55:29 s/though/thought 15:55:40 it was so long ago i can't remember... 15:55:47 ;-) 15:56:08 Yes it was merged at the end of mitaka cycle, but we could'nt merge the nova patch and it just got merged. 15:56:43 great 15:57:46 That's all i had, if there's anything you need to target at RC1 just ping a core to help out 15:58:51 #endmeeting