15:02:52 <haleyb> #startmeeting neutron_l3
15:02:53 <openstack> Meeting started Thu Sep 21 15:02:52 2017 UTC and is due to finish in 60 minutes.  The chair is haleyb. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:02:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:02:56 <Swami> hi
15:02:57 <openstack> The meeting name has been set to 'neutron_l3'
15:03:06 <haleyb> #chair Swami
15:03:07 <openstack> Current chairs: Swami haleyb
15:04:16 <haleyb> #topic Announcements
15:04:56 <haleyb> Hope people had a productive PTG and have recovered by now
15:05:16 <Swami> mlavalle will not be able to attend today's meeting, since he has a conflict
15:05:20 <Swami> haleyb: sure
15:06:33 <Swami> haleyb: I hope mlavalle might have sent out a report of the PTG update
15:07:04 <haleyb> i don't remember seeing it yet, i'll look again
15:07:38 <Swami> haleyb: I thought mlavalle mentioned that he would send it out in a day or two. Sorry I have not seen either.
15:07:41 <haleyb> I guess the one thing we got out of the PTG was more L3 bugs, at least in all these corner cases of router migration and such
15:08:42 <haleyb> I had no other announcements, might as well move to bugs
15:08:48 <haleyb> #topic Bugs
15:09:14 <Swami> haleyb: thanks, let us go over the dvr bugs
15:09:36 <Swami> haleyb: yes agreed
15:10:19 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1718585
15:10:21 <openstack> Launchpad bug 1718585 in neutron "set floatingip status to DOWN during creation" [Undecided,Opinion] - Assigned to venkata anil (anil-venkata)
15:10:43 <Swami> This bug has been filed by anil-venkata
15:11:13 <Swami> It seems that he is asking to change the behavior of floatingip status report.
15:11:14 <haleyb> i thought we set status to ERROR by default, only ACTIVE if succeedd
15:11:46 <Swami> haleyb: That was my opinion, but I need to recheck and I have not payed attention to the status.
15:12:18 <Swami> haleyb: But in the case of migration of floatingIP or router associated with a floatingip keeping a new status for floatingip would be tedious.
15:13:11 <Swami> We always say that when a VM migrates or a floatingIP migrates there should not be any down time, then why do we need to change the state.
15:14:09 <Swami> I agree that during the initial floatingIP setup there should be some state that determines, if it is ready to be consumed or not
15:14:12 <haleyb> yeah, we should only change on target host if it failed, otherwise the state could be flakey if new updated before old host
15:14:46 <haleyb> since one could be tearing down while other is building
15:14:49 <Swami> haleyb: but the floatingip state is not tied to host.
15:15:42 <haleyb> but the agent reports the state, that's what i'm getting at
15:16:24 <Swami> haleyb: There are currently three floatingIP states and I did see an additional state that is defined in the agent as 'NOCHANGE'
15:17:56 <Swami> haleyb: may be we can see what makes sense to handle all these timing issues with floatingip.
15:18:04 <Swami> More discussion needed on this.
15:18:31 <Swami> The next one in the list is
15:18:34 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1718345
15:18:35 <openstack> Launchpad bug 1718345 in neutron "ml2_distributed_port_bindings not cleared after migration from DVR" [Undecided,New]
15:19:27 <Swami> I have to check the code path to see why the ml2 port binding is not being cleared when the router is migrated.
15:20:02 <Swami> The port binding is actually done when ensure_port_binding is called.
15:20:17 <Swami> But we need to see if the router migration takes a similar path or not.
15:20:25 <haleyb> looks like a bug, since we found all these other cases with router ports i'm not surprised to find something else
15:20:48 <Swami> The original design was to move the legacy to dvr and not to move the dvr to legacy. So there may be some corner cases here.
15:21:08 <Swami> which we have not addressed.
15:21:27 <haleyb> agreed, we've just never noticed since noone typically does this
15:21:30 <Swami> I will triage this and see what is missing in here.
15:22:00 <haleyb> great, thanks
15:22:15 <Swami> The next in the list is
15:22:18 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1717302
15:22:19 <openstack> Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [Undecided,New]
15:23:30 <Swami> I posted a patch to address this issue
15:23:45 <Swami> #link https://review.openstack.org/#/c/505324/
15:24:11 <Swami> but still I was seeing a couple of tests failing for east-west communication.
15:24:18 <haleyb> Can you add a closes-bug to that next update?  to tie it to the bug?
15:24:45 <Swami> haleyb: This is not directly related to this bug, this was a migration patch that I included to test the case here.
15:25:01 <Swami> haleyb: But still I am seeing east-west with fip failing.
15:25:07 <Swami> haleyb: not sure what is causing this.
15:25:35 <Swami> haleyb: The log trace still shows that an IP cannot be assigned to qg- interface and does not exist.
15:26:07 <haleyb> Swami: i will look at the dvr scheduler part of that change again, think that's where i didn't look as close
15:26:17 <haleyb> but the qg- error is still strange
15:26:34 <Swami> Also the unknown factor here is, in the logs, I can see that it is trying to ARP on an IP for qg- interface in the qrouter namespace. I am not sure if this is log noise or something related to the failure.
15:27:07 <reedip_> hi
15:27:18 <haleyb> reedip_: hi
15:27:25 <Swami> reedip_: hi
15:27:42 <Swami> haleyb: we will keep debuging this issue with the patch.
15:27:56 <reedip_> just joining in , please continue, I will put up my point in an open discussion if possible :)
15:28:10 <Swami> reedip_: we are discussing the dvr bugs.
15:28:22 <Swami> reedip_: If you have one related to the FWaaS, you can post it now
15:28:58 <reedip_> Swami : yep , but I think its more related to the FWaaS than DVR actually .
15:29:36 <Swami> ok I will go ahead and bring in that bug for discussion
15:29:39 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1716401
15:29:40 <openstack> Launchpad bug 1716401 in neutron "FWaaS: Ip tables rules do not get updated in case of distributed virtual routers (DVR)" [Undecided,New] - Assigned to Reedip (reedip-banerjee)
15:29:51 <reedip_> oh heheheh :)
15:29:55 <reedip_> yeah thats the one :)
15:30:01 <Swami> reedip_: can you discuss it now
15:30:35 <reedip_> I understand that FWaaS still hasnt looped in the HA part of a router, so its a separate discussion :)
15:31:00 <Swami> reedip_: I think I had read through the bug description and add in my comments. I also wanted to talk to sridhar during the PTG, but he was bussy.
15:31:18 <Swami> reedip_: let us keep the ha part apart and work with non-ha first.
15:31:33 <reedip_> Swami : well we can discuss it now :)
15:32:23 <Swami> reedip_: go ahead
15:32:51 <Swami> reedip_: if you can point me in the bug, where your code is handling the router_update scenario, then I can check it out.
15:32:57 <reedip_> Swami : as far as I know, the DVR code included the DVR and DVR_SNAT option for deployment in the Compute and the Controller node .
15:33:15 <reedip_> I do not know much about DVR, so just starting to look at it from the bug;s point of view
15:33:34 <reedip_> but FWaaS has not considered DVR_SNAT as of now
15:33:57 <Swami> reedip_: I am sure when we originally designed this both was considered.
15:34:40 <Swami> reedip_: in the case of floatingip either residing in dvr_snat node or in dvr node, the rules were configured on the 'rfp' port of the router namespace and cleared when floatingip was removed.
15:35:20 <Swami> reedip_: this is only for north-south. In the case of dvr_snat you additionally need to setup the rules in the qg- interface on the snat_namespace.
15:35:34 <reedip_> Swami : ok ...
15:36:29 <Swami> reedip_: The only behavior change that we introduced in dvr is creating the fipnamespace along with the gateway create on external network.
15:36:32 <reedip_> Swami : I need to read the arch for DVR, so maybe I can put this issue up later ? I will look into the points that you have mentioned
15:36:59 <Swami> reedip_: ok, ping me on the IRC channel or add your comments on the bug description and we can take it from there.
15:37:06 <Swami> reedip_: hope this helps.
15:37:38 <reedip_> Sure Swami. By the way any doc where in I can find some DVR info ?
15:37:49 <reedip_> I mean the arch for DVR ?
15:38:08 <Swami> reedip_: All our docs were in the google. Ping me offline and I can point you to the dvr docs.
15:38:27 <Swami> The next bug in the list is
15:38:30 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1707003
15:38:32 <openstack> Launchpad bug 1707003 in neutron "gate-tempest-dsvm-neutron-dvr-ha-multinode-full-ubuntu-xenial-nv job has a very high failure rate" [High,Confirmed] - Assigned to Brian Haley (brian-haley)
15:38:32 <reedip_> ping you offline ?
15:38:34 <reedip_> ok :)
15:39:17 <Swami> haleyb: any update on the grafana with this issue. Are we still seeing the grenade failures
15:39:48 <haleyb> refreshing page now...
15:39:56 <Swami> haleyb: sorry wrong post.
15:39:59 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1713927
15:40:00 <openstack> Launchpad bug 1713927 in neutron "gate-grenade-dsvm-neutron-dvr-multinode-ubuntu-xenial fails constantly" [High,In progress] - Assigned to Brian Haley (brian-haley)
15:40:05 <Swami> I meant this bug.
15:40:36 <haleyb> grafana still has the job just under 20%, let me look at other bug
15:41:32 <haleyb> oh, that bug :)  I just updated my fip host patch this morning, had to go through the logic yet again
15:42:28 <Swami> haleyb: For the grafana issue the patch that you are working on is not required. Since we already have the agent side fix, that should have brought it down.
15:42:29 <haleyb> https://review.openstack.org/#/c/500143/ is the patch
15:42:56 <haleyb> right, it's just the last remaining known issue to fix and backport
15:42:59 <Swami> haleyb: thanks for the patch link.
15:43:20 <Swami> haleyb: The one thing I have seen in the grenade failure is something unrelated like volume failures etc.,
15:43:44 <haleyb> i will look at some of the failures to see if they are related, i've seen the "can't ssh" failures randomly, and there is another patch for that
15:44:35 <Swami> haleyb: ok sounds good.
15:44:38 <Swami> Let us move on
15:44:41 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1717597
15:44:42 <openstack> Launchpad bug 1717597 in neutron "Bad arping call in DVR centralized floating IP code" [High,In progress] - Assigned to Brian Haley (brian-haley)
15:44:42 <haleyb> https://review.openstack.org/#/c/500384/ was the patch
15:45:01 <Swami> haleyb: ok thanks
15:45:07 <haleyb> i think that patch just got approved - for arping
15:45:26 <Swami> #link https://review.openstack.org/#/c/504252/
15:45:29 <Swami> link to the patch.
15:45:36 <Swami> haleyb: yes it should merge today.
15:46:02 <haleyb> then i'll cherry-pick
15:46:25 <Swami> haleyb: thanks
15:46:28 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1716829
15:46:29 <openstack> Launchpad bug 1716829 in neutron "Centralized floatingips not configured right with DVR and HA" [High,In progress] - Assigned to Brian Haley (brian-haley)
15:46:49 <Swami> #link https://review.openstack.org/#/c/503530/
15:46:56 <Swami> This needs a workflow
15:48:04 <Swami> I think this can even merge before the fip['host'] patch, so there not really much of dependencies here.
15:48:04 <haleyb> the parent if the fip host patch, so it can't merge yet
15:48:19 <Swami> haleyb: ok
15:49:02 <haleyb> strange how my comments appear in the latest +2, blame that on gerrit
15:50:08 <Swami> haleyb: sure
15:50:27 <Swami> haleyb: I think we have discussed almost all the bugs.
15:50:55 <Swami> haleyb: There are still some gate failure bugs, which are DB related. StaledataError and failed to bind a port.
15:51:13 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1716321
15:51:14 <openstack> Launchpad bug 1716321 in neutron "StaleDataError: UPDATE statement on table 'standardattributes' expected to update 1 row(s); 0 were matched." [Undecided,New]
15:51:40 <Swami> I think we fixed, this one, but we need to make sure if it is occuring again.
15:51:45 <haleyb> Swami: related to bugs, there was a dvr change to master recently that i think needed a backport to pike, but i can't find the master review now
15:52:04 <Swami> haleyb: do you remember what was the patch about
15:52:35 <Swami> haleyb: Is that the DVR - HA migration with router device owner change.
15:53:13 <haleyb> link?  i'll eventually find it but remember adding a comment in it the other day
15:54:46 <Swami> never mind that patch has already been cherry picked.
15:55:22 <haleyb> i have a tab open somewhere, i'll ping you later
15:55:46 <Swami> #link https://review.openstack.org/#/c/494376/ is this patch
15:56:36 <Swami> I just cherry-picked this one, while I type.
15:56:57 <haleyb> i can't remember, i'll find it eventually :)
15:57:18 <Swami> haleyb: ok I will go through my committed changes and cherry-pick if any
15:57:23 <haleyb> https://bugs.launchpad.net/neutron/+bug/1718369 was the only other recent L3 bug, and I marked as need more info since the release wasn't given
15:57:24 <openstack> Launchpad bug 1718369 in neutron "DBDeadlock occurs when delete router_gateway" [Undecided,Incomplete]
15:58:28 <Swami> ok i will keep an eye on it.
15:58:32 <haleyb> we're about out of time
15:58:37 <haleyb> #topic Open Discussion
15:58:39 <Swami> I think we have come to the top of the hour
15:58:48 <haleyb> any topics for the last minute?
15:58:57 <Swami> I hope we need to resolve the tempest issue.
15:59:23 <Swami> haleyb: can you check with anil_venkata if he can run it in house the tempest test to see the failure on vrrp.
15:59:38 <haleyb> Swami: which bug?
16:00:26 <haleyb> time is up, ping me on #neutron channel
16:00:30 <haleyb> #endmeeting