15:00:21 <mlavalle> #startmeeting neutron_l3
15:00:27 <openstack> Meeting started Thu May 31 15:00:21 2018 UTC and is due to finish in 60 minutes.  The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:31 <openstack> The meeting name has been set to 'neutron_l3'
15:00:35 <haleyb> hi
15:00:38 <mlavalle> Hi there!
15:00:38 <njohnston> o/
15:01:10 <mlavalle> #topic Announcements
15:01:52 <mlavalle> Just a reminder that the Rocky-2 milestone will be next week
15:02:15 <mlavalle> Any other announcements from the team?
15:03:01 <mlavalle> ok then, let's move on
15:03:05 <mlavalle> #topic Bugs
15:03:37 <mlavalle> First one for today is https://bugs.launchpad.net/neutron/+bug/1766701
15:03:38 <openstack> Launchpad bug 1766701 in neutron "Trunk Tests are failing often in dvr-multinode scenario job" [High,Confirmed] - Assigned to Miguel Lavalle (minsel)
15:03:51 <mlavalle> I've been working on this one
15:05:26 <mlavalle> At least one of the failures is due to a change introduced recently to an interface command
15:05:52 <mlavalle> I have a tentative fix proposed to it: https://review.openstack.org/#/c/571043
15:06:08 <mlavalle> I managed so far to change the nature of the error
15:06:22 <mlavalle> but still need to do some additional work on it
15:06:35 <mlavalle> any comments?
15:07:13 <haleyb> nice catch on the }
15:07:39 <mlavalle> well, I still need to really fix it
15:07:45 <haleyb> did you want to merge that change regardless?
15:08:08 <haleyb> or keep looking and do it all at once?
15:09:17 <mlavalle> I want to look at it further
15:09:33 <mlavalle> I'll remove the WIP later
15:09:42 <haleyb> ack
15:10:43 <mlavalle> Next one is https://bugs.launchpad.net/neutron/+bug/1756301
15:10:45 <openstack> Launchpad bug 1756301 in neutron "Tempest DVR HA multimode tests fails due to no FIP connectivity" [High,Confirmed] - Assigned to Miguel Lavalle (minsel)
15:11:02 <mlavalle> I spent some time with it earlier today
15:11:36 <mlavalle> I can see the failures in several cases.
15:12:00 <mlavalle> I am digging right now in http://logs.openstack.org/93/565593/4/check/neutron-tempest-dvr-ha-multinode-full/1c1b813/job-output.txt.gz
15:12:51 <mlavalle> Looking at https://bugs.launchpad.net/neutron/+bug/1717302
15:12:53 <openstack> Launchpad bug 1717302 in neutron "Tempest floatingip scenario tests failing on DVR Multinode setup with HA" [High,Confirmed] - Assigned to Brian Haley (brian-haley)
15:13:06 <mlavalle> don't you think we can consolidate this two bugs?
15:13:11 <mlavalle> these^^^^
15:13:33 <haleyb> probably as they are both dvr ha multinode
15:14:07 <mlavalle> and it's essentially the same report about floating ips
15:15:39 <mlavalle> The latter one has more debugging notes. So I will keep this one, assign it to me and mark the other one duplicated
15:15:43 <mlavalle> makes sense?
15:15:46 <haleyb> +1
15:16:34 <Swami_> hi
15:16:43 <mlavalle> Done
15:16:49 <mlavalle> hey Swami_
15:16:54 <Swami_> hi
15:17:48 <mlavalle> do you have any bugs to report Swami_?
15:18:07 <Swami_> mlavalle: yes I do have one, that needs some attention.
15:18:11 <Swami_> Let me pull up the bug
15:19:30 <Swami_> #link https://bugs.launchpad.net/neutron/+bug/1773999
15:19:32 <openstack> Launchpad bug 1773999 in neutron "Allowed Address Pairs doesn’t work after neutron-port update" [Undecided,In progress] - Assigned to Boris (boris-maeck)
15:20:04 <Swami_> We might have to revert one of the patches that we pushed in to address this issue.
15:20:07 <Swami_> #link https://review.openstack.org/#/c/550676/
15:20:27 <Swami_> mlavalle: I need to have a conversation on this issue.
15:21:03 <mlavalle> you mean a high bandwidth conversation?
15:21:07 <mlavalle> or here?
15:21:37 <Swami_> Changing the ARP entry NUD, does not help for this bug, since if we have a temporary state in the ARP table and when traffic is routed between the subnets, the ARP entry goes back to incomplete.
15:21:58 <Swami_> mlavalle: probably here, but if it takes longer then we can move to the channel.
15:22:09 <mlavalle> let's carry on here
15:22:23 <mlavalle> that way haleyb and njohnston can chime in
15:22:48 <Swami_> mlavalle: The optimal way to address this issue is only by polling on the GARP message for a given IP in the router namespace and then add a permanent entry for the allowed_address_pair.
15:23:10 <Swami_> But this would have a hit in performance.
15:23:34 <mlavalle> yeah
15:23:58 <Swami_> Since it would have a hit in performance, should we make it configurable, so that people who wanted to use this feature can turn on or off the feature.
15:24:41 <Swami_> This all boils down to the design constraint of DVR not able to forward the ARP request outside of the given node.
15:25:09 <haleyb> what is meant by 'polling'?  i need to look at the bug
15:25:51 <Swami_> mlavalle: haleyb: I mean't we need to sniff all the packets that are coming to the router-namespace and filter it by the GARP type and IP address registered to filter for.
15:25:59 <Swami_> haleyb: That is what I meant for polling.
15:26:18 <Swami_> haleyb: Is there any other way to address this issue.
15:27:46 <Swami_> mlavalle: Do you think we need to file an RFE for this change request?
15:28:20 <mlavalle> that is probably a good idea
15:28:49 <mlavalle> so we can iterate on the problem and the possible solution
15:28:53 <Swami_> mlavalle: ok let me go ahead and file an RFE to discuss further on this approach.
15:29:12 <mlavalle> should we do the revert right away?
15:29:22 <haleyb> i don't know how we'd easily watch for gARPs.  i guess i'd like to know how the router is trying to confirm the ARP and failing
15:30:32 <Swami_> haleyb: Yes right now, the router gets this GARP message and the ARP table is update as temporary. Since permanent entries are not changeable.
15:31:52 <Swami_> haleyb: So when we have a VM residing on a different subnet trying to ping this vrrp ip, the arp message from the VM reaches the router, the router sees the ARP table but since the entry is temporary it is trying to confirm by re-arping to the IP to confirm the MAC. Since we don't allow the ARP traffic to pass out, the ARP entry becomes invalid after a few seconds.
15:32:30 <haleyb> Swami_: i guess writing it up is the first step, then maybe we can think of ways to fix it
15:32:43 <Swami_> haleyb: ok, I will write it up.
15:33:02 <mlavalle> this is the status after merging https://review.openstack.org/#/c/550676/
15:33:04 <Swami_> haleyb: mlavalle: I create an RFE and then add in my thoughts on how to approach this.
15:33:05 <mlavalle> right?
15:33:49 <Swami_> mlavalle: yes you are right.
15:33:56 <mlavalle> ok
15:34:08 <mlavalle> yeah a write up will help
15:34:23 <haleyb> so that change merged in master/queens/pike :(
15:34:25 <Swami_> mlavalle: ok thanks, will do.
15:34:31 <Swami_> haleyb: yes.
15:35:13 <mlavalle> how about the revert?
15:35:52 <Swami_> mlavalle: A revert should remove the existing change.
15:36:12 <mlavalle> oh I know, but is that a more desirable poisition?
15:36:42 <Swami_> mlavalle: But the allowed_address_pair issue will still persist and the long term solution would be the new approach that I mentioned.
15:37:13 <mlavalle> ok, so it seems that we don't need to revert right away
15:37:23 <mlavalle> am I correct?
15:38:10 <Swami_> mlavalle: The reason someone had asked to revert is because, previously at least it was able to pass traffic if a customer manually did a port update, since port update will push in a new arp table update if MAC had changed and this patch broke it.
15:39:14 <mlavalle> so reverting leaves us in a better position then
15:39:14 <Swami_> mlavalle: So let us first revert it and think about the new proposal.
15:39:22 <Swami_> mlavalle: yes
15:39:33 <mlavalle> yeah, that is what I was tryiong to get at
15:39:47 <mlavalle> ok with you haleyb?
15:40:09 <haleyb> sure
15:40:25 <mlavalle> Swami_: will you propose the revert?
15:40:44 <Swami_> mlavalle: Yes there is already a patch. https://review.openstack.org/#/c/570941/
15:40:52 <mlavalle> Thanks
15:41:14 <Swami_> mlavalle: haleyb: Another question about RFE.
15:41:37 <mlavalle> not from me, I will wait for the RFE
15:41:51 <haleyb> mlavalle: just wonder if revert should be done from original change so context is all there
15:41:58 <Swami_> mlavalle: haleyb: I will also file an RFE to make the FloatingIP path to IPv6, since we don't have IPv6 support right now for the fast path exit.
15:42:24 <mlavalle> haleyb: that is good point. let's preserve history
15:42:29 <haleyb> Swami_: ok
15:42:44 <Swami_> haleyb: ok, I will revert from the original patch. I will push in one.
15:43:16 <mlavalle> Swami_: that is this person's frist contribution
15:43:28 <Swami_> mlavalle: Yes
15:43:41 <mlavalle> let me leave a comment in his patch proposing that he does the revert from the original patch
15:43:52 <mlavalle> I want to be welcoming ;-)
15:44:02 <mlavalle> makes sense?
15:44:04 <Swami_> mlavalle: Sure, nice.
15:44:10 <haleyb> and they can revert the stable ones the same way
15:44:32 <mlavalle> yeah, he might be interested in making a contribution
15:44:49 <Swami_> mlavalle: he is a customer that I am working with.
15:46:05 <mlavalle> ok done. I left a comment there
15:46:14 <Swami_> mlavalle: thanks
15:47:09 <Swami_> mlavalle: haleyb: I will be also filing an RFE for the IPv6 support for the FloatingIP namespace for DVR routers, since we don't have radpd and link local address in IPv6 yet.
15:47:38 <mlavalle> Swami_: ack
15:47:52 <Swami_> mlavalle: ok thanks
15:48:14 <mlavalle> I also have a request for Swami_ and / or haleyb. Would you help triaging https://bugs.launchpad.net/neutron/+bug/1773286?
15:48:15 <openstack> Launchpad bug 1773286 in neutron "In some specific case with dvr mode I found the l2pop flows is incomplete." [Undecided,New]
15:48:19 <haleyb> Swami_: ack, and there might be some old patches around to help make that work
15:48:43 <Swami_> mlavalle: yes, I think I know this issue.
15:48:49 <haleyb> mlavalle: yes, saw that and he replied with the bad code, haven't looked yet
15:49:22 <Swami_> mlavalle: The line of code pointed in this bug, was by passed, since we were seeing an error log in the jenkins.
15:49:55 <mlavalle> ok, cool. Please leave some comments in the bug
15:49:58 <Swami_> mlavalle: But if it is eating up the l2pop, we can leave with the error log.
15:50:19 <Swami_> mlavalle: will do.
15:50:41 <mlavalle> The other petion that we have is to see if anyone has seen the following:
15:50:45 <mlavalle> http://logs.openstack.org/87/564887/6/check/neutron-tempest-dvr-ha-multinode-full/6296f64/logs/subnode-2/screen-q-l3.txt.gz?level=ERROR
15:50:48 <Swami_> haleyb: can you please point me to the old patches for the radpd for the fip namespace.
15:51:33 <mlavalle> This came up in the last CI meeting
15:51:58 <haleyb> Swami_: i'll find them, don't think it was with ra proxy (i think that's what you meant)
15:52:13 <Swami_> haleyb: thanks
15:52:36 <Swami_> mlavalle: Let me check the code path to see why we are seeing this gateway route update error.
15:53:01 <mlavalle> if you find something relevant, please file a bug
15:53:11 <Swami_> mlavalle: sure will do.
15:53:21 <mlavalle> Thank you!
15:53:44 <mlavalle> Any other bugs we should discuss today?
15:54:47 <mlavalle> I take that as no
15:54:56 <mlavalle> #topic Open Agenda
15:55:09 <mlavalle> any thing we should discuss today?
15:55:30 <Swami_> mlavalle: I don't have any.
15:56:07 <Swami_> haleyb: You missed the good food and drinks in the summit
15:56:10 <mlavalle> haleyb: do you know that Swami_ and Ryan Tidwell attended the Summit?
15:56:40 <haleyb> darn, hope everyone had a good time
15:56:45 <mlavalle> also Mark McClain
15:57:11 <mlavalle> haleyb: rest assured you were missed ;-0
15:57:25 <haleyb> any more sea plane rides?
15:57:40 <Swami_> mlavalle: I did a seaplane ride.
15:57:53 <Swami_> haleyb: it was fun getting on a seaplane ride.
15:58:07 <mlavalle> I didn't do the sea plane this time around, but I went ziplinning in the mountains
15:58:13 <mlavalle> 5 zip lines
15:58:13 <Swami_> mlavalle: haleyb: It was an 1 hour ride.
15:58:21 <Swami_> mlavalle: That would be fun
15:58:33 <haleyb> you guys have all the fun :)
15:58:34 <mlavalle> the longest one is almost a third of a mile long
15:58:37 <Swami_> mlavalle: Did you do it in the Grouse mountain.
15:58:50 <mlavalle> Exactly, Grouse Mountain
15:59:17 <mlavalle> ok, time is up
15:59:22 <mlavalle> Nice talking to you
15:59:27 <mlavalle> #endmeeting