15:00:20 #startmeeting neutron_l3 15:00:21 Meeting started Thu Aug 31 15:00:20 2017 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:22 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:25 The meeting name has been set to 'neutron_l3' 15:00:32 hi 15:00:38 hi 15:01:14 #chair haleyb Swami 15:01:14 Current chairs: Swami haleyb mlavalle 15:01:31 #topic Announcements 15:01:53 Pike is being released this week 15:02:31 so we will start the whole thing again and move to Queens now :-) 15:03:32 We are a little more than one week away from the PTG 15:04:41 I arrive Sunday 10th at around 2:30pm. Staying at the event hotel (Renaissance) until Saturday morning 15:05:12 mlavalle: I am arriving late in the evening on Sunday and will be flying back on friday evening. 15:05:25 i'll be there Sunday as well 15:05:42 Any other annoucements from the team? 15:06:30 nothing from me. 15:07:10 I had a quick chat a couple of weeks ago with carl_baldwin. He is going to join us for dinner Monday or Tuesday 15:07:26 mlavalle: great! 15:07:42 He'll drive all the wasy from Fort Collins 15:07:55 ok, moving on 15:08:00 #topic Bugs 15:08:14 Swami: go ahead, as usual 15:08:17 mlavalle: thanks 15:08:34 We had a critical bug in the grenade job that was filed yesterday. 15:08:54 #link https://bugs.launchpad.net/neutron/+bug/1713927 15:08:55 Launchpad bug 1713927 in neutron "gate-grenade-dsvm-neutron-dvr-multinode-ubuntu-xenial fails constantly" [Critical,Confirmed] 15:09:39 If all are not aware about the problem. It seems somehow the server is providing a floatingip that is not bound to a host. 15:10:03 The agent configures the floatingip without checking for the host. 15:10:22 So the floatingip is residing on two different nodes. 15:11:00 This happens only when a neutron-server is restarted after some time interval. At that time when a full-sync happens, we sneak in this duplicate floatingip. 15:11:20 Swami: kevin did put a reproducer in the bug this morning, and i had a discussion with kuba earlier, think we might have a workaround 15:11:51 haleyb: I did see that jakob had posted a patch. 15:12:11 yes, but we also think we need this revert https://review.openstack.org/#/c/499263/ 15:12:11 haleyb: Yes I also ready kevin's reproducing steps. 15:12:32 haleyb: yesterday I was trying to reproduce by restarting the agent. I could not reproduce it. 15:13:23 haleyb: As I mentioned there are two problems. 15:13:45 haleyb: mlavalle: From the server side, the notification should be only sent to the host that is hosting it. 15:14:13 haleyb: mlavalle: on the client side it should check for host or dest_host, before configuring the floatingip. 15:14:53 haleyb: mlavalle: While I have not figured out what is happening on the server side yet. But on the client side we may have a better solution. 15:15:00 Swami: i think with the revert and jakub's change the agent side would be fixed, fixing the check queue 15:15:12 haleyb: mlavalle: Yes I agree. 15:15:32 so the host fix we merged last week was really a symptom? 15:15:59 haleyb: mlavalle: But still the check that we have under the 'get_external_device_interface_name" 15:16:11 haleyb: mlavalle: may not be the right place. 15:16:25 haleyb: mlavalle: we should have this check before configuring the floatingip. 15:17:12 #link https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_local_router.py#L103 15:17:31 we should have this check at this line, so that no floatingips are configured. 15:17:45 and only floatingips that are associated with that host will be configured. 15:18:31 mlavalle: To your question. Yes it is a symptom that we saw last week, which was sending floatingip's without host. 15:18:44 but the agent shouldn't have sent, right? 15:18:47 makes sense, thanks 15:19:20 haleyb: Yes, you mean the 'server' shouldn't have sent in the first place. 15:20:11 Swami: right, sent by accident, or because 'host' not set 15:20:40 haleyb: Yes. 15:21:27 haleyb: mlavalle: The case where we check for the 'host' condition in the server has three different conditions, so probably we might be having a leak in one of those checks. 15:22:23 Swami: so let's do the revert in stable/pike and/or master until we can fix correctly 15:22:37 haleyb: mlavalle: Probably the fast approach is to fix the agent side to handle the fips based on the host and then we can take a look at the server side. 15:23:27 haleyb: mlavalle: I think we should fix the agent first and no need to revert at this point. 15:24:18 Swami: i think without the pike revert we can't land anything, as it's now broken so grenade in master can't pass (from what I understood) 15:25:22 we can ask jakub in #neutron after meeting, but that's what i understood 15:25:38 haleyb: Ok, if that is the case then we can decide on reverting. 15:27:17 haleyb: mlavalle : So the plan is, let us check with jakub and see how his patch fairs. 15:28:03 haleyb: mlavalle: If it works and can be merged with the grenade job allowing to merge, then we can go ahead and push this patch and need not revert. Otherwise we should revert. 15:28:05 grenade job is now doing pike->queens as of yesterday, so that changed things 15:28:31 haleyb: what is that? I don't get it. 15:28:50 yeah, the grenade equation chenged 15:28:55 grenade configures "old" version, then upgrades to "new" version 15:28:57 changed 15:29:06 so if pike is broken, then we can't upgrade 15:29:40 i think the revert fixes pike, then jakub's change fixes master enough to make progress 15:30:08 haleyb: ok makes sense. 15:31:22 haleyb: mean while I will try to see what is happening on the server side logic. 15:32:08 Let us move on. 15:32:46 #link https://bugs.launchpad.net/neutron/+bug/1712913 15:32:48 Launchpad bug 1712913 in neutron "Update DVR router port cause error with QoS rules" [Undecided,In progress] - Assigned to Slawek Kaplonski (slaweq) 15:33:00 This is another bug that was filed against DVR. 15:33:40 But by looking at the bug description and the logs, it seems that it might also happen with legacy routers. 15:34:16 The issue is with the ovs_agent. There is patch currently for review. 15:34:39 #link https://review.openstack.org/#/c/498598/ 15:35:12 Please take a look at the patch, after we fix this critical issue and when the gate is happy. 15:35:32 ok 15:35:48 The next bug in the list is 15:35:50 #link https://bugs.launchpad.net/neutron/+bug/1712795 15:35:52 Launchpad bug 1712795 in neutron "Fail to startup neutron-l3-agent" [Undecided,New] 15:36:49 This bug seems to be incomplete. I don't see any issue in neutron-l3-agent processing routers. I have asked couple of questions on the branch and how it can be reproduced, but have not received any information yet. 15:36:57 Until then we can mark it as incomplete. 15:37:34 There are two other bugs that was filed against the multinode failure. 15:37:56 Since we have already discussed about the multinode failure, we can have the discussions next week on where we stand. 15:38:05 mlavalle: back to you. 15:38:18 Swami: Thanks 15:38:26 Good discussion 15:38:48 I don't have any major bugs to discuss 15:38:56 #topic Open Agenda 15:39:33 Since we have that critical DVR bug breaking the gate, let's move to Opem Sgenda 15:39:36 Agenda 15:40:07 Any other topics we should discuss today? 15:40:34 mlavalle: No I will keep working on the fix. 15:40:58 https://www.eventbrite.com/e/drinks-with-rdo-at-the-ptg-tickets-37396477872 15:41:05 RDO release party at PTG :) 15:41:23 i think that's open to all 15:41:51 But I think you need to register. I will right now 15:42:13 ok guys, thanks for attending 15:42:21 #endmeeting