15:03:03 #startmeeting neutron_l3 15:03:03 Meeting started Thu Sep 24 15:03:03 2015 UTC and is due to finish in 60 minutes. The chair is mlavalle. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:03:04 SergeyLukjanov: Thanks! 15:03:05 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:03:07 The meeting name has been set to 'neutron_l3' 15:03:14 SergeyLukjanov: thanks! 15:03:18 hi 15:03:21 * regXboi still notes having multiple chairs is good idea - cough, cough 15:03:28 morning all 15:03:40 #topic Announcements 15:04:05 so, we cut RC1 yesterday.... lots of last time activity, at least for me 15:04:13 mlavalle: yay! 15:04:16 hi 15:04:27 That also means that Mitaka is open! 15:04:39 ++ 15:04:40 carl_baldwin: +1 15:04:58 also we should know who is the new PTL soon 15:05:05 that will be tomorrow 15:05:15 if you haven't voted - go do so :) 15:05:25 yes, please vote..... 15:05:38 note: I don't care who you vote for :) just vote 15:06:00 any other announcements? 15:06:09 Remember the last vote was won my one single vote. 15:06:35 s/my/by/ 15:07:03 that's true, so if you haven't voted, you might elect the ptl 15:07:41 ok, moving along..... 15:07:48 #topic Bugs 15:08:01 so pinging myself.... 15:08:21 the good news today is that we worked through our critical bugs last week 15:08:36 haven't shown up in at least 7 days 15:09:04 we also worked through a lot of high importance bugs, so today I have two to highlight 15:09:12 Yay! 15:09:24 https://bugs.launchpad.net/neutron/+bug/1365473 15:09:25 Launchpad bug 1365473 in neutron "Unable to create a router that's both HA and distributed" [High,In progress] - Assigned to Assaf Muller (amuller) 15:09:48 I noticed adduarte is in the meeting and he has been involved with this one.... any comments? 15:10:08 assayf was working on it review 15:10:44 testing shows us in good shape. but it does need code review 15:10:45 I talked with amuller yesterday about this. We’re still hoping to merge this before Liberty final. I’m going to mark it an RC2 candidate. 15:11:05 carl_baldwin: sounds good 15:11:12 yeah, it got assigned to amuller 17 hours ago..... 15:11:14 carl_baldwin:+1 15:11:19 and retesting after any new changes from assaf 15:11:37 bout an hour of testing 15:12:51 adduarte: Thanks for all of your work here. 15:13:08 ok, next up is https://bugs.launchpad.net/neutron/+bug/1494351 15:13:09 Launchpad bug 1494351 in neutron "Observed StaleDataError in gate-neutron-dsvm-api tests if reference IPAM driver is used" [High,In progress] - Assigned to Pavel Bondar (pasha117) 15:13:31 pavel_bondar was working on it.... any updates? 15:13:36 mlavalle: fyi - I have two others to throw in the hopper when you are done 15:13:38 unfortunatelly I did not had free cycles to work on that during last week 15:13:54 it is pretty clear now what the fix should be 15:14:06 but I need about 2 free days to implement and test 15:14:17 * mlavalle pu the bugs regXboi is talking about in the agenda.... 15:14:27 * regXboi goes and looks :) 15:14:27 * mlavalle maybe? 15:15:07 pavel_bondar: that's cool, thanks for the update. Any help needed? 15:15:22 * regXboi no - these are ones I'm just seeing now :( - I think carl_baldwin bumped them up and I forgot to add them - apologies 15:15:53 regXboi: Sorry 15:16:07 mlavalle: it is clear what to do, so I think I am ok here 15:16:09 carl_baldwin: my fault, I should have added them 15:16:28 pavel_bondar: thanks... moving on 15:16:39 mlavalle: the first is https://bugs.launchpad.net/neutron/+bug/1486795 15:16:40 Launchpad bug 1486795 in neutron "DVR: create or update port by using notify specific host rather than fanout" [High,In progress] - Assigned to shihanzhang (shihanzhang) 15:17:07 this got bumped up by carl_baldwin to High on 9/18 - there is a request in the bug for a patch set review 15:17:08 pavel_bondar: I’ll get to that one today. It had dropped off my radar. :( 15:17:32 ( https://review.openstack.org/221209 ) so maybe we can make this a RC-2 target as well? 15:17:33 * regXboi hopes 15:17:41 carl_baldwin: you reviewed it on 9/21. we are actually waiting for the next revision 15:18:05 regXboi: Possibly. It needs work and I wouldn’t call it a release blocker. 15:18:19 carl_baldwin: looking at that patch set, it needs more work :( 15:18:32 carl_baldwin, so I don't think it's RC-2 ready now 15:18:40 regXboi: yes, carl_baldwin indicated that the code needs cleanup 15:19:23 the last one is https://bugs.launchpad.net/neutron/+bug/1486828 and I don't see a patch set (which means I think launchpad missed it) 15:19:24 Launchpad bug 1486828 in neutron "L3: Notify specific agent rather than fanout when associating floatingip" [High,In progress] - Assigned to changzhi (changzhi) 15:19:50 regXboi: we were actually talking about the same 2 bugs. both in the agenda 15:20:04 mlavalle: ok - cool :) 15:20:13 I got your back man :-) 15:20:29 good, because I'd have lost my head this week if it wasn't attached to my shoulders :) 15:20:51 regXboi: There is a patch. Let me find... 15:21:15 carl_baldwin: yes, I remember seeing it - didn't realize that launchpad had missed it 15:21:16 #link https://review.openstack.org/#/c/215136/ 15:21:45 thanks - I've updated launchpad 15:21:49 carl_baldwin: I'll update the bug with the patchset 15:21:56 mlavalle: Thanks 15:21:57 mlavalle: I got your back this time :) 15:22:02 :-) 15:22:31 yeah this one had me worried 15:22:44 the test failures didn't look like I could just call them "unrelated" 15:23:23 one, question, is https://bugs.launchpad.net/neutron/+bug/1494351 required for Liberty? 15:23:24 Launchpad bug 1494351 in neutron "Observed StaleDataError in gate-neutron-dsvm-api tests if reference IPAM driver is used" [High,In progress] - Assigned to Pavel Bondar (pasha117) 15:24:10 pavel_bondar: It’d be nice but I wouldn’t hold up the release for it. 15:24:18 Hi, sorry to come in late. 15:24:48 carl_baldwin: ok, then try to get some time to fix it sooner 15:25:31 the other good news is that we worked through 2 medium importance bugs that were marked for RC-1 by regXboi.... so, all in all, great teamwork 15:25:42 +1 15:25:52 yes - thanks to everybody that pitched in once they were found :) 15:26:19 any other bugs to discuss? 15:26:47 ok, moving on...... 15:26:48 mlavalle: I think we are good for now - the backlog is up a bit, and I need to spend some time with it for next week 15:27:09 #topic Router Networks 15:27:40 carl_baldwin: I think this is the spec you want to discuss https://review.openstack.org/#/c/225384/ 15:27:59 I put it in the agenda 15:28:11 mlavalle: Yes. I wanted to point it out. It is collecting feedback. 15:28:58 It introduces two very poorly named new entities to help model L3 networking. 15:29:31 I think it is probably best to just take the discussion to the review. Maybe we’ll have more to discuss next week. 15:29:47 I will review soon, hopefully tomorrow. 15:30:10 Please take a look and pitch in with feedback 15:30:24 moving on..... 15:30:32 #topic DVR 15:30:42 mlavalle: hi 15:30:47 Swami: the floor is all yours 15:31:04 I am working on fixing some error logs that I have been seeing in the l3-agent logs. 15:31:11 regXboi: and haleyb: too 15:31:23 I have also added a couple of bugs related to those fixes. 15:31:40 carl_baldwin: thanks for reminding me 15:31:46 #link https://review.openstack.org/#/c/225319/ 15:32:06 #link https://review.openstack.org/#/c/225514/ 15:32:27 #link https://review.openstack.org/#/c/225523/ 15:32:46 #link https://review.openstack.org/#/c/227008/ 15:33:18 I have questions about that last one, but will put them in the review 15:33:27 There was one other patch I have submitted to revert it. This one I had a discussion with carl_baldwin yesterday regarding the static routes being added to snat_namespace instead of the qr-namespace 15:33:32 Swami: can you speculate how many of those might be causing DVR jobs failures in the pipelines? 15:33:44 #link https://review.openstack.org/#/c/227045/ 15:34:18 regXboi: As of today they are causing the DVR jobs to fail, but the intermittent failures may be due to these issues. 15:34:49 s/they are/they are not 15:35:13 Swami: are there logstash queries that we can use to check the pipelines? 15:35:41 regXboi: I looked at the logtrace from my patches for any traces that have happened. 15:36:03 regXboi: I don't have a logstash query at this time. 15:36:23 haleyb: thanks 15:36:31 ok, I ask becasue the dvr jobs are running at between 25 and 50% failure rates for the last week :( 15:36:43 and that's not good 15:36:54 regXboi: last two days on 20 and 21 the DVR job failure was high. 15:37:28 regXboi: I was not sure what was cusing that spike for last two days, it came down yesterday. 15:37:32 Swami: I guess my main concern with some of the log message "squashing" is that is there a bug underneath? Hiding the message just makes it worse 15:37:37 Does anyone know what caused that spike. 15:38:06 haleyb: +1 squashing messages without fixing bugs is not the way to go 15:38:23 haleyb: The issue that I am seeing with these log message, is either the port or the namespace is concurrently deleted, while other one is trying to add the data. 15:38:27 Swami: I do not know - that's why I was looking for logstash queries to check to see if they were related 15:38:29 for example, if we get a "namespace does not exist" error was it supposed to be there? 15:38:42 regXboi: haleyb: I agree. 15:39:01 haleyb: This is the reason for the patch revert that I included. 15:39:48 Swami: yes, concurrency with DVR is a bit tricky 15:39:48 haleyb: it is trying to add a static route to the snat-namespace, when the snat-namespace has been already deleted or not in existence. 15:39:54 on the DVR job failures, I need to finish this last O(n) perf issue that I'm chasing and then I was going to go at the problem from the FIP side 15:40:30 regXboi: Most of the DVR jobs inconsistent failure is due to the "ssh timeout" or not able to reach with the FloatingIP. 15:40:31 to see *why* the FIPs aren't available as those are *most* of what I've seen when DVR fails 15:40:40 Swami: exactly 15:40:59 Swami: on the SNAT namespace, that gets back at the conversation you and carl_baldwin had yesterday regarding do we need to add routes there, what was the conclusion there? 15:41:06 regXboi: The best option is that we might have to add some test cases to see if the packet can reach from one namespace to another namespace. 15:41:36 Swami: I think we may have some additional options to follow 15:42:51 I still need to come to my own conclusion. At first I thought it makes sense for all of the dvr router namespaces to have essentially the same routing tables. But, the static routes may not be used in the snat namespace. 15:43:33 sorry got disconnected. 15:44:14 regXboi: haleyb: what would be the best option to handle the concurrency problem with the DVR. 15:45:08 regXboi, haleyb maybe we need to continue this discusssion in the Neutron channel? 15:45:18 mlavalle: +1 - let's move on 15:45:28 ok..... 15:45:33 yes.... 15:45:43 #topic BGP dynamic routing 15:45:52 hi 15:45:55 sorry got disconnected. 15:45:57 hi 15:46:13 Swami_: let's continue that conversation in the Neutron channel 15:46:24 mlavalle: sure 15:47:01 tidwellr: floor is all yours..... 15:47:13 just plugging along, the dynamic routing agent is in pretty solid shape 15:47:38 feel free to review (let me get the links) 15:48:06 https://review.openstack.org/#/c/207607/ 15:48:11 https://review.openstack.org/#/c/207625/ 15:48:23 https://review.openstack.org/#/c/207635/ 15:48:34 these are in good shape 15:49:07 we still have the service plugin and API code to polish up https://review.openstack.org/#/c/201621/ 15:49:14 tidwellr: great! Let’s get them back in to our merge queues. 15:49:39 since Mitaka is now open, I need to do a rebase 15:49:58 https://review.openstack.org/#/c/201621/ is still WIP 15:50:10 tidwellr: I’d say only rebase if you hit a merge conflict. 15:50:30 well, we've got migrations in the wrong folders and stuff like that 15:51:23 I'm looking to break down https://review.openstack.org/#/c/201621/, it's rather large 15:51:48 tidwellr: ok 15:51:49 anyway, that's where we're at 15:52:04 tidwellr: thanks for the update.... let's move on 15:52:12 #topic DNS 15:52:57 so regXboi filed a bug last Friday, where internal dns queries were causing poor performance in ports gets 15:53:21 we were able to propose a fix for it, that made it in RC-1: https://review.openstack.org/#/c/226581/ 15:53:26 :) 15:53:40 esentially eliminated all the overhead db queries for subnets 15:53:54 thanks regXboi, great catch 15:54:01 you are welcome 15:54:07 mlavalle: Thanks for the quick response on that. 15:54:58 as far as external DNS (https://review.openstack.org/#/c/212213/) we are at a point where the functionality for floating ips is woking fine 15:55:53 Here http://paste.openstack.org/show/471874/ I walk though the creation of a fip with dns_name and dns_domain and show the impact on the designate database and I can dig the A and PTR pointers 15:56:44 mlavalle: nice! 15:56:44 We’ve got a talk we need to get ready for 15:56:48 I reviewed this with Kiall, mugsie and the rest of the Designate tea, and I am going to polish it over the next 2 days. after that, we are good to show it in Tokyo 15:57:37 and next week i'll move on to vm ports on external networks and will start bugging the infra team to get neutron + designate in the gate 15:57:56 that's my update for today 15:58:32 #topic Open Discussion 15:58:45 any other topics to bring up? 15:58:45 under the O(n) heading - I've got one more I'm chasing for router scheduling - ovs_add_port 16:00:01 ok, guys, thanks for attending, time is over. And keep up the great work! 16:00:09 #endmeeting