15:00:20 <carl_baldwin> #startmeeting neutron_l3
15:00:21 <openstack> Meeting started Thu Mar 13 15:00:20 2014 UTC and is due to finish in 60 minutes.  The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:22 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:24 <openstack> The meeting name has been set to 'neutron_l3'
15:00:28 <carl_baldwin> #topic Announcements
15:00:37 <carl_baldwin> #link https://wiki.openstack.org/wiki/Meetings/Neutron-L3-Subteam
15:01:07 <carl_baldwin> First, I didn’t plan for daylight savings time when I chose the time for this meeting.
15:01:23 <carl_baldwin> The time shift has caused a bit of a problem for me.
15:01:46 <carl_baldwin> I'd like to suggest a couple of possible meeting times.  Both Thursday.
15:02:15 <carl_baldwin> The first is an hour earlier.  I know that makes things very early for a few in Western US.
15:02:29 <ajo> For me it's actually better , +1
15:02:48 <carl_baldwin> The second is two hours later which could be difficult for others in other parts of the world.
15:03:21 <safchain> for me both are ok
15:03:28 <safchain> HI all btw
15:03:34 <carl_baldwin> safchain: hi
15:03:41 <ajo> safchain, hi :)
15:04:54 <carl_baldwin> I'll wait a few days for others who might be reading the meeting logs to chime email.  Ping me on irc or email with any concerns.  I'll announce the meeting time before next week.  And, I'll consider the next shift in daylight savings time.  ;)
15:05:15 <ajo> sure, thanks carl_baldwin
15:05:19 <carl_baldwin> #topic l3-high-availability
15:05:32 <carl_baldwin> safchain: Anything to report?
15:05:54 <safchain> currently I'm working on the conntrackd integration,
15:06:14 <safchain> The assaf's patch has to be reworked a bit to support multicast
15:07:13 <carl_baldwin> I need to review again.  Anything new on the FFE?  I fear that it didn't happen.
15:07:18 <safchain> I don't know if all of you have tested patches
15:07:35 <safchain> carl_baldwin, no new for FFE
15:08:22 <ajo> I couldn't test yet safchain, but I will try to allocate some time for it.
15:08:34 <carl_baldwin> Okay.  The sub team page has links and information about reviewing and testing but I'll admit I've not yet tested.
15:08:44 <safchain> carl_baldwin, I think this is almost everything for me, just need more feed back with functionnal test
15:09:13 <ajo> safchain, do you have some functional test examples?
15:09:25 <ajo> I could get some people on our team to provide feedback on that.
15:09:27 <carl_baldwin> Okay, I am looking forward to running it.  I need a multi host development setup soon anyway.
15:09:56 <carl_baldwin> #link https://docs.google.com/document/d/1P2OnlKAGMeSZTbGENNAKOse6B2TRXJ8keUMVvtUCUSM/edit#
15:10:03 <safchain> ajo, I will add some test use cases on the doc.
15:10:20 <carl_baldwin> ajo: ^ This is the doc.
15:11:07 <ajo> Thanks, I mean, if we have already some kind of initial functional test written for this. I will keep a link to this doc for manual testing.
15:11:37 <safchain> ajo, not yet
15:11:58 <ajo> ok, it's not easy
15:12:08 <safchain> ajo, but tempest test should works with HA enabled
15:12:19 <ajo> ok, that's a good start
15:12:59 <carl_baldwin> safchain: anything else?
15:13:13 <safchain> It's ok for me
15:13:16 <carl_baldwin> #topic neutron-ovs-dvr
15:14:05 <carl_baldwin> Doesn't look like Swami is around.
15:14:30 <carl_baldwin> Swami is still working on detailing changes to L3.
15:14:41 <carl_baldwin> The doc for L2 is up.  Could use more review.
15:14:57 <carl_baldwin> #link  https://docs.google.com/document/d/1depasJSnGZPOnRLxEC_PYsVLcGVFXZLqP52RFTe21BE/edit#heading=h.5w7clq272tji
15:15:18 <safchain> Sure, I plan to review it by the end of the week
15:15:53 <carl_baldwin> Also looking in to integrating the HA L3 and HA DHCP was discussed.
15:16:38 <carl_baldwin> safchain: great
15:16:46 <Sudhakar_> hi all...
15:17:01 <ajo> hi Sudhakar_
15:17:03 <carl_baldwin> Sudhakar_: hi
15:17:08 <safchain> carl_baldwin, yes I'll try to ping swami after reviewing the doc
15:17:17 <Sudhakar_> carl_baldwin, is there a doc about HA DHCP?
15:17:19 <safchain> hi Sudhakar_
15:17:26 <Sudhakar_> hi ajo...
15:17:42 <carl_baldwin> Sudhakar_: I don't think there is a doc yet about it.  Only some initial discussion expressing interest in starting that work.
15:17:43 <Sudhakar_> hi carl
15:18:10 <Sudhakar_> Did Swami initiate the discussion?
15:18:22 <carl_baldwin> Sudhakar_: yes
15:18:30 <Sudhakar_> Ok. I have some context then..
15:18:44 <Sudhakar_> I am Swami's colleague ..based out of India
15:19:12 <Sudhakar_> basically we were thinking of an Agent monitoring service....which can be used to monitor different agents ...
15:19:22 <Sudhakar_> typically useful for L3 and DHCP when we have multiple NNs
15:20:03 <ajo> Sudhakar_, something like rpcdaemon ?
15:20:19 <Sudhakar_> not exactly..
15:20:41 <Sudhakar_> a thread which can started from plugin itself...
15:20:49 <Sudhakar_> and act based on the agent report_states...
15:21:21 <ajo> Sudhakar_, what kind of actions?
15:22:03 <Sudhakar_> for ex: if a DHCP agent hosting a particular network goes down ....and we have another active DHCP agent in the cloud...
15:22:37 <Sudhakar_> agent monitor detects that this DHCP agent went down and trigger rescheduling the network's DHCP on to the other agent..
15:22:54 <ajo> A few weeks ago, I was proposing that daemon agents could provide status via status file -> init.d "status", but it could be complementary.
15:23:08 <ajo> aha, it makes sense Sudhakar_
15:23:21 <Sudhakar_> currently we have agent_down_time configuration which will help us decide on rescheduling...
15:23:28 <carl_baldwin> Sudhakar_: Do you have any document describing this that we could review offline?
15:23:33 <Sudhakar_> we could have another parameter altogether to avoid mixing up..
15:23:48 <safchain> Sudhakar_, It seems there is something like that for LBaaS
15:23:49 <ajo> yes, a document on those ideas would be interesting,
15:23:57 <Sudhakar_> we are refining the doc... will publish it for review soon..
15:24:16 <carl_baldwin> Actually, I made a mistake above.  I said HA DHCP where I should have said, more precisely, distributed DHCP.
15:24:36 <carl_baldwin> Sudhakar_: Great.
15:24:39 <ajo> aha carl_baldwin , the one based in openflow rules?
15:24:46 <Sudhakar_> Ok ..:)
15:25:12 <Sudhakar_> Distributed DHCP was another thought...but i don't have much idea on that yet...
15:25:22 <carl_baldwin> ajo: open flow rules could play a part but that did not come up explicitly.
15:25:33 <ajo> understood
15:26:01 <carl_baldwin> #topic l3-agent-consolidation
15:26:36 <carl_baldwin> This work is up for review but the bp was pushed out to Icehouse.
15:26:52 <carl_baldwin> yamahata: anything to add?
15:27:10 <yamahata> carl_baldwin: nothing new this week.
15:27:28 <carl_baldwin> #topic bgp-dynamic-routing
15:27:45 <carl_baldwin> #link https://blueprints.launchpad.net/neutron/+spec/bgp-dynamic-routing
15:27:52 <carl_baldwin> #link https://blueprints.launchpad.net/neutron/+spec/neutron-bgp-mpls-vpn
15:28:10 <carl_baldwin> nextone92: are you around?
15:28:59 <carl_baldwin> I spent some time reviewing the bgp-mpls bp this week and made some notes.
15:29:38 <carl_baldwin> It looks like a few key people aren't around this week to discuss.  So, I'll try again next week.
15:30:02 <carl_baldwin> #topic DNS lookup of instances
15:30:32 <carl_baldwin> Really quick, I’m almost done writing a blueprint for this.  Then, I need get it reviewed internally before I can post it.
15:30:42 <carl_baldwin> I hope to have more to report on this next week.
15:30:51 <ajo> sounds interesting, thanks carl_baldwin
15:30:57 <carl_baldwin> #topic Agent Performance with Wrapper Overhead
15:31:07 <carl_baldwin> #link https://etherpad.openstack.org/p/neutron-agent-exec-performance
15:31:31 <carl_baldwin> This has come up on the ML this week.  I have worked on it some so I created this etherpad.
15:32:06 <rossella_> carl_baldwin: nice summary on the etherpad
15:32:18 <ajo> yes, thanks carl_baldwin  :)
15:32:39 <carl_baldwin> rossella_: ajo: thanks
15:32:59 <carl_baldwin> So, there are a number of potential ways to tackle the problem.
15:33:36 <carl_baldwin> I'm wondering what could be done for Icehouse.
15:33:57 <nextone92> carl_baldwin - sorry I'm so late to join the meeting
15:34:03 <Sudhakar_> carl_baldwin, thanks for putting up the doc. looking forward on this..
15:34:06 <ajo> Yes, I had that thought too carl_baldwin
15:34:09 <rossella_> Icehouse is now
15:34:19 <rossella_> we can't do much
15:34:23 <Swami> Carl: Sorry I am late today
15:34:35 <safchain> yes, I will have a look to this etherpad
15:34:37 <carl_baldwin> Swami: nextone92: Hi
15:34:59 <Swami> Carl: hi
15:35:04 <ajo> Yuriy's idea (priviledged agent) doesn't look bad from the point of view of keeping all in python. But looks like it requires more changes into neutron. Too bad to be at the end of the cycle.
15:35:10 <carl_baldwin> rossella_: I fear you are right.  There isn't much unless we can find bugs that could be fixed in short order.
15:35:24 <YorikSar> o/
15:35:31 <YorikSar> I'm that Yuriy.
15:35:40 <ajo> Ho YorikSar ! :)
15:35:43 <ajo> Hi :)
15:35:51 <YorikSar> I don't think it'll become very intrusive
15:35:57 <carl_baldwin> ajo: YorikSar: My thinking is similar.  It may be a very good long term solution.
15:36:25 <carl_baldwin> YorikSar: I noticed your additions to the ether pad only this morning so I have not had a chance to review them.
15:36:35 <YorikSar> We basically need to replace execute() calls with smth like rootwrap.client.execute()
15:36:40 <ajo> I'm just worried with, for example, memory consumtion. We must keep all instances tied tight... to avoid leaking "agents"
15:37:08 <YorikSar> ajo: They can kill themselves by timeout.
15:37:27 <YorikSar> Then we won;t leak them.
15:37:35 <ajo> And at client exit
15:37:50 <ajo> May be, for ones running inside a netns: kill by timeout
15:37:51 <YorikSar> ajo: Yeah. Which can end up basically the same.
15:38:03 <ajo> the system-wide ones: kill by client exit + longer timoeut
15:38:51 <ajo> carl_baldwin, do you think this approach could have the potential to be backported to Icehouse if it's tackled from now to the start of Juno?
15:40:11 <YorikSar> ajo: I'm thinking about trying to push this to oslo.rootwrap, actually. So backporting will be minimal, but it'll be another feature.
15:40:24 <carl_baldwin> ajo: I don't think it adds features and it wouldn't change the database.  So, I think there might be hope for it.
15:40:45 <ajo> carl_baldwin, do we have a bug filed for this?
15:40:51 <carl_baldwin> ... not a new feature from the user perspective.  More of an implementation detail.
15:41:09 <ajo> Yes, we're killing a cpu-eating-bug....
15:41:31 <YorikSar> carl_baldwin: Oh, yes. Agree.
15:41:49 <carl_baldwin> It is a significant implementation detail though.
15:41:59 <ajo> yes, I agree carl_baldwin
15:42:00 <carl_baldwin> I don't think there is one overarching bug for this.
15:42:22 <carl_baldwin> I have filed detailed bugs for some of the individual problems that I've found and fixed.
15:42:50 <ajo> carl_baldwin, I can fill a bug with the details
15:42:56 <carl_baldwin> ajo: Great.
15:42:58 <ajo> (basically, the start of the latest mail thread)
15:43:08 <ajo> #action fill bug about the rootwrap overhead problem.
15:43:15 <ajo> is it done this way?
15:43:33 <ajo> sorry, I'm almost new to meetings
15:44:02 <haleyb> carl_baldwin: perhaps for icehouse all we can do is continue chipping away at unnecessary calls, and maybe get your priority change in?  my $.02
15:44:06 <carl_baldwin> ajo: I think you need to mention your handle after action.  But, yes.  Everyone should feel free to add their own action items.
15:44:31 <ajo> #action ajo fill bug about the rootwrap overhead problem.
15:44:36 <Swami> is that even possible for icehouse, at this time
15:44:42 <rossella_> haleyb: +1
15:44:56 <YorikSar> I'm going to work on POC for that agent soon, btw.
15:45:10 <carl_baldwin> haleyb: Yes.  I'm hoping to wrap up that priority change this week as a bug fix.
15:45:11 <YorikSar> It's going to be interesting stuff to code :)
15:45:39 <ajo> may be, for icehouse, I could try to spend some time in reducing the python subset in the current rootwrap, and get a C++ translation we can use.
15:45:46 <carl_baldwin> Swami: I imagine there is little that can be done for Icehouse.  Only bug fixes and I imagine that significant changes will not be accepted.
15:46:04 <Swami> Yes that's my thought as well.
15:46:07 <ajo> (automated one), but I'm unsure about the auditability of such solution. that might require some investigation.
15:46:18 <carl_baldwin> ajo: It might be worth a try.  That is something I'm not very familiar with though.
15:47:07 <ajo> carl_baldwin: may be it's not much work <1 week, I could try to allocate the time for that with my manager...
15:47:30 <ajo> I have found speed improvements of >x50 with the C++ translation, but the python subset is rather reduced.
15:48:07 <carl_baldwin> ajo: Remember that we need to reduce start up time and not necessarily execution speed.
15:48:23 <ajo> Yes, that's greatly reduced, let me look for some numbers I had.
15:48:36 <carl_baldwin> ajo: sounds good.
15:48:51 <carl_baldwin> There are updates to "sudo" and "ip" that can help at scale.  These fall outside the scope of the Openstack release.
15:49:14 <YorikSar> I wouldn't actually call switching to some subset of Python staying with Python. it'd still be some other language.
15:49:30 <carl_baldwin> Is there any documentation existing in openstack about tuning at the OS level?
15:49:51 <YorikSar> But it might worth it to compare our approaches and probably come up with some benchmark.
15:49:54 <ajo> 1 sec. getting the numbers
15:50:00 <carl_baldwin> If so, I thought we could add some information from the ether pad to that document.  If not, it could be created.
15:50:23 <mwagner_lap> carl_baldwin, not sure if there any docs on tuning at the OS level
15:50:31 <ajo> http://fpaste.org/85068/25818139/
15:50:49 <mwagner_lap> assuming you are talking about the neutron server itself
15:50:52 <ajo> [majopela@redcylon ~]$ time python test.py
15:50:53 <ajo> real	0m0.094s
15:50:58 <ajo> [majopela@redcylon ~]$ time ./test
15:50:58 <ajo> real	0m0.004s
15:51:37 <carl_baldwin> #action carl_baldwin will look for OS level tuning documentation and either augment it or create it.
15:52:20 <ajo> carl_baldwin, there is an "iproute" patch, and a "sudo" patch, could you add them to the etherpad?
15:52:46 <carl_baldwin> FWIW, my efforts at consolidating system calls to run multiple calls under a single wrapper invocation have shown that it is extremely challenging with little reward.
15:53:18 <carl_baldwin> ajo: I believe those patches are referenced from the etherpad.
15:53:28 <ajo> ah, thanks carl_baldwin
15:53:48 <ajo> you're right , [3] and [2]
15:53:57 <carl_baldwin> ajo: Some of them are rather indirect.  I'll fix that.
15:54:24 <carl_baldwin> #action carl_baldwin will fix references to patches to make them easier to spot and follow.
15:54:40 <ajo> carl_baldwin, doing it as we talk, :)
15:54:49 <carl_baldwin> ajo: cool, thanks.
15:56:09 <carl_baldwin> So, ajo and YorikSar We'll be looking forward to seeing what you come up with.  Keep the ether pad up and we'll collaborate there.
15:56:43 <YorikSar> ok
15:56:50 <carl_baldwin> Anything else?
15:57:12 <carl_baldwin> #topic General Discussion
15:57:48 <ajo> carl_baldwin,
15:58:01 <ajo> I've seen neighbour table overflow messages from kernel,
15:58:09 <ajo> when I start lots of networks,
15:58:13 <ajo> have you seen this before?
15:58:36 <ajo> lots (>100)
15:58:43 <safchain> ajo, which plugin/agent ?
15:58:51 <haleyb> ipv6 error?  i think we've seen that too
15:58:54 <ajo> normal neutron-l3-agent
15:58:59 <ajo> with ipv4
15:59:24 <ajo> and openvswitch
15:59:27 <carl_baldwin> I believe that we have seen it but I did not work on that issue directly.  So, I cannot offer the solution.
15:59:39 <ajo> It's in my todo list
15:59:54 <ajo> I tried to tune the ARP garbage collection settings on the kernel
16:00:03 <ajo> but, I'm not sure if it's namespace related
16:00:28 <haleyb> ajo: found my notes - yes, found and solution is to increase size - gc_thresh*
16:00:31 <carl_baldwin> I've got a hard stop at the hour.  Feel free to continue discussion in the neutron room or here if no one has this room.
16:00:52 <carl_baldwin> Thank you all who came and participated.
16:01:05 <safchain> thx carl_baldwin
16:01:08 <haleyb> ajo: neighbor table is shared between all namespaces
16:01:09 <carl_baldwin> Please review the meetings logs and get back to me about potential time change for this meeting.
16:01:17 <Sudhakar_> thanks carl
16:01:22 <carl_baldwin> Bye!
16:01:23 <carl_baldwin> #endmeeting