15:04:43 <Swami> #startmeeting distributed_virtual_router
15:04:44 <openstack> Meeting started Wed Aug 20 15:04:43 2014 UTC and is due to finish in 60 minutes.  The chair is Swami. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:04:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:04:47 <openstack> The meeting name has been set to 'distributed_virtual_router'
15:04:51 <Swami> Bhooshan: hi
15:04:55 <pcm_> hi
15:05:09 <Rajeev> Swami: Hi
15:05:22 <Swami> Rajeev: hi
15:05:29 <Swami> pcm_:hi
15:06:06 <Swami> #info Feature Proposal freeze is this week
15:06:33 <Swami> #topic Agenda
15:06:42 <Swami> 1. DVR Update
15:07:17 <Swami> DVR is currently under test.
15:07:52 <Swami> The code has been completely merged.
15:08:20 <Swami> DVR team is currently focused on testing, fixing bugs.
15:09:00 <Swami> #topic Horizon
15:09:20 <Bhooshan> Completed code changes for enhancement of Horizon to support DVR. Patchset4 has been uploaded.
15:09:23 <Swami> Bhooshan: I saw your patch yesterday
15:09:41 <Bhooshan> As per the last meeting removed checkbox from has been replaced with drop down menu.
15:09:53 <Swami> Bhooshan: I was not able to test your code, I will try to test it today.
15:10:15 <Bhooshan> swami: fine
15:10:17 <Bhooshan> Administrator can choose "Use server default", “Distributed” and “Centralized” from dropdown menu.
15:10:39 <Swami> Bhooshan: Thanks for the accomodating those change request
15:10:51 <Bhooshan> "Use server default" won’t carry “distributed” flag along with the REST request.
15:11:12 <Swami> Bhooshan: Good to know.
15:11:21 <Bhooshan> Now I am writing unit test case for the DVR support and addressing the comments.
15:11:41 <Swami> Bhooshan: The only other pending item is the "Router update" or "Edit" tab
15:12:11 <Swami> Amotoki mentioned that it can be done by Juno 3.
15:12:38 <Bhooshan> Amotoki replied for this in one mail
15:12:45 <Swami> If you are pretty much occupied on this initial effort, can I ask amotoki to work on the edit tab.
15:13:03 <Bhooshan> update router will go as different patch
15:13:38 <amotoki> Swami: Bhooshan: I am okay with either case.
15:13:53 <Swami> amotoki: thanks for jumping in.
15:13:57 <Bhooshan> ok, let wait till we will this patchset
15:14:11 <Swami> So let us keep both as separate patches.
15:14:21 <Bhooshan> till we merge this patchset
15:14:39 <Swami> Bhooshan you can focus on wrapping up the current work that you are doing along with amotoki's work
15:14:39 <Bhooshan> I am planning to finish unit tests by tomorrow
15:14:49 <Bhooshan> fine
15:15:04 <Swami> I will request amotoki to put in another patch for the router edit tab.
15:15:14 <Bhooshan> Ok.
15:15:29 <Swami> amotoki: are you ok with this proposal
15:15:42 <amotoki> Swami: yes. no problem.
15:15:55 <Swami> amotoki: Thanks for your help.
15:16:39 <Bhooshan> Amotoki: Thank you for all your helps
15:16:55 <Swami> Bhooshan: can you add your patch link into the irc
15:17:19 <Swami> Just for people to review
15:17:28 <Bhooshan> https://review.openstack.org/112583
15:18:06 <Swami> DVR folks please review the horizon UI pages for DVR.
15:18:11 <amotoki> Swami: is "Edit router" for admin panel in your mind? "Edit Router" in project panel provides not so much value.
15:18:52 <Swami> amotoki: Yes edit router is only for "admin" at this time to allow an admin to update a router from a legacy to "distributed".
15:20:04 <Swami> amotoki: is it clear
15:20:15 <amotoki> Swami: In my undertstanding, Juno supports only updating a router from legacy to distributed. update from dvr to legacy is not supported.
15:20:40 <Swami> amotoki: Yes your understanding is right.
15:20:52 <amotoki> Swami: thanks. it is now clear to me.
15:21:02 <Swami> ok
15:21:07 <Swami> Let us move on to the next topic
15:21:14 <amotoki> there are many patches related to dvr and i cannot track teh status completely :-(
15:21:15 <Swami> #topic Bugs
15:21:30 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1358718
15:21:31 <uvirtbot> Launchpad bug 1358718 in neutron "duplicate ping packets from dhcp namespace when pinging across DVR subnet  VMs" [Medium,New]
15:21:47 <Swami> I have posted the link to the DVR bug list
15:22:33 <Swami> Most of the bugs we have assignee
15:22:48 <Swami> A couple of new bugs have poped yesterday.
15:23:35 <Swami> One of the bug requires some discussion
15:23:35 <carl_baldwin> I think our High importance bugs look like they’re under control.  I’m having difficulty keeping up with the Medium.
15:24:00 <Swami> #link https://bugs.launchpad.net/neutron/+bug/1358998
15:24:01 <uvirtbot> Launchpad bug 1358998 in neutron ""No L3 agents can host the router..." traces for DVR" [Medium,New]
15:24:14 <Swami> This was one of the bug that was filed by armando yesterday
15:24:14 <carl_baldwin> If you notice that there has not been enough reviewer attention on a bug fix, could you ping me?
15:24:39 <Swami> carl_baldwin: is there a specific bug
15:25:05 <Swami> carl_baldwin: got it I will let you know if there is a patch that is waiting for review.
15:25:13 <carl_baldwin> Swami: no.  No specific bug.  Thanks.
15:25:42 <Swami> carl_baldwin: sure will do.
15:26:16 <Swami> The bug that i have posted the link above is caused by FWaaS failing in the gate.
15:26:41 <Swami> The problem for this is our router namespaces are not getting created and so the Firewall raises an error.
15:27:24 <Swami> The current firewall does not support DVR and the router configured is all dVR, so the behavior is odd.
15:28:04 <carl_baldwin> Swami: You mean that the bug is the cause for FWaaS failing?
15:28:07 <Swami> In dvr we only create an IR if there is a VM on the network or in the case of service node we create an IR if there is a gateway and interface attached to the IR.
15:28:32 <Swami> carl_baldwin: Yes that's what armando mentioned to me yesterday.
15:28:49 <carl_baldwin> Swami: Thanks for the clarification.
15:29:08 <Swami> The log message by itself is kind of missleading we might have to correct the log message.
15:29:23 <carl_baldwin> After looking at this further, I wonder if Medium is appropriate for this bug.
15:29:41 <Swami> When there is no VM in the network, the scheduler comes back and says that there are no valid L3 agents right now.
15:30:27 <Swami> The main reason I brought up this topic is because myself, armando had concerns with the way the current IR's are created.
15:30:44 <Swami> I just wanted to get some feedback from the team on how we can handle this situation.
15:31:10 <Swami> There are two options
15:31:57 <Swami> 1. Fix the FWaaS code to handle DVR routers in the gate testing so that it never waits for the IR and when IR is created the FWaaS should apply the rules. In this case the L3scheduler to provide information to the FwaaS agent when the IR is created.
15:32:38 <Swami> Option: 2. We should create IR's irrespective of the VM's availability.
15:32:51 <Swami> What is your take on this.
15:33:33 <Swami> armando: ping
15:33:50 <Rajeev> Swami: thanks for putting out the options. Don't like 2
15:34:25 <Swami> Rajeev: Thanks for your input
15:34:36 <Swami> armando seems to align on option 2.
15:35:07 <Rajeev> Swami: is 2 for only single node scenario ?
15:35:25 <carl_baldwin> Swami: Could I get a little more detail on option 2?  In what case would we create an IR where we don’t now?
15:36:03 <Swami> No he was ok for multi node as well.
15:36:31 * chuckC_ wonders if armax gets notified when his handle appears
15:36:33 <Rajeev> Swami: Is FWaaS the only reason for 2 ?
15:36:45 <armax> chuckC_: hi
15:36:55 <Swami> carl_baldwin: Armando is suggesting can we go ahead and create IR's similar to centralized routers on all nodes, where there are active L3 agents, instead of checking for a dvr related port to pop up.
15:36:57 <chuckC_> armax: guess not!
15:37:19 <armax> yeah, armax would do the trick
15:37:23 <Swami> armax: are you there
15:37:32 <armax> yes
15:37:46 <armax> hang on let me read through
15:37:52 <carl_baldwin> Swami: So, all compute nodes would host all IRs?
15:37:58 <Swami> armax: we are just going through the two options that we were discussing regarding the IR creation
15:38:25 <Swami> carl_baldwin: Yes
15:38:27 <armax> Swami: right, I am still on the fence as to what option is best
15:39:10 <Rajeev> Swami: I still want to know what the motivation is for these options ?
15:39:12 <carl_baldwin> Swami: armax:  That just won’t scale.  Network namespaces are not as resource intensive as VMs but they are not free.  We cannot load a compute host with 1000s of NSs.
15:39:14 <armax> Swami: the fact that we create namespaces under some (not well documented) circumstances
15:39:17 <WormMan> I think all compute nodes having every router would be messy and complicated. We run nova-network MultiHost and 64 tenants/vlans is ba enough there
15:39:25 <WormMan> bad enough
15:39:27 <armax> makes it difficult to understand when and where namespaces should or should not be
15:39:41 <armax> option 2 is only better to rule out any logic error
15:39:44 <Swami> carl_baldwin: agreed, that was my original comment on that option.
15:39:47 <armax> if namespaces are always supposed to be there
15:40:41 <armax> then the presence of namespace or lack thereof will tell immediately if there’s a sympthom or not
15:40:59 <armax> carl_baldwin: agreed that performance may be an issue
15:41:26 <carl_baldwin> armax: Performance will be an issue.  A big one.
15:41:28 <armax> it’s not a matter of namespace handles on the hosts
15:41:41 <armax> carl_baldwin: but also the notifications that would result in getting those namespaces there
15:42:25 <Rajeev> control plane traffic after the Namespaces are created would be high too
15:42:52 <Rajeev> because updates will get directed to all nodes
15:42:53 <Swami> So are we all in an agreement that Option 2 is ruled out.
15:43:07 <carl_baldwin> +1
15:43:32 <Swami> Ok, now let us come back to the Option 1.
15:43:38 <Rajeev> +1 would feel better if I knew why we are bringing these options
15:44:20 <Rajeev> Is there any other reason than the Fwaas and simplicity ?
15:44:32 <Swami> So in order to fix this issue for now, we either have to fix the current FWaaS tempest test suite that tries to configure the FWaaS for Distributed routers
15:45:18 <Rajeev> In that case 2 sounds like an overkill
15:45:21 <carl_baldwin> Is the fwaas agent the same as the l3 agent?
15:45:53 <Swami> Yes it is using the l3 agent, there is no separate agent for it.
15:47:04 <armax> when looking at the fwaas test errors yestarday I noticed that the test might pass based on a dirty state of the L3 agent
15:47:24 <Swami> rajeev: Yes in terms of simplicity option 2 would be more simple, but for performance reasons we have to move away from option 2.
15:48:10 <Swami> armax: Is that a trivial change.
15:48:16 <armax> Swami: I haven’t root caused it completely…but the fact that we make the transition of the firewall state from PENDING_CREATE to CREATE by looking also at the actual namespaces on the agent’s node is a bit weird
15:49:07 <Swami> armax: I think it is because they are dependent on the router, so they have a check in there.
15:49:07 <armax> Rajeev, Swami: let’s put at rest option 2. The only reason I brought it up was because we found so many issues with namespace placement that I was advocating for option 2, only as a first step and then walk back to address the performance issues
15:49:36 <armax> but that ship has sailed in my opinion, it seems we got a good handle on how to deal with snat/ir namespaces etc.
15:49:40 <carl_baldwin> armax: That is a bit wierd.
15:50:01 <Rajeev> armax: understood. thanks for explaining.
15:50:02 <armax> so let’s stick with option 1
15:50:13 <Swami> agreed.
15:50:21 <armax> and perhaps let’s document somewhere in the code where/when the placement happens
15:50:33 <armax> because just reading the code it’s not super clear
15:50:54 <carl_baldwin> armax: +1 we need to be clear about this.
15:50:58 <Swami> armax: carl_baldwin: I have initiated a thread with the FWaaS team on this issue and probably we might fix this behavior for the DVR with their current implementation.
15:51:01 <Rajeev> +1
15:51:54 <armax> Swami: the other reason why I brough option 2 up was because of this b
15:51:56 <armax> bug
15:52:10 <armax> #link https://bugs.launchpad.net/neutron/+bug/1358998
15:52:11 <uvirtbot> Launchpad bug 1358998 in neutron ""No L3 agents can host the router..." traces for DVR" [Medium,New]
15:52:14 <armax> not a bug per se...
15:52:34 <armax> but the fact that we’re looking a failure in router placement is confusing to the eye of the admin
15:52:54 <armax> anyhow, let’s go back to fwaas/dvr
15:52:56 <Swami> armax: yes that's what leaded us to this discussion here in the IRC
15:53:13 <armax> gotcha
15:53:14 <carl_baldwin> armax: I haven’t totally wrapped my head around the problem.  It could just be a misleading log message.
15:53:23 <armax> carl_baldwin: it is indeed
15:53:33 <carl_baldwin> Or, maybe it is a bigger problem?
15:53:51 <armax> carl_baldwin: right…we can’t tell for sure and it depends on each case
15:54:02 <carl_baldwin> Hence my inability to make up my mind about the bugs Importance.  :)
15:54:39 <Swami> armax: carl_baldwin: The log messages requires some clean up, there are some misleading logs. We will try to clean it up as part of the snat fixes.
15:54:41 <armax> hence the whole story around option 2; if we had chosen that path intiially only to improve it aftewards we would’ve been able to tell
15:55:27 <Swami> ok.
15:55:39 <Swami> #topic DVR migration patch
15:56:17 <Swami> Mike is currently working on the migration patch, but he is also having challenges in migrating a legacy router with multiple subnets to a DVR router.
15:56:56 <mrsmith> Swami: yes - a couple probs
15:57:10 <Swami> If VMs are scattered across multiple subnets for a legacy router, there is no cleaner way to migrate the legacy router and create IR on all respective compute Nodes.
15:57:29 <mrsmith> the current code doesn't handle multiple subnets well - so there is a fix for that in progress
15:58:07 <mrsmith> it comes back to our previous discussion on IRs on CNs and checking for VMs being present or not
15:58:36 <Swami> mrsmith: thanks for the update
15:58:37 <mrsmith> I keep hitting db errors related to open sessions and not rolling back previous sessions
15:58:57 <mrsmith> hopefully I will have more progress today
15:59:02 <mrsmith> Swami: np
15:59:15 <Swami> #topic Open Discussion
15:59:24 <mrsmith> quick - time low
15:59:36 <Swami> Any other items that we need to discuss
16:00:00 <Swami> Ok folks, thanks for joining the meeting.
16:00:13 <Swami> If we have any other topic we can discuss tomorrow in the L3 meeting.
16:00:18 <carl_baldwin> mrsmith: Is there any way to fix this in stages?  Maybe throw an exception if there are multiple subnets in the first patch and follow on with other patches?
16:00:18 <Swami> bye
16:00:36 <mrsmith> carl_baldwin - yes possible
16:00:43 <mrsmith> I have considered that
16:00:48 <Swami> carl_baldwin: that might be a possibility. We will try to explore more on this today and give an update.
16:00:57 <mrsmith> k
16:01:01 <carl_baldwin> mrsmith: Swami: thanks
16:01:07 <Swami> sorry we are at the end of hour.
16:01:07 <Swami> bye
16:01:11 <Swami> #endmeeting