15:00:23 <carl_baldwin> #startmeeting neutron_l3
15:00:24 <openstack> Meeting started Thu Jul 31 15:00:23 2014 UTC and is due to finish in 60 minutes.  The chair is carl_baldwin. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:25 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:28 <openstack> The meeting name has been set to 'neutron_l3'
15:00:31 <carl_baldwin> #topic Announcements
15:00:37 <kevinbenton> o/
15:00:37 <carl_baldwin> #link https://wiki.openstack.org/wiki/Meetings/Neutron-L3-Subteam
15:00:53 <mrsmith> howdy carl_baldwin
15:01:07 <carl_baldwin> Already a week in to Juno-3.  Things are moving fast.
15:01:14 <safchain> hi
15:01:29 <carl_baldwin> juno-3 is targetted for September 4th.
15:01:38 <carl_baldwin> #link https://wiki.openstack.org/wiki/Juno_Release_Schedule
15:02:26 <carl_baldwin> Also, the initial DVR implementation has been merged.  This should enable broader testing.
15:02:46 <carl_baldwin> The infra patches to enable experimental job have also merged I think.
15:02:52 <armax> carl_baldwin: the first experimental job is running as we speak
15:03:12 <armax> http://status.openstack.org/zuul/
15:03:18 <Swami> great.
15:03:25 <carl_baldwin> armax: Great.  I was just going to look this morning.
15:03:33 <armax> https://jenkins06.openstack.org/job/check-tempest-dsvm-neutron-dvr/1/console
15:03:59 <armax> for review 108177,10
15:04:49 <carl_baldwin> Does this require an explicit “check experimental” to run?
15:04:59 <armax> yes
15:05:18 <armax> I did post comment ‘check experimental’ explictily
15:05:37 <carl_baldwin> #topic neutron-ovs-dvr
15:06:15 <carl_baldwin> all:  Make use of this job on our DVR related patches.
15:06:49 * carl_baldwin goes to run ‘check experimental’ on his reviews
15:07:07 <carl_baldwin> Swami: Anything to report?
15:07:22 <Swami> carl_baldwin: hi
15:07:43 <Swami> we had a couple of issues that we wanted to discuss
15:07:52 <Swami> This is related to migration
15:08:52 <carl_baldwin> You have the floor.
15:08:55 <Swami> The first question that we have is for a "router-migration", can we make use of "admin_state_up/admin_state_down" first before issuing a router-update
15:09:32 <carl_baldwin> Do you mean to require that a router be in admin_state_down before migration?
15:09:43 <armax> Swami: I think it’s sensible
15:09:46 <Swami> The reason is, when admin issues these commands, the existing state of the routers are cleaned up and then we can move or migrate the routers to the new agent.
15:10:00 <Swami> carl_baldwin: Yes
15:10:22 <mrsmith> it comes down to the admin running 3 commands or one
15:10:29 <carl_baldwin> Swami: I don’t see a problem with that.  It should be documented.
15:10:38 <armax> admin_state_down/up are there for a reason
15:10:57 <Swami> We were initially debating that will the admin be ok, in issuing two commands for a migration. First would be to set the 'admin_state" and next would be to do an update.
15:11:11 <viveknarasimhan> could we do this internally
15:11:13 <armax> we could flip the state automatically in the migration process, but I’d vote to be more explicit
15:11:19 <viveknarasimhan> without the admin explicitly running 3 commands.
15:11:33 <viveknarasimhan> does he need to know 3 commmands to be executed in a certain order?
15:11:38 <yyywu> i vote for explicit, this is one time thing, right?
15:11:58 <armax> the workflow usually goes like this: you warn your tenant of a maintenance
15:12:02 <armax> you bring the router down
15:12:04 <armax> you migrate
15:12:08 <carl_baldwin> Can the admin run them one after the other with no delay?
15:12:13 <armax> you bring it up (and hope that everything works)
15:12:24 <armax> then go back to tenant and tell him that everything is okay :)
15:12:28 <Swami> viveknarasimhan: I agree with you and that is the reason we wanted concensus from all of us before proceeding.
15:12:37 <carl_baldwin> Or, does the admin need to wait on something before being allow to run the migration?
15:13:11 <Swami> carl_baldwin: we need to check it out.
15:13:34 <Swami> So if we all agree with armax: this is how it has to be done.
15:13:39 <carl_baldwin> I ask because it could increase router downtime.  But, it is a one time migration and can be planned downtime.
15:13:56 <viveknarasimhan> if some command in teh 3 fails
15:13:59 <viveknarasimhan> is there a way to rollback
15:14:04 <mrsmith> I vote for explicit - it is more straight forward
15:14:07 <viveknarasimhan> or he need to recreate the centralized router again?
15:14:07 <Swami> We will document that "admin" need to first bring down the router, migrate the router and then tell the tenant to use it.
15:14:44 <armax> carl_baldwin: i imagine that it’s better being explicit
15:14:44 <Swami> any questions or concerns there.
15:15:15 <armax> we might want to give us some room before the router going down and the migration
15:15:30 <armax> if we do everythin in one shot there’s a risk something gets scheduled on that router in between
15:15:49 <yyywu> armax, agreed
15:15:54 <Swami> armax: agreed
15:16:00 <armax> scheduled as in something happens  to that router
15:16:01 <carl_baldwin> I don’t think I’m concerned.
15:16:12 <armax> unlikely, but you never know
15:16:18 <viveknarasimhan> if a problem happens, it could happen in 3 step process as well
15:16:19 <yamamoto_> explicit sounds less surprising in POV of admins
15:16:27 <yyywu> and rollback is part of migration failure case, right?
15:17:12 <Swami> yyywu: when do you think that rollback should happen
15:17:46 <yyywu> Sawmi: I am thinking if migration failure happened, rollback should kick in.
15:17:47 <Swami> right now we are not targeting 'rollback" but we can flag a "migration-error" if something odd happens.
15:18:14 <armax> rollback can’t really happen if we don’t implement the distributed->centralized path
15:18:21 <yyywu> Swami: i think we can live as it.
15:18:44 <armax> a recovery procedure would be to destroy and recreate the router (with all the interfaces and gateway associated with it)
15:18:45 <Swami> armax: you are right.
15:19:10 <carl_baldwin> Anything else on migration / admin_state?
15:19:31 <Swami> carl_baldwin: admin_state is done.
15:19:41 <Swami> Next question is on the VM migration.
15:20:33 <Swami> How will vm migration be handled during router conversion.
15:20:41 <Swami> There are two cases.
15:21:15 <Swami> One admin would like to use the same compute node, so they will not disturb the VM, but restart the l3-agent with DVR mode enabled.
15:22:29 <Swami> The other case is where, the admin wants to move all their VM to a greenfiled deployment for DVR enabled Nodes. So they bring up new Compute nodes with DVR enabled L3-agents. In this case the VM migration is out of scope for the dvr team.
15:22:47 <carl_baldwin> The first is the only scenario I had in mind.
15:23:25 <mrsmith> on the first l3-agent would need to be updated as well as ovs
15:23:41 <carl_baldwin> Is this live migration in the second case?
15:24:04 <Swami> carl_baldwin: Ok, if we only target the first scenario, then we will go through the use cases.
15:24:17 <yyywu> one question, during router conversion, could nova initiate vm migration?
15:24:27 <Swami> carl_baldwin: yes it is a kind of live migration.
15:25:06 <Swami> yyywu: nova has no idea of router conversion, I don't nova will be aware about the router changes.
15:25:16 <carl_baldwin> I think we should consider the second case out of scope.
15:25:39 <armax> Swami: prior to doing the migration every compute host needs to run l2 (with dvr enalbed) and l3 agent
15:25:44 <armax> correct?
15:25:54 <viveknarasimhan> correct armax
15:26:14 <Swami> armax: agreed.
15:26:31 <Swami> So the admin issues a "router_admin_state_down".
15:26:34 <armax> does it make sense to keep the compute host disabled well during the migrtion?
15:26:48 <Swami> The the admin prepares the compute node for migration.
15:27:12 <Swami> And the admin updates the router for migration.
15:27:33 <armax> my understanding was that duing a planned upgrade the admin would deploy the right services with the right configs
15:27:35 <Swami> This is for the "case 1" where we use the existing compute nodes.
15:27:38 <armax> on the elements of the cloud
15:28:24 <armax> but router migration should probably be a step right after the upgrade is complete
15:28:47 <armax> not 100% true in every case
15:29:07 <armax> especially if the default router type is ‘centralized'
15:30:01 <Swami> carl_baldwin: armax: viveknarasimhan: mrsmith: are we all in an agreement
15:30:25 <mrsmith> Swami: on VM migration?
15:30:33 <mrsmith> focus on case 1 ?
15:30:45 <carl_baldwin> I think so.  I’m not keen on adding the second case to our scope.
15:31:05 <viveknarasimhan> i agree. we will try to get case 1 fully covered
15:31:07 <carl_baldwin> Maybe in kilo if there is demand.
15:31:14 <viveknarasimhan> case 2 looks bit complex
15:31:16 <armax> makes sense
15:31:34 <carl_baldwin> Swami: anything else?
15:31:39 <armax> even though moving a vm to a new host
15:31:41 <Swami> mrsmith: carl_baldwin's reply should have answered your question for VM migration. We should reduce our scope to the Case 1: that we discussed.
15:31:48 <armax> does look like pretty much like a scheduling event
15:31:56 <armax> so dvr should handle it just as well
15:32:08 <Swami> carl_baldwin: that's all from me.
15:32:48 <carl_baldwin> Swami: thanks.  Let’s get pounding more on the DVR code and fixing bugs.  We’ve already got some fixes done and a few more on the way.  Great job!
15:33:02 <yamamoto_> i have a small dvr question
15:33:11 <carl_baldwin> yamamoto_: go ahead.
15:33:17 <Swami> carl_baldwin: np
15:33:17 <yamamoto_> see https://review.openstack.org/#/c/110188/
15:33:30 <yamamoto_> it's about ofagent but ovs-agent looks same
15:33:49 <yamamoto_> isn't it a problem for unbind_port_from_dvr?
15:34:43 <carl_baldwin> yamamoto_: this will take a a bit to look into.  Do you mind if we take the question to the neutron room?
15:35:00 <yamamoto_> np.  i just wanted dvr folks know.
15:35:26 <carl_baldwin> Now is a good time to grab them in the neutron room.
15:35:31 <carl_baldwin> #topic l3-high-availability
15:35:37 <carl_baldwin> safchain: armax:  Any update here?
15:35:51 <safchain> hi
15:35:55 <armax> carl_baldwin: going through the review bits
15:36:19 <armax> armax: I need to allocate more time to this though
15:36:22 <safchain> I addressed comments and reworked base classes
15:36:36 <aleksandr_null> Hi guys, sorry I'm late, tried to find correct meeting room :)
15:36:44 <amuller> Working on l3 agent functional testing: https://review.openstack.org/#/c/109860/
15:36:47 <carl_baldwin> I said I’d review last week and did not.  But, now with the bulk of DVR merged, I have some review cycles.
15:36:48 <amuller> Something basic for starters
15:37:04 <safchain> currently rebasing the scheduler part
15:37:19 <amuller> The l3 agent patch itself: https://review.openstack.org/#/c/70700/ - Adds HA routers to the functional tests
15:37:54 <amuller> Once that's working I'll be able to respond to reviewer comments and refactor the HA code in the l3 agent so that it isn't as obtrusive
15:38:33 <carl_baldwin> amuller: do you have a timeline for getting that working?
15:39:16 <amuller> the base patch that adds the functional tests is working
15:39:49 <amuller> the ha additions in the l3 ha agent patch aren't... I figure I need 2-3 days working on that and I'll start pushing new patchsets that change the code itself and not the functional tests
15:40:18 <carl_baldwin> amuller: Thanks.  I need to catch up on the progress.  I’ll review today.
15:40:25 <carl_baldwin> Anything else?
15:40:38 <safchain> ok for me
15:40:49 <amuller> all good
15:40:50 <carl_baldwin> Thanks
15:40:56 <carl_baldwin> #topic bgp-dynamic-routing
15:41:01 <carl_baldwin> devvesa_: hi
15:41:06 <devvesa_> hi
15:41:22 <devvesa_> sorry, i was out last week
15:41:59 <carl_baldwin> Anything to report?
15:42:30 <devvesa_> keep working on it, i am close to push a WIP patch soon
15:42:50 <devvesa_> so you can start review it
15:43:13 <carl_baldwin> devvesa_: That’d be great.
15:43:27 <carl_baldwin> Be sure to ping me when you post it and I’ll have a look.
15:43:41 <devvesa_> ok, great
15:44:00 <carl_baldwin> devvesa_: Anything else?
15:44:14 <devvesa_> no, anything else for the moment
15:44:39 <carl_baldwin> devvesa_: thanks
15:44:44 <devvesa_> thanks carl
15:45:10 <carl_baldwin> All of the other usual topics are deferred to Kilo.  I’ll defer discussion for now.
15:45:28 <carl_baldwin> #topic reschedule-routers-on-dead-agents
15:45:36 <carl_baldwin> kevinbenton: hi, this one is yours.
15:46:13 <kevinbenton> topic title is pretty self-explanatory. i would like routers to be taken off of dead agents so they can be automatically rescheduled
15:46:29 <kevinbenton> here is one approach https://review.openstack.org/#/c/110893/
15:46:59 <amuller> So, the L3 HA blueprint solves the same problem
15:47:09 <kevinbenton> this needs to be in icehouse
15:47:14 <kevinbenton> IMO
15:47:21 <kevinbenton> so i was hoping for a bugfix
15:47:33 <armax> kevinbenton, amuller I think the two overlap
15:47:47 <armax> and I see kevinbenton’s approach also a a contingency plan
15:47:51 <amuller> I was under the impression that people use pacemaker and other solutions currently
15:47:54 <carl_baldwin> This might cross line from bug fix to feature.  Might be hard to get in to Icehouse.
15:48:19 <armax> that mitigates the need of relying on external elements to the fail-over process
15:48:29 <carl_baldwin> amuller: We’ve toyed around with a pacemaker solution.  A colleague gave a talk at the Atl. summit.
15:49:01 <aleksandr_null> amuller: I could confirm it from Mirantis Fuel perspective
15:49:06 <amuller> We use a Pacemaker based solution in RH OpenStack as well
15:49:11 <amuller> to solve L3 HA issue
15:49:31 <kevinbenton> it’s annoying to have to use an external process to do something as simple as rescheduling
15:49:34 <aleksandr_null> we using pm/crm to manage l3 agents and forcing rescheduling of routers. Of course with some downtime ;(
15:49:49 <carl_baldwin> In our testing, we found it very easy to get in to situations where nodes start shooting themselves.  It turned out to be somewhat difficult to get right.
15:50:21 <amuller> kevinbenton: I agree, but I'm really conflicted if something like this should be merged... Since L3 HA is the concensus on how to do it, I'd be really careful with making the code any more complicated pre L3 HA
15:50:40 <amuller> L3 HA == VRRP blueprint
15:51:03 <kevinbenton> i don’t see how this is the same really
15:51:16 <kevinbenton> it just does what can be done with the existing API
15:51:21 <armax> I think kevinbenton’s proposal targets non HA deployments
15:51:31 <armax> granted we want to minimize potential code conflicts
15:51:49 <armax> so let’s see how the two develop and make a call later on when the code is more mature
15:51:56 <kevinbenton> armax: right, and i don’t think there would be
15:51:58 <aleksandr_null> From my point of view VRRP+cn_sync looks easier than rescheduling. In terms of used technologies. But it's not so easy to implement.
15:52:12 <kevinbenton> aleksandr_null: what?
15:52:16 <amuller> I know that Rackspace use something similar to your proposal Kevin
15:52:20 <kevinbenton> aleksandr_null: did you see my patch?
15:52:23 <armax> I’d see L3HA the canonical way of doing things
15:52:29 <kevinbenton> it’s like 10 lines
15:52:30 <amuller> they monitor the RPC bus and reschedule routers as needed
15:52:37 <aleksandr_null> kevinbenton: Will take a look, of course.
15:52:48 <aleksandr_null> armax: +1
15:53:10 <armax> that said, there are situations where L3HA as a solution won’t be available
15:53:11 <kevinbenton> armax: yes, l3ha is definitely the way to move forward, but I’m trying to address an issue in icehouse
15:53:19 <kevinbenton> if possible
15:53:28 <armax> now, people might have come up with their own solutions
15:53:37 <armax> homegrown and painful
15:53:50 <armax> I think kevinbenton is trying to see whether some of that pain can be taken away :)
15:53:56 <kevinbenton> it’s embarrassing that a node goes down and we just throw our hands up
15:53:57 <aleksandr_null> amuller: But what will happens if something will be wrong with communications inside of the cloud ? MQ fails from time to time, of course its out of scope but VRRP will do that autonomous
15:54:08 <carl_baldwin> I’m concerned that simply rescheduling will not be enough.  A pacemaker/corosync type solution would shoot the dead node.  This solution would not.  With the agent down, there is no one left around to clean up the old router.
15:54:53 <armax> I’d promote that effort, but I’d reserve me the judgment to see whether it’s icehouse/juno material once the code is complete
15:54:57 <amuller> kevinbenton: If the code doesn't end up being more complicated after it's properly tested, and properly solved the problem, then it's safe enough to merge as it is, but I have a gut feeling that you'll find that it's gonna end up a lot more complicated
15:54:59 <armax> kevinbenton: how far off are you?
15:55:17 <carl_baldwin> In many situations, the old routers could still be plumbed and moving traffic.
15:55:42 <kevinbenton> carl_baldwin: do your compute nodes frequently lose connectivity to the neutron server?
15:56:30 <aleksandr_null> kevinbenton: it depends on architecture of cluster. We had an situation when customer just disabled mgmt/comm network.
15:56:36 <carl_baldwin> kevinbenton: There are many reasons an agent can be considered dead.
15:56:38 <aleksandr_null> For a while.
15:56:58 <kevinbenton> armax: i have the basic patch there, but it doesn’t address zombie nodes like carl_baldwin mentioned
15:57:14 <kevinbenton> aleksandr_null, carl_baldwin: how does the openvswitch agent handle a broken management network?
15:57:34 <kevinbenton> we have to assume that’s down too then, right?
15:57:43 <aleksandr_null> yep.
15:58:13 <carl_baldwin> kevinbenton: yes, the agent goes inactive.
15:58:41 <kevinbenton> well then yes, i wasn’t aware we supported headless operational modes
15:59:03 <kevinbenton> my patch is pointless and this isn’t a problem that can be solved from the neutron server
16:00:04 <kevinbenton> because it doesn’t actually know if these routers are online or not
16:00:22 <aleksandr_null> IMHO this could be solved only by using autonomous solutions like vrrp, I dont' have any other solutions related to RPC/MQ because they couldn't work autonomous =(
16:01:41 <carl_baldwin> kevinbenton: It is still something that needs to be addressed.  Our pacemaker / corosync solutions have not been as great as we’d hoped.
16:01:44 <armax> kevinbenton’s solution obviously need cooperation between servers an agents
16:02:13 <armax> kevinbenton: saying its’ pointless is a bit harsh
16:02:14 <armax> :)
16:02:18 <aleksandr_null> of course mgmt network outage is extraordinary case. Rescheduling that kevinbenton suggests maybe improved by trying to monitor neighborhood nodes and/or mgmt net and if something happens with mgmt net then don't do anything. Just suggestion.
16:02:33 <armax> every solution has tradeoffs
16:02:45 <carl_baldwin> armax: +1
16:02:51 <aleksandr_null> +1
16:02:55 <armax> the larger question is: do we want to provide some degree of built-in functionality?
16:03:07 <armax> with any of the cons that may have?
16:03:18 <armax> external pcm/cm also has issues
16:03:37 <aleksandr_null> carl_baldwin: Completely agree. pm/crm doing almost the same the Kevin suggested and also wouldn't work fine if mgmt network is down.
16:03:46 <armax> so long as kevin’s proposal is not disruptive to the current effort for L3HA
16:03:52 <aleksandr_null> corosync cluster will just split up.
16:03:59 <amuller> kevinbenton: We'd appreciate any contributions to the L3HA efforts :)
16:04:02 <armax> I’d like to have the option to decide whether to take it or not
16:04:25 <aleksandr_null> I could test it in corosync environment and without it
16:04:26 <armax> amuller: indeed, would kevin’s time best put to L3HA?
16:04:31 <armax> kevinbenton: that’s a question for kevin :)
16:04:41 <amuller> kevinbenton: testing/reviewing would be awesome, and there's loose ends also
16:04:44 <armax> he might have an hidden customer requiremetn ;)
16:04:49 <kevinbenton> armax: that’s not going to happen right now. I need a solution for icehouse
16:04:54 <kevinbenton> armax: not hidden :-)
16:05:03 <armax> kevinbenton: right, you know what  I mean
16:05:47 <armax> kevinbenton: I think it makes sense if you keep on working on this, let’s revise the progress in a week
16:05:54 <carl_baldwin> I’m glad the discussion is opened.  HA will be a hard nut to crack.  Is this something we want to add to the permanent agenda?
16:06:05 <carl_baldwin> I just noticed we’re over time.  Anyone else waiting for the room?
16:06:09 <armax> and see how far we got
16:06:14 <armax> waaaay over time
16:06:32 * carl_baldwin is really sorry about going over time if someone is waiting for the room.
16:06:50 <aleksandr_null> looks like nobody :)
16:06:53 <armax> carl_baldwin: they would have kicked us out
16:06:56 <armax> :)
16:07:00 <armax> bye everyone
16:07:06 <yamamoto_> bye
16:07:10 <kevinbenton> bye
16:07:18 <aleksandr_null> bye guys!
16:07:25 <carl_baldwin> I’ve got to run.  I’d like to discuss rescheduling routers more.  I’ll keep it in the agenda near HA.
16:07:45 <carl_baldwin> I’ll also get some of our guys with experience with our HA solution on kevinbenton’s review to provide insight.
16:07:52 <carl_baldwin> Thanks all
16:08:06 <carl_baldwin> #endmeeting