14:01:12 <liuyulong> #startmeeting neutron_l3
14:01:13 <openstack> Meeting started Wed Jan  8 14:01:12 2020 UTC and is due to finish in 60 minutes.  The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:14 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:16 <openstack> The meeting name has been set to 'neutron_l3'
14:01:47 <liuyulong> #chair liuyulong_
14:01:48 <openstack> Current chairs: liuyulong liuyulong_
14:02:09 <liuyulong> Happy new year everyone!
14:03:26 <liuyulong> #topic Announcements
14:03:51 <liuyulong> #link https://launchpad.net/neutron/+milestone/ussuri-2
14:04:27 <liuyulong> Expected: 2020-02-12
14:05:46 <liuyulong> There will be about 10 days holidays for Chinese New Year this month.
14:06:56 <liuyulong> Someone may not online, so time is running out...
14:07:14 <haleyb> hi
14:07:30 <liuyulong> hi
14:08:09 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1858419
14:08:09 <openstack> Launchpad bug 1858419 in neutron "Docs needed for tunables at large scale" [Undecided,Confirmed]
14:08:38 <liuyulong> Slawek asked me something in mail about this large scale cloud.
14:08:56 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1858419/comments/1
14:09:11 <liuyulong> Allow me to say something here
14:09:22 <liuyulong> This could be a really long story.
14:10:00 <liuyulong> Config option tunning may have a lot choices.
14:10:42 <liuyulong> But neutron itself still have some architecture defect, which may not be resolved by configuration.
14:10:44 <slaweq> hi
14:10:48 <slaweq> sorry for being late
14:11:27 <liuyulong> As you may see in the comment #1, we did some local works for neutron itself.
14:11:51 <slaweq> liuyulong: I know that we can't solve everything by config options
14:11:52 <liuyulong> (Some of them was talked during Shanghai PTG.)
14:12:53 <slaweq> but it's rather more about identyfing options which are crucial for large scale and to add some note for some options that e.g. "setting this to high/load value may have impact on large scale because it will make huge load on rabbitmq" (it's just an example for non existing option now :))
14:14:04 <liuyulong> Yes, we can start in such way.
14:14:40 <liuyulong> Anyway, I will share some config tunning running in our cloud deployment.
14:15:09 <slaweq> liuyulong: thx a lot
14:15:52 <liuyulong> OK, let's move on.
14:15:55 <liuyulong> #topic Bugs
14:16:09 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2020-January/011831.html
14:16:38 <liuyulong> And this I guess:
14:16:42 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011766.html
14:17:20 <liuyulong> May be also this:
14:17:22 <liuyulong> #link http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011751.html
14:17:34 <liuyulong> OK, first one:
14:17:49 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1858086
14:17:49 <openstack> Launchpad bug 1858086 in neutron "qrouter's local link route cannot be restored " [Medium,Confirmed]
14:18:16 <liuyulong> This should be an API leak for the user input check.
14:19:02 <liuyulong> We should not allow user to add some route destination CIDR which overlaps the subnet.
14:19:45 <liuyulong> There are too many potential risks for DVR related traffic.
14:19:50 <haleyb> yes, i thought i was reading that wrong but how can you add a route to a local subnet via a non-local IP ?
14:21:42 <liuyulong> It is router route-add action?
14:22:19 <liuyulong> Not the subnet static route, right?
14:24:42 <slaweq> it's "extra-route" but I'm not sure what action is called on server side for it
14:24:56 <slaweq> on client's side You do "neutron router-update --extra-route"
14:26:38 <liuyulong> Yes, "openstack  router set --route destination=<subnet>,gateway=<ip-address>]"
14:28:17 <liuyulong> Such overlap should not be allowed.
14:30:44 <liuyulong> This is obvious, when you add an IP address to your host, the system will add a default on-link route for it.
14:32:19 <liuyulong> That means "this subnet is directly accessible.", change it does not make any sense in most scenario.
14:33:04 <liuyulong> But by the way, the bug reporter said neutron does not recover that route automatically.
14:33:11 <haleyb> i would tend to agree, actually surprised it didn't throw an exception when adding it
14:33:52 <liuyulong> This can be another view of the bug, since neutron does not handle such on-link route in the qrouter namespace when it is directly accessible.
14:35:28 <liuyulong> So, I think it's OK to terminate it at the very beginning of API.
14:35:48 <slaweq> sounds good for me
14:35:58 <liuyulong> OK, next one.
14:36:01 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1857422
14:36:01 <openstack> Launchpad bug 1857422 in neutron "neutron-keepalived-state-change and keeplived cannot be cleanup for those routers which is deleted during l3-agent died" [Undecided,New]
14:38:19 <liuyulong> Firstly, because the L3-agent is dead, so the "delete RPC" will not be processed, this could be a reason why the processed remained.
14:40:06 <haleyb> if i'm remembering correctly, the l3-agent should clean-up the namespace(s) at the end of it's sync, but is it just not cleaning keepalived stuff because it didn't know that the associated router was ha ?
14:40:37 <liuyulong> But we did encounter similar phenomena in our own deployment when L3 agent is alive. The "neutron-keepalived-state-change" and "radvd" processes sometimes remain when routers were deleted.
14:42:14 <haleyb> is this the same thing?
14:42:29 <liuyulong> haleyb, I'm not sure, maybe the user's L3-agent is just dead too long time to re-process the delete RPC.
14:43:29 <liuyulong> haleyb, no, just some similar phenomena.
14:43:34 <haleyb> right, if for example it didn't get the RPC, that's when the resources get orphaned?
14:44:40 <liuyulong> Yes, according to the "reproduction steps" in the bug description.
14:46:44 <haleyb> i guess it seems like a valid bug
14:47:32 <liuyulong> If we need to cover this situation, the L3-agent may need a persistent cache to distinguish which router was delete during the down time. And then starts the delete procedure for the stale routers.
14:50:02 <liuyulong> And I still have questions, the router namespace, meta-proxy and radvd process will remain too? Or just neutron-keepalived-state-change and keeplived ?
14:50:55 <haleyb> at the end of sync, the l3-agent should have cleaned the router namespace
14:51:05 <haleyb> initial sync at startup that is
14:53:50 <liuyulong> And +1 to Miguel's comment, if this is not seen in the production environment, then it is contrived. : ) https://bugs.launchpad.net/neutron/+bug/1857422/comments/2
14:53:50 <openstack> Launchpad bug 1857422 in neutron "neutron-keepalived-state-change and keeplived cannot be cleanup for those routers which is deleted during l3-agent died" [Undecided,New]
14:54:51 <liuyulong> Last one:
14:54:54 <liuyulong> #link https://bugs.launchpad.net/neutron/+bug/1856839
14:54:54 <openstack> Launchpad bug 1856839 in neutron "[L3] router processing time increase if there are large set ports" [Medium,In progress] - Assigned to LIU Yulong (dragon889)
14:55:13 <liuyulong> Code is here: https://review.opendev.org/701077
14:55:29 <liuyulong> It is an optimization for large scale cloud. : )
14:57:14 <slaweq> I would also like to ask You for review https://review.opendev.org/#/c/700011/ if You will have some time
14:58:16 <liuyulong> We are running out of time, maybe you can leave the comment in the gerrit.
14:59:18 <liuyulong> Alright, let's end here.
14:59:26 <liuyulong> #endmeeting