14:00:47 #startmeeting neutron_l3 14:00:48 Meeting started Wed Sep 23 14:00:47 2020 UTC and is due to finish in 60 minutes. The chair is liuyulong. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:00:49 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:00:51 The meeting name has been set to 'neutron_l3' 14:01:23 hi 14:02:33 hi 14:03:50 No announcements from me today, so maybe we can directly goto the Bug section to cut the meeting time. 14:04:44 hi 14:04:49 OK, no objection, : ) 14:04:59 #topic Bugs 14:05:06 ralonsoh, hi 14:05:16 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017254.html 14:05:23 #link http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017432.html 14:05:41 These are the bug lists from our deputy. 14:05:58 First one 14:05:59 #link https://bugs.launchpad.net/neutron/+bug/1895950 14:06:02 Launchpad bug 1895950 in neutron "keepalived can't perform failover if the l3 agent is down" [Medium,Won't fix] 14:06:57 I don't understand why You marked it as won't fix 14:07:05 I replied to this last week, the L3 agent should be alive during the HA router state change. 14:07:15 IMO if we will move bringing interfaces to be up to neutron-state-change-monitor process it should works 14:07:22 After the patch https://review.opendev.org/#/c/707406/ 14:07:56 but that is regression introduced by this patch 14:07:59 isn't it? 14:08:23 small but IMHO still regression 14:09:02 Because the running state-agent process can not do that work if you do not re-spawn it. 14:09:33 isn't it respawned if You restart L3 agent? 14:10:49 no 14:10:56 I'm not sure, but from my experience, the state change process will run as it is. 14:11:08 if the keepalived-state-change process is running, is not rebooted 14:11:28 but if reload_cfg if enabled, then we'll send SIGHUP 14:12:01 (reload_cfg is false when restarting l3 agent) 14:12:08 so no, we don't restart it 14:12:10 ok, so maybe we can add bringing interfaces to be up/down to the state-change process and keep it in l3 agent for 1 cycle 14:12:16 It reloads the config options, not the python process. 14:12:17 later remove it from the l3 agent 14:12:26 or maybe 2 cycles 14:12:34 and add e.g. release note about that 14:12:36 idk 14:13:03 one question: if the l3 agent is down, how this host will become master? 14:13:13 keepalived can still be running 14:13:17 and it can failover 14:13:30 but l3 agent will not bring interfaces up on new master node 14:13:39 yeah, that was my question 14:13:41 The DB state updating still needs L3 agent alive. 14:13:45 ^ 14:13:58 I know that 14:14:19 but still IMO would be better to have working dataplane even in case when L3 agent is down for some reason 14:14:32 Actually L3 agent must run during HA router failover, it is designed by this. (not me, but it is) : ) 14:15:04 liuyulong: before Your patch even? 14:15:55 No, I mean HA state change workflow has something related to L3 agent. It needs L3 agent to do some work. 14:16:33 Not the gateway, but something like RA, DB state, config state and so on. 14:17:28 But, it's fine to add the gateway UP action to the state-change process. 14:17:32 I'm fine with it. 14:17:32 ok, lets keep this bug as won't fix for now 14:18:01 and maybe check/update docs to be clear about that there 14:21:44 Sorry, bad connection 14:21:56 #link https://bugs.launchpad.net/neutron/+bug/1894843 14:21:59 Launchpad bug 1894843 in neutron "[dvr_snat] Router update deletes rfp interface from qrouter even when VM port is present on this host" [Medium,New] 14:23:04 I have no idea why set "dvr_snat" on every hypervisor? Should it be "dvr"? 14:24:09 dvr_snat should be only on network controllers 14:24:12 we are using dvr_snat e.g. in our gates 14:24:26 L3 agent in "dvr_snat" with mixed compute service does not work fine from my personal experiences. 14:24:26 and that possible can cause some failures in dvr multinode jobs maybe 14:24:34 (idk for sure but just guessing) 14:25:41 IMO, this should be documented well, users should not deploy their cloud like this. 14:26:47 IMO, there are no much agent mode check for "dvr_snat" during the router processing. 14:27:45 We have consensus that the "dvr_snat" is for those centralized network node (functions) which can not be distributed. 14:29:21 So, my advice for this bug/user is to change the config options. 14:29:30 agree 14:30:04 The final cloud deployment should be in two scenario: 14:30:47 1. their compute nodes have ability to external network (internet), so the compute node set the L3 agent mode to "dvr". 14:31:26 2. compute node can not reach the Internet, set the agent mode to "dvr_no_external" 14:32:14 3. centralized network nodes should be run dedicated physical hosts, and the L3 agent mode is "dvr_snat". 14:35:01 OK, no more bugs from me 14:36:36 OK, let's move on 14:36:37 #topic On demand agenda 14:38:23 nothing from me 14:38:24 #link https://bugs.launchpad.net/neutron/+bug/1895972 14:38:35 liuyulong: Error: Could not gather data from Launchpad for bug #1895972 (https://launchpad.net/bugs/1895972). The error has been logged 14:39:04 Another gap is filling... Congrats! 14:39:44 this feature is ongoing but yes! 14:41:56 There are C works, so it is one example of fullstack development process for OVN feature. 14:42:26 Python works are not started. 14:42:44 #link https://review.opendev.org/#/c/738551/ 14:43:04 slaweq, hi, I've replied the comments. 14:43:43 I've tested it from my local devstack environment for a while. 14:43:52 and what is happening with https://review.opendev.org/#/c/731446 14:43:53 ? 14:44:07 superseded by yours, I think so 14:44:15 I cannot say I covered every cases, but those I noticed and experienced. 14:44:51 ralonsoh, yep, it has 2 closes bugs. 14:46:54 liuyulong: ok, I will check that 14:47:33 But with some deep thinking, after these flows refactor or rediect (some works else), IMO the entire flow structure may have a chance to redesign in someday. 14:48:00 It could be a long story. Just forget it. : ) 14:48:09 Oh, I wonder why https://review.opendev.org/#/c/731446 is not approved too 14:50:10 would you give an explanation? 14:50:39 I will tell Li YaJie later 14:51:14 Please take look at the inline comments in gerrit, and the meeting LOG here. : ) 14:51:31 OK, no more talks from me now. 14:51:42 I will left 1 or 2 mins here. 14:52:45 https://review.opendev.org/#/c/743661/ 14:52:58 my patch for ecmp 14:53:52 I really hope it can be 'merged' 14:55:37 It's feature freeze now, IMO, it should be moved to next dev cycle. 14:55:58 Wait... 14:57:07 ..so when is the next dev cycle? 14:57:16 #link http://eavesdrop.openstack.org/meetings/networking/2020/networking.2020-09-15-14.00.log.html#l-13 14:57:21 #link https://launchpad.net/neutron/+milestone/victoria-3 14:57:59 If this was not in the V-3 list, it will not be merged for now. 14:58:17 Sorry, I cannot open the launchpad.net for now. 14:58:18 ZhuXiaoYu: yes, we are in the RC-1 week now 14:58:38 so we can merge this patch after rc-1 will be released and we will have stable/victoria branch created already 14:59:25 got it, really thx for tell me that, it's helpful 14:59:34 I will start another round review of the spec https://review.opendev.org/#/c/729532 this week. 15:00:11 Time is up. 15:00:15 Thank you guys. 15:00:16 Bye 15:00:21 Bye 15:00:25 thx 15:00:26 #endmeeting