15:00:56 <slaweq> #startmeeting neutron_ci
15:00:57 <openstack> Meeting started Tue Apr  6 15:00:56 2021 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:58 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:58 <slaweq> hi
15:01:00 <openstack> The meeting name has been set to 'neutron_ci'
15:01:15 <ralonsoh> hi
15:01:19 <bcafarel> o/
15:01:22 <lajoskatona> Hi
15:02:11 <slaweq> ok, let's start
15:02:19 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:27 <slaweq> Please open now :)
15:04:24 <slaweq> #topic Actions from previous meetings
15:04:38 <slaweq> ralonsoh to check failed qos scenario test
15:05:01 <ralonsoh> no, sorry, I just started. I was busy with the py38/FTs timeouts
15:05:07 <slaweq> ralonsoh: sure
15:05:09 <slaweq> np
15:05:17 <slaweq> can I assign it to You for next week too?
15:05:20 <ralonsoh> sure
15:05:23 <slaweq> #action ralonsoh to check failed qos scenario test
15:05:24 <slaweq> thx
15:05:28 <slaweq> next one
15:05:30 <slaweq> ralonsoh to check https://bugs.launchpad.net/neutron/+bug/1921866
15:05:32 <openstack> Launchpad bug 1917793 in neutron "duplicate for #1921866 [HA] keepalived_state_change does not finish "handle_initial_state"execution" [Critical,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
15:05:55 <ralonsoh> I pushed a patch to mitigate it
15:05:57 <ralonsoh> one sec
15:06:12 <ralonsoh> #link https://review.opendev.org/c/openstack/neutron/+/779024
15:06:32 <ralonsoh> (already merged)
15:06:43 <slaweq> thx, so we should be good with that one :)
15:07:01 <slaweq> next one then
15:07:03 <slaweq> slaweq to check failed start metadata proxy issue
15:07:08 <slaweq> Bug https://bugs.launchpad.net/neutron/+bug/1922684
15:07:09 <openstack> Launchpad bug 1922684 in neutron "Functional dhcp agent tests fails to spawn metadata proxy" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
15:07:20 <slaweq> and proposed fix https://review.opendev.org/c/openstack/neutron/+/784903
15:07:40 <slaweq> ralonsoh: I saw You had some questions about it
15:07:48 <ralonsoh> thanks
15:08:04 <ralonsoh> we can discuss it in the patch
15:08:19 <slaweq> let me try to quickly explain it here
15:08:22 <ralonsoh> sure
15:09:39 <slaweq> first of all, You can easy reproduce it if You will raise exceptions.ProcessExecutionError somewhere in fill_dhcp_udp_checksums() method in https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1762
15:09:53 <slaweq> this is what happens really in those failed tests
15:10:04 <slaweq> so during iptables-restore command there is exception raised
15:10:17 <slaweq> and this is handled properly by dhcp driver
15:11:22 <slaweq> but when it tries call setup() method again https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1664 it fails on ensure_device_is_ready: https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1692
15:11:44 <slaweq> it happens like that because in the test we prepare network object with port prepared
15:12:05 <slaweq> and that "fake" port is used in the first call of setup() method
15:12:17 <slaweq> exactly here https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1667
15:12:36 <slaweq> so "port" is exactly what test expects that it will be
15:13:11 <slaweq> but we also mock get_dhcp_port() from the plugin rpc api class in that test
15:13:22 <slaweq> so in first call of setup() method it will:
15:13:41 <slaweq> 1. get correct port in https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1667
15:14:05 <slaweq> 2. update network.ports[0] to be mock instead of port which was returned in 1)
15:14:14 <slaweq> 3. fail on iptables call
15:14:24 <slaweq> and now second call of setup()
15:14:33 <slaweq> 1. get wrong (mock) port in https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1667
15:14:47 <slaweq> 2. fails at https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/agent/linux/dhcp.py#L1692
15:15:11 <slaweq> I'm not sure if that's clear for You now
15:15:15 <ralonsoh> ok, I'll check it locally, I still don't get it
15:15:44 <slaweq> ok
15:15:51 <slaweq> we can continue in the review later
15:16:12 <slaweq> that's all regarding actions from last week
15:16:18 <slaweq> let's move on
15:16:22 <slaweq> #topic Stadium projects
15:16:29 <slaweq> lajoskatona: any updates?
15:16:39 <slaweq> except midonet as it's not stadium project anymore ;)
15:16:40 <lajoskatona> nothing to tell the truth
15:17:14 <lajoskatona> as I saw this morning things a re going in, so no issue at leat as i checked
15:17:25 <slaweq> ok, thx for taking care of it
15:17:31 <slaweq> #topic Stable branches
15:17:40 <slaweq> bcafarel: any updates?
15:17:51 <slaweq> except the issue with py2 (again) in older branches
15:18:03 <bcafarel> main issue I spoiled in previous meeting is py2 bug indeed
15:18:19 <bcafarel> as it breaks up to ussuri included the list of ok branches got short :)
15:18:19 <slaweq> is there any LP for that bug already?
15:18:39 <bcafarel> I had opened one for neutron, but closed it as dup (gmann opened one for devstack)
15:18:46 <bcafarel> https://bugs.launchpad.net/devstack/+bug/1922736
15:18:47 <openstack> Launchpad bug 1922736 in devstack "Stable stein|train py2 devstack based jobs are broken on py2 interpreter" [Critical,Confirmed]
15:18:53 <bcafarel> as it is rather generic issue not just for us
15:19:42 <slaweq> thx bcafarel
15:20:14 * slaweq wonders when we will need to stop testing all py2 branches in u/s
15:20:35 <bcafarel> well, train had still both IIRC
15:20:57 <bcafarel> so expect a few other "oh yes whe should cap this one too"
15:21:08 <slaweq> :)
15:21:28 <slaweq> something else, easier regarding the stable branches
15:21:43 <slaweq> we need to update our grafana dashboads to include stable/wallaby
15:21:49 <slaweq> bcafarel: will You take care of it?
15:22:33 <bcafarel> sigh sorry I pushed doc update to note this as release step and then forgot about actually doing it
15:22:40 <slaweq> LOL
15:22:48 <bcafarel> slaweq: let's add it as topic for next week so I do not keep forgeting :)
15:22:54 <slaweq> thx
15:23:14 <slaweq> #action bcafarel to update grafana dashboards with stable/wallaby
15:23:24 <slaweq> ok, next topic
15:23:26 <slaweq> #topic Grafana
15:23:56 <slaweq> here things looks pretty ok this week IMO
15:24:02 <slaweq> I don't seen any major issues
15:24:42 <ralonsoh> well, py38 and FTs were a bit unstable, too many timeouts
15:24:52 <slaweq> ralonsoh: true
15:25:11 <slaweq> but You proposed some patches to address, at least py38 issues, right?
15:25:19 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/784771
15:25:21 <ralonsoh> and for FTs
15:25:25 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/784771
15:25:38 <ralonsoh> sorry: https://review.opendev.org/c/openstack/neutron/+/784889
15:26:12 <slaweq> ok, both are approved already
15:26:22 <slaweq> lets see if it will be better with those patches merged
15:27:00 <bcafarel> seeing the times for the offline_migration tests it should help
15:27:30 <ralonsoh> mysql tests take around 10 mins, all of them
15:27:52 <slaweq> hopefully
15:27:57 <ralonsoh> I'm trying to merge in one single test, to avoid executing the migration again and again
15:29:02 <slaweq> ++
15:29:45 <slaweq> ok, lets talk about some specific issues
15:29:51 <slaweq> #topic functional
15:29:59 <slaweq> I found one new issue for today
15:30:05 <slaweq> https://78bb45d7d79a62b0c924-1d8800dfbc4b22202783e69a87ac00ba.ssl.cf1.rackcdn.com/783647/6/check/neutron-functional-with-uwsgi/83ffba0/testr_results.html
15:30:10 <slaweq> it's failed test_get_egress_min_bw_for_port
15:30:27 <ralonsoh> fail
15:30:27 <ralonsoh> [x]
15:30:27 <ralonsoh> 
15:30:27 <ralonsoh> ft1.22: neutron.tests.functional.agent.common.test_ovs_lib.BaseOVSTestCase.test_get_egress_min_bw_for_porttesttools.testresult.real._StringException: Traceback (most recent call last):
15:30:27 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 708, in wait_until_true
15:30:29 <ralonsoh> eventlet.sleep(sleep)
15:30:31 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/eventlet/greenthread.py", line 36, in sleep
15:30:34 <ralonsoh> hub.switch()
15:30:38 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch
15:30:41 <ralonsoh> return self.greenlet.switch()
15:30:43 <ralonsoh> eventlet.timeout.Timeout: 5 seconds
15:30:45 <ralonsoh> During handling of the above exception, another exception occurred:
15:30:47 <ralonsoh> Traceback (most recent call last):
15:30:49 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/common/test_ovs_lib.py", line 158, in _check_value
15:30:52 <ralonsoh> common_utils.wait_until_true(part_check_value, timeout=5, sleep=1)
15:30:54 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 713, in wait_until_true
15:30:57 <ralonsoh> raise WaitTimeout(_("Timed out after %d seconds") % timeout)
15:30:59 <ralonsoh> neutron.common.utils.WaitTimeout: Timed out after 5 seconds
15:31:01 <ralonsoh> During handling of the above exception, another exception occurred:
15:31:03 <ralonsoh> Traceback (most recent call last):
15:31:05 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func
15:31:09 <ralonsoh> return f(self, *args, **kwargs)
15:31:11 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/common/test_ovs_lib.py", line 452, in test_get_egress_min_bw_for_port
15:31:14 <ralonsoh> self._check_value(2800, self.ovs.get_egress_min_bw_for_port,
15:31:16 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/common/test_ovs_lib.py", line 160, in _check_value
15:31:19 <ralonsoh> self.fail('Expected value: %s, retrieved value: %s' %
15:31:21 <ralonsoh> File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/unittest2/case.py", line 690, in fail
15:31:24 <ralonsoh> raise self.failureException(msg)
15:31:26 <ralonsoh> AssertionError: Expected value: 2800, retrieved value: 1700
15:31:28 <ralonsoh> 
15:31:30 <ralonsoh> sorry!!!
15:31:32 <ralonsoh> what I wanted to point out is the retrieved value, 1700
15:31:34 <ralonsoh> this could be due to an overloaded host
15:31:48 <slaweq> :)
15:31:57 <slaweq> wrong copy paste ;P
15:33:01 <slaweq> ralonsoh: but how overloaded host can impact that?
15:33:29 <ralonsoh> because it cannot transmit at the requested speed
15:33:45 <slaweq> but it's not checking actual bandwidth
15:33:56 <slaweq> it's just checking what is set in ovs IMO
15:34:20 <ralonsoh> sorry! you are right
15:34:33 <ralonsoh> ok, indeed this is an error
15:34:43 <slaweq> https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/tests/functional/agent/common/test_ovs_lib.py#L452
15:34:48 <slaweq> it failed in that line
15:35:00 <slaweq> so just "update_minimum_bandwidth_queue()"
15:35:12 <slaweq> and then wait 5 seconds until it will be really set
15:35:42 <ralonsoh> this is the most trivial check
15:35:56 <slaweq> but maybe we should use different ports in each test
15:36:11 <slaweq> as now it seems that 1700 was set in different test: https://github.com/openstack/neutron/blob/58c9912be0ce5d9bf9eb9e1c44b87cdf90aab452/neutron/tests/functional/agent/common/test_ovs_lib.py#L374
15:36:28 <ralonsoh> we do, we are generating a new port uuid per test
15:36:41 <slaweq> so from where 1700 came?
15:38:12 <ralonsoh> ups, the queue number
15:38:25 <ralonsoh> maybe we need to make the queue number random
15:38:28 <ralonsoh> I'll check it
15:38:32 <slaweq> thx
15:38:50 <slaweq> queue number is always 1
15:39:00 <slaweq> it may be that there is race between those tests
15:40:00 <slaweq> #action ralonsoh to check failed test_get_egress_min_bw_for_port functional test
15:40:29 <slaweq> ok, that's basically all what I had for today
15:40:40 <slaweq> I really didn't found many new issues in our jobs this week
15:40:53 <bcafarel> not complaining that you did not :)
15:41:01 <lajoskatona> +1
15:41:09 <slaweq> one last thing from me for today
15:41:14 <slaweq> https://review.opendev.org/q/topic:secure-rbac+project:openstack/neutron+status:open
15:41:20 <slaweq> please review those patches
15:41:33 <slaweq> I'm pushing new UT for API policies
15:41:39 <bcafarel> slaweq++ nice
15:41:44 <slaweq> (and finding new bugs all the time :/)
15:41:54 <slaweq> so those tests are useful IMO
15:42:10 <slaweq> I know that those patches are huge but please review them :)
15:42:54 <slaweq> and that's all what I have for today
15:43:04 <slaweq> do You have anything else You want to talk about today?
15:43:21 <ralonsoh> https://bugs.launchpad.net/neutron/+bug/1915341
15:43:22 <openstack> Launchpad bug 1915341 in neutron "neutron-linuxbridge-agent not starting due to nf_tables rules" [Critical,New]
15:43:30 <ralonsoh> but this could be discussed in the PTG
15:43:40 <ralonsoh> in a nutshell: this problem is related to nft API
15:44:00 <ralonsoh> if they use legacy ebtables (same as in our CI), the problem is gone
15:44:19 <ralonsoh> I'm trying to fix it for legacy and ebtables-nft (new API)
15:44:25 <lajoskatona> so this iwhy I cant reproduce it ?
15:44:32 <ralonsoh> probably
15:44:41 <ralonsoh> you can force the new api
15:44:42 <ralonsoh> one sec
15:44:58 <ralonsoh> https://review.opendev.org/c/openstack/neutron/+/775413/11/roles/nftables/tasks/main.yaml
15:45:08 <ralonsoh> this is the patch I'm using to test it
15:45:41 <ralonsoh> but this is just a heads-up, we'll talk about the future of linux bridge and nft in the PTG
15:45:43 <lajoskatona> thanks, I check it
15:45:44 <ralonsoh> I'll add a topic
15:45:48 <ralonsoh> (that's all)
15:45:57 <slaweq> thx for topic proposal
15:46:33 <slaweq> I already added something about linuxbridge agent to the etherpad
15:46:47 <slaweq> but please add Your notes to it too :)
15:47:27 <slaweq> ralonsoh: regarding bug https://bugs.launchpad.net/neutron/+bug/1915341 do You think we should have note about it somewhere in our docs?
15:47:28 <openstack> Launchpad bug 1915341 in neutron "neutron-linuxbridge-agent not starting due to nf_tables rules" [Critical,New]
15:47:51 <ralonsoh> slaweq, yes, we should add this in the documentation
15:47:59 <ralonsoh> I'll do it
15:48:07 <slaweq> ralonsoh++ thx a lot
15:48:26 <slaweq> #action ralonsoh to update LB installation guide with info about legacy ebtables
15:49:28 <slaweq> with that I think we can finish today's meeting
15:49:47 <slaweq> thx for attending
15:49:51 <bcafarel> o/
15:49:53 <ralonsoh> bye
15:49:54 <slaweq> o/
15:49:56 <slaweq> #endmeeting