15:00:04 <slaweq> #startmeeting neutron_ci
15:00:04 <openstack> Meeting started Wed Jan 29 15:00:04 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:06 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:06 <slaweq> hi
15:00:08 <openstack> The meeting name has been set to 'neutron_ci'
15:00:46 <njohnston> o/
15:01:01 <slaweq> welcome on CI meeting at new hour and in new room
15:01:03 <slaweq> :)
15:01:20 <bcafarel> o/
15:01:45 <ralonsoh> hi
15:02:29 <slaweq> ok, lets start now
15:02:45 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:51 <slaweq> please open now :)
15:02:58 <slaweq> #topic Actions from previous meetings
15:03:11 <slaweq> ralonsoh to increase log level for ovsdbapp in fullstack/functional jobs
15:03:20 <ralonsoh> one sec
15:03:32 <ralonsoh> #link https://review.opendev.org/#/c/703791/
15:03:47 <ralonsoh> it's still failing ...
15:04:00 <slaweq> ahh, this is the patch for that :)
15:04:01 <ralonsoh> I need to review how to configure properly the ml2 plugin in zuul
15:04:11 <slaweq> I commented it today again
15:04:34 <slaweq> but now I think that it will not work like that
15:04:48 <slaweq> as You need to set proper config options in test's setUp method
15:05:17 <slaweq> probably somewhere in https://github.com/openstack/neutron/blob/master/neutron/tests/functional/base.py
15:05:22 <slaweq> for functional tests
15:05:40 <slaweq> in those jobs we are not using config files at all
15:05:48 <ralonsoh> you are right
15:05:51 <ralonsoh> not in the FTs
15:06:06 <ralonsoh> but I should configure like this in fullstack
15:06:08 <slaweq> and in fullstack it is similar
15:06:27 <ralonsoh> ok, I'll check it later today
15:06:28 <slaweq> it's here: https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/resources/config.py
15:06:43 <ralonsoh> you are right
15:06:44 <slaweq> sorry that I didn't wrote it earlier
15:06:55 <ralonsoh> the agent is configured there
15:06:56 <ralonsoh> thanks!!
15:07:00 <slaweq> idk why I missed that and tried to fix Your patch in "Your way" :)
15:07:49 <slaweq> ok, next one was:
15:07:51 <slaweq> slaweq to open bug for issue with get_dp_id in os_ken
15:08:01 <slaweq> I reported it here  https://bugs.launchpad.net/neutron/+bug/1861269
15:08:01 <openstack> Launchpad bug 1861269 in neutron "Functional tests failing due to failure with getting datapath ID from ovs" [High,Confirmed]
15:08:10 <ralonsoh> slaweq, I think I have a possible solution for this
15:08:11 <slaweq> but I didn't assign it to myself
15:08:22 <ralonsoh> #link https://review.opendev.org/#/c/704397/
15:08:30 <ralonsoh> I need to justify it
15:08:47 <ralonsoh> but the point is, although we have multithreading because of os-ken
15:09:02 <bcafarel> funny, usually in this kind of review the "fix" is add a sleep, not remove one :)
15:09:02 <ralonsoh> the ovs agent code should not give the GIL to other tasks
15:09:21 <ralonsoh> that means: do not use sleep, what will stop the thread execution
15:09:40 <ralonsoh> if other threads are not expecting the GIL then those threads won't return it back
15:10:03 <ralonsoh> I've rechecked this patch several times, no errors in FT and fullstack (related)
15:10:03 <slaweq> ralonsoh: makes sense IMO
15:10:17 <ralonsoh> I'll add a proper explanation in the patch
15:10:30 <slaweq> bcafarel: LOL, that's true, usually we need to add sleep to "fix" something :)
15:10:34 <ralonsoh> hehehhehe
15:10:44 <slaweq> ralonsoh: please link Your patch to this bug also
15:10:48 <ralonsoh> sure
15:11:04 <slaweq> thx ralonsoh
15:11:20 <bcafarel> wow that line is old, it comes directly from "Introduce Ryu based OpenFlow implementation"
15:11:28 <ralonsoh> exactly!
15:12:31 <slaweq> ok, next one
15:12:33 <slaweq> slaweq to try to skip cleaning up neutron resources in fullstack job
15:12:53 <slaweq> I was trying it locally and it was fast about 4-5 seconds on test
15:13:03 <slaweq> so I think it's not worth to do it
15:13:17 <slaweq> and I didn't send any patch
15:13:42 <ralonsoh> yeah, not too much (and maybe we can introduce new errors)
15:13:49 <slaweq> ralonsoh: exactly
15:14:06 <slaweq> so risk of unpredictible side effects is too high IMO
15:14:46 <slaweq> ok, that was all from last week
15:14:52 <slaweq> #topic Stadium projects
15:15:17 <slaweq> as we talked yesterday, we finished dropping py2 support in Neutron
15:15:21 <slaweq> \o/
15:15:33 <ralonsoh> fantastic
15:15:37 <slaweq> so lets use etherpad https://etherpad.openstack.org/p/neutron-train-zuulv3-py27drop to track migration to zuul v3
15:16:16 <slaweq> but I have one more thing about dropping py2 support
15:16:19 <slaweq> there is patch https://review.opendev.org/#/c/704257/
15:16:24 <slaweq> for neutron-tempest-plugin
15:16:45 <slaweq> it's not working properly for Rocky jobs
15:17:32 <slaweq> and I have a question: shouldn't we first make tag of neutron-tempest-plugin repo and use this tag for rocky with py27
15:17:44 <slaweq> and than go with this patch to drop py27 completly?
15:17:50 <ralonsoh> exactly
15:18:02 <slaweq> or how it will work with rocky after we will merge it?
15:18:14 <njohnston> that makes sense to me
15:18:16 <ralonsoh> we need to tag it first to use it in rocky tests
15:18:18 <slaweq> gmann: njohnston: am I missing something here?
15:19:30 <njohnston> I am not completely sure - gmann has been very active in this area, I believe he has a plan, but I am not sure all the details
15:20:19 <slaweq> ok, I will ask about it in review
15:21:14 <slaweq> ok, njohnston any other updates about zuulv3 migration?
15:22:02 <njohnston> I don't have any updates; it has been accepted as an official goal for the V cycle, so we are way ahead of schedule, but it will be good to finish up because of all the reasons
15:22:32 <njohnston> there are only 3 or 4 stadium projects that have changes left
15:22:40 <slaweq> not too many
15:22:56 <njohnston> bcafarel is working on bgpvpn https://review.opendev.org/#/c/703601/
15:22:59 <bcafarel> stadium-wise there is also amotoki's question on vpnaas failing on rocky and moving to use neutron-tempest-plugin there
15:23:38 <slaweq> bcafarel: do You have any patch with failure example?
15:24:00 <bcafarel> for vpnaas? https://review.opendev.org/#/c/590569/
15:24:56 <bcafarel> also for bgpvpn I was wondering: there is an install job which does not run any tests (as per its name), should we migrate it or just drop it? I think other tests cover the "is it installable?" part
15:25:41 <njohnston> neutron-dynamic-routing also has bcafarel's magic touch https://review.opendev.org/#/c/703582/ ; I don't see anything zuulv3 related for networking-odl, networking-midonet, neutron-vpnaas
15:27:22 <njohnston> thats it for me
15:27:32 <slaweq> thx njohnston
15:27:41 <slaweq> speaking about this vpnaas rocky issue
15:27:59 <slaweq> am I understanding correct that if we would pin tempest used for rocky than it would be fine?
15:28:33 <slaweq> or we should use for rocky branch job defined in neutron-tempest-plugin repo (like for master now)?
15:29:56 <njohnston> I'll defer on that to gmann
15:30:24 <slaweq> ok, I will talk with him about it
15:30:41 <slaweq> #action slaweq to talk with gmann about vpnaas jobs on rocky
15:30:53 * slaweq starts hating rocky branch now
15:31:02 <bcafarel> :)
15:31:18 * njohnston welcomes slaweq to the club
15:31:20 <bcafarel> "old but not enough" branch
15:31:40 <slaweq> lol, that's true
15:31:57 <njohnston> I have had too many backports that go "train: green; stein: green; rocky: RED; queens: green"
15:31:58 <slaweq> good news is that it's just few more weeks and it will be EM
15:32:55 <slaweq> ok, lets move on
15:33:09 <slaweq> or do You have anything else related to stadium for today?
15:33:20 <njohnston> nope, nothing else
15:34:09 <slaweq> ok, so lets move on
15:34:11 <slaweq> #topic Grafana
15:34:20 <slaweq> #link http://grafana.openstack.org/d/Hj5IHcSmz/neutron-failure-rate?orgId=1
15:34:52 <slaweq> from what I can say, we are much better with scenario jobs now
15:35:08 <slaweq> but our biggest problems are fullstack/functional jobs
15:35:12 <slaweq> and grenade jobs
15:35:57 <slaweq> and the biggest issue from those is functional job
15:36:05 <njohnston> yep
15:37:54 <slaweq> and I think that we are missing some ovn related job on the dashboard now
15:38:05 <slaweq> I will check that and update dashboard if needed
15:38:20 <slaweq> #action slaweq to update grafana dashboard with missing jobs
15:38:42 <bcafarel> are all the jobs in? I think I saw some reviews on functional ovn (at least)
15:39:10 <slaweq> but functional tests will be run together with our "old" functional job I think
15:39:22 <bcafarel> ah ok :)
15:39:51 <slaweq> anything else related to grafana for today?
15:40:29 <slaweq> ok, so lets move on than
15:40:40 <slaweq> #topic fullstack/functional
15:40:53 <slaweq> I have few examples of failures in functional job
15:41:10 <slaweq> first     again ovsdbapp command timeouts:
15:41:12 <slaweq> https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_abb/704530/4/gate/neutron-functional/abbc532/testr_results.html
15:41:14 <slaweq> https://ce2d3847c3fd9d644a91-e099c5c03695c7198c297e75ec3f8d05.ssl.cf2.rackcdn.com/704240/3/gate/neutron-functional/ab28823/testr_results.html
15:41:21 <slaweq> but I know ralonsoh is on it already
15:41:38 <slaweq> so it's just to point to the new examples of this issue
15:42:01 <ralonsoh> yes, let's see if we can have more information with the patch uploaded
15:42:26 <ralonsoh> but those are the main problems we see in FT and fullstack
15:42:31 <ralonsoh> 1) ovsdb timeouts
15:42:40 <ralonsoh> 2) the os-ken datapath timeout
15:42:46 <ralonsoh> 3) pyroute timeouts
15:42:57 <ralonsoh> (did I say "timeout" before?)
15:43:11 <slaweq> lol
15:43:26 <slaweq> yeah, timeouts are our biggest nightmare now :/
15:43:29 <njohnston> lol
15:43:52 <slaweq> but it seems logical that removing "sleep" from code may solve timeouts :P
15:44:33 <slaweq> ok, next one than (this one is new for me, at least I don't remember anything like that)
15:44:38 <slaweq> failure in neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_correct_protection_allowed_address_pairs
15:44:46 <slaweq> https://4131d9f319da782ce250-f8398ccf7503ce4fb23659d29292afec.ssl.cf2.rackcdn.com/694568/16/check/neutron-functional/4fe1d73/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_correct_protection_allowed_address_pairs.txt
15:44:59 <slaweq> there are errors like     2020-01-29 09:11:10.172 22333 ERROR ovsdbapp.backend.ovs_idl.vlog [-] tcp:127.0.0.1:6640: error parsing update: ovsdb error: Modify non-existing row: ovs.db.error.Error: ovsdb error: Modify non-existing row
15:45:26 <ralonsoh> I have no idea in this one
15:45:46 <ralonsoh> how, a Linux Bridge test, is hitting an OVS error??
15:45:50 * njohnston looks for otherwiseguy
15:45:51 <bcafarel> some race condition in test because of our timeout friend?
15:45:52 <slaweq> me neighter but I wonder why ovsdbapp is used in those Linuxbridge tests
15:46:00 <ralonsoh> that's the point
15:46:03 <ralonsoh> or LB or OVS
15:46:17 <bcafarel> oh
15:47:01 <slaweq> and here is error in test: https://4131d9f319da782ce250-f8398ccf7503ce4fb23659d29292afec.ssl.cf2.rackcdn.com/694568/16/check/neutron-functional/4fe1d73/testr_results.html
15:47:15 <slaweq> so it seems that it failed on preparation of test env
15:48:18 <ralonsoh> slaweq, that seems an error from a previous test
15:48:29 <ralonsoh> and a blocked greenlet thread
15:48:40 <ralonsoh> maybe it's too late
15:48:54 <ralonsoh> but the use of greenthreads, IMO, was not a good option
15:49:24 <ralonsoh> (remember python does NOT have multithreading at all)
15:49:28 <slaweq> but that's the only failed test in this job
15:49:35 <ralonsoh> I know...
15:50:56 <slaweq> lets see if we will have more such issues, as nobody saw it before so far maybe it will never happen again ;)
15:51:09 * slaweq don't belive himself even
15:51:19 <ralonsoh> hahahah
15:51:36 <njohnston> :-D
15:51:55 <slaweq> and I have one more, like: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_00e/701733/24/check/neutron-functional/00ede7c/testr_results.html
15:52:01 * bcafarel is not betting on it either
15:52:08 <slaweq> and this one I think I saw at least twice this week
15:52:19 <ralonsoh> no no
15:52:23 <ralonsoh> this is not a problem
15:52:35 <slaweq> no?
15:52:37 <slaweq> why?
15:52:40 <ralonsoh> that was related to an error in the OVN functional tests
15:52:46 <ralonsoh> but that's solved not
15:52:47 <ralonsoh> now
15:52:48 <ralonsoh> one sec
15:53:02 <ralonsoh> (also I pushed a DNM patch to test this)
15:53:07 <bcafarel> a sorting order issue right?
15:53:19 <slaweq> ahh, so it's related to ovn migration, right?
15:53:36 <ralonsoh> https://review.opendev.org/#/c/701733/24..26
15:53:43 <ralonsoh> please, read the diff and you'll understand
15:53:55 <ralonsoh> --> https://review.opendev.org/#/c/701733/24..26/neutron/tests/functional/plugins/ml2/drivers/ovn/mech_driver/ovsdb/test_maintenance.py
15:55:35 <ralonsoh> BTW, that problem in FTs was tested in https://review.opendev.org/#/c/704376
15:56:11 <slaweq> so it was trying to use test_extensions.setup_extensions_middleware(sg_mgr) as security groups api, instead of "normal" one, right?
15:56:42 <ralonsoh> exactly
15:56:51 <ralonsoh> that was needed in networking-ovn
15:56:57 <ralonsoh> but NOT in neutron repo
15:57:05 <ralonsoh> if you are using the basetest class
15:57:09 <slaweq> ok, good that it's not "yet another new issue with functional tests" :)
15:57:16 <ralonsoh> nonono
15:57:21 <slaweq> thx ralonsoh :)
15:57:24 <ralonsoh> yw!
15:57:55 <slaweq> so maybe something similar will be needed to fix failures like https://19dc65f6cfdf56a6f70b-c96c299047b55dcdeaefef8e344ceab6.ssl.cf5.rackcdn.com/702397/7/check/neutron-functional/8ca93bc/testr_results.html
15:58:07 <slaweq> I saw it also only in ovn related patches
15:58:30 <slaweq> and it seems that there is simly no needed route in neutron loaded
15:58:37 <ralonsoh> yes, first we need "part 1" patch
15:58:45 <ralonsoh> then will handle "part 2"
15:58:51 <slaweq> ok
15:58:53 <slaweq> thx
15:58:59 <slaweq> so those 2 from my list are fine than
15:59:19 <slaweq> for fullstack tests I saw such failure:
15:59:21 <slaweq> https://7994d6b1b4a3fac76e83-9707ce74906f3f341f743e6035ad1064.ssl.cf5.rackcdn.com/704397/2/check/neutron-fullstack/d44b482/controller/logs/dsvm-fullstack-logs/TestPlacementBandwidthReport.test_configurations_are_synced_towards_placement_Open-vSwitch-agent_/neutron-server--2020-01-29--00-20-32-648918_log.txt
15:59:28 <slaweq> it's issue with connection to placement service
16:00:00 <slaweq> I will ask tomorrow rubasov and lajoskatona to take a look at it
16:00:05 <slaweq> maybe they can help with this
16:00:16 <slaweq> and we are out of time now
16:00:22 <slaweq> thx for the meeting guys
16:00:26 <ralonsoh> they have experience on this
16:00:29 <slaweq> see You around
16:00:29 <ralonsoh> bye!!
16:00:32 <slaweq> o/
16:00:34 <slaweq> #endmeeting