16:00:16 <slaweq> #startmeeting neutron_ci
16:00:16 <openstack> Meeting started Tue Jun 18 16:00:16 2019 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:20 <openstack> The meeting name has been set to 'neutron_ci'
16:00:21 <slaweq> hi (again)
16:00:22 <mlavalle> o/
16:00:27 <njohnston> o/
16:00:55 <slaweq> Agenda for the meeting is on https://github.com/openstack/neutron/tree/master/neutron
16:01:04 <slaweq> #undo
16:01:12 <slaweq> Agenda for the meeting is on https://etherpad.openstack.org/p/neutron-ci-meetings
16:01:12 <bcafarel> hi
16:01:14 <ralonsoh> hi
16:01:54 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:02:09 <slaweq> #topic Actions from previous meetings
16:02:19 <slaweq> we have only one action from previous week
16:02:26 <slaweq> mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) reproducing most common failure: test_connectivity_through_2_routers
16:02:27 <openstack> bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] https://launchpad.net/bugs/1830763 - Assigned to Miguel Lavalle (minsel)
16:02:38 <mlavalle> I am currently working on this
16:03:13 <slaweq> we recently had similar (maybe same) issue in our D/S CI on release based on Stein
16:03:15 <mlavalle> just 10 minutes ago, I was able to get a succesful execution of that test case in my local environment, with a pdb break at the end of the test
16:04:23 <bcafarel> sorry I may have spotty connection, will try to follow
16:04:24 <mlavalle> and it is easy to reproduce a failure, where I will also set a pdb break before it tears down. so now I will be able to run tcpdump and compare ip tables to see where the problem is
16:04:34 <mlavalle> that's the status at this point
16:05:11 <slaweq> great - I may ping You about this during this week to see if You found something and if we can use it in our d/s :)
16:05:24 <mlavalle> yeap, cool
16:05:33 <slaweq> thx a lot for working on it mlavalle :)
16:05:46 <slaweq> #action mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763) reproducing most common failure: test_connectivity_through_2_routers
16:05:47 <openstack> bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] https://launchpad.net/bugs/1830763 - Assigned to Miguel Lavalle (minsel)
16:05:57 <slaweq> I will assign it for next week just as a reminder, ok?
16:06:03 <mlavalle> of course
16:06:05 <slaweq> thx
16:06:20 <mlavalle> keep the pressure on that guy
16:06:29 <slaweq> sure :)
16:06:47 <slaweq> anything else You want to add/ask regarding actions from last week?
16:07:04 <mlavalle> not me
16:07:26 <slaweq> ok, lets move on then
16:07:30 <slaweq> #topic Stadium projects
16:07:39 <slaweq> Python 3 migration
16:07:41 <slaweq> Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status
16:08:33 <njohnston> Train-2 is when I was planning on getting really serious about this, so I am gearing up to work on it later this week
16:08:46 <slaweq> thx njohnston
16:09:03 <slaweq> I also started sending some patches and I will try to continue this during next weeks
16:09:47 <slaweq> actually looking at etherpad there is no many projects left to switch
16:10:08 <mlavalle> \o/
16:10:24 <slaweq> this week I will push neutron-fwaas patches
16:10:45 <slaweq> #action slaweq to send patch to switch neutron-fwaas to python 3
16:10:56 <slaweq> ^^ this way I will have more pressure to do it :)
16:11:15 <mlavalle> yeah let's also keep p[ressure on that other lazy guy
16:11:21 <slaweq> LOL
16:11:26 <njohnston> +1
16:11:43 <slaweq> but I think I'm the only lazy guy here :P
16:12:00 <slaweq> ok, lets move on then
16:12:02 <slaweq> tempest-plugins migration
16:12:04 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo
16:13:15 <njohnston> I'll take another look at fwaas later this week as well
16:13:32 <slaweq> here we are good with networking-sfc (thx bcafarel) and almost done with networking-bgpvpn
16:14:03 <slaweq> we still need to do vpnaas, fwaas and networking-dynamic-routing
16:14:11 <mlavalle> I have slacked off with vpnaas
16:14:19 <mlavalle> hoping to get back to it soon
16:14:49 <slaweq> sure mlavalle - that is not urgent for sure - more like "nice to have" :)
16:15:15 <slaweq> I would rather want to finish python 3 transition first if I would have to choose
16:16:14 <slaweq> anything else anyone wants to add regarding to stadium projects?
16:16:24 <mlavalle> not me
16:16:45 <njohnston> I wonder if once we're done we could doublecheck the migration by adding a DNM change that deletes the python 2.7 version and see how all our jobs do
16:17:26 <slaweq> python 2.7 version of what? of python binaries on host?
16:17:31 <njohnston> yes
16:18:01 <njohnston> just to make absolutely sure we have everything
16:18:09 <slaweq> njohnston: yes, that is good idea and we can also do it for neutron jobs
16:19:06 <slaweq> njohnston: can You add such note on the top of the etherpad maybe?
16:19:13 <slaweq> to not forget about it
16:19:36 <njohnston> sure thing!
16:19:40 <slaweq> thx njohnston
16:19:45 <slaweq> ok, lets move on
16:19:48 <slaweq> next topic
16:19:50 <slaweq> #topic Grafana
16:19:52 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:20:37 <slaweq> during last couple of days our check queue was broken (neutron-tempest-plugin-scenario-linuxbridge job) so there is not too much in gate queue graphs
16:21:10 <slaweq> now it's getting back to normal thx to fix from liuyulong
16:21:37 <slaweq> also rally job was failing quite often
16:21:49 <mlavalle> becasue of server creation?
16:21:55 <slaweq> yes, and that should be now fixed with https://review.opendev.org/#/c/665614/
16:22:16 <slaweq> and it's going back to low numbers today
16:23:00 <mlavalle> cool
16:23:24 <slaweq> from other things, networking-ovn job is almost switched to be voting - njohnston can You update dashboard when it will be merged in neutron?
16:23:35 <njohnston> will do
16:23:45 <slaweq> thx a lot
16:23:49 <njohnston> #action njohnston update dashboard when ovn job becomes voting
16:24:28 <slaweq> and last note from me about grafana - still functional and fullstack jobs are quite unstable :/
16:25:06 <slaweq> I have couple of examples ready for latter part of the meeting :)
16:25:32 <slaweq> anything else strange/worth to discuss do You see on grafana?
16:25:53 <mlavalle> not from me
16:26:48 <slaweq> ok, so lets talk about this unstable jobs then
16:26:50 <slaweq> #topic fullstack/functional
16:26:54 <slaweq> first functional tests
16:27:16 <slaweq> the most often during last week I saw failed neutron.tests.functional.agent.linux.test_ip_lib.IpMonitorTestCase tests, like:
16:27:22 <slaweq> http://logs.openstack.org/36/662236/3/check/neutron-functional/3214fdd/testr_results.html.gz
16:27:32 <ralonsoh> sorry for that
16:27:33 <ralonsoh> https://review.opendev.org/#/c/664889/
16:27:35 <slaweq> but that should be fixed with patch from ralonsoh https://review.opendev.org/#/c/664889/
16:27:46 <slaweq> ralonsoh: You don't need to sorry :)
16:28:00 <slaweq> thx for quick fix
16:28:35 <slaweq> other than that I saw some failures which happend once (at least I didn't found more examples)
16:28:44 <slaweq> e.g. tests from module neutron.tests.functional.test_server
16:28:50 <slaweq> http://logs.openstack.org/12/640812/2/check/neutron-functional/35dc53f/testr_results.html.gz
16:30:27 <slaweq> looking at logs from those tests, I see http://logs.openstack.org/12/640812/2/check/neutron-functional/35dc53f/controller/logs/dsvm-functional-logs/neutron.tests.functional.test_server.TestWsgiServer.test_restart_wsgi_on_sighup_multiple_workers.txt.gz#_2019-06-17_20_37_09_085
16:30:43 <slaweq> do You know if it is normal to send SIG_UNBLOCK there?
16:31:35 <mlavalle> I don't know
16:31:42 <slaweq> ok, in other test (which passed) it's exactly the same
16:32:45 <slaweq> any ideas about why it could fail?
16:34:19 <mlavalle> hard to say
16:34:30 <mlavalle> without digging deeper
16:34:39 <slaweq> I know :/
16:34:49 <slaweq> if that will happen again, I will open bug for it
16:35:21 <mlavalle> where is the signal coming from?
16:35:46 <slaweq> it's from test: https://github.com/openstack/neutron/blob/master/neutron/tests/functional/test_server.py#L133
16:35:58 <slaweq> it failed exactly in L159
16:36:57 <mlavalle> ahh, it's waiting
16:37:09 <slaweq> yes, 5 seconds
16:37:16 <slaweq> should be enough IMO
16:37:29 <mlavalle> yes, looks like plenty
16:37:46 <ralonsoh> it takes 8 secs to stop the process
16:37:55 <ralonsoh> from 37:09 to 37:17
16:38:25 <slaweq> ralonsoh: so do You think it could be "just" overloaded node?
16:38:35 <ralonsoh> maybe, I'll take a look at this one
16:38:45 <ralonsoh> I'll open a low priority bug, just to track it
16:38:46 <slaweq> thx ralonsoh
16:38:47 <ralonsoh> ok?
16:38:53 <slaweq> sure, that is good idea
16:39:16 <slaweq> next one on my list is: neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase
16:39:23 <slaweq> http://logs.openstack.org/48/665548/4/check/neutron-functional/f6d6447/testr_results.html.gz
16:40:20 <ralonsoh> not only this one but I've seen this error in other tests
16:40:29 <ralonsoh> the namespace not created
16:40:38 <slaweq> logs from this one are in:
16:40:40 <slaweq> http://logs.openstack.org/48/665548/4/check/neutron-functional/f6d6447/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_spawns_conflicting_pid_base_process.txt.gz#_2019-06-17_10_02_30_829
16:42:02 <ralonsoh> Just a guess, but maybe is something in pyroute, I'll review the latest patches
16:42:31 <slaweq> so I will report it as a bug also to track those issues
16:42:41 <slaweq> ok?
16:42:44 <ralonsoh> +1
16:43:14 <slaweq> what is strange is that many commands in this namespace was executed earlier: http://logs.openstack.org/48/665548/4/check/neutron-functional/f6d6447/controller/logs/journal_log.txt.gz#_Jun_17_10_01_41
16:44:12 <slaweq> I will open bug and try to look deeper into this
16:44:35 <slaweq> #action slaweq to open bug related to missing namespace issue in functional tests
16:44:56 <slaweq> #action ralonsoh to open bug related to failed test_server functional tests
16:45:16 <slaweq> and last one on my list
16:45:18 <slaweq> neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase
16:45:30 <slaweq> http://logs.openstack.org/40/665640/4/check/neutron-functional-python27/e1df845/testr_results.html.gz
16:46:08 <slaweq> in logs for this test there is not too much http://logs.openstack.org/40/665640/4/check/neutron-functional-python27/e1df845/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.linux.test_linuxbridge_arp_protect.LinuxBridgeARPSpoofTestCase.test_arp_protection_dead_reference_removal.txt.gz
16:46:10 <slaweq> :/
16:46:42 <slaweq> and it should be more there
16:48:18 <ralonsoh> slaweq, maybe (I need to check better those tests cases) we should execute them in temp namespaces
16:48:31 <ralonsoh> slaweq, I don't think we are doing this now
16:48:38 <ralonsoh> (if it's possible)
16:48:52 <slaweq> ralonsoh: but You mean to run "tox" in namespace?
16:48:57 <ralonsoh> no no
16:49:07 <ralonsoh> like in the ip_lib commands
16:49:15 <ralonsoh> create everything in a namespace
16:49:24 <ralonsoh> a temp one, to avoid interferences
16:49:31 <slaweq> ahh, ok
16:49:36 <ralonsoh> is the best way to isolate a test case
16:49:39 <slaweq> yes, we should probably
16:49:42 <ralonsoh> (if possible)
16:49:49 <ralonsoh> ok I'll take this one
16:49:51 <mlavalle> that makes sense
16:50:01 <slaweq> but are You talking about this linuxbridge arp tests? or the previous one?
16:50:11 <ralonsoh> LB arp
16:50:22 <slaweq> yes, that we should run in namespaces if possible IMO
16:50:59 <slaweq> ralonsoh: so You will take care of it, right?
16:51:00 <ralonsoh> I'll create a bug for this
16:51:02 <ralonsoh> yes
16:51:19 <slaweq> great, thx a lot
16:51:31 <slaweq> so that would be all regarding functional tests from me
16:51:43 <slaweq> do You have anything else?
16:51:47 <mlavalle> not me
16:52:13 <slaweq> ok, lets quickly go through fullstack issues
16:52:22 <slaweq> first one (I spotted it at least twice)
16:52:24 <slaweq> neutron.tests.fullstack.test_l3_agent.TestLegacyL3Agent.test_north_south_traffic
16:52:29 <slaweq> http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/testr_results.html.gz
16:52:31 <slaweq> http://logs.openstack.org/29/664629/1/check/neutron-fullstack/e99aa1c/testr_results.html.gz
16:53:52 <slaweq> there is a lot of errors in ovs agent logs: http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--01-52-58-123455_log.txt.gz?level=ERROR
16:55:36 <haleyb> CRITICAL neutron [req-c0ac0a6c-e65a-46c3-a123-82239299ec08 - - - - -] Unhandled error: RuntimeError: No datapath_id on bridge br-ethaa0e64ff9
16:55:56 <slaweq> and the same errors in second case http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--01-52-58-123455_log.txt.gz?level=ERROR
16:56:06 <slaweq> sorry, this is the same link as above
16:56:42 <haleyb> http://logs.openstack.org/11/662111/21/check/neutron-fullstack/c0f8029/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--01-52-58-123455_log.txt.gz?#_2019-06-18_01_53_52_665
16:56:46 <slaweq> http://logs.openstack.org/29/664629/1/check/neutron-fullstack/e99aa1c/controller/logs/dsvm-fullstack-logs/TestLegacyL3Agent.test_north_south_traffic/neutron-openvswitch-agent--2019-06-18--02-04-44-968982_log.txt.gz?level=ERROR
16:56:56 <slaweq> there is log from second example
16:57:00 <slaweq> and the same error
16:57:03 <haleyb> so right before that the bridge was recreated
16:57:37 <slaweq> anyone wants to volunteer and check this issue deeper?
16:57:52 <slaweq> looks like that can be "real" bug not only test issue
16:58:03 <slaweq> *might be
16:58:44 <mlavalle> too much in my plate right now. maybe next week
16:58:55 <slaweq> ok, so I will report this bug for now and we will see later
16:59:02 <mlavalle> +1
16:59:11 <slaweq> mlavalle: sure, I also will not have time to work on it this week
16:59:34 <slaweq> #action slaweq to report bug regarding failing neutron.tests.fullstack.test_l3_agent.TestLegacyL3Agent.test_north_south_traffic tests
16:59:44 <slaweq> ok, and I think we are out of time now
16:59:48 <slaweq> thx for attending
16:59:51 <mlavalle> o/
16:59:52 <slaweq> see You next week
16:59:54 <slaweq> o/
16:59:56 <njohnsto_> o/
16:59:56 <slaweq> #endmeeting