16:00:04 <slaweq> #startmeeting neutron_ci
16:00:09 <openstack> Meeting started Tue May 28 16:00:04 2019 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:10 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:11 <njohnston_> o/
16:00:12 <openstack> The meeting name has been set to 'neutron_ci'
16:00:15 <slaweq> hi
16:00:19 <mlavalle> o/
16:01:19 <ralonsoh> hi
16:01:29 <slaweq> ok, lets start
16:01:37 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:01:44 <slaweq> please open for later :)
16:01:59 <slaweq> #topic Actions from previous meetings
16:02:09 <slaweq> #undo
16:02:10 <openstack> Removing item from minutes: #topic Actions from previous meetings
16:02:14 <slaweq> sorry, I forgot
16:02:18 <mlavalle> LOL
16:02:21 <slaweq> agenda for today's meeting
16:02:23 <slaweq> #link https://etherpad.openstack.org/p/neutron-ci-meetings
16:02:28 <slaweq> and now lets start
16:02:31 <slaweq> #topic Actions from previous meetings
16:02:42 <slaweq> mlavalle to continue debuging reasons of neutron-tempest-plugin-dvr-multinode-scenario failures
16:03:00 <mlavalle> I didn't dovote as much time as I wanted but I made some progress
16:03:58 <mlavalle> looking at many patches, one common test case that fails is test_connectivity_through_2_routers
16:04:12 <mlavalle> so I filed a bug: https://bugs.launchpad.net/neutron/+bug/1830763
16:04:13 <openstack> Launchpad bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] - Assigned to Miguel Lavalle (minsel)
16:04:19 <mlavalle> assigned it to myself
16:04:37 <slaweq> it's test written by me :/
16:04:46 <mlavalle> and added a Kibana query: http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22test_connectivity_through_2_routers%5C%22%20AND%20build_status:%5C%22FAILURE%5C%22%20AND%20build_branch:%5C%22master%5C%22%20AND%20build_name:%5C%22neutron-tempest-plugin-dvr-multinode-scenario%5C%22%20AND%20project:%5C%22openstack%2Fneutron%5C%22
16:05:06 <mlavalle> so I will be focusing on this one over the next few days
16:05:21 <mlavalle> that's all I have to say about this
16:05:40 <slaweq> ok, thx for update mlavalle
16:05:56 <slaweq> do You mind if I will assign it as an action for next week to You?
16:06:08 <mlavalle> please do
16:06:19 <slaweq> #action mlavalle to debug neutron-tempest-plugin-dvr-multinode-scenario failures (bug 1830763)
16:06:20 <openstack> bug 1830763 in neutron "Debug neutron-tempest-plugin-dvr-multinode-scenario failures" [High,Confirmed] https://launchpad.net/bugs/1830763 - Assigned to Miguel Lavalle (minsel)
16:06:21 <slaweq> thx
16:06:35 <slaweq> ok, so next one
16:06:36 <slaweq> mlavalle to talk with nova folks about slow responses for metadata requests
16:06:46 <mlavalle> so I decided not to
16:07:09 <mlavalle> analyzing in detail some logs and the code
16:07:29 <mlavalle> and after conversation with slaweq, we decided that the problem doesn't seem to be on the Nova side
16:07:39 <slaweq> yes
16:07:44 <slaweq> I agree :)
16:09:50 <slaweq> mlavalle: do You want to explain what You found in logs there?
16:10:19 <mlavalle> correlating the code with the logs, we found that there time elpased
16:10:42 <mlavalle> between sending the request for keypairs to Nova and getting the response was less than 2 secs
16:11:06 <mlavalle> that's it in a nutshell
16:11:44 <slaweq> but from VM PoV there is (probably) more that 10 seconds for this request and that's why it fails
16:11:57 <mlavalle> yeap
16:12:32 <slaweq> some time ago I started some patch to add zuul role to fetch journal log: https://review.opendev.org/#/c/643733/
16:12:33 <patchbot> patch 643733 - zuul/zuul-jobs - Add role to fetch journal log from test node - 3 patch sets
16:12:42 <slaweq> but I never had time to work on this
16:13:02 <slaweq> I today respined this patch as it may help with this issue also
16:13:15 <slaweq> because e.g. haproxy logs are in journal log probably
16:13:46 <mlavalle> yeah
16:13:50 <mlavalle> good idea
16:13:56 <slaweq> so I will assign this to myself as an action for next week :)
16:14:14 <slaweq> #action slaweq to continue work on fetch-journal-log zuul role
16:14:27 <slaweq> that way I will force myself to spent some time on it :)
16:14:43 <slaweq> ok, lets move forward
16:14:46 <slaweq> next one
16:14:48 <slaweq> slaweq to reopen bug related to failures of neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_ha_router_restart_agents_no_packet_lost
16:14:56 <slaweq> Done, bug https://bugs.launchpad.net/neutron/+bug/1798475 reopened
16:14:58 <openstack> Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] - Assigned to LIU Yulong (dragon889)
16:15:05 <slaweq> I send also patch to mark this test as unstable again https://review.opendev.org/#/c/660592/
16:15:05 <patchbot> patch 660592 - neutron - Mark fullstack test_ha_router_restart_agents_no_pa... - 1 patch set
16:15:16 <slaweq> please check this patch if You will have some time :)
16:15:33 <slaweq> and the last one was:
16:15:35 <slaweq> ralonsoh to propose patch with additional logging to help debug https://bugs.launchpad.net/neutron/+bug/1799555
16:15:36 <openstack> Launchpad bug 1799555 in neutron "Fullstack test neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent timeout" [High,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
16:16:08 <ralonsoh> https://review.opendev.org/#/c/660785/
16:16:09 <patchbot> patch 660785 - neutron - Add debug information to AutoScheduler and BaseSch... - 4 patch sets
16:16:42 <slaweq> thx ralonsoh
16:16:52 <slaweq> I will review it tonight or tomorrow morning
16:17:00 <ralonsoh> thanks!
16:17:03 <mlavalle> it's also in my pile
16:17:16 <njohnston> +1
16:17:45 <slaweq> ok
16:17:49 <slaweq> that's all from last week
16:17:53 <slaweq> questions/comments?
16:18:09 <mlavalle> none from me
16:18:46 <slaweq> ok, lets move on then
16:18:48 <slaweq> #topic Stadium projects
16:18:54 <slaweq> Python 3 migration
16:19:02 <slaweq> Stadium projects etherpad: https://etherpad.openstack.org/p/neutron_stadium_python3_status
16:19:09 <slaweq> njohnston: any update on this?
16:19:51 <njohnston> nothing at present, no.
16:20:43 <slaweq> I think that thanks tidwellr we have migrated neutron-dynamic-routing to py3
16:20:50 <slaweq> it's done in https://review.opendev.org/#/c/657409/
16:20:51 <patchbot> patch 657409 - neutron-dynamic-routing - Convert CI jobs to python 3 (MERGED) - 8 patch sets
16:21:00 <slaweq> so I will update etherpad
16:21:04 <njohnston> excellent
16:21:54 <mlavalle> Great
16:22:17 <slaweq> I will try to pick up one of the projects this week if I will have couple of free minutes :)
16:22:37 <slaweq> ok, next thing related to stadium projects
16:22:39 <slaweq> tempest-plugins migration
16:22:43 <slaweq> Etherpad: https://etherpad.openstack.org/p/neutron_stadium_move_to_tempest_plugin_repo
16:23:05 <slaweq> for bgpvpn we merged first patch https://review.opendev.org/#/c/652991/
16:23:06 <patchbot> patch 652991 - neutron-tempest-plugin - Rehome tempest tests from networking-bgpvpn repo (MERGED) - 14 patch sets
16:23:17 <slaweq> and I have question regarding to this patch
16:23:45 <slaweq> I unfortunatelly added  neutron-tempest-plugin-bgpvpn-bagpipe job to "neutron-tempest-plugin-jobs" template
16:23:55 <slaweq> thus it's now run on every neutron patch
16:24:01 <slaweq> do we want it like that?
16:24:13 <slaweq> I think that it shouldn't be in this template, right?
16:24:18 <mlavalle> I don't think so
16:24:37 <slaweq> yes, I though that when I today realized that it's running in neutron gate too
16:24:38 <njohnston> I agree, I don't think so
16:24:40 <slaweq> so I will change that
16:24:56 <slaweq> #action slaweq to remove neutron-tempest-plugin-bgpvpn-bagpipe from "neutron-tempest-plugin-jobs" template
16:25:18 <slaweq> except that, there is also second patch for bgpvpn project: https://review.opendev.org/#/c/657793/
16:25:19 <patchbot> patch 657793 - networking-bgpvpn - Rehome tempest tests to neutron-tempest-plugin repo - 1 patch set
16:25:27 <slaweq> please review it if You will have some time
16:25:38 <slaweq> especially mlavalle as You probably have +2 power in this repo :)
16:25:56 <mlavalle> I do, indeed
16:26:01 <slaweq> ths
16:26:03 <slaweq> *thx
16:26:22 <njohnston> the tests for the migration of neutron-fwaas tempest tests are still failing, but I am concerned that is because fwaas is broken for other reasons, as slaweq noticed in the work to move the neutron-fwaas-fullstack job to zuulv3 https://review.opendev.org/644526
16:26:23 <patchbot> patch 644526 - neutron-fwaas - Switch neutron-fwaas-fullstack job to zuulv3 syntax - 20 patch sets
16:27:10 <slaweq> njohnston: but what I noticed in my patch is that fullstack job in fwaas repo is broken
16:27:14 <mlavalle> njohnston: I intend to send an email to Sridhar and xgerman
16:27:23 <slaweq> all other jobs are working fine there
16:27:30 <mlavalle> o about their involvment with fwaas
16:27:49 <njohnston> I don't even have a working email address for xgerman since he left Rackspace
16:27:54 <mlavalle> after they respond, we can proceed to send a plea of help the the general ML
16:28:09 <mlavalle> I have a way to find it
16:28:15 <njohnston> ok cool
16:28:33 <slaweq> mlavalle has got his secret PTL's tools to find it :P
16:28:33 <njohnston> anyway I will keep digging to see if I can find the root cause of the issue at least
16:28:58 * mlavalle wishes that was the case, LOL
16:29:11 <slaweq> :)
16:29:18 <haleyb> i have german's email if you need it
16:29:27 <mlavalle> cool
16:29:28 <slaweq> ok, njohnston if You would need any help with this zuul issues, please ping me
16:29:45 <slaweq> ha, so haleyb is this secret PTL's tool :P
16:29:55 <haleyb> :)
16:30:30 <slaweq> ok, so moving on
16:30:40 <slaweq> there is also networking-sfc project
16:30:46 <slaweq> and first patch is merged https://review.opendev.org/#/c/653012
16:30:47 <patchbot> patch 653012 - neutron-tempest-plugin - Migrate networking-sfc tests to neutron-tempest-pl... (MERGED) - 10 patch sets
16:31:03 <slaweq> bcafarel said that second patch https://review.opendev.org/#/c/653747 is also ready for review
16:31:03 <patchbot> patch 653747 - networking-sfc - Complete move of networking-sfc tempest tests to t... - 21 patch sets
16:31:12 <slaweq> so please add it to Your review list :)
16:31:39 <slaweq> especially people who have +2 in this repo
16:32:32 <slaweq> ok
16:32:43 <mlavalle> which I do
16:32:44 <slaweq> any other questions/comments related to stadium projects?
16:33:04 <mlavalle> just to say I didn't make much proress with vpnaas this week
16:33:20 <mlavalle> I'll try again over the next few days
16:33:29 <slaweq> sure, no rush :)
16:33:53 <slaweq> we don't have any deadline for this
16:34:28 <slaweq> (but You can turn on hulk some day if it will take too long ;D)
16:34:49 * mlavalle shudders just thinking about it
16:35:14 <slaweq> LOL
16:35:23 <slaweq> ok, lets move on
16:35:31 <slaweq> #topic Grafana
16:35:37 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:37:07 <slaweq> I don't see any urgent issues there
16:37:09 <mlavalle> not too bad
16:37:15 <slaweq> all looks "as usually"
16:37:21 <slaweq> and indeed not very bad :)
16:37:34 <mlavalle> it clearly shows the effect of the long weekend in the US
16:37:46 <slaweq> we still have some failures in tempest jobs and in functional/fullstack jobs but nothig very bad
16:38:04 <mlavalle> mostly functional, the way I see it
16:38:10 <njohnston> is that what happened to the unit tests graph in the gate queue - disproportionate impact of job failures due to low volume?
16:39:09 <slaweq> njohnston: I think so - look that this highest failure rate is when it was run only 3 times
16:39:23 <njohnston> makes sense
16:39:27 <slaweq> so it could be even some DNM patch with broken tests :)
16:39:49 <slaweq> lets see in next couple of days how it will be
16:40:53 <slaweq> ok, lets move on then
16:40:58 <slaweq> #topic fullstack/functional
16:41:16 <slaweq> I was looking at some recent failed jobs
16:41:31 <slaweq> and I found fullstack failure on neutron.tests.fullstack.test_l3_agent.TestHAL3Agent. test_ha_router_restart_agents_no_packet_lost
16:41:39 <slaweq> which I already propsoed to mark as unstable again
16:41:48 <slaweq> and bug is reopened for this
16:42:39 <slaweq> but also liuyulong|away have patch which may help for this https://review.opendev.org/#/c/660905/
16:42:40 <patchbot> patch 660905 - neutron - Set neutron-keepalived-state-change proctitle - 1 patch set
16:42:59 <slaweq> so I think that we don't need to talk about this failed test too much here
16:43:20 <mlavalle> ok
16:43:22 <slaweq> regarding to functional tests I found 2 issues
16:43:29 <slaweq> one in neutron.tests.functional.test_server.TestWsgiServer. test_restart_wsgi_on_sighup_multiple_workers:
16:43:35 <slaweq> http://logs.openstack.org/69/655969/3/check/neutron-functional/2a0533c/testr_results.html.gz
16:45:28 <slaweq> according to logstash it happend twice in last week
16:45:36 <slaweq> so it's not too much for now
16:45:41 <mlavalle> yeah
16:45:51 <slaweq> IMO let's keep an eye on it and we will see how it will be
16:45:56 <slaweq> what do You think?
16:46:16 <mlavalle> yes
16:46:47 <slaweq> and the second issue, which I found at least twice in last couple of days
16:46:55 <slaweq> neutron.tests.functional.agent.test_l2_ovs_agent.TestOVSAgent. test_assert_br_phys_patch_port_ofports_dont_change
16:47:01 <slaweq> http://logs.openstack.org/87/658787/6/check/neutron-functional-python27/100ec44/testr_results.html.gz
16:47:03 <slaweq> http://logs.openstack.org/05/660905/1/check/neutron-functional/f48d9de/testr_results.html.gz
16:48:21 <slaweq> so, this one is also not happening very often for now
16:48:32 <slaweq> lets also keep an eye on it and we will see how it will be
16:48:34 <slaweq> ok?
16:48:37 <mlavalle> cool
16:48:49 <slaweq> ok
16:48:57 <slaweq> other than that I think it's good
16:49:31 <slaweq> regarding tempest/scenario tests the issue which we have is well known (ssh problems) and we already talked about it
16:49:41 <mlavalle> yeap
16:49:55 <slaweq> so, I have one last topic for today
16:49:59 <slaweq> #topic Open discussion
16:50:12 <slaweq> I wanted to ask about one thing here
16:50:36 <slaweq> recently I found that we don't have any API/scenario tests for port forwarding in neutron-tempest-plugin repo
16:50:57 <mlavalle> only scenario?
16:51:05 <slaweq> so we have some e.g. functional tests of course but we are missing any end-to-end tests
16:51:18 <slaweq> so I started adding such tests in neutron-tempest-plugin repo
16:51:31 <slaweq> but I wanted to ask if we can do something to avoid such things in future
16:51:52 <slaweq> I proposed some small update to reviewers guide https://review.opendev.org/661770
16:51:53 <patchbot> patch 661770 - neutron - Add short info about tempest API/scenario tests to... - 2 patch sets
16:51:58 <slaweq> so please check it
16:52:11 <slaweq> but maybe there is something else what we can do also
16:52:15 <slaweq> what do You think?
16:52:32 <mlavalle> Nice!
16:52:35 <mlavalle> Thanks!
16:52:35 <njohnston> yeah, earlier today I noticed that it doesnt look like we have any kind of testing for vlan trunking that makes sure it works when instances get migrated
16:52:54 <slaweq> njohnston: yep, so it's second thing :/
16:52:59 <njohnston> it should be a criterion for completion of the feature
16:53:09 <njohnston> slaweq++
16:53:32 <slaweq> njohnston: I agree, that's why I added this note to reviewers guide
16:53:41 <slaweq> but maybe it should be written also somewhere else?
16:53:55 <slaweq> I don't know TBH :)
16:56:09 <mlavalle> I think that's enough
16:56:26 <slaweq> ok, thx mlavalle and njohnston for opinions :)
16:56:34 <haleyb> i had one question
16:56:39 <slaweq> and we should all remember about this during reviews
16:56:43 <slaweq> haleyb: sure
16:57:12 <haleyb> i might have actually thought of an answer, but i'm trying to fix one of the OVN periodic jobs
16:57:15 <haleyb> https://review.opendev.org/#/c/661065/
16:57:16 <patchbot> patch 661065 - neutron - Fix OVS build issue on Fedora - 1 patch set
16:57:29 <haleyb> but they only run on the master branch
16:57:48 <haleyb> i was wondering if there was a way to trigger that job on any change
16:58:08 <haleyb> or should i just add it to a test patch in the regular job run
16:58:29 <slaweq> haleyb: You can add it to check queue in zuul config file
16:58:36 <slaweq> and then it will be run on any patch
16:58:55 <haleyb> slaweq: yes, that's what i thought too just a second ago
16:59:04 <haleyb> we never much look at those periodic jobs
16:59:22 <slaweq> it's here https://github.com/openstack/networking-ovn/blob/master/zuul.d/project.yaml
16:59:45 <slaweq> haleyb: for neutron I'm usually looking before ci meeting if they are not failing too much
16:59:55 <slaweq> but except that, never :)
17:00:13 <haleyb> :(
17:00:17 <slaweq> ok, it's time to end meeting
17:00:17 <haleyb> time is up
17:00:21 <slaweq> #endmeeting