15:01:15 <slaweq_> #startmeeting neutron_ci
15:01:16 <openstack> Meeting started Wed Oct  7 15:01:15 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq_. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:01:17 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:01:18 <slaweq_> hi
15:01:20 <openstack> The meeting name has been set to 'neutron_ci'
15:01:23 <lajoskatona> Hi
15:01:45 <bcafarel> o/
15:02:08 <ralonsoh> hi
15:02:32 <slaweq_> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:02:35 <slaweq_> Please open now :)
15:03:27 <slaweq_> #topic Actions from previous meetings
15:03:35 <slaweq_> bcafarel to update our grafana dashboards for stable branches
15:04:09 <bcafarel> in progress, not sent yet (I wanted to check jobs listed there)
15:04:18 <slaweq_> ok, thx bcafarel
15:04:24 <slaweq_> I will assign it to You for next week
15:04:30 <slaweq_> just to remember about it
15:04:33 <slaweq_> ok?
15:04:42 <bcafarel> sounds good, also to have reviewers if it gets forgotten
15:04:46 <slaweq_> #action bcafarel to update our grafana dashboards for stable branches
15:04:53 <slaweq_> thx a lot
15:05:05 <slaweq_> ok, next one
15:05:07 <slaweq_> ralonsoh to report a bug and check failing openstack-tox-py36-with-ovsdbapp-master periodic job
15:05:29 <ralonsoh> I sent a patch to try to solve it
15:05:31 <ralonsoh> one sec
15:05:45 <ralonsoh> (should be on the etherpad)
15:06:30 <slaweq_> I don't see it on etherpad
15:06:35 <ralonsoh> https://review.opendev.org/#/c/755256/
15:06:57 <ralonsoh> avoid to monkey patch processutils
15:07:21 <ralonsoh> well, use the original current_thread _active
15:07:35 <ralonsoh> but we'll need a new version of oslo.concurrency
15:07:51 <slaweq_> and it seems that it helped
15:08:01 <ralonsoh> at least locally
15:08:12 <ralonsoh> but I can't say that in the CI
15:08:13 <slaweq_> https://zuul.openstack.org/buildset/aa6cb9d44d1a49368494071338c7415e
15:08:16 <slaweq_> :)
15:08:18 <slaweq_> it helped
15:08:39 <ralonsoh> ah8hh ok, this is2 another problem
15:08:41 <ralonsoh> sorry
15:09:01 <ralonsoh> #link https://review.opendev.org/#/c/749537/
15:09:04 <ralonsoh> this is the patch
15:09:06 <ralonsoh> sorry again
15:09:20 <slaweq_> :)
15:09:24 <slaweq_> don't need to sorry
15:09:29 <slaweq_> good that it's fixed :)
15:09:34 <slaweq_> thx ralonsoh
15:09:40 <slaweq_> and thx otherwiseguy
15:10:19 <slaweq_> ok, so I think we can move on to the next topics
15:10:22 <slaweq_> #topic Switch to Ubuntu Focal
15:10:29 <slaweq_> Etherpad: https://etherpad.opendev.org/p/neutron-victoria-switch_to_focal
15:10:40 <slaweq_> we still have some stadium projects to check/change
15:10:49 <slaweq_> but I didn't had time this week
15:10:57 <slaweq_> do You have any other updates on that?
15:11:03 <ralonsoh> no
15:11:50 <lajoskatona> no
15:12:38 <bcafarel> https://review.opendev.org/#/c/754068/ longing for second +2 for sfc :)
15:13:02 <bcafarel> else topic:migrate-to-focal list looks good for us
15:13:15 <slaweq_> bcafarel: I already gave +2 :)
15:13:21 <slaweq_> so I can't help with that one now
15:13:28 <slaweq_> ralonsoh: lajoskatona but You can ;)
15:13:32 <ralonsoh> sure
15:13:43 <lajoskatona> done :-)
15:14:38 <slaweq_> thx
15:14:51 <bcafarel> thanks :)
15:15:03 <lajoskatona> Shall I have a slighly related question, do we need this any more: https://review.opendev.org/755721  ?
15:15:37 <slaweq_> lajoskatona: nope
15:15:45 <slaweq_> it was an issue with pypi mirror
15:16:08 <lajoskatona> slaweq_: yeah that's why I asked :-) I abandone it then
15:16:11 <slaweq_> and I think ralonsoh fixed it on devstack by capping setuptools version
15:16:21 <ralonsoh> but that was rejected
15:16:28 <ralonsoh> the problem was in the pypi server
15:16:35 <slaweq_> ralonsoh: ahh, ok
15:16:39 <ralonsoh> admins talked to pypi folks to solve that
15:16:45 <slaweq_> most important is that problem is fixed now :)
15:16:49 <ralonsoh> yes
15:16:53 <slaweq_> thx ralonsoh and lajoskatona for taking care of it :)
15:18:05 <slaweq_> ok
15:18:06 <lajoskatona> no problem
15:18:11 <slaweq_> regrading standardize on zuul v3
15:18:37 <slaweq_> we merged networking-odl patch https://review.opendev.org/#/c/725647/
15:18:50 <slaweq_> so the last one missing is  https://review.opendev.org/#/c/729591/ for neutron
15:19:30 <slaweq_> and it just failed again, at least functional tests job: https://40f71fdb4a17c8b8e33a-40a7733116b3138073a0fe5a58665a17.ssl.cf5.rackcdn.com/729591/21/check/neutron-functional-with-uwsgi/aace04f/testr_results.html
15:19:31 <tosky> which received its fair share of rechecks
15:19:33 <slaweq_> :/
15:20:57 <ralonsoh> slaweq_, that's the other related problem I was talking this morning
15:21:06 <ralonsoh> now we don't fail in the OVN method
15:21:17 <ralonsoh> but in the "old_method" --> L3 plugin
15:21:25 <ralonsoh> I need to check if this is related
15:21:40 <ralonsoh> I'll talk to otherwiseguy
15:21:49 <slaweq_> ralonsoh: ok
15:22:01 <tosky> please remember to vote also on the networking-odl backport for stable/victoria: https://review.opendev.org/#/c/756324/
15:22:38 <slaweq_> tosky: I already did
15:22:45 <slaweq_> I think we need bcafarel's vote also
15:23:07 <tosky> yeah, another stable core
15:23:12 <tosky> or neutron stable core
15:23:37 <bcafarel> reviewed and W+1 :)
15:24:09 <slaweq_> thx
15:24:47 <slaweq_> so I think we can move on to the next topic now
15:24:50 <slaweq_> #topic Stable branches
15:25:01 <slaweq_> Ussuri dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1
15:25:04 <slaweq_> Train dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1
15:25:48 <bcafarel> one thing I remember now on stable dashboards, we will also need a victoria template for neuton-tempest-plugin
15:25:57 <bcafarel> and switch neutron stable/victoria to it
15:26:13 <slaweq_> bcafarel: yes, true
15:26:18 <slaweq_> I will do this template
15:26:29 <slaweq_> thx for reminder
15:26:42 <slaweq_> #action slaweq to make neutron-tempest-plugin victoria template
15:26:55 <bcafarel> np, I remembered when my test dashboard came up empty for them
15:29:52 <slaweq_> btw. I have one new issue in stable/train
15:29:54 <slaweq_> https://bugs.launchpad.net/neutron/+bug/1898748
15:29:55 <openstack> Launchpad bug 1898748 in neutron "[stable/train] Creation of the QoS policy takes ages" [Critical,New]
15:30:06 <slaweq_> did You saw it already maybe?
15:30:14 <ralonsoh> no
15:30:26 <slaweq_> it seems that it breaks devstack gate for stable/train :/
15:31:09 <bcafarel> I don't think I saw it either
15:31:26 <slaweq_> is there anyone who wants to check that maybe?
15:32:16 <slaweq_> if not, I will try to check that
15:32:21 <ralonsoh> I'll try to take a look at this error tomorrow
15:32:29 <slaweq_> thx ralonsoh :)
15:33:03 <slaweq_> ok, lets move on
15:33:08 <slaweq_> #topic Grafana
15:33:13 <slaweq_> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:34:47 <slaweq_> IMO worst thing from voting jobs is neutron-functional-with-uwsgi now
15:34:57 <slaweq_> and we have couple of issues there
15:35:22 <slaweq_> and also most of the ovn based jobs are failing 100% of times
15:36:33 <slaweq_> anything else You have regarding grafana in general?
15:36:43 <slaweq_> or should we move on to the specific job types?
15:37:37 <bcafarel> nothing from me
15:37:49 <slaweq_> ok, so lets move on
15:37:57 <slaweq_> #topic functional/fullstack
15:38:17 <slaweq_> I reported today https://bugs.launchpad.net/neutron/+bug/1898859
15:38:18 <openstack> Launchpad bug 1898859 in neutron "Functional test neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_spawns_conflicting_pid_vrrp_subprocess is failing" [High,Confirmed]
15:38:33 <slaweq_> as I saw it at least twice recently
15:38:51 <slaweq_> IIRC we already saw it in the past too but I wasn't sure if we have bug reported for that already
15:38:59 <ralonsoh> related to the ns deletion
15:39:07 <ralonsoh> https://review.opendev.org/#/c/754938/
15:39:15 <ralonsoh> please, review ^^
15:40:32 <slaweq_> ahh, right
15:40:35 <slaweq_> now I remember :)
15:40:53 <slaweq_> so I will mark https://bugs.launchpad.net/neutron/+bug/1898859 as duplicate of https://bugs.launchpad.net/neutron/+bug/1838793
15:40:55 <openstack> Launchpad bug 1898859 in neutron "Functional test neutron.tests.functional.agent.linux.test_keepalived.KeepalivedManagerTestCase.test_keepalived_spawns_conflicting_pid_vrrp_subprocess is failing" [High,Confirmed]
15:40:55 <ralonsoh> I think you can join both LP bugs
15:40:56 <openstack> Launchpad bug 1838793 in neutron ""KeepalivedManagerTestCase" tests failing during namespace deletion" [High,Confirmed] - Assigned to Rodolfo Alonso (rodolfo-alonso-hernandez)
15:40:58 <ralonsoh> yes
15:41:23 <slaweq_> lajoskatona: can You check that patch from ralonsoh?
15:41:36 <slaweq_> I hope it will help us a bit with this functional tests job :)
15:41:59 <lajoskatona> slaweq_: sure, I cheked it in the past, so has some background :-)
15:42:08 <slaweq_> lajoskatona: thx a lot
15:42:31 <slaweq_> and for other issues with functional tests I know that ralonsoh told me that he will open LPs
15:42:47 <ralonsoh> the one related to the agents
15:42:56 <ralonsoh> test_agent_show
15:45:01 <slaweq_> yes, did You report it already?
15:45:23 <ralonsoh> not yet
15:45:40 <ralonsoh> I'm still investigating the error
15:45:47 <slaweq_> k
15:47:13 <slaweq_> ok, lets move on then
15:47:15 <slaweq_> #topic Tempest/Scenario
15:47:35 <slaweq_> first, I reported today bug: https://bugs.launchpad.net/neutron/+bug/1898862
15:47:37 <openstack> Launchpad bug 1898862 in neutron "Job neutron-ovn-tempest-ovs-release-ipv6-only is failing 100% of times" [High,Confirmed]
15:48:02 <slaweq_> becuase neutron-ovn-tempest-ovs-release-ipv6-only is failing 100% of times and usually (or always even) there is 9 tests failing there
15:48:11 <slaweq_> so it's very reproducible
15:48:46 <slaweq_> I will try to ping lucasgomes or jlibosva to take a look at that one
15:49:18 <slaweq_> there is also ovn related issue https://bugs.launchpad.net/neutron/+bug/1885900
15:49:19 <openstack> Launchpad bug 1885900 in neutron "test_trunk_subport_lifecycle is failing in ovn based jobs" [Critical,Confirmed] - Assigned to Lucas Alvares Gomes (lucasagomes)
15:49:22 <slaweq_> which I saw today again
15:50:33 <slaweq_> and we still have some random ssh authentication failures
15:50:37 <slaweq_> like e.g. https://3b00945aa0cfe70597e9-73e59f2d88a36c349deccf374592c99f.ssl.cf5.rackcdn.com/755752/3/gate/neutron-tempest-linuxbridge/4bbc7f9/testr_results.html
15:50:43 <slaweq_> or https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_807/750166/5/gate/neutron-tempest-plugin-scenario-linuxbridge/8073d8a/testr_results.html
15:51:01 <slaweq_> and in those cases there is no any "pattern", like always same tests or always same backend
15:51:06 <slaweq_> it happens everywhere
15:51:35 <slaweq_> and I tend to think that this is issue which ralonsoh found some time ago in our d/s ci
15:51:45 <slaweq_> with paramiko and some race condition
15:51:47 <ralonsoh> with paramiko
15:51:49 <ralonsoh> yes
15:51:59 <slaweq_> I couldn't reproduce that locally
15:52:08 <slaweq_> but some race is there IMO
15:52:28 <ralonsoh> once paramiko tries to log into a VM without the keys, even when the keys are installed, the SSH connection is not possible
15:52:55 <slaweq_> maybe we can try to check console log first to see if ssh key was confiugred already
15:52:59 <slaweq_> before ssh to the instance
15:53:42 <slaweq_> if that will fail for any reason (e.g. custom guest os which don't log things like cirros), we can always try ssh at the end
15:53:47 <slaweq_> as "fallback" option
15:53:53 <slaweq_> wdyt?
15:54:20 <ralonsoh> it worths to try it
15:54:23 <slaweq_> we can maybe propose that first in neutron-tempest-plugin
15:54:27 <bcafarel> worth a try
15:54:31 <slaweq_> and if that will work, then propose to tempest too
15:54:52 <slaweq_> ok, I will give it a try
15:55:03 <ralonsoh> (I was doing the opposite: reviewing the paramiko code)
15:55:13 <slaweq_> #action slaweq to propose patch to check console log before ssh to instance
15:55:40 <slaweq_> ralonsoh: if You will find issue on paramiko's side, we can always revert workaround from neutron-tempest-plugin :)
15:55:47 <ralonsoh> of course
15:56:28 <slaweq_> ok, I have one more issue related to ovn jobs: https://bugs.launchpad.net/neutron/+bug/1898863
15:56:29 <openstack> Launchpad bug 1898863 in neutron "OVN based scenario jobs failing 100% of times" [Critical,Confirmed]
15:56:39 <slaweq_> did You saw that before?
15:56:57 <bcafarel> on dstat??
15:57:01 <slaweq_> yes
15:57:07 <slaweq_> but I saw it only on ovn based jobs
15:57:09 <slaweq_> :/
15:57:15 <ralonsoh> no sorry, that's new to me
15:57:44 <slaweq_> ok, anyone wants to take a look at that?
15:58:03 <slaweq_> if not than it's also fine for now as it affects "only" non-voting jobs
15:58:44 <ralonsoh> https://bugs.launchpad.net/ubuntu/+source/dstat/+bug/1866619
15:58:46 <openstack> Launchpad bug 1866619 in dstat (Ubuntu) "OverflowError when machine suspends and resumes after a longer while" [Undecided,Confirmed]
15:58:52 <ralonsoh> DistroRelease: Ubuntu 20.04
15:59:23 <slaweq_> so we will probably need to disable dstat as temporary workaround
15:59:28 <slaweq_> thx ralonsoh
15:59:31 <ralonsoh> yes
16:00:03 <slaweq_> ok
16:00:09 <slaweq_> we are out of time today
16:00:14 <slaweq_> thx for attending the meeting
16:00:16 <slaweq_> o/
16:00:17 <ralonsoh> bye!
16:00:19 <slaweq_> #endmeeting