15:00:16 <slaweq> #startmeeting neutron_ci
15:00:17 <openstack> Meeting started Tue Nov 24 15:00:16 2020 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:18 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:20 <openstack> The meeting name has been set to 'neutron_ci'
15:00:47 * bcafarel finishes his coffee just in time
15:00:51 <gmann> slaweq: lajoskatona how can I help you?
15:00:51 <slaweq> welcome (again)
15:00:56 <slaweq> gmann: hi
15:01:16 <gmann> please ping me link I will review the tempest one.
15:01:27 <slaweq> gmann: we were just talking about patch
15:01:29 <slaweq> https://review.opendev.org/c/openstack/tempest/+/743695
15:01:42 <slaweq> if You will have some time to review, that would be great :)
15:01:47 <gmann> slaweq: ack, will check today
15:01:48 <lajoskatona> Hi
15:01:52 <slaweq> thx a lot gmann
15:01:59 <gmann> np!
15:02:05 <lajoskatona> gmann: Hi, I send it
15:02:15 <lajoskatona> slaweq was quivker
15:02:41 <gmann> lajoskatona: sure, i will review it today
15:02:49 <lajoskatona> gmann: thanks
15:03:06 <slaweq> ok, lets go with our ci meeting now :)
15:03:09 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:03:17 <slaweq> and agenda is on etherpad https://etherpad.opendev.org/p/neutron-ci-meetings
15:03:30 <slaweq> #topic Actions from previous meetings
15:03:36 <slaweq> first one was:
15:03:39 <slaweq> slaweq to report bug regarding errors 500 in ovn functional tests
15:03:49 <slaweq> it already was reported: https://bugs.launchpad.net/neutron/+bug/1903008
15:03:52 <openstack> Launchpad bug 1903008 in neutron "Create network failed during functional test" [High,Confirmed]
15:04:26 <slaweq> and we are actually waiting for ralonsoh's patch with engine facade migration first
15:04:33 <slaweq> so this is "on hold" for now
15:04:44 <slaweq> and next one was:
15:04:46 <slaweq> ralonsoh will decrease number of test workers in scenario jobs
15:04:52 <ralonsoh> merged
15:04:59 <slaweq> fast :)
15:05:41 <ralonsoh> https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/763051
15:05:48 <ralonsoh> sorry, I didn't find it
15:06:05 <slaweq> I hope it will make those jobs more stable
15:06:57 <bcafarel> crossing fingers
15:07:15 <slaweq> thx ralonsoh :)
15:07:18 <slaweq> ok, lets move on
15:07:21 <slaweq> #topic Stadium projects
15:07:33 <slaweq> anything regarding stadium to discuss today?
15:08:09 <bcafarel> small stable/stadium update, https://bugs.launchpad.net/neutron/+bug/1903689/comments/5
15:08:11 <openstack> Launchpad bug 1903689 in neutron "[stable/ussuri] Functional job fails - AttributeError: module 'neutron_lib.constants' has no attribute 'DEVICE_OWNER_DISTRIBUTED'" [Medium,In progress] - Assigned to Bernard Cafarelli (bcafarel)
15:08:11 <lajoskatona> nothing special, perhasp this one: https://review.opendev.org/c/openstack/networking-odl/+/763210
15:09:05 <bcafarel> basically, adding neutron to upper-constraints needs to be done manually when creating new stable branch (maybe to add to a list of steps for that?)
15:09:15 <bcafarel> I will send patches for train to victoria (forgot to do it yesterday) to catch up
15:10:19 <slaweq> bcafarel: can You also check https://docs.openstack.org/neutron/latest/contributor/policies/release-checklist.html if it's is up to date?
15:10:27 <slaweq> and maybe update with this info if needed
15:10:51 <bcafarel> ooh nice, I wondered if we had something like that
15:11:19 <bcafarel> slaweq: will do, and check other stuff I think of (adding branch tempest template, remove *master* jobs, etc)
15:11:33 <slaweq> bcafarel++ thx a lot
15:12:00 <slaweq> #action bcafarel to fix stable branches upper-constraints in stadium projects
15:12:11 <slaweq> #action bcafarel to check and update doc https://docs.openstack.org/neutron/latest/contributor/policies/release-checklist.html
15:12:29 <slaweq> ^^ just to not forget (that bcafarel voluneer for that :P)
15:12:40 <bcafarel> :)
15:13:18 <slaweq> lajoskatona: and regardig Your patch, I already +2 it
15:13:28 <slaweq> so You need e.g. ralonsoh to check that
15:13:37 <lajoskatona> slaweq: thanks, just som eadvertisement for more attention :-)
15:14:11 <ralonsoh> np
15:15:09 <slaweq> ok, next topic
15:15:11 <slaweq> #topic Stable branches
15:15:15 <slaweq> Victoria dashboard: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release?orgId=1
15:15:17 <slaweq> Ussuri dashboard: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release?orgId=1
15:16:11 <bcafarel> my unread backlog for stable is not too bad, so I'd say stble branches are good
15:16:19 <bcafarel> (well except still pending https://bugs.launchpad.net/neutron/+bug/1903531 )
15:16:22 <openstack> Launchpad bug 1903531 in neutron "Update of neutron-server breaks compatibility to previous neutron-agent version" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
15:16:56 <slaweq> sorry, correct links:
15:16:58 <slaweq> Victoria dashboard: https://grafana.opendev.org/d/HUCHup2Gz/neutron-failure-rate-previous-stable-release?orgId=1
15:17:00 <slaweq> Ussuri dashboard: https://grafana.opendev.org/d/smqHXphMk/neutron-failure-rate-older-stable-release?orgId=1
15:18:30 <slaweq> thx bcafarel - that is good topic to discuss
15:18:40 <slaweq> and I forgot about it on previous meeting
15:18:54 <slaweq> I don't really know what to do with it now :/
15:20:12 <slaweq> the problem is that IIUC fix should go to the agent's side
15:20:24 <slaweq> and if agent will be already updated, there will be no issue at all
15:20:33 <slaweq> is my understanding correct?
15:21:33 <bcafarel> checking that original commit again
15:22:50 <bcafarel> slaweq: so having a fix in agent to handle both types, and note that agents should be updated first for this bug?
15:23:25 <slaweq> problem is that officially supported update path is that first server should be updated always
15:23:34 <slaweq> as it should handle compatibility with older agents
15:23:37 <lajoskatona> what I can't see what happens if the revert will be merged
15:23:38 <slaweq> not vice versa
15:23:50 <lajoskatona> how that affects these deployments
15:24:08 <slaweq> lajoskatona: when we will revert that change, someone who already updated to e.g. 15.3 will have the same issue again
15:24:14 <slaweq> but in the opposite direction
15:24:29 <ralonsoh> exactly, they will experience the same problem
15:24:31 <slaweq> because his server will send (ip, mac) tuple
15:24:32 <lajoskatona> ok, so the fix would be better
15:24:40 <slaweq> but how to fix it?
15:24:41 <ralonsoh> because they have already rebooted the agents
15:25:09 <ralonsoh> send a patch handling both possible RPC responses
15:25:18 <ralonsoh> (IP) or (IP, MAC)
15:25:25 <slaweq> ralonsoh: but that patch needs to be on Agent's side, right
15:25:27 <slaweq> ?
15:25:44 <ralonsoh> in both, if I'm not wrong
15:25:50 <ralonsoh> this is something sent by the server
15:28:10 <slaweq> ralonsoh: yes, it is send by server
15:28:22 <slaweq> but how You want to send 2 things by server?
15:29:18 <ralonsoh> no, if the server is updated, it should send (IP,MAC)
15:29:56 <slaweq> yes
15:30:07 <ralonsoh> but, TBH, to those deployments no updated
15:30:09 <slaweq> so agent should be changed that it would be able to handle both cases
15:30:18 <ralonsoh> if they follow the update procedures
15:30:23 <ralonsoh> first the server, then the agents
15:30:40 <ralonsoh> if we don't revert the original patch, then when the server is updated
15:30:47 <ralonsoh> the RPC will send  (IP,MAC)
15:30:52 <ralonsoh> and the agents won't understand this
15:31:13 <slaweq> yes, that's the problem
15:31:51 <ralonsoh> so maybe we should just revert the patch in stable releases
15:32:24 <slaweq> but if we will revert it in stable branches, then for deployments which already updated to latest Train (or Stein) the issue will be the same
15:32:36 <slaweq> updated server will again send just IP
15:32:52 <slaweq> and agent will expect (IP, MAC) as it will not have reverted change yet
15:33:22 <bcafarel> I guess it will be limited number of deployments - if we cannot have fix soon it may be the "not so bad" option
15:33:51 <bcafarel> fix it for people that have not updated yet, with the cost of additional trouble for those (hopefully few) who did
15:34:20 <slaweq> I think that will be better but I'm not 100% sure about that
15:34:29 <slaweq> ok, I will try to play with it a bit more
15:34:49 <slaweq> and lets discuss that on drivers meeting on Friday and decide there what to do with it
15:35:42 <slaweq> are You ok with this plan?
15:35:47 <ralonsoh> perfect
15:35:49 <bcafarel> sounds good
15:35:51 <slaweq> ok
15:36:15 <slaweq> we may also say that e.g. 15.3 is "broken" and maybe remove it from pypi if possible
15:36:27 <slaweq> so no new people will update to that verion
15:36:34 <ralonsoh> that's also an option
15:36:39 <slaweq> I will ask release team for that
15:36:47 <bcafarel> +1 that would be good in the meantime
15:36:52 <slaweq> #action slaweq to explore options to fix https://bugs.launchpad.net/neutron/+bug/1903531
15:36:53 <openstack> Launchpad bug 1903531 in neutron "Update of neutron-server breaks compatibility to previous neutron-agent version" [Critical,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
15:37:11 <slaweq> ok, lets move on now
15:37:13 <slaweq> #topic Grafana
15:37:17 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
15:37:44 <slaweq> in overall I think that it looks not so bad this week
15:38:21 <ralonsoh> it's getting better, yes
15:39:10 <slaweq> looking e.g. at https://grafana.opendev.org/d/PfjNuthGz/neutron-failure-rate?viewPanel=20&orgId=1
15:39:20 <slaweq> all except ovn job looks pretty good this week
15:39:44 <slaweq> and there is much less ssh authentication failures recently IMO
15:40:13 <bcafarel> that's nice
15:40:41 <slaweq> regarding specific jobs
15:40:49 <slaweq> #topic Tempest/Scenario
15:41:07 <slaweq> I was looking at various failures from last week today
15:41:15 <slaweq> and I didn't found many new issues
15:41:28 <slaweq> I just found 2 examples of SSH failure in ovn jobs:
15:41:32 <slaweq> https://7513009f5bff8f76e461-f83d06667d580e000031601b82c71a43.ssl.cf2.rackcdn.com/763246/1/gate/neutron-ovn-tempest-ovs-release/acf920f/testr_results.html
15:41:36 <slaweq> https://8c281a85ffc729001c78-68bc071a5cbea1ed39a41226592204b6.ssl.cf1.rackcdn.com/763777/1/check/neutron-ovn-tempest-ovs-release-ipv6-only/3e09df5/testr_results.html
15:41:45 <slaweq> I didn't report it yet on LP
15:41:47 <slaweq> but I will
15:43:21 <slaweq> but really https://7513009f5bff8f76e461-f83d06667d580e000031601b82c71a43.ssl.cf2.rackcdn.com/763246/1/gate/neutron-ovn-tempest-ovs-release/acf920f/testr_results.html is probably known issue with some race in paramiko
15:43:31 <slaweq> but there wasn't console output there
15:43:50 <ralonsoh> did you push the patch to tempest?
15:44:07 <slaweq> tempest patch is merged
15:44:09 <ralonsoh> that one waiting for the VM output to mitigate the paramiko problem
15:44:12 <ralonsoh> ooook
15:44:30 <slaweq> https://review.opendev.org/c/openstack/tempest/+/761964
15:44:57 <slaweq> but in that case it was waiting for more than 10 minutes, checking console output
15:45:01 <slaweq> and it failed later :/
15:47:21 <slaweq> I don't have any other examples of the failures in tempest jobs for this week
15:47:28 <slaweq> lets move on
15:47:34 <slaweq> #topic Rally jobs
15:47:58 <slaweq> I found today few cases with failure like:  https://zuul.opendev.org/t/openstack/build/be642647ac1e4f5993a65e5f3f91a7a5 in rally job
15:48:06 <slaweq> do You know maybe if that is known issue?
15:48:45 <ralonsoh> no
15:48:50 <slaweq> :/
15:49:09 <slaweq> I will report that agains rally as it doesn't seems to be issue in neutron really
15:50:17 <slaweq> #action slaweq to report bug against rally
15:51:15 <slaweq> and that's all what I had for today
15:51:29 <slaweq> do You want to talk about anything else regarding CI today?
15:51:38 <haleyb> i had one question, kind-of related to CI
15:52:11 <haleyb> I've been randomly working on fixing issues using IPv6 addresses for tunnel endpoints
15:52:22 <haleyb> and i sent out a WIP at https://review.opendev.org/c/openstack/neutron/+/760668
15:52:42 <haleyb> i was wondering if something like that should just be in one of the existing CI jobs
15:53:00 <haleyb> truly making things ipv6-only
15:53:52 <slaweq> haleyb: isn't it like that in tempest-ipv6-only job?
15:53:55 <slaweq> https://zuul.opendev.org/t/openstack/build/7d267a79a9ef4a6ab3413619a09bf0aa
15:54:08 <haleyb> i don't think it does the tunnel does it?
15:55:02 <haleyb> i just added that TUNNEL_IP_VERSION to devstack, it actually hasn't merged yet
15:55:23 <slaweq> maybe You can then change that tempest-ipv6-only job
15:55:33 <slaweq> as it is indended to be ipv6-only :)
15:56:59 <haleyb> slaweq: yes, i thought about that too, just didn't want to break everyone that inherited that
15:57:19 <haleyb> but maybe noone will notice with the new gerrit :)
15:57:39 <slaweq> haleyb: if You don't want to break anything for other projects You can propose new job like neutron-tempest-ipv6-only
15:57:48 <ralonsoh> but slaweq is right, according to the playbooks, "ipv6-only-deployments-verification" should "Verify the IPv6-only deployments"
15:57:58 <slaweq> which will inherit from tempest-ipv6-only and will also set this one var
15:58:01 <ralonsoh> and this is executed in tempest ipv6
15:58:13 <slaweq> then we can run only this new job in our queue
15:58:37 <slaweq> haleyb: can You sync with tempest folks what would be better for them?
15:59:22 <haleyb> slaweq: sure, i can propose something there and ask them for comments
15:59:28 <slaweq> haleyb++ thx
15:59:41 <slaweq> ok, we are running out of time now
15:59:45 <slaweq> thx for attending the meeting
15:59:49 <slaweq> and see You online
15:59:50 <ralonsoh> bye
15:59:50 <bcafarel> o/
15:59:52 <slaweq> #endmeeting