#openstack-meeting log

16:00:43 <slaweq> #startmeeting neutron_ci
16:00:43 <openstack> Meeting started Tue Apr 10 16:00:43 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:45 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:46 <slaweq> hi
16:00:48 <mlavalle> o/
16:00:48 <openstack> The meeting name has been set to 'neutron_ci'
16:00:49 <ihrachys> o
16:01:15 <slaweq> #topic Actions from previous meetings
16:01:39 <slaweq> jlibosva, haleyb: are You around?
16:01:43 <haleyb> o/
16:02:23 <jlibosva> o/
16:02:37 <slaweq> ok, so I think we can start now
16:02:50 <slaweq> first action from previous week:
16:02:56 <slaweq> haleyb to check router migrations issue
16:03:43 <haleyb> i haven't reproduced the issue yet, seemed to work fine manually, so $more_testing=1
16:04:38 <slaweq> ok
16:04:58 <slaweq> #action haleyb to continue testing why router migrations tests fails
16:05:11 <slaweq> next one was
16:05:12 <slaweq> slaweq will check difference between neutron-tempest-multinode-full and neutron-tempest-dvr-ha-multinode-full
16:05:26 <slaweq> so I compared those jobs definitions: https://pastebin.com/zQTHJ1Rg
16:05:49 <slaweq> Basically difference is that on neutron-tempest-multinode-full L3 agent is only on main node (subnode is only for compute), in neutron-tempest-dvr-ha-multinode-full L3 agent on subnodes is in dvr_snat mode
16:06:32 <slaweq> AFAIR ihrachys wants to check if we needs both those jobs - IMO we should have both of them
16:07:22 <slaweq> ah, one more thing - there is also flag DEVSTACK_GATE_TLSPROXY  set in one of those jobs but I have no idea what is it for :)
16:08:25 <ihrachys> I think tlsproxy is on by default. afair it's some nova thing.
16:09:24 <ihrachys> can't even find it mentioned in latest devstack-gate so maybe completely irrelevant
16:09:41 <ihrachys> slaweq, as for the dvr vs. legacy, for what I understand, we have tempest-full job for legacy too. do we need that one then?
16:10:23 <slaweq> tempest-full is singlenode
16:10:27 <slaweq> this one is multinode
16:10:47 <slaweq> no?
16:11:27 <slaweq> but maybe we don't need singlenode job then as multinode should covers it as well
16:11:31 <slaweq> what do You think?
16:12:14 <ihrachys> yeah that's what I was alluding to.
16:12:36 <ihrachys> tempest-full may come from tempest repo though. we also have full-py3 from the same source
16:13:17 <ihrachys> but I guess I don't feel strongly about it anymore since there are legit differences we may want to keep between jobs we maintain.
16:13:24 <ihrachys> thanks for checking slaweq
16:13:44 <slaweq> Your welcome ihrachys
16:13:58 <slaweq> so we will left it as it is for now, right?
16:15:10 <mlavalle> I think so, slaweq
16:15:17 <slaweq> ok, fine
16:15:32 <slaweq> next action then
16:15:39 <slaweq> mlavalle to take a look why rally jobs are taking so long time
16:15:49 <mlavalle> I didn't have time last week
16:16:20 <slaweq> ok, but I'm not sure if there is something really wrong with it now - let's talk about it later :)
16:16:31 <slaweq> moving on
16:16:33 <slaweq> next
16:16:33 <slaweq> slaweq will add openstack-tox-lower-constraints to grafana dashboard
16:16:44 <slaweq> patch merged today https://review.openstack.org/#/c/559162/
16:17:10 <slaweq> and the last one from last week:
16:17:10 <slaweq> slaweq will check old gate-failure bugs
16:17:23 <slaweq> I didn't have time to check them during last week - sorry
16:17:28 <slaweq> #action slaweq will check old gate-failure bugs
16:17:33 <ihrachys> do we have a document capturing rationale for the low-constraints job?
16:18:16 <slaweq> ihrachys: are You asking about job?
16:18:51 <ihrachys> it's just first time I hear about it. I was not very involved lately.
16:19:07 <ihrachys> wanted to understand what the job is for. but maybe I can google myself.
16:19:19 <slaweq> I think there was some email about it few weeks ago
16:19:43 <ihrachys> ok I will take a look, nevermine
16:19:46 <ihrachys> *nevermind
16:19:54 <mlavalle> yes, there is a message in the ML explaining the rationale
16:20:30 <slaweq> thx, I can't find it now
16:20:48 <slaweq> but I think it was given in commit message in patch which added it to neutron repo
16:21:01 <slaweq> so You will find it easily there
16:21:04 <ihrachys> ok ok, sorry, I shouldn't have asked dumb questions
16:21:22 <slaweq> ihrachys: it wasn't dumb question for sure :)
16:21:44 <slaweq> ok, moving on to next topic
16:21:45 <slaweq> #topic Grafana
16:21:52 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:22:46 <slaweq> I was checking today last few days there
16:23:43 <slaweq> there was some spike yesterday but it was on all jobs similar IMO
16:25:04 <slaweq> there are still same jobs on quite high failure rate but I think there is nothing "new" what would require special attention
16:25:35 <slaweq> any thoughts?
16:25:43 <ihrachys> there is periodic failure it seems
16:25:46 <ihrachys> with neutron-lib-master
16:25:59 <ihrachys> and looks like for py35 only
16:26:04 <slaweq> yes, I forgot about that one now
16:26:14 <slaweq> but I have it for later discussion prepared :)
16:26:29 <ihrachys> http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-neutron-lib-master/66d328a/testr_results.html.gz
16:26:40 <ihrachys> ok nevermind then
16:26:43 <slaweq> so IMHO it is due to patch https://review.openstack.org/#/c/541766/
16:26:50 <slaweq> ihrachys: we can talk about it now
16:27:19 <slaweq> I checked failures since 6.04 for this job and there are exactly same errors there each time
16:27:46 <ihrachys> yeah looks like related
16:28:07 <slaweq> it looks for me that it's because of this patch: https://review.openstack.org/#/c/541766/ - maybe there is something missing in neutron now
16:28:29 <slaweq> I can try to check it or at least I will report a bug for that
16:28:46 <slaweq> or maybe someone else wants to fix it? :)
16:29:29 <slaweq> I can assign action to someone from You. Any volunteers? :)
16:29:40 <jlibosva> if everybody is busy, I can have a look
16:29:55 <haleyb> or maybe ping yamahata ?
16:30:36 <slaweq> I think yamahata is not available now
16:30:52 <slaweq> so jlibosva can You at least report a bug and ask yamahata to check it?
16:30:58 <yamahata> yamahata: I'm here. what's the issue?
16:31:04 <mlavalle> yamahata is based out of California
16:31:05 <slaweq> hi yamahata :)
16:31:25 <slaweq> mlavalle: I didn't know that
16:31:38 <jlibosva> me neither :)
16:31:49 <slaweq> yamahata: since few days we have failure in periodic job: http://logs.openstack.org/periodic/git.openstack.org/openstack/neutron/master/openstack-tox-py35-with-neutron-lib-master/66d328a/testr_results.html.gz
16:31:59 <slaweq> it looks like related to patch https://review.openstack.org/#/c/541766/
16:32:11 <slaweq> do You have any idea why it's failing?
16:32:28 <ihrachys> for what I understand, a fix would be test only, and it would just capture any new arguments and ignore them when matching mock call args
16:32:42 * yamahata opening links
16:33:00 <yamahata> let me check it.
16:34:28 <yamahata> now callback registration accepts priority priority_group.DEFAULT_PRIORITY=555000
16:34:52 <yamahata> so function call check needs to be updated to include priority.
16:35:34 <ihrachys> yamahata, no. if we do that, unit tests will fail with old neutron-lib
16:35:52 <yamahata> Oh, right.
16:35:53 <ihrachys> instead of explicitly matching, we should ignore the argument value (if it's present)
16:36:23 <slaweq> or we should release new neutron-lib version and update test as yamahata propose :)
16:36:34 <ihrachys> the test goal is to validate that subscription happened for the right callback / event. it has nothing to do with priorities, so we should ignore it.
16:36:59 <slaweq> ihrachys: right
16:37:08 <ihrachys> slaweq, meaning, old queens neutron gate will be incompatible with newer neutron-lib? not ideal.
16:37:44 <slaweq> but old queens neutron gate uses newest neutron-lib also?
16:37:55 <ihrachys> I prefer the test is liberal / compatible with different versions. we CAN then release / bump requirements / force checking priority, but what's the point
16:38:09 <slaweq> I agree with You
16:38:14 <ihrachys> slaweq, it doesn't; but we don't block anyone from updating neutron-lib
16:38:24 <slaweq> I was just curious :) thx
16:38:41 <ihrachys> it's not like we have <=XXX in requirements
16:38:47 <ihrachys> so anything fresh is fair game
16:39:00 <slaweq> right
16:39:09 <slaweq> so do we have any volunteer who will fix this test?
16:39:31 <mlavalle> I thought yamahata will fix it
16:39:32 <yamahata> let me cook first patch to address test_register
16:39:47 <yamahata> test__init__ failure looks different issue
16:39:59 <slaweq> ok, thx yamahata
16:40:22 <yamahata> anyway I'll look into test__init__ too.
16:40:43 <slaweq> #action yamahata to fix issues with openstack-tox-py35-with-neutron-lib-master periodic job
16:40:47 <mlavalle> thanks yamahata!
16:41:15 <slaweq> so I think that periodic jobs are fine and we can go to next topic
16:41:17 <slaweq> #topic Fullstack
16:41:55 <slaweq> I was today looking for fullstack issues in last 2 days
16:42:20 <slaweq> I found one POST_FAILURE: http://logs.openstack.org/14/559414/2/check/neutron-fullstack/1cbd1b0/job-output.txt.gz#_2018-04-09_21_06_31_611265 which looks like some issue with infra
16:42:57 <slaweq> and at least twice failed fullstack tests for patch https://review.openstack.org/#/c/499908/ which looks for me that is related to patch it self
16:43:25 <slaweq> do You found any other fullstack failures in last days?
16:44:09 <ihrachys> I wasn't paying attention so no
16:44:25 <mlavalle> I didn't see any
16:45:06 <jlibosva> me neither
16:45:10 <slaweq> ok, so I will try to keep an eye on it and check when something new will happen :)
16:45:18 <slaweq> next topic then
16:45:19 <slaweq> #topic Scenarios
16:45:37 <slaweq> here we still have same "big players" :)
16:45:49 <slaweq> neutron-tempest-plugin-dvr-multinode-scenario failing on 100% still
16:46:09 <slaweq> but what is interesting there was moment that it was little below 100% at 6.04 and then get back to 100%
16:46:38 <jlibosva> are those just the migration tests?
16:46:51 <jlibosva> perhaps we should mark them with our favorite unstable_test decorator
16:46:59 <slaweq> in most cases failed tests are those with migration
16:47:13 <slaweq> sometimes there is also test_trunk_subport_lifecycle which fails
16:47:23 <jlibosva> right, I still have that on my plate
16:47:36 <slaweq> I was checking few jobs and didn't found other issues
16:49:02 <slaweq> haleyb: as You are checking those failures with migration, do You think it is worth to mark them as unstable for now?
16:49:33 <haleyb> slaweq: yes, i suppose we could
16:49:43 <slaweq> ok, will You do it then?
16:50:22 <haleyb> yes
16:50:26 <slaweq> thx
16:50:48 <slaweq> #action haleyb will mark router migration tests are unstable
16:51:02 <slaweq> thx jlibosva, good idea :)
16:51:16 <ihrachys> do we do same with trunk then or it's a different beast?
16:51:48 <slaweq> I think we can do the same with it for now
16:51:55 <slaweq> jlibosva: do You agree?
16:51:56 <jlibosva> I'd suggest to do one by one
16:52:10 <jlibosva> let's see how the failure rate goes without considering router migration
16:52:17 <ihrachys> ok
16:52:20 <jlibosva> if it's still high, let's cross trunk out too
16:52:20 <slaweq> ++
16:52:35 <slaweq> ok
16:52:49 <slaweq> ok, so moving on
16:52:53 <slaweq> #topic Rally
16:53:15 <jlibosva> eh, I had one item I wanted to bring re scenarios
16:53:22 <slaweq> sorry jlibosva
16:53:29 <slaweq> #topic scenarios
16:53:37 <slaweq> go on then :)
16:53:47 <jlibosva> so I've been watching the ovsfw tempest job and it seems to me that it copies other tempest failures
16:54:08 <jlibosva> that said, I think the failures are not related to the fact it uses ovsfw driver. I wanted to know opinions about making it voting in the check Q
16:55:15 <slaweq> I agree, it is quite stable since few weeks
16:55:20 <ihrachys> would it make sense, long term, to have ovsfw for multinode dvr-ha job?
16:55:24 <ihrachys> and then have just one
16:55:42 <jlibosva> I think long term it would make sense to make ovsfw the default
16:55:45 <ihrachys> I understand that ovsfw is stable already so we can have it enabled and plan for ovsfw for dvr-ha
16:56:02 <ihrachys> jlibosva, for all jobs?
16:56:03 <jlibosva> which will lead to have ovsfw in all current tempest tests
16:56:11 <jlibosva> for devstack, yes, for all jobs
16:56:48 <jlibosva> I think that should be the goal, then we can deprecate iptables hybrid driver. I don't think it makes sense to maintain both
16:57:04 <jlibosva> iptables driver will stay for LB of course
16:57:21 <jlibosva> but that's a long term :)
16:57:37 <jlibosva> so for now, we can make the ovsfw voting
16:57:59 <mlavalle> is it failing around ~15%?
16:57:59 <ihrachys> but there's no migration path from hybrid so how can you deprecate
16:58:22 <ihrachys> agreed to make it voting and deal with wider plans separately
16:58:34 <jlibosva> there is since Pike
16:58:46 <jlibosva> that leaves hybrid plugging behind
16:58:54 <slaweq> mlavalle: it is usually around 10-15%
16:59:03 <jlibosva> and with multiple port bindings being planned for rocky, we'll have a way to deal with it too
16:59:42 <slaweq> ok, maybe up to 25% sometimes but it follows other jobs also
16:59:53 <slaweq> I think we are out of time now
16:59:56 <ihrachys> jlibosva, ok looks like I was cryo-frozen for a while lol
17:00:06 <slaweq> #endmeeting