#openstack-meeting log

16:00:16 <slaweq> #startmeeting neutron_ci
16:00:17 <openstack> Meeting started Tue Nov  6 16:00:16 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:19 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:21 <openstack> The meeting name has been set to 'neutron_ci'
16:00:23 <slaweq> welcome on another meeting :)
16:00:50 <mlavalle> o/
16:01:00 <bcafarel> hi again :)
16:01:03 * mlavalle had to re-start system
16:01:18 <mlavalle> made it back on time :-)
16:01:26 <slaweq> haleyb: are You around for CI meeting?
16:01:47 <haleyb> slaweq: yes, i'm here, just on phone at same time with someone
16:01:55 <slaweq> hongbin: are You around for CI meeting?
16:01:59 <slaweq> haleyb: sure, no problem :)
16:02:22 <hongbin> o/
16:02:35 <slaweq> welcome hongbin :)
16:02:47 <slaweq> I think we can start as njohnston_ is not available today
16:02:49 <slaweq> #topic Actions from previous meetings
16:03:00 <slaweq> mlavalle to continue debugging issue with not reachable FIP in scenario jobs
16:03:10 <mlavalle> I am working on it
16:03:18 <mlavalle> couldn't reproduce locally
16:03:33 <mlavalle> at this moment comparing logs between good run and bad run
16:04:27 <slaweq> ok, if You will need any help, ping me :)
16:04:45 <mlavalle> do you want the bug?
16:04:53 <slaweq> yes, please
16:05:01 <mlavalle> take it hten
16:05:07 <mlavalle> then
16:06:55 <slaweq> so mlavalle should I work on it now?
16:07:09 <mlavalle> if you want the bug, go ahead
16:07:16 <mlavalle> and work on it
16:07:24 <mlavalle> I was planning to work on it today and tomorrow
16:07:32 <mlavalle> but then I am leaving for Berlin
16:07:41 <slaweq> so please work in it for those 2 days if You can
16:07:46 <slaweq> and I can continue later
16:07:48 <slaweq> :)
16:07:50 <mlavalle> yes, of course
16:07:54 <slaweq> ok for You?
16:07:57 <mlavalle> yes
16:08:07 <slaweq> great
16:08:17 <slaweq> so lets make an action about that
16:08:26 <slaweq> mlavalle/slaweq to continue debugging issue with not reachable FIP in scenario jobs
16:08:34 <slaweq> #action mlavalle/slaweq to continue debugging issue with not reachable FIP in scenario jobs
16:08:46 <slaweq> thx mlavalle :)
16:08:51 <slaweq> ok, lets go to the next one
16:08:51 <mlavalle> thank you
16:08:55 <slaweq> slaweq to check if failing test_ha_router_namespace_has_ipv6_forwarding_disabled is related to bug https://bugs.launchpad.net/neutron/+bug/1798475
16:08:55 <openstack> Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed]
16:09:04 <slaweq> so I checked it and it looks like different bug
16:09:14 <slaweq> I reported it here: https://bugs.launchpad.net/neutron/+bug/1801930
16:09:14 <openstack> Launchpad bug 1801930 in neutron "Functional test test_ha_router_namespace_has_ipv6_forwarding_disabled failing quite often" [High,In progress] - Assigned to Slawek Kaplonski (slaweq)
16:09:30 <slaweq> and I even pushed patch which (I hope) will fix that: https://review.openstack.org/615893
16:10:04 <slaweq> I couldn't reproduce this issue locally but I was checking logs of such failed test and it's pretty clear for me that it is race issue
16:10:33 <slaweq> so please add it to Your review list if You can :)
16:10:44 <haleyb> there is another ipv6 issue i'm working on that might be related, probably next on your list :)
16:11:20 <slaweq> haleyb: are You talking about: * haleyb to check issue with failing FIP transition to down state ?
16:11:36 <haleyb> no, that's a different one :o
16:12:07 <haleyb> https://bugs.launchpad.net/neutron/+bug/1787919
16:12:07 <openstack> Launchpad bug 1787919 in neutron "Upgrade router to L3 HA broke IPv6" [High,In progress] - Assigned to Brian Haley (brian-haley)
16:12:11 <slaweq> haleyb: so I don't have others on my list
16:12:24 <slaweq> ahh, this one wasn't discussed on CI meeting before
16:12:25 <haleyb> don't know whether that's related to yours, but same system
16:12:36 <haleyb> maybe just in l3 meeting
16:13:03 <haleyb> slaweq: i will look at your change, might be unrelated
16:13:22 <slaweq> for the description of bug it looks like this may or may not be related :)
16:13:51 <slaweq> but in logs which I found it this forwarding was set but about half second after test's check was done
16:14:13 <haleyb> right, that's why it rang a bell for me when i saw your bug :)
16:14:46 <slaweq> thx haleyb :)
16:14:59 <slaweq> ok, lets move on
16:15:01 <slaweq> next action
16:15:03 <slaweq> njohnston rename existing neutron-functional job to neutron-functional-python27 and switch neutron-functional to be py3
16:15:31 <slaweq> njohnston_: is not here today so I think I will just reassign it to the next meeting
16:15:37 <slaweq> #action njohnston rename existing neutron-functional job to neutron-functional-python27 and switch neutron-functional to be py3
16:15:52 <slaweq> njohnston make py3 etherpad
16:15:57 <slaweq> that is next one
16:16:07 <slaweq> do You
16:16:40 <bcafarel> side note https://review.openstack.org/#/c/577383/ is almost there on that functional job reshuffle
16:17:25 <slaweq> bcafarel: thx for info
16:18:11 <slaweq> so You are still working on it, right?
16:19:01 <bcafarel> yes I will send an update with the missing piece (neutron-functional-python27)
16:19:09 <slaweq> thx bcafarel
16:19:21 <slaweq> ok, lets assign action about etherpad to njohnston_ again for next week
16:19:25 <slaweq> #action njohnston make py3 etherpad
16:19:33 <slaweq> next one:
16:19:35 <slaweq> njohnston check if grenade is ready for py3
16:19:41 <slaweq> same here
16:19:46 <slaweq> #action njohnston check if grenade is ready for py3
16:19:55 <slaweq> next one was:
16:19:58 <slaweq> slaweq to check Fullstack tests fails because process is not killed properly (bug 1798472)
16:19:58 <openstack> bug 1798472 in neutron "Fullstack tests fails because process is not killed properly" [High,Confirmed] https://launchpad.net/bugs/1798472
16:20:05 <slaweq> and I didn't have time to get to this one yet
16:20:15 <slaweq> I will add it again to myself
16:20:20 <slaweq> #action slaweq to check Fullstack tests fails because process is not killed properly (bug 1798472)
16:20:28 <slaweq> next:
16:20:30 <slaweq> mlavalle to check bug 1798475
16:20:30 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475
16:21:07 <slaweq> any info mlavalle about this one?
16:21:10 <mlavalle> I was going to try that one after the FIP one
16:21:29 <slaweq> ok
16:21:58 <slaweq> #action mlavalle to check bug 1798475
16:21:58 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475
16:22:11 <slaweq> and the last one was:
16:22:12 <slaweq> slaweq to check issue with openstack-tox-py35-with-oslo-master periodic job
16:22:25 <slaweq> it's now fixed on oslo.service side with: https://review.openstack.org/#/c/614642/
16:22:50 <slaweq> so we are fine with this periodic job again
16:23:22 <slaweq> anything to add? any other things from previous week You want to discuss?
16:24:19 <mlavalle> slaweq: just one point
16:24:37 <mlavalle> regarding the previous bug assigned to it
16:24:55 <mlavalle> if at some point it becomes urgent, please let me know and I'll change priorities
16:25:03 <mlavalle> or we can assign to someone else
16:25:21 <slaweq> sure, I will keep that in mind, thx
16:25:48 <mlavalle> :-)
16:25:58 <slaweq> ok, lets move on then to the next topic
16:26:05 <slaweq> #topic Python 3
16:26:33 <slaweq> I don't we did much progress on it since last week
16:26:53 <slaweq> we have this patch from bcafarel who is working on switching functional tests to python3
16:27:06 <slaweq> and we have few action items assigned to njohnston_
16:27:21 <slaweq> so I don't have anything else to discuss about it today
16:27:29 <slaweq> do You have something to bring on here?
16:28:17 <mlavalle> nothing from me
16:28:42 <slaweq> ok, lets move on then
16:28:44 <slaweq> #topic Grafana
16:28:53 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:30:51 <slaweq> graphs looks quite "normal"
16:31:01 <slaweq> do You want to talk about something specific?
16:31:47 * mlavalle waiting for the graphs to render
16:32:51 <mlavalle> Yeah, they look good
16:33:01 <mlavalle> there was a little spike in functional
16:33:14 <slaweq> in gate queue?
16:33:26 <mlavalle> yeah, yesterday
16:33:27 <slaweq> it was on the weekend and there was very few runs then
16:33:45 <mlavalle> yeah, I see that
16:33:52 <mlavalle> nothing else grabs my attention
16:33:57 <slaweq> good :)
16:33:58 <mlavalle> so I think we are good
16:34:17 <slaweq> so lets talk about this functional and fullstack jobs now :)
16:34:23 <mlavalle> ok
16:34:24 <slaweq> #topic fullstack/functional
16:34:48 <slaweq> regarding functional tests, we have identified 2 issues:
16:35:08 <slaweq> one is this with ipv6 forwarding https://bugs.launchpad.net/neutron/+bug/1801930  and I hope this will be fixed soon
16:35:08 <openstack> Launchpad bug 1801930 in neutron "Functional test test_ha_router_namespace_has_ipv6_forwarding_disabled failing quite often" [High,In progress] - Assigned to Slawek Kaplonski (slaweq)
16:35:18 <slaweq> and second one is related to this db migrations
16:35:28 <slaweq> we had to reopen bug: https://bugs.launchpad.net/neutron/+bug/1687027
16:35:28 <openstack> Launchpad bug 1687027 in neutron "test_walk_versions tests fail with "IndexError: tuple index out of range" after timeout" [High,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:35:36 <slaweq> because it still happens
16:35:49 <slaweq> it't not so often as it was
16:36:14 <slaweq> but sometime even 600 seconds is not enough and that leads me to think that it's something else except only timeout
16:36:33 <mlavalle> so some are cuased by timeout
16:36:34 <slaweq> unfortunatelly there is no logs from those tests in job logs
16:36:42 <mlavalle> but some by something else
16:36:46 <hongbin> yes, it is possibly something that is hanging
16:37:00 <slaweq> hanging or not running at all, yes :/
16:37:32 <munimeha1> hi
16:38:15 <slaweq> hi munimeha1
16:38:35 <slaweq> but problem is that if You go to results of functional job, like: http://logs.openstack.org/88/555088/29/check/neutron-functional/0b00b31/logs/testr_results.html.gz
16:38:51 <slaweq> there is no logs from those tests in http://logs.openstack.org/88/555088/29/check/neutron-functional/0b00b31/logs/dsvm-functional-logs/
16:39:03 <slaweq> so we don't know anything about what happend there :/
16:39:14 <slaweq> I think that we should first check why those tests aren't logged there
16:39:18 <mlavalle> is that true for toher projects?
16:39:28 <mlavalle> other projects^^^
16:40:02 <slaweq> what are You asking for exactly?
16:40:12 <munimeha1> can we evaluate this patch for ci https://review.openstack.org/#/c/603501/
16:40:30 <munimeha1> Do we need to add any gate or anything
16:40:46 <slaweq> munimeha1: can we discuss it in Open agenda?
16:40:54 <munimeha1> thanks
16:40:56 <mlavalle> munimeha1: probably you are in the wrong meeting. the Neutron meeting ended more than 1 hour ago
16:41:05 <munimeha1> ok
16:41:19 <mlavalle> and I raised your patch in that meeting
16:41:29 <mlavalle> please chack the logs, blueprints section
16:42:00 <munimeha1> thanks
16:42:17 <mlavalle> slaweq: I was wondering it the lack of logs is a consequence of the way we setup this test
16:42:25 <slaweq> mlavalle: I don't know
16:42:29 <slaweq> we should check that
16:42:43 <slaweq> but there are logs for some other tests there
16:42:46 <mlavalle> that's why I said, what do other projects do in terms of this setup?
16:42:56 <slaweq> we should check that probably
16:43:11 <slaweq> is there any volunteer to check that maybe?
16:43:45 <mlavalle> I would volunteer with the caveat that I won't get to it until after Berlin
16:43:57 <slaweq> if no, I will assign it to me but I don't know if I will have time for that
16:44:17 <mlavalle> best thing, assign it to you
16:44:22 <slaweq> mlavalle: You already have a lot of things on Your todo list and You have summit so I will assign it to myself :)
16:44:33 <slaweq> but thx for volunteering :)
16:44:45 <mlavalle> if by the week after Berlin you haven't gotten to it, we can discuss again
16:44:54 <slaweq> #action slaweq to check why db_migration functional tests don't have logs
16:45:01 <slaweq> mlavalle: ok, thx
16:45:37 <slaweq> ok, regarding fullstack tests we had two issues which are already assigned to me and mlavalle so there is nothing else to discuss here I guess
16:45:46 <slaweq> lets move to next topic
16:45:48 <slaweq> #topic Tempest/Scenario
16:46:20 <slaweq> regarding tempest jobs, I recently started thinging about one thing
16:46:48 <slaweq> I see quite many failures completly not related to neutron, like e.g. cinder volume issues
16:46:55 <slaweq> so I have a question to You
16:47:24 <slaweq> what You think about creating some blacklist of tests like cinder.volume or maybe some others which we will not run in our gate?
16:47:48 <mlavalle> mhhhhh
16:47:49 <slaweq> I know that neutron and nova are quite related to each other so we can't do that
16:48:07 <slaweq> but for example cinder is not related to neutron at all IMO
16:48:10 <hongbin> the problem is how to track the list
16:48:15 <mlavalle> yeah
16:48:24 <mlavalle> the tracking problem worries me
16:49:03 <mlavalle> if we can have a process whereby we can revisti what we blacklist
16:49:15 <mlavalle> I would consider it
16:49:21 <slaweq> I was thinking about something quite generic like blacklist of all tempest.api.volume for example
16:49:51 <slaweq> as those tests are not testing nothing related to neutron (maybe except ssh connectivity to instance but that is tested in many other tests as well)
16:49:53 <mlavalle> how difficult would it be to put together a list of the specific tests failing
16:49:54 <mlavalle> ?
16:50:17 <mlavalle> is cinder mainly the problem?
16:50:30 <slaweq> mlavalle: TBH I don't know how diffult it would be
16:50:44 <hongbin> for me, it is more convienient to track the list in LB even if it is not related to neutron
16:50:44 <slaweq> I can try to make such list if I will find failed test
16:50:52 <mlavalle> what I am thinking, before taking the step of blaclisting
16:51:12 <mlavalle> is, if we can put together realitively easy a list and it is mostly cinder
16:51:23 <mlavalle> I can discuss with their team that list
16:51:28 <mlavalle> and see what they think
16:51:43 <mlavalle> they might say, just remove them from your queues
16:51:46 <hongbin> it could be something else, like error on shelve/unshelve a vm
16:51:48 <slaweq> mlavalle: and yes, from what I see in our jobs result, it's most often that we have some cinder related issues (except ours issues of course)
16:52:28 <slaweq> hongbin: yes, but shelve/unshelve is nova test and I'm not talking about that one here
16:52:40 <hongbin> slaweq: ok
16:52:44 <mlavalle> yes, nova we are not to blacklst
16:52:58 <slaweq> I'm talking about tests which are failing because of volume in ERROR state for example
16:52:59 <bcafarel> only that pesky volume attach test and a few friends of it
16:53:26 <mlavalle> slaweq: send me an email with the list and I'll discuss next week in Berlin
16:53:36 <mlavalle> if it is easy to put together
16:53:42 <hongbin> is there a bug opened in cinder about that?
16:53:45 <slaweq> ok, so I will make list of such failing tests and will send it to You
16:53:51 <slaweq> hongbin: I don't know TBH
16:54:02 <mlavalle> hongbin: good point. we need to make sure that is the case
16:54:10 <hongbin> slaweq: IMO, we should open bugs whenever we saw such failure
16:54:38 <slaweq> hongbin: I agree, I will do that if I spot it next time
16:54:42 <bcafarel> I know of https://bugs.launchpad.net/cinder/+bug/1796708 at least
16:54:42 <openstack> Launchpad bug 1796708 in Cinder "VolumesExtendTest.test_volume_extend_when_volume_has_snapshot intermittently fails with "Extend volume failed.: VolumeNotDeactivated: Volume volume-5514a6ad-abbb-46b3-a464-d73cc67e55af was not deactivated in time."" [Undecided,New]
16:55:07 <mlavalle> yeah, that is an old friend of ours
16:55:46 <hongbin> slaweq: i will do that as well, then we need a way to track the list of opened bugs in other projects that affect neutron
16:56:00 <slaweq> hongbin: thx
16:56:08 <mlavalle> let's all do that
16:56:22 <slaweq> sounds good
16:56:38 <slaweq> we can create some etherpad to track such failures there
16:56:41 <mlavalle> slaweq: but thanks for bringing the issue up
16:57:41 <slaweq> ok, so that's all from me for today :)
16:57:45 <slaweq> #topic Open discussion
16:58:04 <slaweq> munimeha1: do You want to discuss about something related to CI?
16:59:18 <slaweq> ok, I have one more thing which I forgot at the beginning :)
16:59:37 <slaweq> I want to welcome on those meetings our new CI lieutenant: hongbin :)
16:59:48 <slaweq> sorry that I didn't that at the beginning :)
17:00:05 <mlavalle> yaay! welcome!
17:00:07 <hongbin> slaweq: haha, np, look forward to working in the CI team
17:00:17 <slaweq> :)
17:00:22 <slaweq> ok, I think we have to finish now
17:00:26 <slaweq> thx for attending
17:00:27 <bcafarel> better late announce than never, welcome new lieutenant hongbin :)
17:00:28 <slaweq> o/
17:00:38 <slaweq> #endmeeting