16:00:35 <slaweq> #startmeeting neutron_ci
16:00:36 <openstack> Meeting started Tue Jul 24 16:00:35 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:40 <openstack> The meeting name has been set to 'neutron_ci'
16:00:41 <slaweq> hi
16:01:17 <mlavalle> o/
16:01:34 <slaweq> hi mlavalle
16:01:38 <slaweq> njohnston: are You around?
16:02:31 <slaweq> I think that there will be only 2 of us today mlavalle :)
16:02:51 <mlavalle> slaweq: always nice to talk to you :-)
16:03:04 <slaweq> so let's start, maybe someone will join later
16:03:12 <slaweq> thx :)
16:03:28 <slaweq> and it's also very nice to talk to You :)
16:03:40 <slaweq> so let's do it quickly
16:03:42 <slaweq> #topic Actions from previous meetings
16:03:50 <slaweq> njohnston to see if we can set the default time period on the grafana dashboard to now-7d
16:04:04 <slaweq> I have no idea if he did something with that
16:04:29 <slaweq> but it's not urgent for sure so we can wait for next week then
16:04:40 <slaweq> next one
16:04:43 <slaweq> slaweq to talk about issue with test_create_server_with_scheduler_hint_group_anti_affinity with nova-neutron liaison
16:05:01 <mlavalle> did you get a hold of him?
16:05:04 <slaweq> I talked with Sean last week, they was aware of this issue
16:05:22 <slaweq> Sean told me that patch  https://review.openstack.org/#/c/583347/ should solve this
16:05:23 <patchbot> patch 583347 - nova - Update RequestSpec.instance_uuid during scheduling (MERGED)
16:05:27 <slaweq> and this patch was merged today
16:05:37 <mlavalle> ah, that's cool
16:05:37 <slaweq> so I hope it will be better now :)
16:06:08 <mlavalle> Great!
16:06:08 <slaweq> because it happend quite often in multinode jobs recently
16:06:15 <slaweq> and last action was:
16:06:17 <slaweq> * mlavalle/haleyb to check dvr (dvr-ha) env and shelve/unshelve server
16:06:32 <mlavalle> I didn't have time to do that
16:06:47 <slaweq> I guess haleyb also didn't have time before his PTO :)
16:07:02 <slaweq> can I assign it to You for next week also?
16:07:07 <mlavalle> I'll try to follow up at the end of the week
16:07:15 <mlavalle> yes, assign it to me
16:07:29 <slaweq> #action mlavalle to check dvr (dvr-ha) env and shelve/unshelve server
16:07:31 <slaweq> thx
16:07:55 <slaweq> I found that it happend once or twice in non dvr job also
16:08:16 <slaweq> http://logs.openstack.org/31/584431/2/check/tempest-full/e789052/testr_results.html.gz
16:08:22 <mlavalle> ahhh
16:08:35 <slaweq> sorry
16:08:40 <slaweq> ignore this link please
16:08:55 <slaweq> it is same test failed but now I checked that there is different reason there
16:08:59 <slaweq> so it's not same issue
16:09:41 <mlavalle> ack
16:09:49 <slaweq> but there is another one: http://logs.openstack.org/26/575326/13/check/neutron-tempest-multinode-full/6886894/logs/testr_results.html.gz
16:10:02 <slaweq> this looks at first glance as similar issue
16:10:08 <slaweq> and it's not dvr
16:11:07 <slaweq> but in neutron-tempest-dvr-ha-multinode-full job it happens very often - I found 16 examples when I was checking patches from last few days
16:11:11 <mlavalle> yeah, looks similar
16:11:38 <slaweq> ok, if You will need any help on that, ping me :)
16:11:48 <slaweq> I'm not an dvr expert but I can try to help maybe
16:12:09 <slaweq> moving on to next topic?
16:12:32 <mlavalle> ok
16:13:04 <slaweq> #topic Grafana
16:13:21 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:14:09 <mlavalle> SLOOOOOOOOOOOOW
16:14:17 <slaweq> yes
16:14:32 <slaweq> but generally speaking it looks quite good in this week
16:14:50 <slaweq> almost all jobs are below 20% of failures
16:15:24 * mlavalle notices that in the 7 days look, there is a gap
16:16:01 <slaweq> yes, but it's only for gate jobs
16:16:12 <slaweq> for check queue there is no this gap
16:16:19 <slaweq> sorry, it is
16:16:41 <slaweq> but I know that there were some issus with infra recently
16:16:50 <slaweq> that logs server was down or something like that
16:16:57 <slaweq> so maybe that's the reason
16:18:06 <slaweq> https://wiki.openstack.org/wiki/Infrastructure_Status
16:18:18 <slaweq> it was 2018-07-19
16:18:24 <mlavalle> yeah, that mey be the reason
16:18:29 <mlavalle> may
16:19:25 <mlavalle> other than the gap, it seems our CI is behaving well
16:19:34 <slaweq> yep, quite good imo
16:19:57 <slaweq> so I went today through results of CI jobs for latest neutron patches
16:20:08 <slaweq> I checked whole first page from gerrit :)
16:20:22 <slaweq> and I checked what were the resons of failures
16:20:32 <slaweq> I think we can now talk about some of them
16:20:40 <slaweq> #topic Scenarios
16:20:43 * mlavalle thanks slaweq for doing that
16:21:14 <slaweq> most of those issus are related to scenario jobs in fact
16:21:53 <slaweq> I found some issues with unit tests or others but when checking them, it looked that failures are related to patch on which it was run
16:22:09 <slaweq> so let's go through tempest jobs then
16:22:29 <mlavalle> ok
16:22:31 <slaweq> first job is neutron-tempest-multinode-full which is failing quite often
16:23:06 <slaweq> in most of the cases it was failed because of this affinity test issue mentioned at the begining of the meeting
16:23:25 <mlavalle> so we should see a marked improvement soon
16:23:37 <slaweq> here yes, I hope so
16:24:09 <slaweq> other than that there were issues with volumes, live-migration, tagging devices and shelve|unshelve instance
16:24:23 <slaweq> so except this issue with shelve - not related to neutron
16:24:45 <slaweq> about issue with tagging devices, like:         * http://logs.openstack.org/73/565773/3/check/neutron-tempest-multinode-full/6748276/logs/testr_results.html.gz
16:24:49 <mlavalle> and this shelve issue is the one you assigned to me above, right?
16:24:54 <slaweq> right :)
16:25:00 <mlavalle> ok
16:25:25 <slaweq> about tagging issue, my patch to tempest was merged and I found in logs that it's volume which isn't removed from metadata
16:25:43 <slaweq> I even sent email about that few days ago to ask for some help from Nova and Cinder side
16:25:53 <mlavalle> I saw it
16:26:05 <slaweq> I hope they will fix this issue as it looks as real bug, not only test issue
16:27:28 <slaweq> and that's all about this job basically - issues are not related to neutron
16:27:38 <mlavalle> \o/
16:27:39 <slaweq> next job is neutron-tempest-dvr-ha-multinode-full
16:27:56 <slaweq> here I also found some issues with volumes and tagging devices
16:28:12 <slaweq> also really a lot of failures of this shelve/unshelve tests
16:28:27 <slaweq> and one test which may be related to neutron:         * http://logs.openstack.org/48/580548/2/check/neutron-tempest-dvr-ha-multinode-full/dc195b7/logs/testr_results.html.gz
16:28:49 <slaweq> it looks like here there was an issue with ssh to FIP
16:32:01 <slaweq> from instance's console log it looks like there was even no fixed ip configured on it
16:32:36 <mlavalle> mhhh
16:35:48 <slaweq> it will require some more debugging IMO
16:36:02 <slaweq> I don't see anything wrong what could be related to this issue
16:36:33 <slaweq> nova properly started and then unpaused this instance, so neutron should send notification that port is active
16:36:44 <slaweq> so L2 agent (probably) did his job right
16:37:25 <slaweq> do You see anything related to this issue there maybe?
16:37:57 <mlavalle> no
16:38:45 <mlavalle> do you want me to spend sometime with it?
16:38:51 <slaweq> I will try to investigate it more if I will have some time for it
16:39:11 <slaweq> You are already very busy so I will take care of it
16:39:37 <mlavalle> thanks
16:40:09 <slaweq> #action slaweq to check why there was no network connection in neutron-tempest-dvr-ha-multinode-full test
16:40:43 <slaweq> from other jobs, there was also one issue with ssh to instance in neutron-tempest-iptables_hybrid:         * http://logs.openstack.org/07/574907/20/gate/neutron-tempest-iptables_hybrid/1594a88/logs/testr_results.html.gz
16:42:37 <slaweq> but here it might be related to volume as it's "boot from volume" test or something like that
16:42:52 <slaweq> I will check that also if that is related to neutron or not
16:42:58 <mlavalle> ack
16:43:17 <slaweq> #action slaweq to check neutron-tempest-iptables_hybrid ssh connection issue
16:43:30 <slaweq> in other tests I found only some failures related to volumes
16:43:42 <slaweq> and one timeout in * neutron-tempest-plugin-dvr-multinode-scenario
16:43:53 <slaweq> http://logs.openstack.org/83/584383/2/check/neutron-tempest-plugin-dvr-multinode-scenario/2e3af27/job-output.txt.gz
16:44:17 <slaweq> but as this timeout was only once, I don't think that we should now do something with it
16:44:26 <mlavalle> ok
16:44:29 <slaweq> and that's all from my side for today
16:44:42 <mlavalle> so, all in all, we are in good shape, aren't we?
16:44:49 <slaweq> basically I think that we are quite good now with CI :)
16:44:56 <mlavalle> Great
16:45:10 <mlavalle> good job slaweq
16:45:20 <slaweq> I also think that there is a bit less rechecks recently on patches
16:45:41 <slaweq> and patches are merged a bit faster now IMO
16:45:52 <slaweq> thx mlavalle but it's not because of me :)
16:46:13 <slaweq> it's all team's work :)
16:46:16 <mlavalle> you help a lot
16:46:28 <slaweq> thx :)
16:46:40 <slaweq> I don't have anything else for today
16:46:53 <slaweq> do You want to talk about something else maybe?
16:46:59 <mlavalle> me neither, let's go and try to nail this release ;-)
16:47:14 <slaweq> sure
16:47:16 <slaweq> thx a lot
16:47:24 <slaweq> #endmeeting