16:00:31 <slaweq> #startmeeting neutron_ci
16:00:32 <openstack> Meeting started Tue Jul 31 16:00:31 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:34 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:37 <slaweq> welcome back
16:00:37 <openstack> The meeting name has been set to 'neutron_ci'
16:00:39 <mlavalle> o/
16:00:43 <slaweq> on neutron CI meeting this time :)
16:00:59 <njohnston> o/
16:01:14 <slaweq> haleyb: around?
16:01:45 <slaweq> ok, let's start
16:01:48 <openstack> slaweq: Error: Can't start another meeting, one is in progress.  Use #endmeeting first.
16:02:07 <njohnston> you already started :-)
16:02:10 <slaweq> sorry
16:02:15 <slaweq> wrong copy paste
16:02:17 <slaweq> #topic Actions from previous meetings
16:02:31 <slaweq> mlavalle to check dvr (dvr-ha) env and shelve/unshelve server
16:02:51 <mlavalle> I didn't have time. Soprry ;-(
16:03:00 <slaweq> no problem :)
16:03:01 <mlavalle> will catch up this week
16:03:23 <slaweq> we have bigger priorities now :)
16:03:35 <slaweq> #action mlavalle to check dvr (dvr-ha) env and shelve/unshelve server
16:03:46 <slaweq> next one was on me
16:03:48 <slaweq> * slaweq to check why there was no network connection in neutron-tempest-dvr-ha-multinode-full test
16:04:14 <slaweq> I checked briefly logs and I didn't found anything wrong there
16:04:33 <slaweq> it was only once such issue so If it will repeat, I will report a bug and we will dig more into in then
16:04:41 <slaweq> fine for You?
16:05:32 <mlavalle> yeap
16:05:35 <mlavalle> no problem
16:05:44 <slaweq> thx
16:05:54 <slaweq> last thing from last week is
16:05:55 <slaweq> * slaweq to check neutron-tempest-iptables_hybrid ssh connection issue
16:06:13 <slaweq> I was checking that. It doesn’t look like issue with Neutron. This failure happens randomly in many, not only neutron jobs.
16:06:36 <slaweq> I proposed a patch for tempest: https://review.openstack.org/#/c/587221/ - if that will be merged, I can check console log of instance and maybe that will help me understand what is going on there
16:06:52 <slaweq> patch for tempest is in the gate now so I hope it will be merged soon
16:07:00 <mlavalle> cool
16:07:19 <slaweq> ok, so moving on to
16:07:21 <slaweq> #topic Grafana
16:07:41 <slaweq> thx njohnston we have dashboards for stable releases now
16:07:51 <slaweq> * stable/queens: http://grafana.openstack.org/d/pM54U-Kiz/neutron-failure-rate-previous-stable-release
16:08:04 <slaweq> Stable/pike: http://grafana.openstack.org/d/dCFVU-Kik/neutron-failure-rate-older-stable-release
16:08:10 <mlavalle> Pretty cool!
16:08:39 <slaweq> but as I was looking at them today, I think they need some tweaking
16:08:51 <slaweq> as some jobs names might be different in e.g. Pike branch
16:09:11 <slaweq> and would be good to mark some jobs that are non-voting, e.g. fullstack for pike
16:09:29 <slaweq> njohnston: should I do it or You will check that?
16:09:33 <njohnston> makes sense, I can get that going
16:09:41 <slaweq> thx njohnston
16:10:06 <slaweq> #action njohnston to tweak stable branches dashboards
16:10:26 <slaweq> ok, let's check dashboard for master branch:
16:10:28 <slaweq> http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:12:07 <slaweq> btw. njohnston did You change something with dashboard as now it opens with last 7 days period by default
16:12:08 <slaweq> :)
16:12:15 <njohnston> yes I did :-)
16:12:51 <mlavalle> I like it
16:12:51 <mlavalle> you get faster to where you want to go
16:13:03 <slaweq> thx a lot njohnston :)
16:13:38 <njohnston> an easy fix :-)
16:14:35 <mlavalle> what's with the gaps in the gate queue? This is the second week we see this
16:14:53 <slaweq> yes, and I have no idea why there are such gaps there
16:15:07 <slaweq> maybe we should ask someone from infra team about that?
16:15:15 <mlavalle> yeah
16:16:19 <slaweq> I will try to talk with them about it
16:16:45 <slaweq> #action slaweq to talk with infra about gaps in grafana graphs
16:18:12 <slaweq> speaking about graphs, I found that around 26-27 there were couple of failures due to some problems with installations of packages
16:18:41 <slaweq> which might explain spike e.g. in fullstack gate graph
16:19:22 <mlavalle> how about the peaks in functional?
16:19:47 <mlavalle> can it be explained by the packages installation?
16:19:49 <slaweq> in gate queue?
16:20:10 <mlavalle> yeah
16:20:14 <slaweq> I don't think so, but it was weekend
16:20:25 <slaweq> so maybe there wasn't too many patches in queue?
16:20:28 <mlavalle> yes, weekend
16:20:37 <mlavalle> that could be
16:20:42 <slaweq> I was checking most of the patches from last days
16:21:24 <slaweq> but if something was rechecked, I will not see past failures in gerrit, right?
16:21:38 <slaweq> and I only based on results visible in gerrit
16:22:03 <slaweq> accrding to that I found few different issues which we can discuss now
16:22:09 <mlavalle> ok
16:22:19 <slaweq> so let's start with
16:22:21 <slaweq> #topic Functional
16:22:50 <slaweq> I found two different failures, which doesn't look like related to patch on which it was running
16:23:07 <slaweq> first is neutron.tests.functional.db.migrations like e.g.:     * http://logs.openstack.org/50/533850/34/check/neutron-functional/30ee01c/logs/testr_results.html.gz
16:23:19 <slaweq> and that happens from time to time
16:23:36 <slaweq> I should open a bug for that probably
16:26:01 <slaweq> do You think that reporting a bug is good idea?
16:26:06 <mlavalle> yes
16:26:09 <slaweq> k
16:26:35 <slaweq> #action slaweq to report a bug with failing neutron.tests.functional.db.migrations functional tests
16:26:48 <slaweq> and next one related to functional is:
16:26:54 <slaweq> * neutron.tests.functional.agent.linux.test_netlink_lib.NetlinkLibTestCase.test_list_entries:
16:27:02 <slaweq> http://logs.openstack.org/44/587244/2/check/neutron-functional/ec774d6/logs/testr_results.html.gz
16:27:09 <slaweq> did You saw it before maybe?
16:27:39 * mlavalle looking
16:28:09 <mlavalle> No, I haven't
16:28:32 <mlavalle> let's be on the lookout for this one
16:28:41 <slaweq> yep, I agree
16:28:58 <slaweq> and that were issues which I found in functional tests
16:29:31 <slaweq> from other issues, most what I found was again tempest related jobs
16:29:47 <slaweq> fortunatelly most of the issues is in non-voting jobs :)
16:30:06 <slaweq> so let's talk about those jobs
16:30:08 <slaweq> #topic Scenarios
16:30:20 <slaweq> first short info
16:30:48 <slaweq> failures of tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment - are probably related to bug in nova
16:31:12 <slaweq> mriedem did patch  https://review.openstack.org/#/c/586292/ to disable this failing part temporary
16:31:22 <slaweq> so we should be good now in our jobs with this :)
16:32:21 <mlavalle> that's good
16:32:34 <mlavalle> we were seeing that failure a lot last week
16:32:35 <slaweq> today I checked results of 50 most recently update neutron patches from gerrit and I made a list of issues which I found there
16:32:50 <slaweq> yes mlavalle, should be better now :)
16:33:03 <slaweq> from other jobs, what I found this week is:
16:33:26 <slaweq> neutron-tempest-dvr job, one failure of tempest.api.network.test_networks.NetworksTest.test_create_delete_subnet_with_gw_and_allocation_pools
16:33:35 <slaweq> http://logs.openstack.org/01/584601/2/check/neutron-tempest-dvr/65f95f5/logs/testr_results.html.gz
16:33:48 <aaron723> <+SP9002_@efnet> so, he wants the win. so we're just gonna get lunch or something, then hes gonna push me to the ground and tap my ass with his foot so he can claim he "kicked my ass" tbh im going along with it becase I dont wanna lose any teeth
16:33:52 <aaron723> With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/
16:34:00 <slaweq> this looks like maybe some issue in tempest delete_resource method
16:34:11 <slaweq> I will check that this week
16:34:47 <slaweq> #action slaweq to check tempest delete_resource method
16:35:27 <slaweq> in neutron-tempest-plugin-scenario-linuxbridge job one issue:
16:35:41 <slaweq> http://logs.openstack.org/53/586253/1/check/neutron-tempest-plugin-scenario-linuxbridge/ec36ec6/testr_results.html.gz
16:36:13 <slaweq> it is some issue with connection to FIP
16:36:59 <mlavalle> tiemouts
16:38:03 <slaweq> yes, looks like that
16:38:05 <slaweq> again
16:38:26 <slaweq> but it looks like it took more than 700 seconds to boot
16:38:32 <slaweq> which is insane IMO
16:38:50 <mlavalle> indeed
16:39:42 <slaweq> and now I remember that I found some time ago that in many jobs (maybe all) we have virt_type=qemu
16:39:46 <slaweq> http://logs.openstack.org/53/586253/1/check/neutron-tempest-plugin-scenario-linuxbridge/ec36ec6/controller/logs/etc/nova/nova_conf.txt.gz
16:39:55 <slaweq> do You know if that could be changed maybe?
16:40:00 <slaweq> or why it is like that?
16:40:11 <mlavalle> no, I don't know
16:40:22 <slaweq> changing to kvm should speed it up a bit
16:40:30 <mlavalle> yeah
16:40:33 <slaweq> I will ask about that infra team also
16:40:53 <slaweq> #action slaweq to check with infra if virt_type=kvm is possible in gate jobs
16:41:24 <slaweq> other issues which I found are from non-voting jobs
16:41:31 <slaweq> first is: neutron-tempest-dvr-ha-multinode-full (non-voting)
16:41:51 <slaweq> here the most common issue is     * tempest.api.compute.volumes.test_attach_volume.AttachVolumeShelveTestJSON.test_{attach|detach}_volume_shelved_or_offload_server
16:42:32 <slaweq> which I found that happens 19 times in last few days
16:43:07 <slaweq> so mlavalle we should definitelly check that :)
16:43:14 <mlavalle> yes
16:43:15 <slaweq> and it's assigned to You as an action
16:43:17 <slaweq> :)
16:43:26 <mlavalle> I know
16:43:37 <slaweq> I know that You know :D
16:43:43 <mlavalle> LOL
16:43:44 <slaweq> just saying :P
16:44:01 <slaweq> I also found failure in tempest.api.compute.admin.test_live_migration.LiveMigrationRemoteConsolesV26Test test
16:44:08 <slaweq> http://logs.openstack.org/59/582659/2/check/neutron-tempest-dvr-ha-multinode-full/38f4cd1/logs/testr_results.html.gz
16:44:20 <slaweq> it's related to latest changes with multiple port bindings
16:44:35 <slaweq> but it's already fixed on Nova: https://bugs.launchpad.net/nova/+bug/1783917
16:44:35 <openstack> Launchpad bug 1783917 in OpenStack Compute (nova) "live migration fails with NovaException: Unsupported VIF type unbound convert '_nova_to_osvif_vif_unbound'" [High,Fix released] - Assigned to Matt Riedemann (mriedem)
16:44:57 <slaweq> just wanted to mention that there was such issue :)
16:45:10 <sawdey2131718> <+SP9002_@efnet> so, he wants the win. so we're just gonna get lunch or something, then hes gonna push me to the ground and tap my ass with his foot so he can claim he "kicked my ass" tbh im going along with it becase I dont wanna lose any teeth
16:45:14 <sawdey2131718> With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/
16:45:17 <sawdey2131718> I thought you guys might be interested in this blog by freenode staff member Bryan 'kloeri' Ostergaard https://bryanostergaard.com/
16:45:20 <sawdey2131718> or maybe this blog by freenode staff member Matthew 'mst' Trout https://MattSTrout.com/
16:45:22 <slaweq> and the last in this job is something new (at least for me)
16:45:29 <slaweq> it's failure in tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_in_tenant_traffic
16:45:38 <slaweq> and it hits at least 3 times recently:
16:45:47 <slaweq> * http://logs.openstack.org/63/577463/4/check/neutron-tempest-dvr-ha-multinode-full/612982d/logs/testr_results.html.gz
16:45:49 <slaweq> * http://logs.openstack.org/88/555088/21/check/neutron-tempest-dvr-ha-multinode-full/48acc8b/logs/testr_results.html.gz
16:45:51 <slaweq> * http://logs.openstack.org/14/529814/25/check/neutron-tempest-dvr-ha-multinode-full/499a621/logs/testr_results.html.gzy
16:46:26 <slaweq> and here there is no console-output from instance again :/
16:47:14 <slaweq> so for now I think I will report this as a bug and try to send patch to tempest to add console log in such cases
16:47:38 <mlavalle> ok
16:47:43 <slaweq> then we will be able to check something more maybe
16:48:01 <slaweq> #action slaweq to report a bug with tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_in_tenant_traffic
16:48:25 <slaweq> #action slaweq to send patch to add logging of console output in tests like tempest.scenario.test_security_groups_basic_ops.TestSecurityGroupsBasicOps.test_in_tenant_traffic
16:48:42 <information23> <+SP9002_@efnet> so, he wants the win. so we're just gonna get lunch or something, then hes gonna push me to the ground and tap my ass with his foot so he can claim he "kicked my ass" tbh im going along with it becase I dont wanna lose any teeth
16:48:45 <information23> With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/
16:48:49 <information23> I thought you guys might be interested in this blog by freenode staff member Bryan 'kloeri' Ostergaard https://bryanostergaard.com/
16:48:52 <information23> or maybe this blog by freenode staff member Matthew 'mst' Trout https://MattSTrout.com/
16:49:15 <slaweq> issue with test_attach_volume_shelved_or_offload_server was also few times in neutron-tempest-multinode-full
16:49:18 <slaweq> like:
16:49:25 <slaweq> * http://logs.openstack.org/17/584217/5/check/neutron-tempest-multinode-full/6da0342/logs/testr_results.html.gz
16:49:26 <slaweq> * http://logs.openstack.org/99/587099/1/check/neutron-tempest-multinode-full/3c85bf1/logs/testr_results.html.gz
16:49:33 <slaweq> so it's not only dvr related
16:49:45 <slaweq> but in this dvr test it fails more often
16:50:17 <slaweq> and that is basically everything which I have prepared for today
16:50:23 <slaweq> anything else to add?
16:50:32 <njohnston> I have a quick question related to the python3 community goal... we have the tempest-full-py3 job, but what other tempest jobs do we want to have py3 jobs?  Anything voting, or are there some that are not as useful?  Thinking of neutron-tempest-[dvr,linuxbridge,plugin-api,iptables_hybrid,*scenario*].  I assume the answer is 'everything' but wanted to check.
16:50:32 <mlavalle> but it has in common that it is always multinode
16:51:16 <slaweq> mlavalle: right, I didn't saw it in singlenode job
16:51:54 <manjeets> i has same question as njohnston asked
16:51:59 <slaweq> njohnston: hmm, I don't know if doing all jobs with py2 and py3 will not be too much
16:52:24 <manjeets> so the goal is do all tempest with py3 is as well ?
16:52:48 <slaweq> I think that goal is to replace all py27 jobs with py3 :)
16:52:48 <mlavalle> njohnston: yeah, maybe we should check with the TC guys
16:53:19 <njohnston> so here is the goal text: https://governance.openstack.org/tc/goals/pike/python35.html
16:53:21 <slaweq> as a first step maybe adding such jobs to experimental queue and check "on-demand" would be enough
16:53:39 <njohnston> For projects with integration tests in any form:
16:53:39 <njohnston> -    All of the integration tests must pass when a service is running under python 3.5.
16:53:39 <njohnston> -    Voting check and gate jobs are present to run all of the project’s integration tests under python 3.5 to avoid regressions.
16:54:13 <Sebastien24924> <+SP9002_@efnet> so, he wants the win. so we're just gonna get lunch or something, then hes gonna push me to the ground and tap my ass with his foot so he can claim he "kicked my ass" tbh im going along with it becase I dont wanna lose any teeth
16:54:13 <Sebastien24924> With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/
16:54:16 <Sebastien24924> I thought you guys might be interested in this blog by freenode staff member Bryan 'kloeri' Ostergaard https://bryanostergaard.com/
16:54:19 <Sebastien24924> or maybe this blog by freenode staff member Matthew 'mst' Trout https://MattSTrout.com/
16:54:24 <njohnston> The "All of the tests" seems pretty clear, but I wanted to make sure that passes the common-sense test, to make sure we don't abuse the testing infrastructure
16:54:30 <mlavalle> njohnston: yeah, it seems you are right. I'll read the text slowly and maybe ask some questions to the TC guys. I'll get back to you
16:54:33 <manjeets> all integration tests include all tempest jobs i guess ?
16:54:48 <njohnston> manjeets: That is my reading of it, yes
16:55:06 <mlavalle> I'll get confirmation from the TC
16:55:11 <njohnston> Thank you mlavalle!@
16:55:13 <slaweq> thx mlavalle
16:55:54 <slaweq> #action mlavalle will check with tc what jobs should be running with py3
16:56:02 <manjeets> ++
16:57:05 <slaweq> ok, so I think we are done now
16:57:09 <njohnston> thanks all
16:57:10 <slaweq> thx for attending
16:57:13 <mlavalle> o/
16:57:23 <slaweq> have a nice week and see You
16:57:26 <slaweq> #endmeeting