16:00:29 <slaweq> #startmeeting neutron_ci
16:00:30 <openstack> Meeting started Tue Oct 15 16:00:29 2019 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:31 <slaweq> hi
16:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:33 <ralonsoh> hi
16:00:34 <openstack> The meeting name has been set to 'neutron_ci'
16:00:55 * slaweq is on last meeting today \o/
16:00:56 <njohnston> o/
16:02:15 <slaweq> ok, I think we can start as bcafarel will not be available today
16:02:23 <slaweq> Grafana dashboard: http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:02:27 <slaweq> please open it now
16:02:36 <slaweq> #topic Actions from previous meetings
16:02:44 <slaweq> njohnston Update the neutron-tempest-plugin dashboard in grafana
16:03:20 <njohnston> so the change has been submitted
16:03:37 <njohnston> #link https://review.opendev.org/687686
16:04:09 <njohnston> waiting for another +2
16:04:24 <ralonsoh> (you have my +1)
16:04:37 <slaweq> thx njohnston
16:04:38 <njohnston> It is dependent on this change by slaweq https://review.opendev.org/#/c/685214/
16:05:00 <slaweq> ahh, yes I wanted to ask today for review of it
16:05:03 <ralonsoh> ^ this one, IMO, is ready to be merged
16:05:04 <slaweq> njohnston: can You?
16:05:06 <slaweq> :)
16:05:16 <njohnston> slaweq: your wish is my command. +W
16:05:35 <slaweq> LOL
16:05:37 <slaweq> thx
16:05:50 <slaweq> ok, so lets move on than
16:05:55 <slaweq> ralonsoh to check root cause of ssh issue in https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/testr_results.html.gz
16:06:15 <ralonsoh> slaweq, sorry but I didn't find the error
16:06:46 <ralonsoh> I still don't know why the IP is not given correctly to the VM
16:06:54 <ralonsoh> and I don't see any DHCP message
16:07:13 <ralonsoh> is like this VM has no connectivity to the DHCP
16:07:28 <slaweq> but it's single node job, rigth?
16:07:33 <ralonsoh> yes
16:07:51 <slaweq> so both ports (vm and dhcp) should be plugged into br-int
16:07:56 <ralonsoh> yes
16:08:30 <ralonsoh> I can give another try
16:08:33 <slaweq> there can be only to reasons IMO - either port wasn't wired properly either firewall rules were not configured in it properly
16:08:39 <slaweq> nono ralonsoh
16:08:41 <ralonsoh> but I spent one whole morning on this
16:08:51 <slaweq> if there is nothing in logs that You can't do anything :)
16:08:58 <slaweq> don't waste Your time on it
16:09:50 <clarkb> well usually that is an indication we need to improve our logging
16:10:07 <slaweq> clarkb: yes, I just think about the same
16:10:09 <clarkb> but ya update logging to cover likely cases then move on and wait for it to hit again (can also update elastic-recheck to track it for you)
16:10:18 <ralonsoh> clarkb, and that's usually the next action
16:11:23 <slaweq> ralonsoh: but do You think we can add some additional debug logs to our code to know something more about it?
16:11:46 <ralonsoh> slaweq, I'll check it and if possible, I'll submit a patch
16:11:57 <slaweq> tbh we should be able to see if ports were configured by ovs agent
16:12:22 <slaweq> and in fact IMO it was configured otherwise neutron would not report to nova that port is UP
16:12:32 <slaweq> and nova would not unpause vm, right?
16:12:47 <ralonsoh> slaweq, yes and we see it in the ovs logs
16:13:08 <slaweq> so IMO more likely this was some issue with openflow rules in br-int
16:13:36 <ralonsoh> slaweq, this is something we maybe should log better in debug/testing mode
16:14:01 <slaweq> ralonsoh: is there any way to maybe log all OF rules from bridges (br-int) every e.g. 1 second?
16:14:22 <ralonsoh> slaweq, uffff be careful
16:14:23 <slaweq> something similar to e.g. dstat: https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_9c2/664646/6/check/tempest-ipv6-only/9c2f68f/controller/logs/screen-dstat.txt.gz
16:14:35 <slaweq> ralonsoh: LOL, I know but that would help us a lot
16:14:35 <ralonsoh> this could be too much for the logs
16:14:45 <ralonsoh> but yes, we can al least track the changes
16:15:03 <slaweq> ralonsoh: right, timestamp and what flow rule was added/removed
16:15:26 <ralonsoh> exactly but I need to check if ovs-ofctl allows this kind of monitoring
16:15:29 <slaweq> do You think that would this be possible?
16:15:38 <ralonsoh> I need to check it first
16:15:46 <ralonsoh> like ovs-vsctl monitor
16:15:56 <slaweq> ralonsoh: exactly
16:16:00 <slaweq> will You do this?
16:16:04 <ralonsoh> (sure
16:16:08 <slaweq> thx a lot
16:16:39 <slaweq> #action ralonsoh to check if there is any possibility to do somethig like ovsdb-monitor for openflows
16:17:05 <slaweq> thx ralonsoh for taking care of it
16:17:09 <slaweq> lets move on
16:17:18 <slaweq> next was
16:17:19 <slaweq> slaweq to prepare etherpad to track dropping py27 jobs from ci
16:17:26 <slaweq> Etherpad is there https://etherpad.openstack.org/p/neutron_drop_python_2_status but it's not ready yet
16:17:39 <slaweq> I will prepare list of jobs to remove in each repo
16:18:10 <slaweq> I know we already started it for some repos like networking-ovn and neutron
16:18:24 <slaweq> this should be much faster than switch to python 3 :)
16:19:08 <njohnston> just one note on this
16:19:15 <slaweq> njohnston: sure
16:19:27 <njohnston> there are some projects out there that are being more aggressive in deprecating python 2
16:19:39 <njohnston> for example, by starting the process of removing six
16:20:13 <njohnston> that level of effort should be discussed and planned out especially for projects that get imported by other projects like neutron/neutron-lib
16:20:38 <njohnston> so while it may seem tempting I just wanted to note for posterity's sake that such moves should be held off, at this time
16:20:39 <slaweq> njohnston: personally I don't think we should do it now
16:20:57 <slaweq> 1. because of backports to stable branches which still support py2
16:21:12 <njohnston> the particular project I saw with the six removal was a widely used oslo project, and so everything I said obviously goes double for oslo
16:21:52 <slaweq> is there any "common goal" to remove such pieces of code during ussuri also?
16:22:02 <bnemec> There's discussion underway: https://etherpad.openstack.org/p/drop-python2-support
16:22:07 <bnemec> Oh, you already saw that.
16:22:09 <slaweq> or dropping py2 support means just that we will not test it in u/s ci?
16:22:41 <bnemec> No wait, the etherpad above is neutron-specific. So that tc one is still relevant here.
16:22:58 <slaweq> bnemec: yes, thx for link
16:23:05 <slaweq> I will read it later
16:23:19 <njohnston> thanks bnemec, that link was my next thing to say :-)
16:24:03 <njohnston> so to sum up: the stadium is complicated, deprecating py2 is complicated, so let's do our usual neutron thing and overcommunicate about it all :-)
16:24:16 <bnemec> lol
16:25:16 <slaweq> njohnston: what do You mean by "overcommunicate"? :)
16:28:46 <slaweq> hmm, I think we lost njohnston now
16:28:56 <slaweq> maybe we can continue with other topics than
16:28:58 <slaweq> #topic Stadium projects
16:29:07 <njohnston> I am here.  Sorry, I thought you were saying that tongue-in-cheek
16:29:41 <slaweq> ahh, no
16:29:53 <slaweq> sorry I really wanted to know what You mean exactly
16:30:07 <njohnston> I just meant that we'll talk about it in all our meetings and coordinate about it verbosely
16:30:16 <slaweq> isn't preparing some etherpad and syncing weekly about it enough?
16:30:20 <slaweq> ahh, ok
16:30:33 <slaweq> thx, so that's something which I want to do with this :)
16:31:33 * njohnston is done
16:31:36 <slaweq> regarding stadium projects, I think we can remove python3 migration from this agenda as we did for neutron meeting
16:31:41 <njohnston> agreed
16:31:54 <slaweq> so we still have tempest-plugins migration
16:32:01 <slaweq> and there is no any news about it
16:32:26 <slaweq> anything else You want to discuss regarding stadium projects' ci?
16:33:06 <slaweq> ahh, I have one more small thing
16:33:10 <slaweq> please review https://review.opendev.org/#/c/685213/ - it's needed for Train and for stadium projects
16:34:07 <slaweq> if there is nothing else related to stadium, lets move on to the next topic
16:34:09 <slaweq> #topic Grafana
16:34:19 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:35:24 <slaweq> it's a bit strange that we have so few data for gate queue
16:35:54 <slaweq> did we merge only few patches last week in neutron really?
16:36:01 <ralonsoh> not many merged last week
16:36:05 <ralonsoh> merges
16:36:50 <slaweq> hmm, it could be like that really
16:37:48 <slaweq> in check queue still our bigger problem IMO is functional and fullstack job
16:38:21 <njohnston> yep
16:38:30 <slaweq> I have couple of examples of new failures there
16:39:05 <slaweq> from good things I think that our (voting) scenario jobs works quite good recently
16:39:43 <njohnston> agreed
16:40:28 <slaweq> ok, so lets talk about those functional/fullstack failures
16:40:38 <slaweq> #topic fullstack/functional
16:40:45 <slaweq> first fullstack
16:41:05 <slaweq> ralonsoh: I saw 2 new failures of our old friend: neutron.tests.fullstack.test_dhcp_agent.TestDhcpAgentHA.test_reschedule_network_on_new_agent
16:41:11 <slaweq> https://2f600b8d6843c7d64afe-bbbb707b755a08f42bfac9929d4d55b4.ssl.cf2.rackcdn.com/688439/3/check/neutron-fullstack/119e702/testr_results.html.gz
16:41:18 <slaweq> and
16:41:20 <slaweq> https://e5965e413edbbc117465-bfd96e490b07511790a1eb1aa4beb29d.ssl.cf2.rackcdn.com/665467/37/check/neutron-fullstack/4437eb3/testr_results.html.gz
16:41:40 <slaweq> IIRC You added some extra logs some time ago to debug this when it will happen again, right?
16:42:10 <ralonsoh> slaweq, yes. I'll add this to my pile for tomorrow
16:42:23 <slaweq> ralonsoh: thx, I can help with this if You want
16:43:34 <slaweq> #action ralonsoh to investigate failed fullstack tests for dhcp agent rescheduling
16:43:51 <slaweq> ok, next one
16:43:52 <slaweq> neutron.tests.fullstack.test_qos.TestDscpMarkingQoSOvs
16:43:57 <slaweq> https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_7c2/684457/1/check/neutron-fullstack/7c2c5d8/testr_results.html.gz
16:44:09 <slaweq> It seems like there was only one icmp packet sent and test failed: https://openstack.fortnebula.com:13808/v1/AUTH_e8fd161dc34c421a979a9e6421f823e9/zuul_opendev_logs_7c2/684457/1/check/neutron-fullstack/7c2c5d8/controller/logs/dsvm-fullstack-logs/TestDscpMarkingQoSOvs.test_dscp_marking_packets.txt.gz
16:44:21 <slaweq> so as it is "my" test I will debug this
16:44:24 <slaweq> ok for You?
16:44:29 <ralonsoh> ok
16:44:32 <njohnston> +1
16:44:35 <slaweq> :)
16:44:48 <slaweq> #action slaweq to investigate failed neutron.tests.fullstack.test_qos.TestDscpMarkingQoSOvs
16:45:01 <slaweq> now functional tests
16:45:04 <slaweq> neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_port_creation_with_dscp_marking(egress)
16:45:09 <slaweq> https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_692/688439/3/check/neutron-functional/69288a9/testr_results.html.gz
16:45:21 <slaweq> this one is interesting IMO
16:45:26 <slaweq> in test logs: https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_692/688439/3/check/neutron-functional/69288a9/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_port_creation_with_dscp_marking_egress_.txt.gz
16:45:38 <slaweq> there is a lot of "EVENT OVSNeutronAgentOSKenApp->ofctl_service GetDatapathRequest send_event" log lines
16:45:59 <slaweq> at first glance it looks for me like something was looping there infinitely
16:46:09 <njohnston> that's quite unusual
16:46:38 <slaweq> njohnston: yep
16:47:25 <slaweq> anyone wants to investigate that?
16:47:34 <ralonsoh> sorry not now
16:47:34 <slaweq> if not, I can assign it to myself
16:47:41 <ralonsoh> I have enough backlog
16:47:45 <slaweq> ralonsoh: sure, I know You are overloaded :)
16:47:57 <slaweq> ok, I will try to check that
16:48:26 <slaweq> #action slaweq to check strange "EVENT OVSNeutronAgentOSKenApp->ofctl_service GetDatapathRequest send_event" log lines in neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_port_creation_with_dscp_marking
16:48:42 <slaweq> ok, next one
16:48:51 <slaweq> this one might be interesting for njohnston :)
16:48:59 <slaweq> failed db migration tests again
16:49:01 <slaweq> https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_692/688439/3/check/neutron-functional/69288a9/controller/logs/dsvm-functional-logs/neutron.tests.functional.agent.l2.extensions.test_ovs_agent_qos_extension.TestOVSAgentQosExtension.test_port_creation_with_dscp_marking_egress_.txt.gz
16:49:10 <slaweq> but this time it's not due to timeout and slow node
16:49:32 <slaweq> sorry, wrong link
16:50:01 <slaweq> argh, I don't have correct link for this one now
16:50:03 <slaweq> sorry
16:50:17 <njohnston> slaweq: np
16:50:21 <slaweq> if I will find it somewhere, I will ping You
16:50:28 <njohnston> slaweq: sure thing
16:50:46 <njohnston> slaweq: also as a lead on the ofctl error from the last bug, check out https://bugzilla.redhat.com/show_bug.cgi?id=1382372
16:50:46 <openstack> bugzilla.redhat.com bug 1382372 in openstack-neutron "Selinux is blocking ovs-vswitchd during functional tests" [Urgent,Closed: currentrelease] - Assigned to twilson
16:51:02 <njohnston> it's been seen before, as an selinux issue
16:51:44 <slaweq> njohnston: but u/s ci is running on ubuntu and I don't think there is selinux enabled there
16:52:54 <njohnston> darn bugzilla, bamboozled again
16:53:17 <slaweq> lol
16:53:58 <slaweq> ok, lets move on to the last one:
16:54:00 <slaweq> neutron.tests.functional.agent.test_firewall.FirewallTestCase
16:54:02 <slaweq> https://08b11cea7395c153ac8e-9514c1b1570a8e9931b2a7d3207ef22f.ssl.cf2.rackcdn.com/684457/1/check/neutron-functional/06e9d0c/testr_results.html.gz
16:55:07 <ralonsoh> the flows were not applied?? maybe
16:55:23 <ralonsoh> (we need a flow monitor)
16:55:36 <slaweq> ralonsoh: yes
16:55:48 <slaweq> otherwise it may be hard to say what happend there
16:56:13 <slaweq> but in this case it's not "br-int" only to monitor
16:56:20 <ralonsoh> I know
16:56:21 <slaweq> each test has got own bridge
16:56:36 <ralonsoh> every test case should be able to deploy it's own monitor
16:56:42 <ralonsoh> in any bridge
16:56:53 <ralonsoh> I'll put extra effort on this
16:57:02 <slaweq> maybe we can do some simple decorator to run tests and in case of failure dump bridges from bridge used in the test somehow?
16:57:19 <ralonsoh> this could be useful too
16:57:21 <njohnston> that would be very useful
16:57:23 <slaweq> *dump flows
16:57:40 <slaweq> anyone wants to do it maybe?
16:57:47 <slaweq> if not, I can
16:57:49 <slaweq> :)
16:57:56 <ralonsoh> I can check it
16:57:57 <slaweq> at least add it to my backlog
16:58:01 <slaweq> ok, thx ralonsoh
16:58:04 <ralonsoh> along with the flow monitor
16:58:27 <slaweq> #action ralonsoh to try to log flows at the end of faliled functional test
16:58:41 <slaweq> ralonsoh: ^^ it's just as a reminder, it's not urgent for sure :)
16:58:52 <njohnston> I'll also see if I can clear enough time to take a look
16:58:59 <slaweq> thx njohnston
16:59:03 <ralonsoh> thanks!
16:59:20 <slaweq> ok, I think we have to finish now
16:59:25 <slaweq> one last thing
16:59:31 <slaweq> please add https://review.opendev.org/#/c/685705/ to Your review pile
16:59:42 <slaweq> thx for attending
16:59:51 <slaweq> and have a great day/evening :)
16:59:53 <slaweq> bye
16:59:57 <ralonsoh> bye!
16:59:57 <slaweq> #endmeeting