#openstack-meeting log

16:00:42 <ihrachys> #startmeeting neutron_ci
16:00:43 <openstack> Meeting started Tue Apr  4 16:00:42 2017 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:44 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:46 <openstack> The meeting name has been set to 'neutron_ci'
16:00:51 <haleyb> hi
16:01:06 <ihrachys> hi everyone, thanks for joining
16:01:19 <ihrachys> we will start with our tradition - reviewing action items from prev meeting
16:01:26 <ihrachys> #topic Action items from prev meeting
16:01:33 <ihrachys> huh, "ihrachys fix e-r bot not reporting in irc channel"
16:01:44 * ihrachys turns red
16:01:52 <ihrachys> no, it's not tackled
16:02:04 <ihrachys> I wonder if history shows I can't tackle it in due time :)
16:02:55 <ihrachys> I will repeat it for the next time, but you know...
16:02:56 <ihrachys> #action ihrachys fix e-r bot not reporting in irc channel
16:03:11 <ihrachys> if someone wants to help tracking it, you are welcome
16:03:15 <ihrachys> next was "mlavalle to fix the grafana board to include gate-tempest-dsvm-neutron-dvr-multinode-full-ubuntu-xenial-nv"
16:03:35 <mlavalle> I submitted this patchset: https://review.openstack.org/#/c/452294/
16:03:35 <ihrachys> seems like it merged: https://review.openstack.org/#/c/452294/
16:03:38 <ihrachys> mlavalle, good job
16:03:43 <mlavalle> and it got merged last night
16:03:54 <mlavalle> :-)
16:04:04 <mlavalle> Thanks for the reviews :-)
16:04:06 <ihrachys> now, let's have a look how the dashboard looks like now
16:04:26 <ihrachys> should be here: http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=8&fullscreen
16:04:26 <reedip> o\ /o
16:05:41 <ihrachys> 35% failure rate it seems
16:06:05 <mlavalle> yeap about that
16:06:28 <ihrachys> that job is non-voting
16:07:36 <ihrachys> mlavalle, haleyb: is there any plan in l3 team to make it the gate job that could replace non-dvr/non-multinode flavours?
16:07:53 <mlavalle> not that I'm aware of
16:08:01 <ihrachys> I remember there was a plan to make ha+dvr the gate setup, but at this point it seems dim
16:08:05 <mlavalle> but haleyb may have a plan
16:08:18 <mlavalle> if he is not on-line....
16:08:28 <haleyb> ihrachys: yes, the plan was to get the ha+dvr change merged
16:08:32 <mlavalle> I will bring this up during the next L3 team meeting on Thursday
16:08:47 <ihrachys> haleyb, I think devstack-gate piece landed, no?
16:09:05 <ihrachys> this one: https://review.openstack.org/#/c/383827/
16:09:32 <ihrachys> there is a test patch from anil here: https://review.openstack.org/#/c/383833/ but I don't know what's the state there
16:09:40 <ihrachys> haleyb, are you in touch with anil?
16:10:14 <haleyb> ihrachys: no, but we will add this to the list of items at the L3 meeting
16:10:20 <ihrachys> ok cool, thanks
16:10:37 <ihrachys> ok next one was on me
16:10:39 <ihrachys> "ihrachys to report bugs for fullstack race in ovs agent when calling to enable_connection_uri"
16:10:44 <ihrachys> there are mixed news here
16:10:55 <ihrachys> I haven't reported bugs just yet but...
16:11:33 <ihrachys> I was working on some OSP Ocata test failures, and while reading through logs, I spotted that we see the same duplicate messages in logs when setting manager
16:11:53 <ihrachys> so it's not only fullstack issue
16:12:03 <ihrachys> the env I see it is an actual multinode deployment
16:12:16 <ihrachys> for reference, this is the error I talk about: http://logs.openstack.org/98/446598/1/check/gate-neutron-dsvm-fullstack-ubuntu-xenial/2e0f93e/logs/dsvm-fullstack-logs/TestOvsConnectivitySameNetworkOnOvsBridgeControllerStop.test_controller_timeout_does_not_break_connectivity_sigkill_GRE-and-l2pop,openflow-native_ovsdb-cli_/neutron-openvswitch-agent--2017-03-16--16-06-05-730632.txt.gz?level=TRACE
16:12:57 <ihrachys> apparently the code that sets managers for native ovsdb driver is raceful when executed by two agents
16:13:18 <ihrachys> which can of course happen because we deploy multiple agents on a single node
16:13:34 <ihrachys> and each of them uses its own copy of ovsdb connection
16:13:59 <ihrachys> so, just a heads up; and it's still on me to report the bug
16:14:10 <ihrachys> #action ihrachys to report bugs for fullstack race in ovs agent when calling to enable_connection_uri
16:14:37 <ihrachys> #action haleyb or mlavalle to report back on ha+dvr plan after l3 meeting
16:15:16 <ihrachys> there was also a long standing action item on jlibosva to prepare py3 transition plan for Pike
16:15:27 <ihrachys> I doubt it's ready though sicnce jlibosva was offline for a while
16:15:38 <ihrachys> jlibosva, but that's your chance to surprize everyone
16:15:38 <jlibosva> indeed
16:15:44 <jlibosva> no surprise
16:15:47 <jlibosva> :(
16:15:57 <ihrachys> that's ok, I would be really surprized
16:16:08 <ihrachys> #action jlibosva to prepare py3 transition plan for Pike
16:16:20 <ihrachys> we can walk through it the next week
16:16:43 <ihrachys> #topic Patches in review
16:16:59 <ihrachys> manjeets's patch to add gate-failure bugs to neutron review board seems stuck: https://review.openstack.org/#/c/439114/
16:17:13 <ihrachys> clarkb, I know you +2d it. who could be the 2nd person to review it?
16:17:37 <manjeets> ihrachys, I posted it on infra channel but did not get attention
16:17:50 <manjeets> may be need to find out
16:19:10 <ihrachys> ok I guess we will need to chase them somehow
16:19:18 <ihrachys> #action ihrachys to chase infra to review https://review.openstack.org/#/c/439114/
16:19:19 <manjeets> jeremy stanley
16:19:39 <manjeets> don't know his irc handle
16:19:55 <ihrachys> I am also aware of this set of backports to fix scenario jobs in lbaas: https://review.openstack.org/#/q/I5d41652a85cfb91646bb48d38eedbe97741a97c2,n,z (mitaka seems broken but I probably won't have time till EOL to fix it)
16:20:00 <ihrachys> manjeets, I think it's fungi
16:20:40 <ihrachys> I also revise a bit how we disable dvr tests for dvrskip scenario jobs here: https://review.openstack.org/#/c/453212/
16:21:36 <fungi> yep, that's me
16:21:43 <ihrachys> dasanind has a patch fixing sporadic tempest failure because of missing project_id on first API call: https://review.openstack.org/#/c/447781/ Now that it has a functional test, it should probably be ready, though I am still to look at the test.
16:21:50 <clarkb> ihrachys: probably fungi or pabelanger
16:21:55 <ihrachys> fungi, wonder if you could push https://review.openstack.org/#/c/439114/
16:22:42 <fungi> looking into it now
16:22:42 <dasanind> ihrachys: I am getting a tempest test failure for https://review.openstack.org/#/c/447781/
16:22:50 <dasanind> http://logs.openstack.org/81/447781/6/check/gate-tempest-dsvm-neutron-linuxbridge-ubuntu-xenial/c1962e8/logs/testr_results.html.gz
16:23:55 <ihrachys> dasanind, is it stable? or just a single failure? becuase the test is for Cinder API, and doesn't look neutron related for what I can see from 30 secs of log inspection
16:24:03 <fungi> manjeets: ihrachys: clarkb: i've approved 439114 now
16:24:12 <ihrachys> fungi, thanks a lot!
16:24:12 <manjeets> thanks fungi
16:24:19 <dasanind> ihrachys: it's just a single failure
16:24:19 <fungi> any time
16:24:52 <ihrachys> dasanind, ok, then it's probably something else. you are of course advised to report a bug if there is no bug that tracks the failure for cinder just yet.
16:25:27 <dasanind> ihrachys: will do
16:26:02 <ihrachys> another failure that we track is fullstack being broken by kevinbenton's change in provisioning blocks where we now require some ml2 driver to deliver the dhcp block for a dhcp enabled port to transition to ACTIVE
16:26:08 <ihrachys> more info here: http://lists.openstack.org/pipermail/openstack-dev/2017-March/114796.html
16:26:20 <ihrachys> jlibosva has a patch for fullstack here: https://review.openstack.org/451704
16:26:44 <ihrachys> it may need some additional investigation on why we seem to need to set agent_down_time=10 to make it pass
16:27:12 <jlibosva> I think it might not be needed but I don't understand why the tests fail on my env if I don't set the agent down time
16:27:24 <jlibosva> I'll wait for CI results and will try to remove that
16:27:35 <ihrachys> jlibosva, wonder what happens if we post a patch on top cleaning those up
16:28:16 <jlibosva> ihrachys: I went through patches that added dhcp tests and the agent_down_time is just to lower waiting when agent goes down in a failover test
16:28:27 <jlibosva> for HA
16:29:31 <ihrachys> jlibosva, but we don't failover in those tests that you touched do we?
16:31:46 <ihrachys> ok we will probably track it offline
16:31:58 <ihrachys> speaking of other patches
16:32:10 <jlibosva> ihrachys: nope
16:32:48 <ihrachys> my attention was brought to https://review.openstack.org/#/c/421155/ that fixes dvr tests for multinode setups (I expect it to affect our new ha+dvr job)
16:33:04 <ihrachys> I marked the bug as gate-failure for that matter
16:33:08 <ihrachys> to ease tracking it
16:33:33 <ihrachys> there is some back and forth there in comments about where to fix it first - tempest or neutron (seems like the test is duplicated)
16:33:47 <ihrachys> anyhoo, I am glad to see it got attention from some :)
16:35:32 <ihrachys> there seems to be a proposal to add a job using ryu master against neutron: https://review.openstack.org/#/c/445262/
16:36:00 <ihrachys> not sure why it's added in gate and not e.g. periodic
16:37:56 <ihrachys> ok I left a comment there
16:38:12 <ihrachys> there is also that long standing patch from jlibosva that documents how rechecks should be approached in gate: https://review.openstack.org/#/c/426829/
16:38:23 <ihrachys> mlavalle, I wonder if your WIP is still needed there
16:38:43 <mlavalle> I can remove it
16:38:51 <mlavalle> I don't think it is useful anymore
16:39:14 <ihrachys> yeah the patch seems to take a lot of time to get in
16:39:31 <mlavalle> I'll just merge it now
16:39:36 <ihrachys> I wonder if it's ok for me to just push it, or I better seek +W from e.g. kevinbenton
16:39:52 <ihrachys> since it's policy thing
16:40:07 <mlavalle> yeah, I think it would be good to get Kevinbenton's blessing
16:40:34 <mlavalle> I'll just remove the -1
16:40:44 <mlavalle> Done
16:41:56 <ihrachys> thanks
16:42:39 <ihrachys> I am not aware of any other patches. have I missed anything?
16:43:35 <ihrachys> otherwiseguy, how close are we to pull the trigger on ovsdbapp switch? https://review.openstack.org/#/c/438087/
16:46:51 <ihrachys> ok I guess otherwiseguy is offline
16:47:08 <otherwiseguy> oh hi
16:47:23 <ihrachys> o/
16:47:31 <ihrachys> how's ovsdbapp doing?
16:49:41 <ihrachys> ok otherwiseguy says he has some connectivity issues
16:49:58 <ihrachys> #topic Bugs
16:50:33 <ihrachys> there seems to be nothing actionable in the list that we haven't discussed already
16:50:36 <ihrachys> #link https://bugs.launchpad.net/neutron/+bugs?field.tag=gate-failure
16:51:04 <ihrachys> I am still not clear why we track e.g. vpnaas bugs that is not even stadium subproject under neutron component
16:51:44 <ihrachys> let's discuss something else
16:51:50 <ihrachys> #topic Open discussion
16:52:08 <ihrachys> infra seems to switch the whole gate that uses ubuntu xenial to UCA: http://lists.openstack.org/pipermail/openstack-dev/2017-April/114912.html
16:52:28 <ihrachys> that's ubuntu cloud archive, a repo that contains new versions of libvirt, openvswitch and such
16:52:28 <clarkb> well I am posing the question :)
16:52:42 <clarkb> I think newer ovs (2.6.1 compared to 2.5.0) helps neutron?
16:53:12 <ihrachys> clarkb, I think fullstack may make use of 2.6.1 so that we can stop compilation for kernel modules
16:53:17 <jlibosva> clarkb: would the image get also the newer kernel?
16:53:28 <clarkb> jlibosva: no UCA doesn't have nweer kernels in it
16:53:30 <ihrachys> right, question is, are images built with UCA on?
16:53:45 <ihrachys> or it's enabled after the fact
16:53:54 <jlibosva> ah, would be useful for fullstack
16:53:54 <clarkb> and they wouldn't be built with UCA on (most likely not at least, that specific detail isn't completely 100% settled)
16:54:01 <jlibosva> but it may help the disabled functional tests
16:54:04 <clarkb> ihrachys: even if it was enabled during image builds we woulnd't get newer kernels
16:54:16 <ihrachys> clarkb, is it possible to get an image with newer kernel too? otherwise we will still compile it seems.
16:54:37 <ihrachys> for functional, the only benefit is we will be able to reenable two tests
16:55:03 <clarkb> ihrachys: we could possibly do hardware enablement but unlike UCA I think ubuntu/canonical says not to use hardware enablement on servers
16:55:30 <jlibosva> we could also precompile the kernel module and fetch it from reliable storage instead of compiling the same all the time
16:55:36 <ihrachys> clarkb, sorry, what's hardware enablement?
16:56:01 <clarkb> ihrachys: its a separate thing that ubuntu does, where they publish newer kernels for LTS so that your new shiny laptop with silly new peripheral design will work
16:56:06 <ihrachys> jlibosva, the module should match kernel; if kernel is updated by ubuntu, we are screwed
16:56:12 <ihrachys> jlibosva, it may not load
16:56:23 <jlibosva> ihrachys: but we won't get newer kernel
16:56:40 <ihrachys> jlibosva, not new enough; but they can still update it for CVE or whatnot
16:56:41 <clarkb> I'm sort of confused why a new kernel is necessary
16:56:54 <jlibosva> there is a bug in kernel datapath for local vxlan traffic
16:57:06 <ihrachys> clarkb, openvswitch kernel pieces contain a fix that is needed for some fullstack tunneling feature
16:57:19 <clarkb> i see, has that been filed against ubuntu?
16:57:23 <ihrachys> jlibosva should really document that somewhere
16:57:35 <jlibosva> there was a bug, let me search
16:57:42 <ihrachys> #action jlibosva document current openvswitch requirements for fullstack/functional in TESTING.rst
16:58:03 <clarkb> I think if we want to talk about newer kernels that avenue for that would be the hardware enablement kernels and that would be separate from any use of UCA
16:58:08 <ihrachys> clarkb, one consideration when switching should also be revising https://review.openstack.org/#/c/402940/4/reference/project-testing-interface.rst not to give wrong message to consumers
16:58:22 <ihrachys> the way the document is worded now suggests that you can safely deploy from LTS bits
16:58:37 <ihrachys> clarkb, ack
16:58:41 <jlibosva> clarkb: this one https://bugs.launchpad.net/kernel/+bug/1627095
16:58:41 <openstack> Launchpad bug 1627095 in linux "Request to backport fix for local VxLAN" [Undecided,New]
16:58:41 <clarkb> ihrachys: well thats what openstack has stated it will support
16:58:47 <clarkb> ihrachys: so if its not the case we should ork to fix that
16:59:30 <ihrachys> clarkb, question is, should we work on it retroactively once bugs are revealed, or maintain a job that proves it still works, even if not too stable?
17:00:04 <clarkb> ihrachys: I think working with the distros to keep a functioning useable "openstack" is likely ideal. I don't know how practical that is in reality though
17:00:23 <clarkb> ihrachys: our users are deploying on these distros, if they don't work then our users will be sad (like me!))
17:00:29 <jlibosva> we're out of time
17:00:39 <ihrachys> right. anyhow would make sense to update docs based on UCA decision.
17:00:49 <ihrachys> ok time indeed
17:00:51 <ihrachys> thanks folks
17:00:53 <ihrachys> #endmeeting