16:00:55 <ihrachys> #startmeeting neutron_ci
16:00:56 <openstack> Meeting started Tue Mar 14 16:00:55 2017 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:58 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:58 <manjeets> hi
16:01:00 <openstack> The meeting name has been set to 'neutron_ci'
16:01:07 <ihrachys> hi everyone
16:01:21 <jlibosva> hi
16:01:43 <dasanind> hi
16:02:10 * ihrachys waves back at manjeets, jlibosva, and dasanind
16:02:11 <ihrachys> #topic Action items from previous meeting
16:02:26 <ihrachys> first was: "ihrachys fix e-r bot not reporting in irc channel"
16:03:15 <ihrachys> I haven't got to that one just yet, need to talk to mtreinish I guess on why it's not reporting, I suspect wrong configuration that is too limiting. lemme repeat the action for the next week.
16:03:21 <ihrachys> #action ihrachys fix e-r bot not reporting in irc channel
16:03:32 <ihrachys> next was "ihrachys to clean up dsvm-scenario flavor handling from gate-hook"
16:03:35 <ihrachys> that happened
16:03:49 <ihrachys> we landed https://review.openstack.org/#/c/442758/
16:05:03 <ihrachys> beyond cleanup, there is some failure in ovs scenario job that started showing up after latest hook rework
16:05:08 <ihrachys> we will discuss later
16:05:22 <ihrachys> next was "ihrachys to walk thru list of open gate failure bugs and give them love"
16:05:30 <ihrachys> I did, closed some bugs that didn't seem relevant
16:05:39 <ihrachys> ok next was "ihrachys to chase down armax on d-g local.conf breakage assessment for stadium"
16:06:31 <ihrachys> armax posted some patches since then: https://review.openstack.org/442884 for client, https://review.openstack.org/442890 for fwaas
16:06:55 <ihrachys> both were backported into stable branches
16:07:32 <ihrachys> there is also ongoing work for sfc gate: https://review.openstack.org/#/c/445037/ and https://review.openstack.org/#/c/442882/
16:07:45 <ihrachys> I assume that's all there is
16:08:10 <ihrachys> ok next was "haleyb and mlavalle to investigate what makes dvr gate job failing with 25% rate"
16:08:14 <clarkb> ihrachys: e-r reports that dhcp lease failures are causing problems
16:08:34 <clarkb> not sure if that is on your radar or not
16:08:53 <ihrachys> clarkb: it wasn't. is there a bug?
16:09:08 <mlavalle> hi
16:09:17 <clarkb> ihrachys: yes its top of e-r list http://status.openstack.org/elastic-recheck/index.html
16:09:29 <ihrachys> mlavalle: hey. we were looking for update on dvr job failure rate that was 25% prev week
16:09:43 <clarkb> ihrachys: the bug is over a year old and "fixed" in nova net, I think we must be rematching errors from syslog? against neutron jobs
16:09:54 <clarkb> ihrachys: so the bug there may not be the most up to date
16:10:13 <mlavalle> ihrachys: I couldn't make progress on that last week :-(
16:10:26 <ihrachys> clarkb: yeah, seems like catching generic message
16:10:39 <ihrachys> mlavalle: ok then I leave the action on you
16:10:47 <ihrachys> #action haleyb and mlavalle to investigate what makes dvr gate job failing with 25% rate
16:10:53 <mlavalle> ihrachys: thanks. I was sick a few days last week
16:11:01 <ihrachys> clarkb: thanks for bringing up, we will have a look
16:11:10 <ihrachys> mlavalle: np, get well
16:11:11 <clarkb> thanks
16:11:45 <ihrachys> #action ihrachys explore why bug 1532809 bubbled into top in e-r
16:11:45 <openstack> bug 1532809 in OpenStack Compute (nova) liberty "Gate failures when DHCP lease cannot be acquired" [High,In progress] https://launchpad.net/bugs/1532809 - Assigned to Sean Dague (sdague)
16:11:59 <ihrachys> next was "ajo to chase down fullstack 100% failure rate due to test_dscp_qos_policy_rule_lifecycle failures"
16:12:06 <ihrachys> also jlibosva was going to help ajo
16:12:11 <ihrachys> jlibosva: any progress?
16:12:15 <jlibosva> ihrachys: yes
16:12:21 <manjeets> I may sound dumb asking that question can some explain a bit what is e-r ?
16:12:25 <jlibosva> I started looking at it like hour-ish ago :)
16:12:39 <jlibosva> It got broken by https://review.openstack.org/#/c/380329
16:12:48 <jlibosva> I suspect that delete_dscp doesn't work with native driver
16:12:59 <ihrachys> manjeets: elastic-recheck, it's a tool that captures failure patterns in logs and reports in gerrit, on elastic recheck webui, and in irc
16:13:10 <jlibosva> I'm just trying a local run and I'm about to send a simple patch to verify the fullstack fix
16:13:22 <jlibosva> https://review.openstack.org/445560
16:13:22 <manjeets> ohk thanks ihrachys
16:13:49 <ihrachys> jlibosva: niiice
16:14:03 <jlibosva> I need to make sure that's really it, I didn't give it much love yet
16:14:09 <ihrachys> jlibosva: I wonder why we have different API for drivers
16:14:14 <jlibosva> yeah ...
16:14:30 <ihrachys> still this is a nice step to check if that fixes the issue
16:14:38 <ihrachys> if so, we can look at making them consistent
16:15:18 <jlibosva> there is also a constant failure in securitygroups for linuxbridge
16:15:24 <jlibosva> I didn't get a chance to look at that yet
16:15:59 <ihrachys> gotcha
16:16:24 <jlibosva> can be an AI to me till the next meeting ;)
16:17:12 <ihrachys> #action jlibosva fix delete_dscp for native driver: https://review.openstack.org/445560
16:17:27 <ihrachys> #action jlibosva to fix remaining fullstack failures in securitygroups for linuxbridge
16:18:12 <ihrachys> we are getting closer to stable functional job (it shows normal failure rate now), let's do the same for fullstack
16:18:17 <ihrachys> next AI was "ajo to restore and merge patch raising ovsdb native timeout: https://review.openstack.org/#/c/425623/"
16:18:28 <ihrachys> we restored, but now I am not sure
16:18:47 <ihrachys> as seen on http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen functional is back to normal
16:18:57 <ihrachys> and not the old normal, but actual normal
16:19:10 <ihrachys> its failure rate is less than -api job that is considered rather stable and is voting
16:19:52 <ihrachys> so I am not sure we should land the timeout bump
16:19:57 <ihrachys> thoughts?
16:20:16 <jlibosva> I would go for it
16:20:31 <jlibosva> as the worst that can happen is that some actions might take longer before failing
16:20:46 <jlibosva> if transaction is successful, it won't be longer
16:22:13 <ihrachys> but doesn't it open door wider for performance regressions
16:24:33 <jlibosva> you mean that if we make a change that would create a longer transaction it would fail for us in gate while with higher timeout it would get pass?
16:24:58 <ihrachys> yeah. we will catch it a lot later when we pile up more regressions on top
16:25:09 <ihrachys> at the point when it may be harder to unravel things
16:26:07 <jlibosva> maybe we can close the bug and discuss this on the review with pros and cons
16:26:27 <ihrachys> yeah
16:26:40 <ihrachys> I also want to get back to the question of voting for the job in several weeks
16:26:45 <ihrachys> assuming we prove it's stable
16:27:41 <ihrachys> ok let's discuss further in gerrit
16:27:45 <ihrachys> next AI was "anilvenkata_afk to track inclusion of HA+DVR patch for devstack-gate"
16:27:55 <jlibosva> ihrachys: you mean putting functional to gate queue?
16:28:02 <ihrachys> jlibosva: yes
16:28:04 <jlibosva> or talking about fullstack
16:28:08 <jlibosva> ok
16:28:09 <ihrachys> we told we want it stable and voting?
16:28:22 <ihrachys> maybe just check, we'll see
16:28:23 <ihrachys> but voting
16:28:39 <ihrachys> ehm, sorry, I mix things :)
16:28:43 <ihrachys> yes, gate queue
16:28:50 <jlibosva> got it :)
16:28:51 <ihrachys> for fullstack, it will be first check voting
16:29:18 <ihrachys> for that we need to show weeks of steady stability
16:29:24 <ihrachys> only time will tell if we can do that
16:29:34 <ihrachys> ok back to HA+DVR job
16:29:43 <ihrachys> the patch is still sitting there: https://review.openstack.org/#/c/383827/
16:30:05 <ihrachys> we may want clarkb and other infra gurus to chime in
16:30:22 <ihrachys> I see clarkb +2d it in the past
16:30:44 <clarkb> I can rereview
16:31:22 <ihrachys> thanks!
16:31:49 <ihrachys> #action anilvenkata to follow up on HA+DVR job patches
16:32:18 <ihrachys> there was a test failure in the new ha-dvr job, probably because of the new topology used
16:32:34 <ihrachys> I assume Anil will look at it once we have infra side done
16:33:50 <ihrachys> though actually, since devstack-gate is part of the same queue, we don't need to wait, we can start fixing the test runs with depends-on patches
16:34:02 <ihrachys> gotta check with Anil on his plans
16:34:31 <ihrachys> and, that was it for action items from the previous week
16:34:36 <ihrachys> let's move on
16:34:44 <ihrachys> #topic Action items from PTG
16:34:53 <ihrachys> the prev meeting, we covered most of them
16:35:12 <ihrachys> the only thing that was left behind is python3 strategy
16:35:32 <ihrachys> during ptg, we decided there should be a bunch of jobs that should transfer to py3
16:35:52 <ihrachys> I believe functional and fullstack were in short list
16:36:03 <ihrachys> also we need some tempest job switched to py3
16:36:16 <ihrachys> jlibosva: what's the plan?
16:37:00 <jlibosva> yes, I believe functional, fullstack and full-tempest makes most sense. I don't think we have any blueprint or bug where we can track the effort
16:37:09 <jlibosva> so I'm gonna create some
16:37:21 <jlibosva> and same for tempest split
16:37:44 <ihrachys> jlibosva: which full-tempest?
16:37:49 <ihrachys> I think we have several
16:38:17 <ihrachys> there is one all-in-one, and two multinodes
16:38:25 <ihrachys> (which are 2nodes really)
16:39:32 <jlibosva> but none of multinode are voting, are they?
16:40:02 <ihrachys> yeah
16:40:06 <ihrachys> and that's a shame
16:40:16 <ihrachys> but consider that the single node one is part of integrated gate
16:40:31 <ihrachys> so it wouldn't be a neutron only decision to switch it
16:40:39 <ihrachys> and I suspect it wouldn't go smooth
16:41:01 <jlibosva> hmm, I need to update myself on that, I see we already have some py35 tempest: http://logs.openstack.org/29/380329/24/check/gate-tempest-dsvm-py35-ubuntu-xenial/4ed1870/logs/testr_results.html.gz
16:41:35 <ihrachys> jlibosva: it's not working
16:41:46 <ihrachys> because swift is not really compatible
16:41:55 <ihrachys> and devstack attempts to execute it as py3
16:42:29 <ihrachys> I think that's going to be tackled with https://review.openstack.org/#/c/439108/
16:42:35 <ihrachys> but so far it doesn't move anywhere
16:43:13 <jlibosva> so what does the job run?
16:44:26 <ihrachys> it runs everything that devstack determines as py3 compat in py3
16:44:38 <ihrachys> and there was a hack in devstack that enforced py3 for swift
16:44:47 <ihrachys> that was honestly totally wrong
16:45:07 <ihrachys> because they have py3 SyntaxErrors in their code, not to mention it was never tried
16:46:16 <ihrachys> anyhoo, let's leave AI on you and move on
16:46:33 <ihrachys> #action jlibosva to figure out the plan for py3 gate transition and report back
16:47:14 <ihrachys> there will also be some CI related work once we get to switch to lib/neutron: https://review.openstack.org/#/q/topic:new-neutron-devstack-in-gate and mod_wsgi
16:47:28 <jlibosva> yep
16:47:44 <ihrachys> one other thing than I wanted to raise is https://review.openstack.org/#/c/439114/
16:47:58 <ihrachys> manjeets: what's the status of the dashboard patch? are you going to respin it?
16:48:10 <manjeets> ihrachys, yes
16:48:25 <ihrachys> I was thinking, maybe we should squash that with existing neutron review dashboard
16:48:35 <ihrachys> having a single tool may make more sense
16:48:42 <manjeets> yes that make sense since some of existing method can be used as well
16:48:42 <ihrachys> thoughts?
16:49:03 <manjeets> just creating a new section for all gate patches in existing dashboard
16:49:05 <manjeets> ??
16:49:08 <ihrachys> yea
16:49:13 <manjeets> ++
16:49:45 <ihrachys> #action manjeets respin https://review.openstack.org/#/c/439114/ to include gate-failure reviews into existing dashboard
16:50:40 <ihrachys> another thing I wanted to note is that we made some progress on the path to index functional and fullstack logs in logstash
16:50:45 <ihrachys> https://review.openstack.org/#/q/topic:index-func-logs
16:51:00 <ihrachys> we now generate a -index.txt file with all INFO+ messages in each gate run
16:51:16 <jlibosva> ihrachys: good job :)
16:51:18 <ihrachys> the only missing bit is actually updating logstash config to index it
16:51:29 <ihrachys> I hope to get infra review it today
16:51:41 <ihrachys> clarkb: maybe you have +2 there too? system-config ^
16:52:35 <ihrachys> speaking of more logs collected, another stalled work is collecting info on mlock-consuming processes: https://review.openstack.org/#/q/topic:collect-mlock-stats-in-gate
16:53:00 <ihrachys> I haven't seen oom-killers lately. is it just my perception?
16:53:28 * jlibosva hasn't checked the status
16:53:42 <clarkb> ihrachys: ya I can look a tthat too
16:54:56 <ihrachys> thanks
16:56:22 <ihrachys> ok any more topics to raise this time?
16:57:42 <jlibosva> not from me
16:58:07 <manjeets> not from me either atm
16:58:11 <ihrachys> ok let's call it a day
16:58:13 <ihrachys> thanks everyone
16:58:15 <manjeets> thanks
16:58:16 <jlibosva> thanks!
16:58:16 <ihrachys> #endmeeting