16:00:26 <ihrachys> #startmeeting neutron_ci
16:00:26 <openstack> Meeting started Tue Feb 28 16:00:26 2017 UTC and is due to finish in 60 minutes.  The chair is ihrachys. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:28 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:30 <openstack> The meeting name has been set to 'neutron_ci'
16:00:35 <mlavalle> o/
16:01:00 <dasm> o/
16:01:50 <ihrachys> I think kevinbenton and jlibosva won't join today for different reasons but nevertheless it's worth running through outstanding actions
16:01:59 <ihrachys> #link https://wiki.openstack.org/wiki/Meetings/NeutronCI Agenda
16:02:04 <ihrachys> #topic Action items from previous meeting
16:02:09 <dasanind> o/
16:02:28 <ihrachys> "ihrachys to look at e-r bot for openstack-neutron channel"
16:02:40 <ihrachys> this landed: https://review.openstack.org/#/c/433828/
16:02:50 <ihrachys> has anyone seen the bot reporting anything in the channel since then? :)
16:03:21 <ihrachys> doesn't seem like it did a single time
16:03:36 <ihrachys> gotta give it some time and see if it's some issue in configuration, or just no positive hits
16:03:58 <ihrachys> #action ihrachys to monitor e-r irc bot reporting in the channel
16:04:18 <ihrachys> "manjeets to polish the dashboard script and propose it for neutron/tools/"
16:04:37 <ihrachys> manjeets: I know you proposed the patch for neutron tree but armax asked to move it to another place
16:04:55 <ihrachys> has it happened so far?
16:04:58 <manjeets> 0/
16:05:06 <ihrachys> (the neutron patch was https://review.openstack.org/#/c/433893/)
16:05:12 <manjeets> i'll move it today was busy with other stuff yesterday
16:05:26 <ihrachys> ok cool, thanks for working on it
16:05:40 <ihrachys> #action manjeets to repropose the CI dashboard script for reviewday
16:05:58 <ihrachys> next item was "jlibosva to try reducing parallelization for scenario tests"
16:06:23 <ihrachys> this landed: https://review.openstack.org/#/c/434866/
16:07:26 <ihrachys> I believe scenario job was still not particularly stable due to qos tests so we also landed https://review.openstack.org/#/c/437011/ that temporarily disables the test that measures bandwidth
16:07:41 <ihrachys> ajo was going to rework the test once more
16:08:09 <ihrachys> at this point the failure rate for the job is: ovs ~ 40% and linuxbridge ~ 100%
16:09:00 <ihrachys> seems like lb trunk test fails consistently due to missing connectivity: http://logs.openstack.org/69/438669/1/check/gate-tempest-dsvm-neutron-scenario-linuxbridge-ubuntu-xenial-nv/a16519c/testr_results.html.gz
16:09:09 <ihrachys> could be a legit failure, will need to follow up with armax on the matter
16:09:28 <ihrachys> #action ihrachys to follow up with armax on why trunk connectivity test fails for lb scenario job
16:09:48 <ihrachys> ok next was "ihrachys to look at getting more info from kernel about ram-locked memory segments"
16:10:03 <ihrachys> there are a bunch of patches on review
16:10:25 <ihrachys> one is enabling the needed logging in peakmem_tracker service: https://review.openstack.org/#/c/434470/ (needs some small rework)
16:10:54 <ihrachys> btw we enabled peakmem_tracker service in all gates yesterday, to help with memory consumption / oom-killer debugging: https://review.openstack.org/#/c/434511/
16:11:27 <ihrachys> since the first patch renames peakmem_tracker, this patch also enables the service with the new name: https://review.openstack.org/434474
16:11:43 <dasm> ihrachys: according to johndperkins, kibana isn't showing oom-killer issues since Feb 23
16:11:51 <ihrachys> I hope that will give us an answer if any process blocks huge chunks of memory from swapping
16:12:08 <ihrachys> dasm: oh nice. do we still see libvirtd crashes?
16:12:33 <dasm> i don't know. i didn't look into that
16:13:23 <ihrachys> there is an etherpad tracking oom-killer from infra side: https://etherpad.openstack.org/p/OOM_Taskforce
16:13:52 <clarkb> yes I think libvirt crashes are still happening
16:13:53 <dasm> although, lack of oom-killer problems could be connected to lesser usage of infra, during ptg.
16:14:10 <clarkb> dims had one pulled up yesterday
16:14:50 * electrocucaracha wonders if <beslemon> discovered something
16:15:00 <ihrachys> clarkb: could it be indeed correlated with the lower utilization of the cloud as dasm suggests?
16:15:20 <dims> clarkb : i was looking through libvirt logs for 2nd and 3rd items in the rechecks list - http://status.openstack.org/elastic-recheck/ - not much luck
16:15:24 <clarkb> ihrachys: maybe? the other angle people were looking at was OOMs seem to happen more often on Rax which is Xen based os potentially related to that
16:16:04 <ihrachys> clarkb: I assume that would require some close work with the owners of the cloud. do we have those relationships?
16:16:42 <clarkb> ihrachys: for rackspace johnthetubaguy is probably a good contact particularly for compute related things?
16:16:57 <electrocucaracha> clarkb: the other thing that beslemon mentioned me was the recent update of their centos images
16:17:16 <ihrachys> electrocucaracha: who is beslemon, I probably miss some context.
16:17:26 <clarkb> infra runs its own images too that should update daily
16:17:45 <electrocucaracha> ihrachys: she is racker (perf engineer of the RPC)
16:18:22 <electrocucaracha> ihrachys: she was helping us(me, dasm, johndperkins)  last week to discover something
16:18:37 <ihrachys> ok. if she works on any of that, it could make sense for her to join the meeting and update on her findings.
16:19:21 <electrocucaracha> ihrachys: well, she only has a limited time for that assignment but I'm going to tell her to include her findings
16:19:45 <ihrachys> in related news, several projects notice instability in their scenario jobs because of broken (or disabled) nested virtualization in some clouds. I heard that from octavia.
16:20:00 <ihrachys> neutron also experienced some slowdown and timeouts for some jobs
16:20:16 <ihrachys> dumping cpu flags of the allocated nodes could give some clue: https://review.openstack.org/#/c/433949/
16:20:54 <ihrachys> electrocucaracha: ack. at least an email or smth, otherwise we work in isolation and don't benefit from multiple eyes looking at the same thing from the same angle.
16:21:14 <electrocucaracha> ihrachys: +1
16:21:45 <ihrachys> ok these were all action items from the previous meeting, moving on
16:21:58 <ihrachys> #topic PTG update
16:22:08 <ihrachys> some CI matters were covered during the PTG the prev week
16:22:48 <ihrachys> some points were captured in the etherpad: https://etherpad.openstack.org/p/neutron-ptg-pike-final line 15-38
16:23:11 <ihrachys> some things were also captured by kevinbenton in his report email: http://lists.openstack.org/pipermail/openstack-dev/2017-February/113032.html
16:23:34 <ihrachys> we will need to follow up on those, so that at least tasks from our CI scope don't slip thru cracks
16:23:49 <ihrachys> I will do that this week, and we will run through the items next week
16:24:07 <ihrachys> #action ihrachys to follow up on PTG working items related to CI and present next week
16:24:40 <ihrachys> tl;dr there is a lot of specific work items on gate stability and also reshaping gate (removing jobs, adding new ...)
16:25:23 <ihrachys> any specific questions about PTG?
16:25:55 <ihrachys> (we will discuss it in detail next week, but if you have anything time sensitive)
16:26:40 <ihrachys> ok moving on
16:26:48 <ihrachys> #topic Known gate issues
16:26:59 <ihrachys> #link https://goo.gl/8vigPl Open bugs
16:27:10 <ihrachys> #link https://bugs.launchpad.net/neutron/+bug/1627106 ovsdb native timeouts
16:27:10 <openstack> Launchpad bug 1627106 in neutron "TimeoutException while executing tests adding bridge using OVSDB native" [Critical,Triaged] - Assigned to Miguel Angel Ajo (mangelajo)
16:27:20 <ihrachys> this did not get much progress per se
16:27:36 <ihrachys> but there are some developments that should help us to isolate some of its impact
16:27:57 <ihrachys> specifically, otherwiseguy is working on splitting the ovsdb code into a separate project: https://review.openstack.org/#/c/438080/
16:28:12 <ihrachys> at which point some of unstable tests that cover the code will move into this new repo
16:28:26 <ihrachys> which will offload some of the impact from neutron tree into the new tree
16:28:38 <ihrachys> and hopefully will reduce impact on integrated gate.
16:28:58 <ihrachys> some may say it's just a shift of responsibility. it indeed is.
16:29:28 <ihrachys> otherwiseguy also had plans to work on a native eventlet state machine for the library once we get initial integration of it with neutron.
16:29:58 <ihrachys> he is hopeful replacing existing solution that is based on native threads with something more integrated with eventlet may squash some bugs that we experience.
16:30:03 <ihrachys> only time will tell.
16:30:36 <ihrachys> for other bugs in the list, I gotta walk thru them and see if they need some love
16:30:53 <ihrachys> #action ihrachys to walk thru list of open gate failure bugs and give them love
16:31:24 <ihrachys> any other known gate failure that would benefit from the discussion here?
16:32:48 <ihrachys> ok let's move on
16:32:55 <ihrachys> #topic Gate hook rework
16:33:19 <ihrachys> some of you may have noticed the gate breakage that hit us on Friday due to the change in devstack-gate in regards to local.conf handling
16:33:39 <ihrachys> I think the breaking change was https://review.openstack.org/#/c/430857/
16:33:56 <ihrachys> this was fixed with https://review.openstack.org/#/q/Ibe640a584add3acc89520a2bbb25b6f4c5818e1b,n,z
16:34:21 <ihrachys> though Sean Dague then raised a point that the way we use devstack-gate in our gate hook is not sustainable and may break in the future
16:34:36 <ihrachys> there is a WIP patch to fix that here: https://review.openstack.org/#/c/438682/
16:34:40 <ihrachys> we will need to backport it too
16:35:08 <ihrachys> I suspect that other repos that were affected by the initial d-g change and patched that to pass gate may also need to follow up with a better fix
16:35:27 <ihrachys> I believe armax was going to at least assess the impact on other stadium repos
16:35:45 <ihrachys> #action armax to assess impact of d-g change on stadium gate hooks
16:35:58 <ihrachys> I will shape the Sean's patch today to make it ready to merge
16:36:17 <ihrachys> #action ihrachys to prepare https://review.openstack.org/#/c/438682/ for merge, then backport
16:36:37 <ihrachys> stadium projects are advised to take another look at their gate setup
16:37:02 <ihrachys> go ask me or sdague about details if you are lost
16:37:22 <ihrachys> #topic lib/neutron for devstack-gate
16:37:46 <ihrachys> One final thing, a heads-up that several folks work on switching gate to lib/neutron (from lib/neutron-legacy)
16:38:00 <ihrachys> the result can be seen in https://review.openstack.org/436798 and the list of dependent patches
16:38:48 <ihrachys> once the patches are in better shape and in, we may need to do some more validation work with gates for other projects to make sure it won't break anything
16:39:09 <ihrachys> #topic Open Discussion
16:39:25 <ihrachys> anything anyone?
16:39:44 <ihrachys> I will merely note that the patch that disables ovs compilation for functional job is in the gate: https://review.openstack.org/437041
16:40:48 <ihrachys> electrocucaracha: any updates on memory consumption tracking work you looked at a while ago?
16:42:50 <ihrachys> ok I believe we lost him :)
16:42:59 <ihrachys> ok folks thanks for joining and bearing with me
16:43:05 <mlavalle> thanks1
16:43:07 <manjeets> thanks
16:43:08 <ihrachys> I hope that next meetings will be more active and more populated :)
16:43:11 <ihrachys> #endmeeting