#openstack-meeting log

16:00:18 <slaweq> #startmeeting neutron_ci
16:00:19 <openstack> Meeting started Tue Nov 27 16:00:18 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:20 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:22 <slaweq> hi
16:00:23 <openstack> The meeting name has been set to 'neutron_ci'
16:00:38 <bcafarel> o/
16:01:58 <slaweq> let's wait few minutes for  the others
16:02:06 <hongbin> o/
16:02:12 <slaweq> I pinged them on openstack-neutron channel
16:02:17 <njohnston> o/
16:03:48 <mlavalle> sorry, I thought it was 1 hour from now
16:04:01 <mlavalle> not used to winter time yet
16:04:07 <slaweq> :)
16:04:09 <bcafarel> :)
16:04:14 <slaweq> ok, so lets start
16:04:21 <slaweq> #topic Actions from previous meetings
16:04:32 <slaweq> mlavalle to continue tracking not reachable FIP in trunk tests
16:04:38 <mlavalle> yes
16:04:56 <mlavalle> that entails merging https://review.openstack.org/#/c/618750
16:05:04 <slaweq> this was added only to not forget about it IIRC, as we first want to get my patch merged
16:05:09 <mlavalle> (I just approved it)
16:05:09 <slaweq> right mlavalle :)
16:05:13 <slaweq> thx
16:05:24 <mlavalle> and then looking at the effects for some days
16:05:29 <slaweq> so I will add this action for next week too to remember it, ok?
16:05:33 <mlavalle> yes
16:05:37 <slaweq> #action mlavalle to continue tracking not reachable FIP in trunk tests
16:05:39 <slaweq> thx
16:05:44 <slaweq> that was quick :)
16:05:52 <slaweq> next one
16:05:54 <slaweq> slaweq to check which experimental jobs can be removed
16:05:55 <mlavalle> I was actually going to ping you....
16:06:10 <slaweq> why?
16:06:36 <mlavalle> do you have a pointer to a traceback of the failute that the patch^^ is supposed to fix
16:06:40 <mlavalle> ?
16:06:52 <slaweq> sure
16:06:59 <slaweq> there is a lot of such issues recently
16:07:25 <mlavalle> I want to see if I find it in the trunk test failure
16:07:34 <slaweq> e.g. http://logs.openstack.org/23/619923/2/check/neutron-tempest-dvr-ha-multinode-full/e356b9a/logs/subnode-2/screen-q-l3.txt.gz?level=ERROR
16:07:51 <slaweq> and it happens in different tests, not only in trunk
16:07:59 <mlavalle> exactly, that is what I was looking for
16:08:00 <slaweq> it's general problem with FIP connectivity
16:08:16 <mlavalle> thanks
16:08:20 <slaweq> yw
16:08:44 <slaweq> ok, so going back to not needed experimental jobs
16:08:56 <haleyb> slaweq: sorry, i need to help a repair man here, just assign me some tasks :)
16:08:56 <slaweq> I did patch to remove some of them: https://review.openstack.org/619719
16:09:10 <slaweq> haleyb: sure - that we can do definitelly
16:09:12 <mlavalle> we are all repair men here
16:09:16 <mlavalle> haleyb: ^^^^
16:09:19 <slaweq> #action haleyb takes all this week :D
16:09:27 <njohnston> lol
16:10:18 <slaweq> mlavalle: please take a look at this patch https://review.openstack.org/619719 - it has already +2 from haleyb
16:10:38 <mlavalle> slaweq: added to the pile
16:10:42 <slaweq> mlavalle: thx
16:10:50 <slaweq> ok, moving on
16:10:58 <slaweq> next one was: slaweq to start migrating neutron CI jobs to zuul v3 syntax
16:11:10 <slaweq> I opened bug for that https://bugs.launchpad.net/neutron/+bug/1804844
16:11:10 <openstack> Launchpad bug 1804844 in neutron "CI jobs definitions should be migrated to Zuul v3 syntax" [Low,Confirmed] - Assigned to Slawek Kaplonski (slaweq)
16:11:22 <slaweq> And I pushed first patch for functional tests but it’s WIP now: https://review.openstack.org/#/c/619742/
16:11:45 <njohnston> thanks for working on that
16:11:54 <slaweq> so if someone wants to work on migration for some job, please feel free to do it and push patch related to this bug
16:12:19 <slaweq> it is in fact a lot of patches to do but I though that one bug to track them all will be enough
16:12:44 <mlavalle> good idea
16:13:47 <slaweq> ok, next one
16:13:48 <slaweq> njohnston to switch neutron to use integrated-gate-py35 with grenade-py3 job instead of our neutron-grenade job
16:14:45 <slaweq> njohnston: any update on this one?
16:14:59 <njohnston> So the grenade-py3 job is already in check and gate queue.  I am watching it for a few runs
16:15:35 <njohnston> Just for due diligence, then I'll push up a. change to disable neutron-grenade.
16:16:27 <slaweq> where it is in gate alreade?
16:16:30 <slaweq> *already
16:16:37 <slaweq> I don't see it
16:16:42 <njohnston> it is inherited from one of the templates we include
16:17:17 <njohnston> but if you look at any neutron job in zuul.openstack.org you'll see grenade-py3
16:18:30 <slaweq> ahh, ok
16:18:33 <slaweq> I see it now
16:18:55 <slaweq> so we only need to remove neutron-grenade job now and we will be done with this, right?
16:19:00 <njohnston> yep!
16:19:07 <slaweq> good
16:19:12 <slaweq> will You do it this week?
16:19:40 <njohnston> I should have the change up within the hour
16:19:45 <slaweq> #action njohnston to remove neutron-grenade job from neutron's CI queues
16:19:47 <slaweq> thx njohnston
16:19:51 <njohnston> just waiting for the job I am watching to finish
16:20:00 <slaweq> ok
16:20:07 <slaweq> so lets move on to the next onw
16:20:09 <slaweq> *one
16:20:12 <slaweq> slaweq to check bug 1798475
16:20:13 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475
16:20:23 <slaweq> I sent patch to store all journal logs in fullstack results: https://review.openstack.org/#/c/619935/
16:20:34 <slaweq> I hope this will help to debug this issue as we will be able to see what is keepalived doing then.
16:20:45 <mlavalle> I'll review it today
16:21:27 <slaweq> in the future when jobs will be migrated to zuulv3 format I think this can be added as role and added to all jobs as it can be helpful with some keepalived or dnsmasq logs
16:21:28 <njohnston> it's a great idea regardless
16:21:45 <mlavalle> yeap
16:21:45 <slaweq> but for now I want it only in fullstack job as first step
16:22:41 <slaweq> #action slaweq to continue debugging bug 1798475 when journal log will be available in fullstack tests
16:22:43 <openstack> bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed] https://launchpad.net/bugs/1798475
16:22:54 <slaweq> ok, lets move on
16:23:02 <slaweq> slaweq to check why db_migration functional tests don't have logs
16:23:09 <slaweq> patch https://review.openstack.org/619266
16:23:20 <slaweq> it's merged already
16:23:49 <slaweq> so now we should have logs from all functional tests in job results
16:24:04 <slaweq> next one was:
16:24:07 <slaweq> njohnston to remove neutron-fullstack-python36 from grafana dashboard
16:24:46 <njohnston> One side note on the removal of the neutron-grenade job; that job is actually in the check and gate queue for the grenade project so I'll push a change in grenade to remove those first, and use a Depends-On to make sure that goes through before the neutron change
16:25:21 <njohnston> Regarding neutron-fullstack-python36, I remember adding it, but when I went to project-config I could find no reference to it.  So that is a no-op.
16:26:08 <slaweq> ahh, that's good
16:26:10 <slaweq> so it's done :)
16:26:15 <slaweq> thx njohnston for checking it
16:26:32 <slaweq> ok, so that was all actions for today
16:26:39 <mlavalle> fwiw
16:27:07 <slaweq> anything else to add or can we move on?
16:27:23 <mlavalle> nothing from me
16:27:39 <slaweq> ok, so next topic then
16:27:40 <slaweq> #topic Python 3
16:27:51 <slaweq> njohnston: bcafarel any updates from You?
16:28:29 <bcafarel> from next week not much I think
16:28:31 <bcafarel> *previous
16:28:46 <bcafarel> slaweq: except someone digging into functional tests for py3
16:29:15 <slaweq> ok, about this functional tests it is real problem
16:29:31 <njohnston> nothing from me because of PTO
16:29:55 <slaweq> I pushed today some DNM patch to test those tests with less output: https://review.openstack.org/#/c/620271/
16:30:01 <slaweq> and indeed it was better
16:30:05 <slaweq> but not perfect
16:30:38 <slaweq> I also talked with mtreinish about it and he told me that it's know issue with stestr and too much output from tests
16:30:50 <bcafarel> :/
16:30:56 <njohnston> :-[
16:31:20 <slaweq> so based on his comments I think that only workaround for this is to make somehow that our tests will produce less on stdout/stderr
16:31:50 <slaweq> also in my DNM patch I had 3 tests failing: http://logs.openstack.org/71/620271/2/check/neutron-functional/a7fd8ea/logs/testr_results.html.gz
16:32:03 <slaweq> it looks for me that it's related to issue with SIGHUP
16:32:20 <slaweq> so I'm not sure if we shouldn't skip/mark as usnstable those tests for now
16:33:35 <slaweq> I will try once again this DNM patch but with those 3 tests marked as unstable to check how it will be then
16:33:42 <slaweq> and we will see then
16:34:24 <slaweq> if anyone has some idea how to fix/workaround this problem, that would be great
16:34:44 <slaweq> patch to switch functional tests to py3 is here: https://review.openstack.org/#/c/577383/
16:34:46 <bcafarel> sounds good, we do have https://bugs.launchpad.net/neutron/+bug/1780139 open for the SIGHUP issue
16:34:47 <openstack> Launchpad bug 1780139 in neutron "Sending SIGHUP to neutron-server process causes it to hang" [Undecided,Triaged] - Assigned to Bernard Cafarelli (bcafarel)
16:36:32 <slaweq> so thats all from me about py3
16:36:51 <slaweq> njohnston: do You know how many other jobs we still should switch to py3?
16:37:34 <bcafarel> slaweq: maybe worth going through https://bugs.launchpad.net/cinder/+bug/1728640 and see if we can grab some ideas, like this "Make test logging setup fixture disable future setup"
16:37:35 <openstack> Launchpad bug 1728640 in Cinder "py35 unit test subunit.parser failures" [Critical,Fix released] - Assigned to Sean McGinnis (sean-mcginnis)
16:38:11 <slaweq> yes, that is very similar issue to what we have with functional tests now :)
16:38:26 <slaweq> I will check that this week
16:38:39 <njohnston> I believe the multinode grenade jobs still need to be switched, at a minimum; grenade-py3 does not relieve us of those sadly
16:38:45 <njohnston> I'll have to check the etherpad
16:38:46 <slaweq> #action slaweq to continue fixing funtional-py3 tests
16:39:02 <slaweq> ok, thx njohnston
16:39:06 <njohnston> #action njohnston to research py3 conversion for neutron grenade multinode jobs
16:39:15 <slaweq> I will also check neutron-tempest-plugin jobs then
16:39:31 <slaweq> #action slaweq to convert neutron-tempest-plugin jobs to py3
16:40:17 <slaweq> ok, can we go on to the next topic then?
16:40:24 <mlavalle> I think so
16:40:26 <njohnston> go ahead
16:40:30 <slaweq> #topic Grafana
16:40:37 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:41:55 <slaweq> gate queue wasn't busy last week as there was not too many people with +2 power available :)
16:42:20 <mlavalle> yeap
16:42:38 <slaweq> We have Neutron-tempest-dvr-ha-multinode-full and Neutron-tempest-plugin-dvr-multinode-scenario failing on 100% again
16:43:10 <slaweq> but from what I was checking it's very often this issue with snat namespace, which should be fixed by https://review.openstack.org/#/c/618750/
16:43:19 <slaweq> so we should be better next week I hope
16:43:59 <slaweq> From other things, I spotted again couple of issues with cinder backups, like:
16:44:01 <slaweq> http://logs.openstack.org/64/617364/19/check/tempest-slow/18519dc/testr_results.html.gz
16:44:03 <mlavalle> yeah, let's track the effect of that
16:44:03 <slaweq> http://logs.openstack.org/87/609587/11/check/tempest-multinode-full/2a5c5a1/testr_results.html.gz
16:44:21 <slaweq> I will report this as a cinder bug today
16:44:56 <mlavalle> slaweq: and I know I have an email from you with cinder failures
16:45:07 <mlavalle> I will talk to Jay and Sean this week
16:45:38 <slaweq> from other things, we still have from time to time failures in functional tests (db-migrations timeout) and fullstack tests (this issue with keepalived mostly) and I'm trying to find out what is going on with both of them
16:45:49 <slaweq> thx mlavalle :)
16:46:07 <slaweq> one more thing related to grafana
16:46:09 <slaweq> We should add to grafana 2 new jobs:
16:46:11 <slaweq> networking-ovn-tempest-dsvm-ovs-release
16:46:13 <slaweq> Tempest-slow
16:46:17 <slaweq> any volunteer for that? :)
16:46:40 <njohnston> sure
16:46:45 <slaweq> thx njohnston :)
16:47:00 <njohnston> #action njohnston add tempest-slow and networking-ovn-tempest-dsvm-ovs-release to grafana
16:47:23 <slaweq> ok, lets move on then
16:47:29 <slaweq> #topic Tempest/Scenario
16:47:52 <slaweq> I today found out that we have job neutron-tempest-dvr in our queue
16:48:00 <slaweq> and it looks that it is single node dvr job
16:48:13 <slaweq> is it intentional? do we want to keep it like that?
16:48:33 <slaweq> It looks the same as neutron-tempest-dvr-ha-multinode-full job in fact
16:48:47 <njohnston> ISTR some discussion about this a long time ago, like in the newton timeframe
16:48:48 <slaweq> only difference is that this multinode job is non-voting
16:49:05 <njohnston> I think the goal was for the multinode job to end up being the voting one
16:49:21 <mlavalle> yes, I think I have the same recollection
16:49:32 <mlavalle> we can discuss in the L3 meeting
16:49:36 <njohnston> +1
16:49:42 <slaweq> njohnston: that is not possible to have multinode job voting now ;)
16:50:00 <slaweq> ok, mlavalle please then add this to L3 meeting agenda if You can
16:50:05 <hongbin> does the multinode job stable enough?
16:50:06 <mlavalle> yes
16:50:15 <mlavalle> hongbin: not even close
16:50:32 <slaweq> #action mlavalle to discuss about neutron-tempest-dvr job in L3 meeting
16:50:43 <slaweq> hongbin: it depends what You mean by stable
16:51:00 <slaweq> it's very stable now as it is on 100% of failures all the time :P
16:51:26 <hongbin> slaweq: if it doesn't block the merging too much after turning into voting, then it is fine
16:52:07 <slaweq> hongbin: it will block everything currently but I agree that we should focus on stabilize it
16:52:21 <slaweq> and we are working on it since some time
16:52:36 <hongbin> ack
16:53:03 <slaweq> ok, lets move on then
16:53:05 <slaweq> #topic Periodic
16:53:23 <slaweq> I just want to mention that we still have neutron-tempest-postgres-full failing all the time
16:53:29 <slaweq> but it's nova issue
16:53:35 <slaweq> bug reported: https://bugs.launchpad.net/nova/+bug/1804271
16:53:38 <openstack> Launchpad bug 1804271 in OpenStack Compute (nova) "nova-api is broken in postgresql jobs" [High,In progress] - Assigned to Matt Riedemann (mriedem)
16:53:41 <slaweq> Fix in progress: https://review.openstack.org/#/c/619061/
16:53:53 <slaweq> so we should be good when this will be merged
16:54:05 <mriedem> slaweq: here is a tip,
16:54:17 <mriedem> show up in the nova channel and ask that another core look at that already +2ed fix for the postgres job
16:54:33 <mriedem> i would, but i've already spent some review request karma today
16:54:41 <slaweq> mriedem: ok, I will :)
16:54:44 <slaweq> thx
16:55:31 <slaweq> last topic then
16:55:33 <slaweq> #topic Open discussion
16:55:46 <slaweq> anyone wants to discuss about anything?
16:56:00 <hongbin> i have one
16:56:08 <slaweq> go on hongbin
16:56:27 <hongbin> i don't like the long list of extensions in zuul job, so i propose a patch: https://review.openstack.org/#/c/619642/
16:56:46 <hongbin> i want to know if this is what you guys prefer to do?
16:57:02 <hongbin> or it is not a good idea
16:57:25 <slaweq> yes, IMO that it easier to read in diff
16:57:30 <bcafarel> it certainly better fits the screen
16:58:04 <njohnston> Would it be possible to use reusable snippets like we do with *tempest-irrelevant-files now?
16:58:14 <hongbin> yes, it possibly will fix the frequeent merge conflict between patches
16:58:38 <slaweq> hongbin: njohnston: great ideas
16:58:59 <hongbin> njohnston: i am not sure, because the list of extensions look different between jobs
16:59:08 <slaweq> hongbin: not all jobs
16:59:23 <slaweq> You can define snippet "per branch" and reuse them if necessary
16:59:33 <slaweq> at least for master branch it should be fine
16:59:52 <hongbin> yes, we can possibly consolidate the stable branch list
16:59:58 <hongbin> i will look into that
17:00:16 <slaweq> ok, we have to finish now
17:00:20 <slaweq> thx for attending
17:00:23 <slaweq> #endmeeting