#openstack-meeting log

16:00:36 <slaweq> #startmeeting neutron_ci
16:00:37 <openstack> Meeting started Tue Oct 23 16:00:36 2018 UTC and is due to finish in 60 minutes.  The chair is slaweq. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:38 <slaweq> hi
16:00:39 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
16:00:42 <openstack> The meeting name has been set to 'neutron_ci'
16:00:44 <mlavalle> o/
16:00:57 <ralonsoh> o/
16:01:42 <slaweq> #topic Actions from previous meetings
16:01:48 <slaweq> lets start
16:01:56 <slaweq> slaweq to continue checking how jobs will run on Bionic nodes
16:02:08 <slaweq> I didn't have time to work on this during last week
16:02:15 <slaweq> sorry for that
16:02:30 <slaweq> I will assign it to my for next week also
16:02:35 <slaweq> #action slaweq to continue checking how jobs will run on Bionic nodes
16:02:51 <slaweq> next one:
16:02:53 <slaweq> mlavalle to continue debugging issue with not reachable FIP in scenario jobs
16:03:07 <mlavalle> I went over a coouple more failures
16:03:14 <mlavalle> and left notes in the bug
16:03:32 <mlavalle> I created an enviroment to test it loclly
16:03:36 <mlavalle> locally
16:03:37 <njohnston> o/
16:03:46 <mlavalle> so I am running it
16:03:54 <mlavalle> in my local environment
16:04:05 <slaweq> and was You able to reproduce it locally?
16:04:12 <mlavalle> not yet
16:04:20 <mlavalle> I'll try today and tomorrow
16:04:43 <mlavalle> if I don't succeed then I will try to debug in zuul
16:04:53 <mlavalle> following slaweq's recipe
16:05:10 <slaweq> ok, thx for update and for working on this mlavalle
16:05:24 <slaweq> #action mlavalle to continue debugging issue with not reachable FIP in scenario jobs
16:05:34 <slaweq> ok, next one is
16:05:35 <slaweq> * mlavalle to send an email about moving tempest plugins from stadium to separate repo
16:05:42 <mlavalle> I did earlier today
16:06:05 <mlavalle> http://lists.openstack.org/pipermail/openstack-dev/2018-October/135977.html
16:06:40 <slaweq> thx mlavalle
16:07:07 <slaweq> lets wait for response from stadium projects now
16:07:16 <slaweq> next one was:
16:07:18 <slaweq> * slaweq to report a bug about issue with process ending in fullstack tests
16:07:24 <slaweq> Done: https://bugs.launchpad.net/neutron/+bug/1798472
16:07:24 <openstack> Launchpad bug 1798472 in neutron "Fullstack tests fails because process is not killed properly" [High,Confirmed]
16:07:45 <slaweq> I only reported it but I didn't have possibility to work on this
16:08:00 <slaweq> so it's free to take if someone wants :)
16:08:22 <slaweq> it's maybe not "low-hanging-fruit" but may be interesting to debug ;)
16:08:30 <mlavalle> I might take it after I finsih debugging the previous to do that I have
16:08:47 <slaweq> thx mlavalle
16:09:01 <slaweq> ok, next one:
16:09:03 <slaweq> * haleyb to check issue with failing FIP transition to down state
16:09:14 <slaweq> I think haleyb is not around today
16:09:19 <haleyb> i just got here
16:09:26 <slaweq> ohh, hi haleyb :)
16:09:59 <haleyb> i looked into it, test seems good still, don't know why associated port gets into BUILD state, still looking
16:10:52 <slaweq> so it's port to which FIP is associated is in build state?
16:11:19 <haleyb> right, so fip stays in ACTIVE (from my memory)
16:12:22 <haleyb> so i'll continue the fight
16:13:58 <slaweq> can it be maybe related to https://review.openstack.org/#/c/606827/ somehow?
16:14:43 <slaweq> there is info that port may stuck in DOWN state but I don't know, maybe it's something similar?
16:15:10 <haleyb> i'll look, but in the original issue there is no live migration, it just removes the device from the VM
16:15:21 <slaweq> ahh, ok
16:15:28 <slaweq> so maybe that other issue then
16:15:58 <slaweq> from what I know port is in BUILD if L2 agent will not send info that port is ACTIVE (or DOWN)
16:16:12 <slaweq> so You should check in L2 agent's logs if port was wired properly
16:16:23 <slaweq> and dig from there IMHO
16:16:36 <haleyb> sure, will look there
16:16:42 <haleyb> thanks!
16:17:02 <slaweq> ok, move on then
16:17:06 <slaweq> next was:
16:17:09 <slaweq> slaweq to add e-r query for know fullstack issue (when bug will be reported)
16:17:14 <slaweq> Done: https://review.openstack.org/#/c/611529/
16:17:20 <slaweq> patch is merged already
16:18:03 <slaweq> and that's all for actions from previous week
16:18:15 <slaweq> anyone wants to add anything?
16:18:29 <mlavalle> not me
16:18:50 <slaweq> ok, lets move on then
16:18:57 <slaweq> #topic Grafana
16:19:04 <slaweq> #link http://grafana.openstack.org/dashboard/db/neutron-failure-rate
16:20:58 <slaweq> there was some spike during the weekend on some jobs, but as it was similar on man different jobs I suspect it's related to some infra issue maybe
16:21:14 <slaweq> I don't know about anything on our side which could cause such issue
16:21:57 <slaweq> anything specific You want to discuss now?
16:22:06 <mlavalle> no, I'm good
16:22:17 <slaweq> or can we go to talk about different jobs as usually?
16:22:28 <mlavalle> yes, let's do that
16:22:55 <slaweq> #topic fullstack/functional
16:23:13 <slaweq> speaing about fullstack, I found 2 issues recently
16:23:27 <slaweq> one was mentioned before: * https://bugs.launchpad.net/neutron/+bug/1798472
16:23:28 <openstack> Launchpad bug 1798472 in neutron "Fullstack tests fails because process is not killed properly" [High,Confirmed]
16:23:35 <slaweq> and second is * https://bugs.launchpad.net/neutron/+bug/1798475
16:23:36 <openstack> Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed]
16:24:09 <slaweq> where looks that restart of L3 agents with ha routers again cause some packet loss sometimes
16:24:12 <mlavalle> They are in haleyb's bug report
16:24:26 <slaweq> mlavalle: yes, both are there
16:26:17 <slaweq> currently it's not hitting as very often but if this second issue will be very annoying we can mark this test as unstable
16:26:29 <slaweq> but I would like first to try to fix it maybe :)
16:26:57 <slaweq> and speaking about functional tests, we also have still some issues there
16:27:36 <slaweq> I still saw sometimes that db migration tests are still failing, even with timeout set to 300 seconds
16:28:01 <slaweq> I checked in logs that those tests can take even up to 400 seconds (I found such numbers at least)
16:28:18 <slaweq> so I proposed another patch: https://review.openstack.org/#/c/612505/
16:28:42 <slaweq> I know that it's not the best "solution" but I don't think we can do anything else with it :/
16:28:57 <slaweq> so please check that patch
16:29:09 <mlavalle> I just +2ed it
16:29:13 <slaweq> thx mlavalle
16:29:24 <mlavalle> and I will push it when zuul returns green
16:29:28 <slaweq> and second issue which I found from time to time is failing test: neutron.tests.functional.agent.l3.test_ha_router.LinuxBridgeL3HATestCase.test_ha_router_namespace_has_ipv6_forwarding_disabled
16:29:54 <slaweq> like e.g. in http://logs.openstack.org/37/610737/2/gate/neutron-functional/286402b/logs/testr_results.html.gz
16:30:27 <slaweq> I think that it may be related to failing fullstack test: https://bugs.launchpad.net/neutron/+bug/1798475 but that should be checked in logs first
16:30:27 <openstack> Launchpad bug 1798475 in neutron "Fullstack test test_ha_router_restart_agents_no_packet_lost failing" [High,Confirmed]
16:30:44 <slaweq> and I will check that this week
16:31:12 <slaweq> #action slaweq to check if failing test_ha_router_namespace_has_ipv6_forwarding_disabled is related to bug https://bugs.launchpad.net/neutron/+bug/1798475
16:31:50 <slaweq> and that's all about fullstack/functional for today on my side
16:31:55 <slaweq> do You want to add something?
16:31:58 <mlavalle> +1
16:32:11 <mlavalle> no
16:32:46 <slaweq> ok, so next topic
16:32:48 <slaweq> #topic Tempest/Scenario
16:32:58 <slaweq> I was looking at tempest jobs this week
16:33:30 <slaweq> and I found out that we have some timeouts reached for neutron-tempest-xxx jobs from time to time
16:33:40 <slaweq> it hits jobs like:
16:33:48 <slaweq> neutron-tempest-iptables_hybrid
16:33:54 <slaweq> neutron-tempest-linuxbridge
16:34:02 <slaweq> and
16:34:04 <slaweq> neutron-tempest-dvr
16:34:19 <slaweq> all of them has got timeout configured for 7800 seconds
16:34:32 <slaweq> maybe we should increase it?
16:34:53 <mlavalle> we can give it a try
16:35:08 <slaweq> I can imagine that sometimes job can go to cloud node which is under heavy load and tests runs slower there
16:35:22 <slaweq> so I think it's better to wait e.g. 3h for results then recheck
16:35:51 <slaweq> and from what I was checking neutron related tests aren't in top of longest ones in those jobs :)
16:36:34 <slaweq> so mlavalle, would it be good to add 1h to those timeouts?
16:36:44 <mlavalle> yes
16:36:50 <slaweq> ok, I will do that
16:36:52 <njohnston> +1
16:37:02 <slaweq> #action slaweq to increase neutron-tempest jobs timeouts
16:37:16 <slaweq> I hope it will make jobs at least a bit more stable
16:37:40 <slaweq> that's all from me about scenario/tempest jobs for this week
16:37:46 <slaweq> do You have anything to add?
16:37:49 <mlavalle> nope
16:38:07 <slaweq> so last topic for today is
16:38:09 <slaweq> #topic Open discussion
16:38:22 <slaweq> and I wanted to ask here about few things
16:38:42 <slaweq> first of all: we have currently 2 quite big topics related to CI jobs:
16:38:57 <slaweq> running jobs with python 3
16:38:59 <slaweq> and second
16:39:08 <slaweq> use of Bionic instead of Xenial
16:39:30 <njohnston> indeed
16:39:32 <slaweq> what do You think to add to agenda of this meeting topics about those two things?
16:39:42 <slaweq> to track progress on both of them weekly?
16:39:50 <njohnston> sure, I think that is a good idea
16:39:54 <mlavalle> that's a good proposal
16:40:35 <slaweq> ok, so I will include it in next meeting then, thx :)
16:40:42 <slaweq> and also I have one more question
16:40:48 <slaweq> related to patch https://review.openstack.org/#/c/573933/
16:40:59 <slaweq> it's waiting for review since long time
16:41:17 <slaweq> but I'm not sure if we really want to maintain something like that in our repo
16:41:26 <slaweq> so I wanted to ask You about opinions about it
16:41:57 <slaweq> it You didn't saw it yet, please add it to Your review list and check it this week maybe
16:41:58 <mlavalle> I'll look at it slowly and leave my comments
16:42:05 <slaweq> thx mlavalle
16:42:26 <njohnston> Since it's meant for developers to run by hand, it seems to me it's more of a dev tool than a CI tool at least at this point
16:42:38 <njohnston> I'll have to run it and see what he's trying to do
16:42:56 <slaweq> njohnston: yes, for me it's some kind of UT framework for bash
16:44:01 <slaweq> and I'm not sure it is something what should land in our repo but I would like to know others' opinion about it :)
16:44:29 <slaweq> ok, and that's all from my side for today
16:44:39 <mlavalle> cool
16:44:39 <slaweq> anything else You want to discuss?
16:44:43 <mlavalle> not from me
16:44:51 * mlavalle is going to rest for a while
16:44:59 <ralonsoh> get well soon
16:45:13 <slaweq> ok, thx for attending guys
16:45:17 <mlavalle> thanks!
16:45:20 <ralonsoh> bye
16:45:26 <slaweq> and feel better soon mlavalle :)
16:45:32 <mlavalle> thanks!
16:45:33 <slaweq> #endmeeting