15:00:23 <anteaya> #startmeeting third-party
15:00:25 <openstack> Meeting started Mon Mar 30 15:00:23 2015 UTC and is due to finish in 60 minutes.  The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot.
15:00:27 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
15:00:29 <openstack> The meeting name has been set to 'third_party'
15:00:32 <anteaya> hello
15:00:38 <ameade> hey
15:00:48 <anteaya> raise your hand if you are here for the third party meeting
15:00:50 <anteaya> ameade: hello
15:00:59 <anteaya> ameade: how is your work coming along?
15:01:08 <kaisers1> hi
15:01:13 <anteaya> kaisers1: hello
15:01:17 <ameade> anteaya: very well thanks :)
15:01:24 <anteaya> ameade: I'm glad to hear that
15:01:30 <anteaya> ameade: what is your current status?
15:01:37 <patrickeast> hi everyone
15:01:44 <anteaya> patrickeast: hi patrickeast
15:02:47 <ameade> anteaya: have been posting on Cinder for most of our drivers for months now but have been setting up another system for our Fibre Channel stuff which has been interesting
15:03:00 <anteaya> yes, I've been following along
15:03:11 <anteaya> which is why I am giving you priority right now at the meeting
15:03:20 <anteaya> unless you prefer that I don't
15:03:55 <ameade> just hanging out incase I can help other folks :)
15:04:21 <anteaya> oh okay
15:04:26 <anteaya> that is good too
15:04:27 <anteaya> welcome
15:04:33 <anteaya> how is everyone today?
15:05:09 <asselin> hi
15:05:14 <anteaya> hey asselin
15:05:26 <anteaya> does anyone have anything they would like to discuss?
15:06:00 <asselin> ameade, I posted our fc cinder testing notes back a few meetings ago
15:06:06 <asselin> ameade, where you there?
15:06:24 <ameade> asselin: I dont think so
15:07:30 <ameade> asselin: Do you do one job at a time against an FC switch or have you turned zoning off?
15:07:59 <asselin> ameade, we have zoning off
15:08:37 <ameade> gotcha, i'm currently running one at a time with single concurrency, but to scale i'm going to have to turn it off too
15:08:58 <ameade> i do think I found a bug with the synchronization in the brocade code though
15:10:17 <asselin> ameade, please share / submit a bug report. we're planning to test with the zone manager eventually
15:10:37 <ameade> asselin: yeah definitely, i'm going to dig into it once i get this other hardware setup
15:12:27 <anteaya> asselin: thanks for bringing this up
15:12:36 <anteaya> is there more to discuss here?
15:13:35 <anteaya> does anyone have anything else they would like to discuss?
15:13:49 <kaisers1> quick question regarding a failure i have
15:13:54 <kaisers1> pasting, sec
15:13:54 <asselin> ameade, http://eavesdrop.openstack.org/meetings/third_party/2015/third_party.2015-02-23-15.01.log.html
15:14:04 <anteaya> there are no quick questions, just questions
15:14:12 <asselin> #link my fibre channel cinder notes http://eavesdrop.openstack.org/meetings/third_party/2015/third_party.2015-02-23-15.01.log.html
15:14:19 <kaisers1> http://paste.openstack.org/show/197567/
15:14:32 <kaisers1> a failure in the cinder tests,
15:15:02 <anteaya> kaisers1: and let's have a question to go with the paste
15:15:06 <kaisers1> i'm just looking into this. Nova cannot create a floating ip, do i read that correctly?
15:15:14 <kaisers1> sorry, not the fastest typing
15:15:15 <ameade> asselin: perfect, thank you...I am gonna have to automate the PCI passthrough stuff soon
15:15:36 <asselin> kaisers1, this test is failing for us too. I disabled it and didn't look into it yet. test_minimum_basic_scenario
15:15:51 <kaisers1> asselin: oh, ok
15:16:01 <krtaylor> hi everybody, sorry I am late
15:16:11 <asselin> ameade, FYI there could be a better way to do it directly via nova pci passthrough.
15:16:15 <kaisers1> asselin: what kind of disabling do you use? don't want to clash with thingees requirements...
15:16:21 <anteaya> kaisers1: I get one hit returned on that error: http://www.gossamer-threads.com/lists/openstack/dev/23837
15:16:45 <kaisers1> anteaya: thanks for the hint!
15:16:47 <asselin> kaisers1, I use regex exclusions.
15:16:59 <kaisers1> asselin: that allowed after fridays email? :)
15:17:05 <anteaya> kaisers1: so it looks like a timing issue
15:17:10 <kaisers1> s/email/emails/
15:17:25 <anteaya> kaisers1: do you have a link to the email you reference?
15:17:27 <asselin> kaisers1, I'm not too concerned with thingee's requirements. There are bugs that need to get fixed.
15:17:37 <kaisers1> ok, thanks for the feedback :)
15:17:41 <anteaya> kaisers1: it helps if you go to lists.openstack.org and bring one back
15:17:50 <kaisers1> ok, sec
15:17:54 <anteaya> thank you
15:17:57 <rhe00> asselin: I am using your FC passthrough scripts and it is very reliable. I tried the nova flavor method as well but could not get it to work reliable with nodepool.
15:18:20 <rhe00> reliably
15:18:24 <kaisers1> one more thing from last week. The weird ResponseNotReady issue was solved
15:18:28 <asselin> thingee wants to include all 'volume' tests. We disable those that don't work and investigate them outside of CI.
15:18:39 <asselin> rhe00, great to know!
15:18:41 <anteaya> asselin: thanks for clarifying
15:18:58 <anteaya> as some would take your above comment to mean that _they_ don't need to worry about requirements
15:18:59 <kaisers1> asselin: ok, will probably go similar
15:19:03 <anteaya> which is not the case
15:19:22 <asselin> kaisers1, it's nice to know other drivers have issues. Likely means its a real bug in the tempest test or cinder itself.
15:19:48 <kaisers1> asselin: I found a bug in tempest on friday :)
15:19:49 <asselin> kaisers1, and not necessarily the drivers
15:19:56 <asselin> kaisers1, link?
15:20:00 <kaisers1> asselin: that was fixed quickly, sec
15:20:57 <kaisers1> link takes a moment, please continue
15:21:20 <anteaya> I'm waiting on your link for the bug and for the mailing list post
15:21:28 <anteaya> unless someone else can find and post them first
15:21:40 <kaisers1> bug: https://bugs.launchpad.net/tempest/+bug/1437328
15:21:41 <openstack> Launchpad bug 1437328 in tempest "No networks found in fixed_network.py (list index out of range)" [High,Fix released] - Assigned to Matthew Treinish (treinish)
15:21:41 <anteaya> I don't have enough information myself to attempt a search
15:21:55 <anteaya> #link https://bugs.launchpad.net/tempest/+bug/1437328
15:21:57 <asselin> ok, the issue I'm running into is: test_volume_upload is causing the jenkins slave to freeze
15:22:05 <asselin> intermittently
15:22:16 <anteaya> asselin: do you have any artifacts to share?
15:22:52 <asselin> anteaya, all I have is the nova console log error message: "BUG: soft lockup - CPU#1 stuck for 22s"
15:23:09 <anteaya> nice
15:23:14 <ameade> i dont think i've hit these fwiw
15:23:42 <anteaya> asselin: I don't know as we have been seeing that in infra
15:23:53 <anteaya> that error message doesn't ring a bell with me
15:24:01 <anteaya> when did it begin?
15:24:46 <asselin> anteaya, it's been around for a while....we used to exclude that test, but now want to run it (obviously)
15:25:04 <anteaya> that is looking like a linux kernel bug
15:25:14 <anteaya> #link https://bugzilla.redhat.com/show_bug.cgi?id=442920
15:25:18 <openstack> bugzilla.redhat.com bug 442920 in kernel "BUG: soft lockup - CPU#0 stuck for 61s!" [Low,Closed: insufficient_data] - Assigned to kernel-maint
15:25:42 <kaisers1> anteaya: This is the mail link: http://lists.openstack.org/pipermail/openstack-dev/2015-March/059990.html . This goes along with some prior discussion regarding tests should not be skipped via regexp. I'm not sure if that was on irc or the mailing list
15:25:42 <asselin> anteaya, yes likely either on the bare metal or nodepool image.....
15:26:00 <anteaya> #link http://lists.openstack.org/pipermail/openstack-dev/2015-March/059990.html
15:26:09 <anteaya> kaisers1: thank you
15:26:22 <anteaya> asselin: interesting
15:26:32 <asselin> anteaya, I'm trying to upgrade the blade's linux kernel to 3.16 (from 3.13) but then fc passthrough breaks
15:26:42 <anteaya> :(
15:27:10 <anteaya> just for the sake of conversation what happens if you downgrade the kernel
15:27:22 <anteaya> meaning it is just 3.13 that is causing that error
15:27:22 <asselin> anteaya, didn't try
15:27:27 * wznoinsk joins late
15:27:32 <anteaya> it might be a dumb idea
15:28:19 <krtaylor> asselin, unfortunately, that message could be a lot of different problems
15:28:29 <anteaya> asselin: here is a thread on an ubuntu forjm
15:28:32 <asselin> anteaya, it's not a dumb idea. It's possible there's a regression in one of the minor release versions
15:28:33 <anteaya> #link http://ubuntuforums.org/showthread.php?t=1757773
15:28:44 <anteaya> asselin: okay well let me know if you decide to try it
15:29:00 <anteaya> asselin: you are using ubuntu, yes?
15:29:08 <asselin> anteaya, yes
15:29:17 <asselin> krtaylor, you've seen these before?
15:29:33 <anteaya> asselin: the thread seams to be suggesting a bios upgrade
15:29:36 <krtaylor> asselin, yes, early on, not in a while
15:29:47 <anteaya> asselin: I'm not sure how blades work, would a bios upgrade apply?
15:30:15 <asselin> krtaylor, which kernel versions do you use for the bare metal & jenkins slaves?
15:30:29 <asselin> anteaya, perhaps I can check to see if there's an update...
15:30:35 <krtaylor> anteaya, yes, we also saw this with hardware related problems with bringup
15:30:49 <anteaya> asselin: it might be a first step rather than downgrading the kernel
15:31:13 <anteaya> has anyone else seen this issue?
15:31:14 <krtaylor> asselin, we are not doing bare metal (yet), we run in 1st level guests just like upstream infra
15:31:50 <asselin> krtaylor, so who runs the hosts?
15:32:14 <krtaylor> asselin, but on our hosts we run a base F20/F21
15:32:51 <anteaya> asselin: let me know when you're done
15:32:59 <asselin> anteaya, i'm done
15:33:33 <anteaya> asselin: okay
15:33:59 <anteaya> kaisers1: so going back to the mailing list thread, are you one of the people who participated in the conversation?
15:35:01 <anteaya> kaisers1: did we lose you?
15:35:14 <kaisers1> anteaya: sorry, colleague asked something
15:35:27 <anteaya> are you one of the folks who participated in the thread?
15:35:44 <kaisers1> anteaya: I did not write but due to the added tests from friday our ci has two failures
15:35:52 <anteaya> okay
15:35:58 <anteaya> which failures?
15:36:07 <kaisers1> one was the exception i posted earlier
15:36:15 <anteaya> okay and the other?
15:36:29 <kaisers1> a permission issue with runtime snapshots
15:36:41 <anteaya> do you have a stack trace handy or no?
15:36:42 <kaisers1> That may relate to our driver
15:36:46 <anteaya> ah
15:36:55 <kaisers1> that's why i'm looking into it myself
15:36:57 <anteaya> well that is good you are running the tests to find that
15:37:00 <anteaya> okay great
15:37:43 <anteaya> so you have a bout 48 hours to get them fixed, yes?
15:38:18 <kaisers1> that's what i think. Although we're still in trunk, our CI was working ok at the deadline, just the new tests are now creating pressure
15:38:24 <anteaya> yes
15:38:35 <anteaya> and from what I read you have 48 hours
15:38:46 <anteaya> are you in a position to meet that deadline?
15:38:53 <kaisers1> 04.06., bit more than 48hours?
15:38:59 <anteaya> give or take
15:39:05 <kaisers1> ok :)
15:39:09 <anteaya> 5 days before 04/06
15:39:14 <anteaya> which is April 1
15:39:18 <anteaya> read the email
15:39:18 <kaisers1> yep
15:39:32 <kaisers1> yes, i did read that
15:39:32 <anteaya> #info you must have a CI reporting and stable
15:39:34 <anteaya> for five days prior to 4/6.
15:39:39 <anteaya> good
15:39:46 <rhe00> anteaya: I think that is only for drivers that were pulled because they didn't have a working CI at the deadline
15:39:47 <anteaya> so stay in communication
15:39:49 <kaisers1> yes
15:39:59 <anteaya> rhe00: correct
15:40:10 <anteaya> rhe00: that is also how I read that
15:40:10 <kaisers1> that's what i gathered, too. Nevertheless i want the failures be fixed asap anyways
15:40:18 <anteaya> kaisers1: good attitude
15:40:20 <asselin> kaisers1, I think we need to get this clarified at the cinder meeting. Personally I think you should be able to have select exclusions with e.g. bug's submitted to track them.
15:40:30 <anteaya> asselin: good point
15:40:35 <rhe00> asselin: +1
15:40:40 <anteaya> has someone added an agenda item to the cinder meeting yet?
15:40:42 <kaisers1> that would be fine
15:41:13 <anteaya> someone needs to drive it for the conversation to happen
15:42:01 <kaisers1> yep. My thought was first to write a mail for clarification
15:42:13 <kaisers1> Otherwise there will be an agenda bullet added...
15:42:22 <asselin> I can post to the mailing list & ask in cinder channel. Given the deadline, probably best to do it today.
15:42:29 <anteaya> asselin: I agree
15:42:48 <anteaya> asselin: did you want to take an action item on that?
15:42:53 <asselin> sure.
15:43:25 <asselin> also I remember now some disucssions in the cinder channel about it. I'll have to look that up again. I think there were clarifications that didn't make it to the mailing list.
15:43:35 <anteaya> #action asselin to post to the mailing list and ask in cinder channel to clarify select exclusions for cinder tests if they have bugs tracking them
15:43:44 <anteaya> asselin: does that sound reasonable?
15:43:49 <asselin> sure
15:43:52 <anteaya> thank you
15:44:13 <anteaya> any more on cinder tests?
15:44:31 <kaisers1> i'm done regarding that for today
15:44:36 <anteaya> kaisers1: thanks
15:44:44 <anteaya> I appreciate you bringing that up
15:44:59 <kaisers1> anteaya: pleasure
15:45:00 <anteaya> discussing issues before a deadline is much nicer than afterwards
15:45:17 <anteaya> does anyone have anything else they would like to discuss today?
15:45:51 <wznoinsk> hi all
15:45:56 <anteaya> wznoinsk: hello
15:46:06 <anteaya> wznoinsk: did you have anything you would like to discuss?
15:46:13 <wznoinsk> is anyone seeing problems with python-eventlet(greenthread) in their CIs when using testr?
15:46:44 <krtaylor> wznoinsk, we have in the past, what are you seeing?
15:46:49 <kaisers1> wznoinsk: Not currently but saw something like that some time ago
15:46:51 <anteaya> wznoinsk: have you a paste?
15:46:53 <wznoinsk> started only recently and happens in both containers and baremetal
15:47:29 <wznoinsk> http://pastebin.com/Dk2x3yMV
15:48:07 * krtaylor looks
15:48:13 <wznoinsk> yet, I'm not yet tested whether it's related to our networking on dpdk but looks pretty generic (eventlet/threading like)
15:48:43 <anteaya> wznoinsk: the error contains a suggestion
15:49:01 <anteaya> wznoinsk: have you tried evaluating if implementing the suggestion is reasonable?
15:49:37 <wznoinsk> I wouldn't be able to change the code (not quickly), if nobody's seen that/similar recently I'll keep diggin myself
15:49:46 <krtaylor> wznoinsk,  this looks like a race, you are running parallel tests I presume?
15:50:13 <anteaya> eventlet is not well liked right now, mostly because of the python2/3 disparity
15:50:40 <anteaya> wznoinsk: beyond that I haven't personally come across eventlet errors specific to third party cis
15:50:41 <wznoinsk> krtaylor: yes, parallel tests, but I think the error is happening on the software threads (greenthreads) and (I would imagine) that the same would happen in the productino env
15:51:35 <wznoinsk> anteaya: I thought I'd share it here as when I boot vms by hand it's ok, only testr gives me that grieve here
15:51:47 <anteaya> wznoinsk: of course, yes
15:51:54 <anteaya> wznoinsk: always good to share
15:52:02 <anteaya> boot vms by hand?
15:52:19 <wznoinsk> ='nova boot' or horizon
15:52:34 <anteaya> oh
15:53:16 <wznoinsk> thanks for having a look anyways
15:53:24 <anteaya> what command are you running that this is part of the output?
15:53:44 <wznoinsk> testr run tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops
15:54:01 <anteaya> is is just that test that fails?
15:54:08 <anteaya> or any test run with testr?
15:54:53 <anteaya> have you tried running just one test?
15:54:56 <wznoinsk> apparently it sometimes passes sometimes failes due to the above error (depending when it actually happens), krtaylor is probably right about some race condition
15:55:20 <anteaya> is it the same test that fails everytime?
15:55:32 <anteaya> you can run it with until failure to flush out races
15:55:40 <wznoinsk> happens for any tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.*, I'm about to check other scenario tests
15:56:11 <anteaya> #info testr run --parallel --concurrency N --until-failure
15:56:19 <anteaya> wznoinsk: okay
15:56:24 <wznoinsk> I'll give it a go, thanks
15:56:27 <anteaya> would be good to get a comparison
15:56:33 <anteaya> thanks for asking, I hope you find the issue
15:56:48 <anteaya> anything more here?
15:57:41 <anteaya> anyone have anything else?
15:57:46 <anteaya> 2 minutes left
15:58:20 <anteaya> okay let's wrap up
15:58:24 <anteaya> thanks everyone
15:58:35 <kaisers1> thanks & bye
15:58:36 <anteaya> enjoy the rest of your <time-of-day>
15:58:42 <anteaya> see you next week
15:58:46 <anteaya> #endmeeting