15:00:23 #startmeeting third-party 15:00:25 Meeting started Mon Mar 30 15:00:23 2015 UTC and is due to finish in 60 minutes. The chair is anteaya. Information about MeetBot at http://wiki.debian.org/MeetBot. 15:00:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 15:00:29 The meeting name has been set to 'third_party' 15:00:32 hello 15:00:38 hey 15:00:48 raise your hand if you are here for the third party meeting 15:00:50 ameade: hello 15:00:59 ameade: how is your work coming along? 15:01:08 hi 15:01:13 kaisers1: hello 15:01:17 anteaya: very well thanks :) 15:01:24 ameade: I'm glad to hear that 15:01:30 ameade: what is your current status? 15:01:37 hi everyone 15:01:44 patrickeast: hi patrickeast 15:02:47 anteaya: have been posting on Cinder for most of our drivers for months now but have been setting up another system for our Fibre Channel stuff which has been interesting 15:03:00 yes, I've been following along 15:03:11 which is why I am giving you priority right now at the meeting 15:03:20 unless you prefer that I don't 15:03:55 just hanging out incase I can help other folks :) 15:04:21 oh okay 15:04:26 that is good too 15:04:27 welcome 15:04:33 how is everyone today? 15:05:09 hi 15:05:14 hey asselin 15:05:26 does anyone have anything they would like to discuss? 15:06:00 ameade, I posted our fc cinder testing notes back a few meetings ago 15:06:06 ameade, where you there? 15:06:24 asselin: I dont think so 15:07:30 asselin: Do you do one job at a time against an FC switch or have you turned zoning off? 15:07:59 ameade, we have zoning off 15:08:37 gotcha, i'm currently running one at a time with single concurrency, but to scale i'm going to have to turn it off too 15:08:58 i do think I found a bug with the synchronization in the brocade code though 15:10:17 ameade, please share / submit a bug report. we're planning to test with the zone manager eventually 15:10:37 asselin: yeah definitely, i'm going to dig into it once i get this other hardware setup 15:12:27 asselin: thanks for bringing this up 15:12:36 is there more to discuss here? 15:13:35 does anyone have anything else they would like to discuss? 15:13:49 quick question regarding a failure i have 15:13:54 pasting, sec 15:13:54 ameade, http://eavesdrop.openstack.org/meetings/third_party/2015/third_party.2015-02-23-15.01.log.html 15:14:04 there are no quick questions, just questions 15:14:12 #link my fibre channel cinder notes http://eavesdrop.openstack.org/meetings/third_party/2015/third_party.2015-02-23-15.01.log.html 15:14:19 http://paste.openstack.org/show/197567/ 15:14:32 a failure in the cinder tests, 15:15:02 kaisers1: and let's have a question to go with the paste 15:15:06 i'm just looking into this. Nova cannot create a floating ip, do i read that correctly? 15:15:14 sorry, not the fastest typing 15:15:15 asselin: perfect, thank you...I am gonna have to automate the PCI passthrough stuff soon 15:15:36 kaisers1, this test is failing for us too. I disabled it and didn't look into it yet. test_minimum_basic_scenario 15:15:51 asselin: oh, ok 15:16:01 hi everybody, sorry I am late 15:16:11 ameade, FYI there could be a better way to do it directly via nova pci passthrough. 15:16:15 asselin: what kind of disabling do you use? don't want to clash with thingees requirements... 15:16:21 kaisers1: I get one hit returned on that error: http://www.gossamer-threads.com/lists/openstack/dev/23837 15:16:45 anteaya: thanks for the hint! 15:16:47 kaisers1, I use regex exclusions. 15:16:59 asselin: that allowed after fridays email? :) 15:17:05 kaisers1: so it looks like a timing issue 15:17:10 s/email/emails/ 15:17:25 kaisers1: do you have a link to the email you reference? 15:17:27 kaisers1, I'm not too concerned with thingee's requirements. There are bugs that need to get fixed. 15:17:37 ok, thanks for the feedback :) 15:17:41 kaisers1: it helps if you go to lists.openstack.org and bring one back 15:17:50 ok, sec 15:17:54 thank you 15:17:57 asselin: I am using your FC passthrough scripts and it is very reliable. I tried the nova flavor method as well but could not get it to work reliable with nodepool. 15:18:20 reliably 15:18:24 one more thing from last week. The weird ResponseNotReady issue was solved 15:18:28 thingee wants to include all 'volume' tests. We disable those that don't work and investigate them outside of CI. 15:18:39 rhe00, great to know! 15:18:41 asselin: thanks for clarifying 15:18:58 as some would take your above comment to mean that _they_ don't need to worry about requirements 15:18:59 asselin: ok, will probably go similar 15:19:03 which is not the case 15:19:22 kaisers1, it's nice to know other drivers have issues. Likely means its a real bug in the tempest test or cinder itself. 15:19:48 asselin: I found a bug in tempest on friday :) 15:19:49 kaisers1, and not necessarily the drivers 15:19:56 kaisers1, link? 15:20:00 asselin: that was fixed quickly, sec 15:20:57 link takes a moment, please continue 15:21:20 I'm waiting on your link for the bug and for the mailing list post 15:21:28 unless someone else can find and post them first 15:21:40 bug: https://bugs.launchpad.net/tempest/+bug/1437328 15:21:41 Launchpad bug 1437328 in tempest "No networks found in fixed_network.py (list index out of range)" [High,Fix released] - Assigned to Matthew Treinish (treinish) 15:21:41 I don't have enough information myself to attempt a search 15:21:55 #link https://bugs.launchpad.net/tempest/+bug/1437328 15:21:57 ok, the issue I'm running into is: test_volume_upload is causing the jenkins slave to freeze 15:22:05 intermittently 15:22:16 asselin: do you have any artifacts to share? 15:22:52 anteaya, all I have is the nova console log error message: "BUG: soft lockup - CPU#1 stuck for 22s" 15:23:09 nice 15:23:14 i dont think i've hit these fwiw 15:23:42 asselin: I don't know as we have been seeing that in infra 15:23:53 that error message doesn't ring a bell with me 15:24:01 when did it begin? 15:24:46 anteaya, it's been around for a while....we used to exclude that test, but now want to run it (obviously) 15:25:04 that is looking like a linux kernel bug 15:25:14 #link https://bugzilla.redhat.com/show_bug.cgi?id=442920 15:25:18 bugzilla.redhat.com bug 442920 in kernel "BUG: soft lockup - CPU#0 stuck for 61s!" [Low,Closed: insufficient_data] - Assigned to kernel-maint 15:25:42 anteaya: This is the mail link: http://lists.openstack.org/pipermail/openstack-dev/2015-March/059990.html . This goes along with some prior discussion regarding tests should not be skipped via regexp. I'm not sure if that was on irc or the mailing list 15:25:42 anteaya, yes likely either on the bare metal or nodepool image..... 15:26:00 #link http://lists.openstack.org/pipermail/openstack-dev/2015-March/059990.html 15:26:09 kaisers1: thank you 15:26:22 asselin: interesting 15:26:32 anteaya, I'm trying to upgrade the blade's linux kernel to 3.16 (from 3.13) but then fc passthrough breaks 15:26:42 :( 15:27:10 just for the sake of conversation what happens if you downgrade the kernel 15:27:22 meaning it is just 3.13 that is causing that error 15:27:22 anteaya, didn't try 15:27:27 * wznoinsk joins late 15:27:32 it might be a dumb idea 15:28:19 asselin, unfortunately, that message could be a lot of different problems 15:28:29 asselin: here is a thread on an ubuntu forjm 15:28:32 anteaya, it's not a dumb idea. It's possible there's a regression in one of the minor release versions 15:28:33 #link http://ubuntuforums.org/showthread.php?t=1757773 15:28:44 asselin: okay well let me know if you decide to try it 15:29:00 asselin: you are using ubuntu, yes? 15:29:08 anteaya, yes 15:29:17 krtaylor, you've seen these before? 15:29:33 asselin: the thread seams to be suggesting a bios upgrade 15:29:36 asselin, yes, early on, not in a while 15:29:47 asselin: I'm not sure how blades work, would a bios upgrade apply? 15:30:15 krtaylor, which kernel versions do you use for the bare metal & jenkins slaves? 15:30:29 anteaya, perhaps I can check to see if there's an update... 15:30:35 anteaya, yes, we also saw this with hardware related problems with bringup 15:30:49 asselin: it might be a first step rather than downgrading the kernel 15:31:13 has anyone else seen this issue? 15:31:14 asselin, we are not doing bare metal (yet), we run in 1st level guests just like upstream infra 15:31:50 krtaylor, so who runs the hosts? 15:32:14 asselin, but on our hosts we run a base F20/F21 15:32:51 asselin: let me know when you're done 15:32:59 anteaya, i'm done 15:33:33 asselin: okay 15:33:59 kaisers1: so going back to the mailing list thread, are you one of the people who participated in the conversation? 15:35:01 kaisers1: did we lose you? 15:35:14 anteaya: sorry, colleague asked something 15:35:27 are you one of the folks who participated in the thread? 15:35:44 anteaya: I did not write but due to the added tests from friday our ci has two failures 15:35:52 okay 15:35:58 which failures? 15:36:07 one was the exception i posted earlier 15:36:15 okay and the other? 15:36:29 a permission issue with runtime snapshots 15:36:41 do you have a stack trace handy or no? 15:36:42 That may relate to our driver 15:36:46 ah 15:36:55 that's why i'm looking into it myself 15:36:57 well that is good you are running the tests to find that 15:37:00 okay great 15:37:43 so you have a bout 48 hours to get them fixed, yes? 15:38:18 that's what i think. Although we're still in trunk, our CI was working ok at the deadline, just the new tests are now creating pressure 15:38:24 yes 15:38:35 and from what I read you have 48 hours 15:38:46 are you in a position to meet that deadline? 15:38:53 04.06., bit more than 48hours? 15:38:59 give or take 15:39:05 ok :) 15:39:09 5 days before 04/06 15:39:14 which is April 1 15:39:18 read the email 15:39:18 yep 15:39:32 yes, i did read that 15:39:32 #info you must have a CI reporting and stable 15:39:34 for five days prior to 4/6. 15:39:39 good 15:39:46 anteaya: I think that is only for drivers that were pulled because they didn't have a working CI at the deadline 15:39:47 so stay in communication 15:39:49 yes 15:39:59 rhe00: correct 15:40:10 rhe00: that is also how I read that 15:40:10 that's what i gathered, too. Nevertheless i want the failures be fixed asap anyways 15:40:18 kaisers1: good attitude 15:40:20 kaisers1, I think we need to get this clarified at the cinder meeting. Personally I think you should be able to have select exclusions with e.g. bug's submitted to track them. 15:40:30 asselin: good point 15:40:35 asselin: +1 15:40:40 has someone added an agenda item to the cinder meeting yet? 15:40:42 that would be fine 15:41:13 someone needs to drive it for the conversation to happen 15:42:01 yep. My thought was first to write a mail for clarification 15:42:13 Otherwise there will be an agenda bullet added... 15:42:22 I can post to the mailing list & ask in cinder channel. Given the deadline, probably best to do it today. 15:42:29 asselin: I agree 15:42:48 asselin: did you want to take an action item on that? 15:42:53 sure. 15:43:25 also I remember now some disucssions in the cinder channel about it. I'll have to look that up again. I think there were clarifications that didn't make it to the mailing list. 15:43:35 #action asselin to post to the mailing list and ask in cinder channel to clarify select exclusions for cinder tests if they have bugs tracking them 15:43:44 asselin: does that sound reasonable? 15:43:49 sure 15:43:52 thank you 15:44:13 any more on cinder tests? 15:44:31 i'm done regarding that for today 15:44:36 kaisers1: thanks 15:44:44 I appreciate you bringing that up 15:44:59 anteaya: pleasure 15:45:00 discussing issues before a deadline is much nicer than afterwards 15:45:17 does anyone have anything else they would like to discuss today? 15:45:51 hi all 15:45:56 wznoinsk: hello 15:46:06 wznoinsk: did you have anything you would like to discuss? 15:46:13 is anyone seeing problems with python-eventlet(greenthread) in their CIs when using testr? 15:46:44 wznoinsk, we have in the past, what are you seeing? 15:46:49 wznoinsk: Not currently but saw something like that some time ago 15:46:51 wznoinsk: have you a paste? 15:46:53 started only recently and happens in both containers and baremetal 15:47:29 http://pastebin.com/Dk2x3yMV 15:48:07 * krtaylor looks 15:48:13 yet, I'm not yet tested whether it's related to our networking on dpdk but looks pretty generic (eventlet/threading like) 15:48:43 wznoinsk: the error contains a suggestion 15:49:01 wznoinsk: have you tried evaluating if implementing the suggestion is reasonable? 15:49:37 I wouldn't be able to change the code (not quickly), if nobody's seen that/similar recently I'll keep diggin myself 15:49:46 wznoinsk, this looks like a race, you are running parallel tests I presume? 15:50:13 eventlet is not well liked right now, mostly because of the python2/3 disparity 15:50:40 wznoinsk: beyond that I haven't personally come across eventlet errors specific to third party cis 15:50:41 krtaylor: yes, parallel tests, but I think the error is happening on the software threads (greenthreads) and (I would imagine) that the same would happen in the productino env 15:51:35 anteaya: I thought I'd share it here as when I boot vms by hand it's ok, only testr gives me that grieve here 15:51:47 wznoinsk: of course, yes 15:51:54 wznoinsk: always good to share 15:52:02 boot vms by hand? 15:52:19 ='nova boot' or horizon 15:52:34 oh 15:53:16 thanks for having a look anyways 15:53:24 what command are you running that this is part of the output? 15:53:44 testr run tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops 15:54:01 is is just that test that fails? 15:54:08 or any test run with testr? 15:54:53 have you tried running just one test? 15:54:56 apparently it sometimes passes sometimes failes due to the above error (depending when it actually happens), krtaylor is probably right about some race condition 15:55:20 is it the same test that fails everytime? 15:55:32 you can run it with until failure to flush out races 15:55:40 happens for any tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.*, I'm about to check other scenario tests 15:56:11 #info testr run --parallel --concurrency N --until-failure 15:56:19 wznoinsk: okay 15:56:24 I'll give it a go, thanks 15:56:27 would be good to get a comparison 15:56:33 thanks for asking, I hope you find the issue 15:56:48 anything more here? 15:57:41 anyone have anything else? 15:57:46 2 minutes left 15:58:20 okay let's wrap up 15:58:24 thanks everyone 15:58:35 thanks & bye 15:58:36 enjoy the rest of your 15:58:42 see you next week 15:58:46 #endmeeting