#openstack-powervm log

13:01:02 <esberglu> #startmeeting powervm_driver_meeting
13:01:03 <openstack> Meeting started Tue Aug  8 13:01:02 2017 UTC and is due to finish in 60 minutes.  The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot.
13:01:04 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
13:01:06 <openstack> The meeting name has been set to 'powervm_driver_meeting'
13:01:17 <mdrabe> o/
13:01:46 <edmondsw> o/
13:02:54 <esberglu> #topic In Tree Driver
13:02:57 <esberglu> #link link https://etherpad.openstack.org/p/powervm-in-tree-todos
13:03:19 <esberglu> I don't think there is anything new IT
13:03:25 <edmondsw> right
13:03:49 <esberglu> #topic Out Of Tree Driver
13:04:58 <edmondsw> thorst please check 5645
13:05:29 <thorst> edmondsw: yes sir.
13:05:40 <edmondsw> I think that's all we've got going OOT at the moment
13:06:18 <esberglu> #topic PCI Passthrough
13:06:53 <esberglu> Anything new here?
13:07:06 <edmondsw> I don't think we've made any progress here yet. efried is finishing up some auth work and then we can start to make progress
13:07:39 <efried> o/
13:07:50 <efried> Yeah, what edmondsw said.
13:08:49 <esberglu> #topic PowerVM CI
13:09:22 <esberglu> Tested the devstack gen. tempest.conf one last time for all runs last night, all looked good
13:09:31 <edmondsw> great
13:09:36 <esberglu> Got the +2 from edmondsw, anyone else want to look before I merge?
13:10:19 <esberglu> Tempest bugs are getting worked through
13:10:22 <edmondsw> do we need to be opening a LP bug about those 2 tests having the same id?
13:10:30 <efried> esberglu I don't need to look again.
13:10:38 <esberglu> edmondsw: I think that it is intentional for those 2
13:10:39 <efried> If it's tested and edmondsw is happy, I'm happy.
13:10:54 <esberglu> They are the same test, just different microversions
13:11:10 <edmondsw> I'd rather we weren't having to skip a couple new tests, but that seems a small price to pay to get this in
13:11:22 <edmondsw> I hope there's a todo to figure that out and get those unskipped?
13:11:49 <esberglu> edmondsw: Yeah I was going to add it to the list once I merged
13:11:50 <edmondsw> yeah, I know it's kinda the same test... still thought they should probably have different ids but maybe not
13:12:06 <edmondsw> esberglu I'd go ahead and add it just to make sure we don't forget :)
13:12:22 <esberglu> I can disable the 2.48 version of the tests by setting the max_microversion
13:12:37 <edmondsw> I'd rather not
13:12:38 <esberglu> But I'm not familiar enough with compute microversions to know if that's really what we want
13:12:47 <esberglu> I didn't think so either
13:12:55 <efried> Can I get some background here?
13:13:21 <efried> Two different tests testing the same thing over different microversions of the API ought to have different UUIDs.  I very much doubt that was intentional.
13:13:46 <efried> And we should be able to handle both microversions in our env.  If we can't, and that's passing in the world at large, it's our bug.
13:14:05 <edmondsw> efried check 5598
13:14:15 <esberglu> https://github.com/openstack/tempest/blob/master/tempest/api/compute/admin/test_server_diagnostics.py
13:14:52 <esberglu> I'm guessing whoever made the V248 test there just copied the original test case and didn't change the ID
13:14:54 <edmondsw> efried I expect efried is right, but I didn't look at how the test is actually written... is it one method, so one id, but run twice somehow?
13:15:12 <edmondsw> esberglu ah in that case it does sound like a bug
13:15:14 <efried> esberglu I suspect that's what happened.
13:15:29 <esberglu> Anyways I can look into it
13:15:29 <edmondsw> esberglu open the LP bug... worst case they reject it
13:15:38 <edmondsw> tx
13:15:38 <esberglu> Yep
13:15:49 <esberglu> Other bugs...
13:16:03 <esberglu> There was a bug in tempest where the REST requests would timeout
13:16:24 <esberglu> efried made a loop to see if it was permanent or temporary
13:16:25 <esberglu> https://review.openstack.org/#/c/491003/
13:16:36 <esberglu> With that getting patched in we no longer are seeing that timeout
13:17:00 <esberglu> But we still need to find out what's causing the timeout and make a long term solution
13:17:12 <edmondsw> ++
13:17:29 <esberglu> hsien got to the bottom of the internal server error 500's
13:17:37 <efried> oh, do tell
13:18:22 <edmondsw> sweet
13:18:35 <edmondsw> 5657
13:18:49 <esberglu> There was an issue with the vios busy rc not being honored and retrying
13:19:09 <efried> btw, that loop fixup should have logged a message when we hit it.  We should look for that log message and see how many times it hits per test.  I suspect the very next try went through.  Which probably means it's a threading problem at the server side of that call.
13:19:49 <esberglu> efried: Will do
13:20:45 <efried> esberglu Another experiment that might be worthwhile is knocking our threading level down.  It's possible we're just timing out due to load.
13:21:08 <efried> Though... it seems like it would always hit on one or more of the same three or four tests, nah?
13:21:30 <esberglu> efried: Yeah same handful of tests
13:22:19 <edmondsw> esberglu you also had something about discover_hosts on the agenda?
13:22:29 <edmondsw> did we get that all straight?
13:22:37 <edmondsw> looks like the CI has been better
13:22:41 <esberglu> edmondsw: Was just going to say that our fix is working there
13:22:52 <edmondsw> awesome
13:22:53 <esberglu> Yep with that and efried's retry loop success rates are up
13:23:14 <esberglu> hsien's fix is +2 so should be in soon, then I will update the systems
13:23:34 <efried> edmondsw It needs to be noted that the retry loop is in tempest code, not our code.
13:23:58 <efried> So it's not a long-term fix (unless we can make the case that it should be submitted to tempest itself).
13:24:22 <edmondsw> efried right, we need to figure out what's going on there and how to fix it permanently
13:24:32 <efried> Yeah, cause I don't think it's a good idea for us to be running long-term with a tempest patch.
13:24:38 <edmondsw> ++
13:24:42 <esberglu> ++
13:24:56 <edmondsw> that on the todo list, esberglu?
13:25:08 <edmondsw> at the top? :)
13:25:44 <esberglu> edmondsw: I need to do an update of the list after the meeting but yeah it will be
13:25:50 <edmondsw> cool
13:25:57 <edmondsw> I was going to ask about http://184.172.12.213/92/474892/6/check/nova-in-tree-pvm/2922a78/
13:26:19 <edmondsw> I'm pretty sure I've seen that kind of failure before... but can't remember where it ended up
13:26:57 <esberglu> edmondsw: Yeah I saw that. I think when I removed a bunch of tests from the skip list with the networking api extension change some may have introduced new issues
13:27:19 <esberglu> I know we have had those before, can't remember what our solution was
13:27:20 <edmondsw> ok, that makes sense. cuz I thought we'd fixed that, but it was probably with a skip
13:28:04 <esberglu> edmondsw: IIRC its an issue with tests interfering with each other
13:29:02 <esberglu> That's all for CI
13:29:07 <esberglu> #topic Driver Testing
13:29:33 <esberglu> Any progress?
13:30:03 <edmondsw> I opened RTC stories for testing
13:30:40 <edmondsw> I ordered them such that we'd validate vSCSI, FC, and LPM with the OOT driver before coming back to iSCSI
13:30:47 <edmondsw> give us some time to do the dev work on iSCSI
13:31:23 <edmondsw> don't see jay1_ on to discuss further
13:31:30 <edmondsw> chhavi fyi ^
13:33:02 <esberglu> #topic Open Discussion
13:33:11 <esberglu> Any last words?
13:33:30 <edmondsw> I finally got devstack working! ;)
13:33:43 <edmondsw> so there are a bunch of additions to https://etherpad.openstack.org/p/powervm_stacking_issues
13:33:47 <esberglu> Woohoo!
13:34:26 <edmondsw> that last one was really weird... hope that's really the fix, and it wasn't just coincidence that it worked after that
13:34:53 <edmondsw> I'm pretty sure it's legit
13:35:09 <edmondsw> that's it from me
13:35:29 <esberglu> Thanks for joining
13:35:32 <esberglu> #endmeeting