14:01:25 #startmeeting powervm_driver_meeting 14:01:26 Meeting started Tue Feb 20 14:01:25 2018 UTC and is due to finish in 60 minutes. The chair is esberglu. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:01:27 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 14:01:30 The meeting name has been set to 'powervm_driver_meeting' 14:03:11 edmondsw: efried: You guys around? Sorry didn't realize the recurring meeting notices were done 14:03:30 oh, yeah, I forgot because no calendar entry. 14:03:35 But I'm here. 14:04:22 #topic In-tree Driver 14:04:59 The bp looked good to me, I think edmondsw is gonna add the interface hotplug stuff still? 14:06:02 Only other open IT item for me is snapshot 14:06:37 Hoping we can get vSCSI, snapshot and localdisk in pretty early in the cycle 14:07:04 Because the migration stuff is gonna require a lot more work, both coding and testing 14:07:09 And getting CI up to speed 14:09:06 yuh 14:09:09 #topic Out-of-tree Driver 14:09:31 edmondsw and I learned things about MagicMock and one of our test cases yesterday. 14:10:08 about the former: it doesn't blow up when you try to iterate on it (Mock does). And about the latter: we weren't testing what we thought we were testing. Which happens a lot. 14:12:09 efried: You mean we have a bunch of tests not doing what we thought? 14:12:40 Probably. But at least this one for sure. 14:13:18 Interested in the recap? 14:13:44 efried: Yeah 14:14:20 This was for driver capabilities. We've been setting has_imagecache dynamically, because whether the driver supports it depends on which disk driver we're using. 14:14:25 Localdisk supports it, the others don't. 14:14:52 So we had a test that said assertFalse(driver.capabilities['has_imagecache']) 14:15:18 which seems reasonable enough 14:15:42 The dynamic setting used to be code in the driver that did something like: 14:15:42 self.capabilities.update(disk_drv.capabilities) 14:16:05 i.e. "take the stuff in the disk_drv.capabilities dict and overwrite/add whatever's in self.capabilities" 14:16:42 edmondsw wasn't happy with the "add" part, because it meant we were pulling capability keys into our compute driver that aren't part of the ComputeDriver interface. 14:17:11 yep 14:17:14 So to get rid of that, he changed it to be explicit about just pulling in has_imagecache: 14:17:14 self.capabilities['has_imagecache'] = disk_drv.capabilities['has_imagecache'] 14:17:27 Whereupon the above assertion started to fail. 14:17:40 And when he changed it to assertTrue, it passed. 14:17:50 Which should not happen. 14:19:34 If we were testing this properly, both of those modes of assignment should have been equivalent in terms of the has_imagecache key. So it should have been the same before & after. Which in fact should have been a signal that the code change was valid. 14:20:34 But the code change is valid, so we realized it was a problem with our test 14:20:57 Well, in broad terms, yeah. But we couldn't suss what the problem could possibly be. 14:21:21 The first thing I suspected was that one or both of those .capabilities was a Mock. But that didn't make sense, because you get an exception if you try to [key access] or .update a Mock. 14:21:31 So either code path should have raised an exception. 14:22:03 But it turns out that the disk_drv.capabilities dict was a *Magic*Mock 14:23:14 And MagicMock, in contrast with Mock, has default impls for magic methods. Like __getitem__ (returns a mock) and __iter__ (yields as if an empty list). 14:24:27 So when we were doing .update(), we were updating the driver's capabilities - where has_imagecache is defaulted to False - from an empty list of disk driver capabilities. Which left it False. 14:24:48 But when we did the assignment, we were assigning a mock to the value. And a mock evaluates to True. 14:25:34 Ah okay I see 14:26:39 I'm not seeing where in the code the capabilities are a MagicMock vs Mock 14:28:41 Yeah, that was tricky to track down. The driver is a real driver, which we were doing on purpose so we could get "real" values in the capabilities dict 14:28:42 BUT 14:29:02 the setUp for this test class is installing a fixture that actually mocks the disk driver constructor 14:29:37 So we're getting a real compute driver, but when it goes to create the disk driver, that mock patch takes over and we get a MagicMock. 14:30:29 Okay I follow 14:32:23 What's next? 14:32:40 efried: How do we fix it? 14:32:53 Don't let the disk driver get mocked. 14:33:24 I suggested moving the test case up to the first class in the suite, which doesn't do any of the fixture stuff. 14:33:57 And also adding test cases where we use different disk drivers to make sure it's True only for LocalStorage. 14:34:22 And also adding assertions to ensure that the extraneous keys aren't being created. 14:34:57 Sounds good 14:35:10 #topic Device Passthrough 14:35:12 IOW, what I'm sure edmondsw thought was going to be an ace across the net is rapidly metastasizing 14:35:58 Yep sounds that way 14:35:59 No news from me on dev passthrough. We're frantically writing & reviewing specs ahead of the PTG. 14:36:08 I abandoned that resource class affinity spec. 14:36:21 And put up this new one (unrelated) last night: https://review.openstack.org/#/c/546009/ 14:38:24 #topic PowerVM CI 14:38:31 I believe I'm still waiting on prompts from edmondsw et al for next steps on dev passthrough. 14:38:38 sorry, I'm a step behind you all day. 14:38:45 np 14:39:23 So the big thing for CI for queens is getting multinode jobs so we can do migration testing 14:39:39 I haven't looked into that at all really in the past, so I need to do some research there 14:39:40 Yeah. Big fun times. 14:39:55 I can tell you this: I will be of absolutely no help. 14:40:21 I have managed to go my entire career without ever migrating a partition 14:40:47 Huh, would have bet against that :) 14:41:22 The other thing for CI I wanted to bring up is that stale vopt issue we were seeing 14:41:46 Since fixing up the post stack cleaner, that has gone WAY down. However it did hit a handful of times in the last week 14:42:21 So we either need to re-explore that vopt naming change or try to figure out where the vopts are coming from now 14:42:22 interesting. 14:42:54 Did you manually clean up (or check) any stale vopts that were lying around from before we fixed the cleaner? 14:43:08 efried: Yes 14:43:52 So we're somehow still ending up with stale ones. 14:44:31 Yep, looks that way 14:44:36 I wonder if it could be timing issues with multiple tests (possibly even from multiple patches) running at once. 14:45:05 Like, the offending vopt isn't actually stale; it's just in the process of being created/mapped or unmapped/deleted when the other test hits this code path. 14:45:18 ...and it's still the naming issue. 14:45:43 efried: If that were the case I don't think the post stack cleaner update would have cut down the occurrences so much 14:46:13 No, I'm saying both issues that we talked about were problems. The post-stack cleaner took care of most of them, but not all. 14:46:19 Unless... is it possible the cleaner fix didn't get deployed to whatever server hit the problem? 14:46:38 efried: Nope, the latest of that gets pulled every run 14:47:50 How often is this happening? Did we just see this once in a little blip and never again? 14:49:31 7 times since I left on wednesday 14:49:48 Where it was a handful a day before the cleaner fix 14:51:09 Either way, needs further debugging. I will look into it and let you know if I get stuck 14:51:12 Spread out across multiple nodes? 14:51:19 efried: Yep 14:51:23 I was thinking we could reinstate the naming fix and see if it clears up entirely. 14:52:30 efried: And if it doesn't, that points to a possible timing issue 14:52:36 I'm on board with that 14:53:11 ight. will look for that review. I need to un-abandon the renaming patch, eh? 14:53:14 * efried looks... 14:53:48 Done: https://review.openstack.org/#/c/540453/ 14:54:40 efried: Cool I'll put up the change to patch it in right after this 14:55:01 efried: That's all for me today, anything else? 14:55:42 don't think so. 14:55:49 #endmeeting