14:01:30 <edmondsw> #startmeeting PowerVM Driver Meeting
14:01:32 <openstack> Meeting started Tue Apr 10 14:01:30 2018 UTC and is due to finish in 60 minutes.  The chair is edmondsw. Information about MeetBot at http://wiki.debian.org/MeetBot.
14:01:33 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
14:01:35 <openstack> The meeting name has been set to 'powervm_driver_meeting'
14:01:40 <edmondsw> #link agenda: https://etherpad.openstack.org/p/powervm_driver_meeting_agenda
14:01:52 <edmondsw> #topic In-Tree Driver
14:02:04 <edmondsw> #link https://etherpad.openstack.org/p/powervm-in-tree-todos
14:02:30 <edmondsw> esberglu update on IT status?
14:03:00 <esberglu> edmondsw: Everything before localdisk is ready for core review
14:03:36 <esberglu> efried: I've responded to all your comments on localdisk except
14:03:38 <esberglu> https://review.openstack.org/#/c/549300/18/nova/virt/powervm/disk/localdisk.py@122
14:03:44 <esberglu> Wasn't sure exactly what you meant there
14:04:45 <edmondsw> hotplug merged, so I've removed that from the todo etherpad
14:04:46 <esberglu> jichenjc left a few more comments that I haven't hit yet
14:05:04 <efried> shall we talk about that comment now?
14:05:13 <edmondsw> go ahead
14:06:14 <efried> We're building a ftsk, which is supposed to have a list of VIOSes in it as getters rather than wrappers, which is supposed to allow us to defer the retrieval of the VIOS(es) until we want to do the actual work, which is supposed to minimize the window for conflicts.
14:06:42 <efried> But we're doing things here that are eliminating those benefits.
14:08:23 <efried> First off, there's only one VIOS we care about, and we already know which one it is (self._vios_uuid or whatever).  So using build_active_vio_feed_task - which goes out and tries to figure out which of all the VIOSes are "active" (RMC up) and stuffs all of those into the ftsk - will only *hopefully* include that VIOS, and may very well include the other(s) that we don't care about.
14:08:51 <edmondsw> agree that we only care about one VIOS
14:08:52 <efried> Second, L137 accesses the .wrapper @property, which prefetches the wrappers in the ftsk, so we're not getting the benefit of deferring that fetch.
14:09:24 <edmondsw> __init__ only sets self._vios_uuid, it does not cache the vios_w, so we do need a way to get vios_w
14:09:48 <edmondsw> and we do need to get it to make sure we have the latest info there, right?
14:09:58 <efried> The main benefit I forgot to mention is running the subtasks in parallel across the VIOSes.  Which is n/a here since there is (should be) only one VIOS we care about.
14:10:12 <esberglu> efried: So what you're proposing is that instead of adding the rm_func to the stg_ftsk we would just call tsk_map.remove_maps
14:10:20 <esberglu> Directly after find_maps?
14:10:35 <edmondsw> so do we need a different stg_ftsk that only retrieves one vios, or do we need to get the vios without a feedtask?
14:10:48 <efried> I'm saying using a ftsk at all in this method is overkill.
14:10:51 <efried> unnecessary.
14:11:08 <efried> esberglu: Let me look; it's possible remove_maps already returns the maps that get removed.
14:11:27 <edmondsw> are feedtasks only relevant when you're dealing with lists, and not singletons?
14:11:39 <edmondsw> feed = list?
14:12:07 <efried> ...which means we could have probably extracted those results out of the ftsk after execute.
14:13:07 <efried> edmondsw: No, that's not the only advantage.  Doing multiple operations, reverting stuff, etc. (FeedTask is a derivative of TaskFlow)
14:13:33 <efried> ...yup, remove_maps already returns the list of maps removed.
14:13:58 <efried> I haven't looked, but I suspect the current code is an artifact of slot manager garbage from OOT, and we're going to have to re-complexify it later when we put that shit back in.
14:14:05 <efried> but for now, we can make this way simpler.
14:14:20 <esberglu> efried: So rip out the stg_ftsk & rm_func stuff, rip out find_maps
14:14:30 <esberglu> And just have
14:14:32 <esberglu> vios_w = stg_ftsk.wrapper_tasks[self._vios_uuid].wrapper
14:14:53 <edmondsw> you just said rip out stg_ftsk, so that won't work
14:15:14 <esberglu> Oh right
14:15:16 <efried> No stg_ftsk.  Retrieve the VIOS wrapper afresh based on self.vios_uuid
14:15:25 <edmondsw> yep
14:15:26 <efried> ...using the SCSI xag
14:15:31 <efried> VIO_SMAP
14:15:53 <edmondsw> and then tsk_map.remove_maps on it
14:15:59 <edmondsw> and done
14:16:09 <esberglu> Okay got it
14:16:45 <edmondsw> esberglu for vscsi, we already did this right?  "Add a follow on to use pypowervm 1.1.12 for wwpns"
14:16:53 <edmondsw> so I'm removing that from TODO etherpad
14:17:02 <esberglu> edmondsw: Yeah
14:17:25 <edmondsw> has anything merged other than netw hotplug
14:17:37 <edmondsw> or have comments we need to address, other than localdisk?
14:17:43 <esberglu> edmondsw: Nope, nothing has been reviewed, still a few things ahead of us in runways
14:17:52 <edmondsw> yep
14:18:02 <edmondsw> any updates on migrate/resize?
14:18:27 <esberglu> Gonna finish up localdisk today and get it ready for review, then jump back into that
14:18:36 <edmondsw> cool
14:18:36 <esberglu> Ready for core review
14:18:51 <edmondsw> ok, anything else IT?
14:19:08 <esberglu> Is there any system requirements for SDE installs? My install failed
14:19:23 <esberglu> And I need that to test localdisk snapshot
14:19:53 <efried> (esberglu: Just noticed gaffe in the commit message)
14:20:14 <edmondsw> esberglu thinking... but check with seroyer
14:21:01 <edmondsw> I think there are some local disk size requirements?
14:22:17 <esberglu> edmondsw: I'll ask and try again, might also see if anyone can loan me a system for a couple days
14:22:37 <edmondsw> #topic Out-of-Tree Driver
14:22:47 <edmondsw> #link https://etherpad.openstack.org/p/powervm-oot-todos
14:23:05 <edmondsw> I've got a meeting setup with the PowerVC folks to talk about the volume refactoring
14:23:19 <edmondsw> and get everyone on the same page there
14:23:49 <edmondsw> I'd talked to gfm about this, and he was onboard, but some others on his team are freaking out
14:23:52 <edmondsw> so need to calm them down
14:24:15 <edmondsw> I've been working with chhavi__ quite a bit on iscsi
14:24:23 <edmondsw> I think we're making progress there
14:24:59 <edmondsw> I need to ping burgerk about https://review.openstack.org/#/c/428433/ again
14:25:14 <edmondsw> #action edmondsw to ping burgerk about config drive UUID
14:25:43 <edmondsw> I also need to start writing code for MSP support
14:26:40 <edmondsw> efried I think the pypowervm support is already there for that, though obviously untested
14:27:04 <edmondsw> efried I will probably be proposing a change to at least the docstring, though, since it says name where it actually needs IPs
14:27:31 <edmondsw> and the arg is badly named as well... I'd love to rename it, but that would break backward compat
14:27:35 <efried> "MSP support"?
14:27:40 <efried> What arg?
14:27:42 <efried> What docstring?
14:27:44 <edmondsw> do you think that's ok since it didn't work before?
14:27:45 <efried> What's going on here??
14:27:48 <edmondsw> one sec
14:28:14 <edmondsw> https://github.com/powervm/pypowervm/blob/master/pypowervm/tasks/migration.py#L52
14:28:28 <edmondsw> dest_msp_name and src_msp_name should actually be lists of IP addresses
14:28:31 <edmondsw> not names
14:28:39 <edmondsw> MSP = mover service partition
14:29:05 <edmondsw> specifying IPs allows you to dictate which interfaces are used for LPM
14:29:45 <edmondsw> new for NovaLink, but HMC has had this... presumably the pypowervm code was copied from HMC support
14:30:06 <edmondsw> efried make more sense now?
14:30:47 <efried> I thought the "lists of" thing was something new coming down the pipe.
14:30:55 <efried> And... you're saying those args don't work at all today?
14:31:37 <edmondsw> efried NovaLink didn't support those in REST until the changes Nicolas has just now been working on
14:31:44 <edmondsw> so they couldn't have worked (for NovaLink) before
14:32:00 <efried> OIC, we just copied that method from k2operator or whatever?
14:32:06 <edmondsw> I assume, yes
14:32:44 <efried> was REST just ignoring any values passed down there?
14:33:04 <efried> cause if so, we can't remove/rename them.  If it was erroring, then maybe we can get away with it.
14:33:24 <edmondsw> efried right, I have to check with Nicolas on that
14:33:42 <edmondsw> until I know otherwise, I'm assuming we have to leave them and just cleanup the docstring
14:33:48 <efried> Well...
14:34:11 <efried> If they can now be lists, we should probably accept (python) lists, and convert 'em to comma-delimited (or whatever) within the method.
14:34:32 <edmondsw> yes
14:34:48 <edmondsw> I don't mean there would only be a docstring change... just that I wouldn't rename the args unless they were erroring before
14:35:10 <efried> Dig.
14:35:37 <edmondsw> anything else to discuss OOT?
14:36:09 <esberglu> nope
14:36:11 <edmondsw> #topic Device Passthrough
14:36:15 <edmondsw> #efried you're up
14:36:35 <efried> I started working on granular.  Some pretty intricate algorithms happening there.
14:36:55 <efried> Got grudging agreement from jaypipes that the spec as written is the way we should go (rather than switching to separate-by-default)
14:37:18 <edmondsw> cool
14:37:26 <efried> he still has to convince Dan, but I think since the path of least resistance is what we've got, it'll just fall off.
14:37:56 <efried> In case you're interested in looking at the code: https://review.openstack.org/#/c/517757/
14:38:04 <efried> I need to fix test, but the general idea is there.
14:38:20 <edmondsw> I'm interested, but won't have time
14:38:21 <efried> At this point I've given up waiting for Jay to finish nrp-in-alloc-cands before I do that.
14:38:29 <edmondsw> :)
14:38:38 <efried> So whichever one of us wins, the other has to figure out how to integrate granular+NRP.
14:39:33 <efried> There's a new #openstack-placement channel you may wish to join.
14:39:43 <edmondsw> efried ah, tx for the heads up
14:40:21 <efried> upt stuff is mostly merged.  I think my runway expires tomorrow.  But the stuff that's left is pretty nonessential - if it doesn't get in, it's not the end of the world.
14:40:34 <edmondsw> so the last important one did merge?
14:40:49 <efried> I think so.  Lemme double check.
14:41:24 <efried> yeah.  The pending ones are nice-to-have, but we can get by without 'em if we need.
14:42:02 <efried> https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/update-provider-tree
14:42:17 <edmondsw> efried so are we ready to start making changes in our driver?
14:42:35 <efried> Yes, if someone else wants to do it.  I'm going to be heads down on granular and reviewing other placement stuff for a while yet.
14:42:49 <efried> also note that we can't get any mileage out of actual trees until Jay's thing is done.
14:43:29 <efried> We can do single-provider stuff with traits, but we won't be able to do anything with child providers for GPUs etc.
14:43:43 <edmondsw> so that'll probably wait a bit longer then, because I have too many other things on my plate right now as well
14:44:00 <efried> We could hack that together with custom resource classes, one per GPU, inventory of 1.  But that would be an interim solution.
14:44:22 <efried> If we get to the end of Rocky and the NRP work still isn't finished, we may have to do that.
14:44:29 <edmondsw> k
14:44:42 <edmondsw> anything else?
14:45:18 <efried> Well, on your side, have we gotten any further figuring out how we want to map/represent/filter supported devices?
14:45:37 <edmondsw> I think it's pretty much what we'd talked about before
14:45:55 <edmondsw> provider per adapter
14:46:28 <edmondsw> so we can allow selection by unique id if need be
14:47:19 <edmondsw> for PCI, representation will use PCI vendor/device IDs
14:47:32 <efried> hm, then I wonder if we actually want to model it with custom resource classes.
14:47:41 <efried> Nah, the cores will freak about that.
14:47:50 <edmondsw> the custom bit will the unique id
14:48:13 <efried> Right, but we need to use traits for that.
14:48:26 <edmondsw> so we could use a common provider and have custom traits?
14:48:34 <edmondsw> if so, great
14:48:44 <efried> no, if it's a common provider, it would need to be distinct RCs.
14:48:50 <efried> Separate providers, traits.
14:49:13 <efried> otherwise there's no way to know which trait belongs to which device.
14:49:44 <edmondsw> I did not follow that
14:49:45 <efried> btw, RP names are freeform - no char restrictions - so we can do whatever tf we want with them.
14:50:18 <efried> meaning that we can use the DRC name (or whatever) for the RP name, and not have to do any weird mapping.
14:50:28 <efried> Sorry, okay, lemme back up.
14:51:00 <efried> Traits are on providers, not resources.
14:51:44 <esberglu> efried: edmondsw: Sorry to butt in, but I've got to present on CI in a few minutes
14:51:46 <esberglu> Multinode CI status: Have working multinode stack within staging, updated prep_devstack to handle control and compute
14:51:58 <esberglu> Still seeing a few errors there
14:52:06 <edmondsw> efried yeah, let's give esberglu a few min on CI and we can continue later
14:52:13 <edmondsw> #topic PowerVM CI
14:52:17 <edmondsw> #link https://etherpad.openstack.org/p/powervm_ci_todos
14:52:27 <esberglu> Next up is getting zuul/nodepool to work with multinode
14:52:37 <esberglu> And figuring out the tempest failures
14:52:43 <esberglu> That's pretty much all I have
14:53:09 <edmondsw> esberglu tempest failures?
14:53:31 <edmondsw> is that specific to multinode, or in general?
14:53:53 <esberglu> Seeing cold mig tempest failures (not all, just a few tests)
14:53:56 <esberglu> On OOT
14:54:10 <edmondsw> ok
14:54:33 <esberglu> Gotta run
14:54:43 <edmondsw> I need to run as well
14:54:58 <edmondsw> #topic Open Discussion
14:55:01 <edmondsw> anything quick here?
14:55:02 <efried> edmondsw: If we want to have all of our devices in the same provider, and have them all with the same generic resource class (e.g. "GPU"), it doesn't help us to have all the traits that represent all the devices on that provider, because when you pick one off, you don't know which trait goes with which inventory item.  And we don't want to be editing traits on the fly to indicate that kind of thing.  So if we want all ou
14:55:02 <efried> custom    resource classes (e.g. "GPU_<drc_index>") and we kinda lose the ability to request different types (e.g. based on vendor/product IDs).
14:56:03 <efried> So what we want is one RP per device, with the provider name equating to a unique identifier we can correlate back to the real device, and traits on the RP marking the type (vendor/product IDs, capabilities, whatever).
14:56:20 <efried> each RP has inventory 1 of the generic resource class (e.g. "GPU")
14:56:47 <efried> If that's still murky, hmu later and we can talk it through s'more.
14:56:55 <edmondsw> so we can use the common/generic RC
14:57:02 <edmondsw> but need custom RP
14:57:15 <efried> We were going to want to do that to some extent anyway.
14:57:16 <edmondsw> that's what I was hoping
14:57:29 <efried> Theoretically we could group like devices
14:57:42 <efried> but then we lose the ability to target a *specific* device.
14:57:47 <efried> which I gather is something we still want.
14:57:52 <edmondsw> I think so
14:57:53 <efried> even though it's not very cloudy.
14:58:22 <edmondsw> well... there are different definitions of cloud
14:58:31 <edmondsw> I think you're falling into the nova definition trap :)
14:58:40 <edmondsw> s/nova/certain nova cores/
14:58:55 <edmondsw> #endmeeting