17:00:35 <tjones> #startmeeting vmwareapi
17:00:36 <openstack> Meeting started Wed Jun  4 17:00:35 2014 UTC and is due to finish in 60 minutes.  The chair is tjones. Information about MeetBot at http://wiki.debian.org/MeetBot.
17:00:37 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
17:00:39 <openstack> The meeting name has been set to 'vmwareapi'
17:01:06 <tjones> hi folks - who is here today?
17:01:15 <browne> me
17:01:17 <arnaud> hi
17:01:32 <vuil> hi
17:01:40 <garyk> eitaj
17:01:46 <mdbooth> hi
17:01:50 <tjones> garyk: ?
17:02:10 <garyk> eitah is hello is south african slang
17:02:17 <tjones> lol - cool
17:02:22 <tjones> ok lets get started
17:02:39 <tjones> today is nova bug day - that means we should be trying to fix/review bugs today instead of feature work
17:03:43 <tjones> still i want to get a sense of where we are for features in the meeting today.  I will be sending out an email on the ML showing our status with refactor specifically
17:03:58 <garyk> tjones: sounds good
17:04:00 <tjones> so lets talk about approved BP 1st
17:04:09 <tjones> #topic approved BP status
17:04:10 <tjones> https://blueprints.launchpad.net/openstack?searchtext=vmware
17:04:36 <tjones> phase 1 of spawn refactor has a +2 from matt.
17:04:41 <tjones> so we are almost done with that
17:04:49 <garyk> ahhh, is that why it is arining outside
17:04:53 <garyk> raining
17:04:59 <tjones> vui - want to talk about phase 2?  (lol gary)
17:05:27 <garyk> my concerns with the refactoring are that backprting patches is really difficult
17:05:45 <vuil> the patches are just about done.
17:06:06 <vuil> I am in the rebase/reorg phase, to arrange them into self contained patches to post.
17:06:25 <vuil> hopefully by today if no complications comes up
17:06:31 <mdbooth> garyk: Indeed. Incidentally, that's also a reason to try to break down patches into small chunks.
17:06:33 <tjones> vuil: once done with that please send me an email and i'll use that as part of the message to the ML
17:06:46 <tjones> garyk: is there any possiblity of backporting the refactor?
17:06:49 <vuil> will do.
17:07:00 <garyk> mdbooth: i am not sure that small chunks will help when the whole code base is updated.
17:07:18 <garyk> tjones: i am not sure about that. it may be a idea well worth exploring
17:07:54 <garyk> tjones: maybe when we are done we can consider it and write a mail to the stable team if relevant
17:08:04 <mdbooth> garyk: Well, the whole codebase needs to be updated :) Small chunks makes it easier to see what we did.
17:08:08 <garyk> i think that we have hijacked the refactoring explanations by vui, sorry
17:08:24 <tjones> lets take this to the open discussion part of the meeting
17:08:29 <garyk> ok, thanks
17:08:30 <tjones> vuil: anything else on phase 2?
17:09:02 <vuil> not really, was thinking about the backports, but let's take that in the open disc.
17:09:06 <tjones> ok
17:09:18 <tjones> how about oslo?  blocked on phase 2 i suspect
17:09:51 <vuil> yeah, once the spawn refactor work goes up, it will be easier to continue on the oslo bit.
17:09:54 * mdbooth almost has concrete plans for an updated api, btw
17:10:24 <tjones> mdbooth: lets take that in open discussion too :-D
17:10:25 <vuil> mdbooth: share when do
17:10:35 <tjones> last approved BP is hot plug - garyk?
17:10:51 <garyk> tjones: it has been in review for months :)
17:11:08 <garyk> i rebased and update the code after the summit...
17:11:22 <tjones> garyk: thought you said there was something to do last week??  if not we need to get on the reviews
17:11:54 <tjones> #action for all to review https://review.openstack.org/59365 and https://review.openstack.org/#/c/91005/
17:12:18 <garyk> tjones: not that i am aware of. rado wanted me to consider an idea he had. i am thinking about that but i do not think it should block what is posted
17:12:25 <tjones> please take a look at those reviews (tomorrow after bug day)
17:12:36 <tjones> #topic BP in review
17:12:37 <tjones> https://review.openstack.org/#/q/status:open+project:openstack/nova-specs+message:vmware,n,z
17:13:17 <tjones> anyone here want to discuss a BP in this list?  There are a number needing update, but not sure if those flks are attending
17:13:54 <garyk> tjones: i have posted the spec for the ephemeral disk support
17:13:58 <garyk> will add the code tomorrow
17:14:03 <tjones> garyk: great!  thanks
17:15:14 <tjones> ok assuming no other BP discsussion needed.
17:15:31 <tjones> #topic bugs
17:15:50 <tjones> our 50 bugs http://tinyurl.com/p28mz43
17:16:29 <tjones> of those the ones which are not assigned http://tinyurl.com/kkyw9c4
17:16:39 <tjones> 18 of them
17:16:46 <tjones> this is the perfect day to work on this list
17:17:16 <tjones> anyone have a bug to discuss or should we go to open discussion since we have a lot to talk about?
17:17:22 <tjones> going once.....
17:17:40 <tjones> #topic open discussion
17:17:41 <tjones> GO
17:18:03 <mdbooth> iSCSI
17:18:05 <tjones> i think we wanted to talk about backporting, api, and iSCSI
17:18:15 <tjones> mdbooth: want to start?
17:18:16 <garyk> tjones: when you send the mail to the list about the refactor can you please add in the bug lists
17:18:23 <tjones> garyk: will do
17:18:28 <garyk> tjones: thanks
17:18:44 <mdbooth> iSCSI problem is hard, because it seems to require cluster config
17:18:45 <arnaud> mdbooth, I have posted a patch for the second issue (the first one being the auth that I think you are fixing)
17:19:04 <mdbooth> arnaud: What's the second issue?
17:19:14 <arnaud> https://review.openstack.org/#/c/97612/
17:19:46 <mdbooth> arnaud: That's probably good, but unfortunately it doesn't solve the problem
17:19:57 <arnaud> hmm
17:20:03 <arnaud> that solves a part of the problem
17:20:13 <arnaud> the fact that you have powered off hosts
17:20:19 <arnaud> is not specific to ISCSI
17:20:26 <mdbooth> iSCSI disks are rdm devices
17:20:35 <mdbooth> rdm devices need to be present on all hosts
17:20:41 <arnaud> yes
17:20:50 <arnaud> I agree with that
17:20:54 <vuil> that is the 'third' problem I guess, that is hard
17:21:07 <mdbooth> Right
17:21:15 <arnaud> we have been looking at the vMotion problem
17:21:21 <arnaud> 'third' problem
17:21:36 <mdbooth> It's not just a vMotion problem
17:21:42 <vuil> We tried to establish that DRS will not auto vmotion to a host that cannot see the device, which I believe to be the case
17:22:05 <mdbooth> If the vm is taken down, it might not be able to come up again if it can't get on the host with its rdm device
17:22:21 <arnaud> same for a new host
17:22:31 <mdbooth> yup
17:22:59 <mdbooth> Anyway, auth is easy to add
17:23:25 <mdbooth> But iSCSI is kinda broken until we come up with a solution to this
17:23:31 <mdbooth> How much do we care?
17:23:57 <arnaud> we are looking at it so yes we care
17:23:58 <arnaud> :)
17:24:29 <mdbooth> Are there any other examples of config which needs to be on all hosts?
17:24:40 * mdbooth can't think of any
17:25:20 <mdbooth> I think fixing this may require a new db table
17:25:44 <garyk> mdbooth: a new db stable specific to a virt driver will be problematic
17:25:51 <vuil> what do you mean by config that needs to be on all host here?
17:25:55 <mdbooth> garyk: I guessed as much :(
17:25:57 <vuil> list of rdm devices?
17:26:03 <mdbooth> vuil: Yes
17:26:41 <garyk> mdbooth: there may be system metadata for hosts. i am not sure.
17:27:06 <vuil> I think this does not have to be exposed.
17:27:25 <vuil> It is somewhat analogous to a cluster with non-homogeneous hosts...
17:27:27 <arnaud> you reasoning with the table is to store this config to a table and everytime we scan we update the table, correct?
17:27:43 <vuil> with some host not having enough resources to accept a VM, so it doesn't
17:27:51 <mdbooth> arnaud: Yeah
17:28:49 <mdbooth> vuil: We don't need to expose it. I just think we need to store some state.
17:29:12 <mdbooth> I don't think it's unreasonable for a driver to store persistent state.
17:29:36 <garyk> i do not understand something here - cider is reposnible for the volume right?
17:29:47 <mdbooth> garyk: Yes.
17:30:03 <mdbooth> garyk: Nova is responsible for making it available to the vm.
17:30:06 <garyk> so cider should be aware of where the vm is running
17:30:22 <garyk> then it can either perform the operation if possible or fail if not.
17:30:35 <garyk> maybe i just do not understand the problems. sorry
17:30:37 <mdbooth> garyk: No. The vm is running in the 'cluster'. Cinder should not have an internal view of the cluster.
17:30:58 <garyk> mdbooth: cinder should. maybe the problem should be addressed there.
17:31:24 <mdbooth> garyk: The VM may also move outside of openstack's control.
17:31:38 <mdbooth> Because of HA or DRS, for eg.
17:31:38 <garyk> mdbooth: why?
17:31:56 <garyk> but if cinder was aware of that then it would not be a problem - for example with the vmdk driver
17:31:57 <mdbooth> Or explicit vMotion by an admin.
17:31:58 <vuil> mdbooth: purpose of the peristent state is...?
17:32:31 <vuil> so we know which hosts need to rescan for new devices?
17:32:32 <mdbooth> vuil: To store which targets needs to be configured.
17:32:49 <mdbooth> Poss also rescan.
17:33:21 <vuil> A rescan is needed to discover new targets. A rescan discover all discoverable targets
17:33:23 <mdbooth> I haven't thought it through in detail, but I'm pretty sure doing it without persistent state would be a pain.
17:33:40 <vuil> so in theory there is no need to track how many there is to discover
17:33:42 <mdbooth> vuil: The target must also be added to the hba
17:33:54 <mdbooth> When a new host joins the cluster, it won't have any configured targets
17:34:01 <mdbooth> Or when a new target is added to host a
17:34:06 <mdbooth> it won't also be added to host b
17:34:15 <mdbooth> So you need to track both
17:34:16 <vuil> Yeah sabari and I had a discussion about this.
17:35:21 <vuil> One option is to wait till a new volume is added, and do the rescan
17:35:28 <mdbooth> In theory you could scrape this from existing vm config, but that wouldn't be pretty or cheap
17:35:39 <mdbooth> So technically it would be a denormalisation
17:36:11 <mdbooth> Either way, you're going to want a reference golden state which is replicated to all hosts
17:36:13 <arnaud> vuil, you mean rescan each host everytime we add a volume to 1 host?
17:36:22 <mdbooth> Not just rescan
17:36:29 <mdbooth> Also add new targets as required
17:36:31 <mdbooth> Or remove them
17:36:43 <vuil> yeah, I am assuming that's part of it.
17:36:59 <vuil> but essentially whatever we are doing for existing hosts, do the same for new one.
17:37:04 <vuil> kinda punt the new host problem.
17:37:26 <vuil> it doesn't participate in iscsi/rdm until a new volume is added
17:37:31 <arnaud> the new host can be specified in the doc
17:37:39 <arnaud> when you add a new host to your cluster, you need to scan
17:38:04 <vuil> when add new volume is created and added to an instance, that already requires a rescan
17:38:17 <mdbooth> Is there any vmware feature we can use for this?
17:38:28 <mdbooth> It doesn't sound like a problem which is unique to us
17:38:41 <mdbooth> Maybe some kind of profile
17:38:53 <mdbooth> Then we could modify the profile
17:40:32 <mdbooth> Without that, I think we can do this if we store targets and paths in some kind of persistent storage
17:40:49 <arnaud> tbh, I think that we rescan each host everytime and when we attach we make sure that we take a host that is powered on and we specify in the doc that when we add a new host, we need to scan
17:40:54 <arnaud> the problem is solved
17:40:59 <mdbooth> Rescan is pretty expensive, btw
17:41:07 <mdbooth> Many seconds
17:41:36 <vuil> yeah, but that needs to be done at minimum for one host
17:42:39 <garyk> the patch that arnaud added should have the correct host for the instance
17:42:47 <garyk> would that not suffice?
17:43:01 <mdbooth> garyk: No, because it won't continue to work if the vm moves for any reason.
17:43:19 <arnaud> not enough for the vmotion case and the new host case
17:43:28 <mdbooth> And that move will not necessarily involve openstack.
17:43:58 <garyk> it just seems like an edge case that the admin will move it to a host that does not see the correct device
17:44:06 <vuil> I am leaning towards advocating that vmotion-enablement be done out of band by some admin action
17:44:32 <mdbooth> vuil: Or DRS, or HA
17:44:40 <vuil> by that I mean one can aschronously rescan the hba, and the VMs will be eventually vmotionable.
17:44:58 <garyk> my understanidng would be that the all of the hosts should be able to see the same devices. if not then it seems to be a setup issue
17:45:03 <vuil> DRS should not try and fail to vmotion the VM to a host that cannot see the device
17:45:21 <mdbooth> garyk: It absolutely is a setup issue. The setup needs to be done by Nova.
17:45:28 <vuil> so until some other hosts discovers the iscsi devices, the VM is essentially pinned (but temporarily)
17:45:49 <garyk> mdbooth: why is the setup done by nova?
17:46:06 <mdbooth> vuil: That's fine. But we still need to ensure that the config is propagated eventually.
17:46:20 <garyk> i would think that it is done out of band by someone who wants to increase the capacity of their cloud - they just adnother host to the cluster
17:47:34 <mdbooth> I think: config stored persistently somewhere (where?). Config on target host updated synchronously. Config on all other hosts updated asynchronously. Scan detects new hosts and auto adds all targets.
17:48:12 <mdbooth> garyk: iscsi volumes can come and go. e.g. cinder create ... creates one.
17:48:24 <mdbooth> garyk: It's not feasible for an admin.
17:49:01 <mdbooth> Scan process could also be responsible for async update.
17:49:05 <garyk> it is starting to sound like cinder should be resposnible for this
17:49:15 <mdbooth> Cinder can't be responsible for this.
17:49:21 <garyk> why?
17:49:23 <mdbooth> Cinder doesn't control the cluster.
17:49:27 <arnaud> yeah because, it cinder doesn't know about the hosts
17:49:29 <arnaud> in the cluster
17:49:36 <arnaud> it cannot trigger the rescan of the targets
17:49:37 <garyk> but cinder can add this support?
17:49:52 <mdbooth> garyk: Cinder in this case is an iscsi provider.
17:49:57 <garyk> i just feel that we are trying to solve the problem in the wrong place
17:50:03 <mdbooth> It knows nothing of what is consuming the iscsi volumes.
17:50:15 <mdbooth> This is a consumption problem, not a provision problem.
17:50:19 <arnaud> agreed
17:50:42 <garyk> it receives a request to provide a resource. why does it not be 'clever' about that provisioning
17:51:16 <mdbooth> Shall we continue this on the list? 9 mins for other topics.
17:51:24 <arnaud> the iscsi logic in cinder should not be aware of vmware
17:51:27 <vuil> because the cinder driver is not VC aware
17:51:33 <tjones> mdbooth: we could also move to openstack-vmware to continue
17:51:37 <tjones> in real-time
17:51:42 <mdbooth> Ok
17:52:19 <garyk> ok, i was not aware that the cinder driver was not aware of vmware
17:52:24 <vuil> lets do that
17:52:33 <arnaud> garyk it's not the vmdk driver in cinder
17:52:38 <arnaud> it's the lvm iscsi driver
17:52:44 <garyk> i am currently working on the esx deprecation.
17:53:01 <garyk> there are a number of issues there. i will send a mail to the list at some stage or another
17:53:09 <tjones> the other topcis i am aware of are backporting due to the refactor.  I think we decided that once phase 1 is complete, gary will ask the stable core if we can backpoint the refactor.  the other issue was the api changes mdbooth was mentioning
17:53:46 <tjones> mdbooth: fo you want to discuss this more here or was it just a heads up?
17:53:46 <garyk> what api changes?
17:53:47 <arnaud> quick question: what is the advantage of backporting the refactor?
17:53:57 <vuil> there is some churn in phase one, but the main upheaval happens in phase 2/3
17:54:06 <vuil> faciliate backports
17:54:06 <garyk> arnaud: we will need to add in bug fixes to the stable branch.
17:54:10 <mdbooth> tjones: We need agreement to move forward. I've dropped it because I don't know how to proceed.
17:54:25 <mdbooth> refactor related: https://review.openstack.org/#/c/97170/
17:54:31 <mdbooth> Different refactor
17:54:42 <mdbooth> Quick opinions on this?
17:54:51 <mdbooth> I similarly dislike vim_util, btw
17:55:00 <garyk> i posted mine on the review
17:55:14 <mdbooth> garyk: Yeah, want to move code changes to another patch.
17:55:16 <arnaud> mdbooth +1 I cannot more agree with this patch
17:55:19 <vuil> seems fine, and fairly orthogonal to the current refactor work
17:55:47 <garyk> mdbooth: why move them to another patch when you can do them on this one?
17:56:11 <tjones> yes this and the power_off are orphans of phase 1 - but i hesitate to link them and block phase 2
17:56:23 <mdbooth> garyk: Because if you're reviewing it, it simpler to see: this patch moves code around, that patch changes the code.
17:56:30 <mdbooth> The 2 things are reviewed differently.
17:56:47 <mdbooth> Also, if you're backporting, you can more easily pick out code changes.
17:56:55 <tjones> cheaper to review a "move only" patch
17:57:03 <mdbooth> Otherwise you're left manually scanning big chunks of very similar code.
17:57:10 <garyk> if so then just address the log comments
17:57:30 <mdbooth> garyk: That said, I agreed with all your comments. Will do another patch.
17:57:41 <mdbooth> i.e. separate patch.
17:57:57 <garyk> no, those should be done in this one
17:58:57 <tjones> ok 2 minutes - i think we still have some chatting to do, so lets move over to openstack-vmware to continue
17:59:06 <garyk> those are in the process of ebing changed in https://review.openstack.org/91352 .
18:00:05 <tjones> gotta end now
18:00:13 <tjones> moving to openstack-vmware
18:00:13 <mdbooth> k
18:00:16 <tjones> #endmeeting