#openstack-meeting-alt log

18:04:27 <dmitryme> #startmeeting Savanna
18:04:28 <openstack> Meeting started Wed May  8 18:04:27 2013 UTC.  The chair is dmitryme. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:04:29 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:04:32 <openstack> The meeting name has been set to 'savanna'
18:04:49 <dmitryme> Here is our agenda for today:
18:05:10 <dmitryme> #info 1. Savanna 0.1.1 is released
18:05:19 <dmitryme> #info 2. The docs are moved to readthedocs.org
18:05:23 <dmitryme> #info 3. We continue to discuss Pluggable Provisioning Mechanism for phase 2
18:05:36 <dmitryme> and that is all the news we have for today
18:05:56 <dmitryme> In more details:
18:06:15 <dmitryme> As I said we've just release a new version
18:06:29 <dmitryme> it contains a number of fixes and enhancements
18:07:03 <dmitryme> you can see the full list by the following links:
18:07:09 <dmitryme> #link https://launchpad.net/savanna/0.1/0.1.1a1
18:07:14 <dmitryme> #link https://launchpad.net/savanna/0.1/0.1.1a2
18:07:25 <dmitryme> #link https://launchpad.net/savanna/0.1/0.1.1
18:07:51 <rnirmal> cool thanks for the update dmitryme
18:08:25 <dmitryme> Ok, as for the docs, they were moved to the http://savanna.readthedocs.org/
18:08:59 <dmitryme> we've also updated the wiki and launchpad, they both reference readthedocs as the main location
18:10:13 <dmitryme> And we continue our discussion on the Pluggable Provisioning Mechanism
18:11:00 <dmitryme> I will not retell it all there :-), just take a look at mailing archive if you're interested:
18:11:05 <dmitryme> #link https://lists.launchpad.net/savanna-all/
18:11:49 <jmaron> do you want to take up some of that discussion here, or continue on email?
18:11:55 <dmitryme> that is pretty all we wanted to anounce today
18:12:06 <rnirmal> is there any specific agenda for today or open discussion ? or I suppose discuss on the pluggable provisioning part
18:12:10 <dmitryme> sure, I guess that is the place
18:12:24 <dmitryme> we have agenda for discussion
18:12:26 <dmitryme> ou
18:12:34 <dmitryme> we have NO agenda for discussion
18:13:02 <rnirmal> ah ok.. cool so maybe if we want to talk about some of the points for pluggable provisioning
18:13:20 <dmitryme> sure, why not
18:13:58 <dmitryme> feel free to ask anything what concerns you, we will try to answer everything
18:14:31 <rnirmal> suppose exec_resource_action has been the most discussed without any conclusion
18:14:54 <jmaron> I responded with some additional info/context to the email
18:15:02 <jmaron> does that response clear things up?
18:16:03 <ruhe> jmaron, can you please provide an example use case for ambari?
18:16:37 <rnirmal> jmaron: some of the issues I see with that are... what do the responses look like.. those seem like they would be specific per plugin
18:18:10 <rnirmal> just at an api interaction level. is the expectation that the user makes a POST/PUT call that gets passed to the exec_resource_action
18:18:38 <jmaron> exactly. they would be. the idea here is that your an experienced ambari administrator with existing scripting capabilities.  but since you're provisioning clusters dynamically you do not want to keep modifying the host and port etc to communicate directly with ambari.  savanna can act as a "gateway" so that you continue to interact with the savanna server but calls go to the current hadoop cluster(s)
18:19:58 <jmaron> I haven't thought thru this completely, and I'm not a security expert, but there also seems to be a capability for having the savanna server sitting in a DMZ fronting clusters that exist within an enterprise that doesn't want those resources exposed directly
18:20:31 <ruhe> ok, i see. thanks
18:20:37 <rnirmal> from the savanna standpoint performing the get_maagement_urls() call to return that information seems more adept for savanna, than having to pass thru calls to the provider
18:21:37 <jmaron> in a dynamic cluster environment, especially the analytics case, those URIs, though available, will be fairly transient
18:22:05 <jmaron> this is simply a convenience for those scenarios
18:22:22 <rnirmal> sure agree with that... but wouldn't it be as easy to get the lastest URI for the cluster and perform those operations as you would with Ambari today
18:22:40 <rnirmal> instead of having them interlaced with savanna
18:22:44 <dmitryme> Jon, to me that Savanna sitting in DMZ sounds like a usefull usecase
18:23:25 <rnirmal> also pardon my ignorance... I haven't really used ambari to comment on it properly.. I'm just looking at it from a savanna generic service standpoint
18:24:21 <jmaron> rnirmal, in a manual interaction, yes - that would be feasible.  but what about an automated scenario (monitoring scripts)?  the pass thru capability enables that much more readily
18:24:27 <jmaron> and there is the DMZ use case as well
18:24:55 <jmaron> where the actual management URIs aren't exposed to the end users (analysts)
18:25:04 <dmitryme> as for automatic - actually auto client can query get_management_urls and take the one with specific name
18:25:25 <dmitryme> I mean that could be easily automated
18:25:26 <rnirmal> understandable for a case where the URI's need not be exposed to the end user
18:26:04 <jmaron> one other capability:
18:26:13 <dmitryme> Yes as I said, I agree with security usecase
18:27:00 <jmaron> the plugin could actually interpret the URI requested as a request to consolidate info from multiple hadoop clusters (each managed separately)
18:27:35 <jmaron> a feature that could be easily enabled if the plugin is allowed to process the URI and the response
18:28:58 <rnirmal> but that goes out of scope for the savanna api... since it would be operating on a specific cluster. I understand the plugin could support it, we also need to think how it's going to be exposed in savanna.
18:29:49 <jmaron> this isn't a UI targeted feature.  end users querying for such information are targeting specific providers with REST invocations
18:30:22 <rnirmal> so something like extensions to the base savanna api ?
18:30:36 <jmaron> in this particular case savanna is simply a REST "gateway"
18:31:02 <jspeidel> basically savanna would just need to expose a new endpoint such as 'hadoop'
18:31:15 <rnirmal> yeah not worried about the UI... just the savanna api part
18:31:20 <jspeidel> all requests to this endpoint would be passed through to the appropriate provider
18:32:24 <dmitryme> as for me, I think me and my team need time to consider all pros and cons
18:32:38 <ruhe> we need to compose pros and cons of this approach, compared to simple api call which would return management url
18:32:42 <dmitryme> generally pros are what Jon said right now
18:32:56 <jmaron> GET /v1/{tenant-id}/hadoop/{provider_id}/clusters/c1/services/HDFS/components/DATANODE would be a request that would be passed to given {provider_id}
18:33:34 <dmitryme> the main cons - we're not sure if that will be a "popular"
18:33:34 <jmaron> so URI from {provider_id} on would be interpreted by plugin
18:33:43 <dmitryme> "popular" feature
18:34:09 <jmaron> it's not "sexy", but it would support admin tasks
18:34:25 <jmaron> and some security scenarios
18:34:29 <rnirmal> dmitryme: you beat me to it... I was going to say is it a case that is applicable for more than just one plugin
18:34:53 <jspeidel> most hadoop providers hava a rest api
18:35:07 <jspeidel> for monitoring and management
18:35:33 <dmitryme> by "popular" I mean popularity within end users
18:35:35 <jspeidel> this provides flexibility to the user without complicating the savanna api
18:36:01 <jspeidel> currently, there is great demand for the Ambari REST api's for HDP users
18:36:26 <jspeidel> I would assume the same would hold for Cloudera, etc
18:36:27 <jmaron> and if we're concerned about perception, there is no exposure to end users via UI elements etc.  It is an admin feature
18:37:32 <dmitryme> ok guys, your points sounds reasonable, just give us some time to consider that
18:37:40 <jmaron> ok.  thanks!
18:37:49 <rnirmal> jmaron: I wouldn't be too opposed to it, if it's proposed as an admin feature  ;) .. the perception holds
18:38:09 <ruhe> if i have a script to manage cluster (hdp or cdh) and i have a cluster i want to manage through rest api. I have two choices here: 1. update cluster name in the script to work with it. 2. update management url to work with it. So I don't see a difference between returning management URL or passing through such requests
18:38:26 <rnirmal> jmaron: another question... so it's a pass thru rest call ?
18:38:39 <rnirmal> well n/m
18:38:54 <rnirmal> it still has to be passed to the plugin right
18:39:15 <jspeidel> one  difference is that the user would need to resolve the public ip addr and port of each management server prior to invoking the api
18:39:32 <jspeidel> instead of just providing a cluster name
18:39:46 <jmaron> ruhe:  there's a third option - you don't have to modify the script at all.  you continue to make your requests to savanna
18:39:48 <jspeidel> and having savanna/plugin resolve the cluster management server
18:40:45 <jspeidel> also, savanna could streamline security as mentioned earlier instead of the user having to obtain management specific user credentials
18:41:03 <rnirmal> other than the proxy part.. I'm still not seeing too much differences between both the approaches.
18:41:34 <rnirmal> sorry benefits one over the other
18:42:02 <rnirmal> I suppose lets doc the pros/cons and get back to it
18:42:19 <ruhe> agree
18:42:45 <jmaron> I'm not sure how you can argue with the DMZ/security proxy use case.  but yes - let's think about it some more...
18:44:06 <ruhe> DMZ is a good case of course
18:45:58 <jspeidel> I can't really think of any cons for providing this and haven't seen any mentioned yet
18:47:37 <jmaron> on to…you seem to have  a concern with an "execute" command with a list of prompt responses to handle situations in which there is an interactive session?  we're concerned with writing that capability in the plugin since it make environmental assumptions (i.e. OS, SSH availability) in the plugin which we feel are unwise…
18:47:57 <rnirmal> with multiple providers handling the requests / responses could make the savanna api complicated.
18:47:57 <jmaron> "forces the plugin to make.."
18:48:26 <jmaron> only one plugin would handle the request
18:48:35 <jspeidel> rnimral: the savanna api would only need to add a single 'hadoop' endpoint
18:48:53 <jmaron> there's nothing complicated about the api.  all it means from savanna is exposing a context root
18:50:15 <ruhe> Dmitry pointed a couple of cons today in the mailing list: supporting exec_resource_action() call will require significant amount of work. It will include HTTP proxy functionality, error handling, etc.
18:50:41 <dmitryme> Jon, as for interactive execute. No matter where this code will be, it will need to handler OS differences.
18:50:53 <jmaron> I think that's a misunderstanding?  the plugin is making the REST invocation
18:51:36 <dmitryme> On the other side, at this time we think about working mainly with RHEL and Ubuntu
18:51:37 <jspeidel> dmitryme: that is why this should be provided as a service to the plugin
18:52:00 <jmaron> right - but that abstraction is precisely the sort of service we expect of the controller
18:52:01 <jspeidel> the hadoop provider should focus on hadoop
18:52:11 <jspeidel> not low level connection details
18:52:15 <ruhe> jmaron, it's seems to me that each plugin will end up with it's own rest client implementation
18:52:40 <jspeidel> each already have their own REST API's
18:52:42 <dmitryme> and will require its own set of utilities, which is not good
18:52:48 <jmaron> ruhe:  unless the controller provides HTTP client as a service
18:53:09 <dmitryme> we want Savanna to keep only API common for different plugins
18:53:31 <ruhe> looking at cloudera rest api python client- it's a sufficient amount of code
18:54:17 <jmaron> in our view, the plugin should only deal with hadoop provisioning and be abstracted from environmental concerns.  any leakage of the environment into the plugin is going to make the whole thing very brittle
18:54:34 <jspeidel> ruhe: not sure I understand.  what does it matter how much code in in the cloudera python client?
18:56:26 <rnirmal> yeah that shouldn't matter... it will just be a dependency and not actually live within the savanna codebase.
18:57:40 <jspeidel> we are simply proposing making access to provider api's easier for a savanna user by providing a savanna context root
18:57:48 <jspeidel> it is not a dependency
18:58:32 <jspeidel> all you would need to do is pass the request to the provider and the provider would execute the rest call against the correct management server
18:59:03 <rnirmal> well if they plan on using their python sdk to interface with the rest api then it is... but that's a specific implementation detail
18:59:25 <jspeidel> yes, it is really just an http call right
18:59:56 <rnirmal> yup
19:00:16 <rnirmal> well think times up..
19:00:27 <dmitryme> Jon, actually commands passed over SSH will always be environment-dependant, even if we implment interactive execute inside Savanna
19:01:04 <dmitryme> I mean, you will have list of commands and env variables dependent on OS you run on
19:01:24 <jspeidel> that is really no different that providing the ability to copy files is it?
19:01:37 <dmitryme> in broader case, you might even run on non-bash shell
19:02:06 <jmaron> right.  it seems to me you're making our argument...
19:02:16 <jmaron> the controller should abstract those details
19:02:23 <jmaron> and allow plugins to simply execute
19:02:51 <jspeidel> otherwise every hadoop provider will need to deal with these VM provider level details
19:02:55 <jmaron> since it would be a mistake to have plugins assume bash, or ssh availability, or ftp availability
19:03:37 <jmaron> the plugins should request a service (e.g. execute command on host) and not have to worry about the execution details
19:04:49 <ruhe> agree, plugin should not deal with OS details.
19:04:56 <jmaron> imagine an openstack deployment on windows....
19:06:30 <jmaron> the plugin should still work
19:06:58 <ruhe> yep, I too think that controller should take control of that
19:06:59 <dmitryme> ok, I guess we
19:07:00 <jmaron> but savanna would have an execution toolkit for supporting the same functionality on windows
19:07:06 <dmitryme> we're out of time
19:07:12 <jspeidel> the vm plugin would know how to deal with windows in this case
19:08:02 <jmaron> "vm plugin"  - VM provisioning (as opposed to hadoop plugin) (just to be clear)
19:08:32 <jspeidel> yes
19:08:47 <jspeidel> it would know how to deal with the underlying vm's
19:09:51 <rnirmal> yeah that will be something that needs to be built. that's a whole another topic
19:09:53 <jmaron> and the controller would still support "execute on host".  the plugin would not know that the execution is occurring on a windows VM
19:10:02 <ruhe> we only need to carefully pick the right tool for this task. do you have suggestions?
19:10:30 <jmaron> task?
19:11:03 <ruhe> provide OS-abstract functions such as install, execute
19:12:43 <jmaron> I have no suggestions off the top of my head.  I'm just making the architectural argument that these abstractions are important to a resilient successful software product
19:13:34 <ruhe> ok, I agree with your argument, just wondering what would be the right tool
19:14:43 <rnirmal> ruhe: you mean like a cross platform tool to do it?
19:15:30 <ruhe> yes. something like puppet or chef
19:15:39 <ruhe> but simpler :)
19:16:12 <rnirmal> yeah also need to look at heat a little more.. maybe something we can leverage from there.
19:16:54 <rnirmal> anyways.. shall we end today's meeting... just seems like going into a long tail of conversations that can be carried over to #savanna
19:17:35 <dmitryme2> yep, lets continue the discussion in emails
19:18:23 <dmitryme> #endmeeting