18:04:27 #startmeeting Savanna 18:04:28 Meeting started Wed May 8 18:04:27 2013 UTC. The chair is dmitryme. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:04:29 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:04:32 The meeting name has been set to 'savanna' 18:04:49 Here is our agenda for today: 18:05:10 #info 1. Savanna 0.1.1 is released 18:05:19 #info 2. The docs are moved to readthedocs.org 18:05:23 #info 3. We continue to discuss Pluggable Provisioning Mechanism for phase 2 18:05:36 and that is all the news we have for today 18:05:56 In more details: 18:06:15 As I said we've just release a new version 18:06:29 it contains a number of fixes and enhancements 18:07:03 you can see the full list by the following links: 18:07:09 #link https://launchpad.net/savanna/0.1/0.1.1a1 18:07:14 #link https://launchpad.net/savanna/0.1/0.1.1a2 18:07:25 #link https://launchpad.net/savanna/0.1/0.1.1 18:07:51 cool thanks for the update dmitryme 18:08:25 Ok, as for the docs, they were moved to the http://savanna.readthedocs.org/ 18:08:59 we've also updated the wiki and launchpad, they both reference readthedocs as the main location 18:10:13 And we continue our discussion on the Pluggable Provisioning Mechanism 18:11:00 I will not retell it all there :-), just take a look at mailing archive if you're interested: 18:11:05 #link https://lists.launchpad.net/savanna-all/ 18:11:49 do you want to take up some of that discussion here, or continue on email? 18:11:55 that is pretty all we wanted to anounce today 18:12:06 is there any specific agenda for today or open discussion ? or I suppose discuss on the pluggable provisioning part 18:12:10 sure, I guess that is the place 18:12:24 we have agenda for discussion 18:12:26 ou 18:12:34 we have NO agenda for discussion 18:13:02 ah ok.. cool so maybe if we want to talk about some of the points for pluggable provisioning 18:13:20 sure, why not 18:13:58 feel free to ask anything what concerns you, we will try to answer everything 18:14:31 suppose exec_resource_action has been the most discussed without any conclusion 18:14:54 I responded with some additional info/context to the email 18:15:02 does that response clear things up? 18:16:03 jmaron, can you please provide an example use case for ambari? 18:16:37 jmaron: some of the issues I see with that are... what do the responses look like.. those seem like they would be specific per plugin 18:18:10 just at an api interaction level. is the expectation that the user makes a POST/PUT call that gets passed to the exec_resource_action 18:18:38 exactly. they would be. the idea here is that your an experienced ambari administrator with existing scripting capabilities. but since you're provisioning clusters dynamically you do not want to keep modifying the host and port etc to communicate directly with ambari. savanna can act as a "gateway" so that you continue to interact with the savanna server but calls go to the current hadoop cluster(s) 18:19:58 I haven't thought thru this completely, and I'm not a security expert, but there also seems to be a capability for having the savanna server sitting in a DMZ fronting clusters that exist within an enterprise that doesn't want those resources exposed directly 18:20:31 ok, i see. thanks 18:20:37 from the savanna standpoint performing the get_maagement_urls() call to return that information seems more adept for savanna, than having to pass thru calls to the provider 18:21:37 in a dynamic cluster environment, especially the analytics case, those URIs, though available, will be fairly transient 18:22:05 this is simply a convenience for those scenarios 18:22:22 sure agree with that... but wouldn't it be as easy to get the lastest URI for the cluster and perform those operations as you would with Ambari today 18:22:40 instead of having them interlaced with savanna 18:22:44 Jon, to me that Savanna sitting in DMZ sounds like a usefull usecase 18:23:25 also pardon my ignorance... I haven't really used ambari to comment on it properly.. I'm just looking at it from a savanna generic service standpoint 18:24:21 rnirmal, in a manual interaction, yes - that would be feasible. but what about an automated scenario (monitoring scripts)? the pass thru capability enables that much more readily 18:24:27 and there is the DMZ use case as well 18:24:55 where the actual management URIs aren't exposed to the end users (analysts) 18:25:04 as for automatic - actually auto client can query get_management_urls and take the one with specific name 18:25:25 I mean that could be easily automated 18:25:26 understandable for a case where the URI's need not be exposed to the end user 18:26:04 one other capability: 18:26:13 Yes as I said, I agree with security usecase 18:27:00 the plugin could actually interpret the URI requested as a request to consolidate info from multiple hadoop clusters (each managed separately) 18:27:35 a feature that could be easily enabled if the plugin is allowed to process the URI and the response 18:28:58 but that goes out of scope for the savanna api... since it would be operating on a specific cluster. I understand the plugin could support it, we also need to think how it's going to be exposed in savanna. 18:29:49 this isn't a UI targeted feature. end users querying for such information are targeting specific providers with REST invocations 18:30:22 so something like extensions to the base savanna api ? 18:30:36 in this particular case savanna is simply a REST "gateway" 18:31:02 basically savanna would just need to expose a new endpoint such as 'hadoop' 18:31:15 yeah not worried about the UI... just the savanna api part 18:31:20 all requests to this endpoint would be passed through to the appropriate provider 18:32:24 as for me, I think me and my team need time to consider all pros and cons 18:32:38 we need to compose pros and cons of this approach, compared to simple api call which would return management url 18:32:42 generally pros are what Jon said right now 18:32:56 GET /v1/{tenant-id}/hadoop/{provider_id}/clusters/c1/services/HDFS/components/DATANODE would be a request that would be passed to given {provider_id} 18:33:34 the main cons - we're not sure if that will be a "popular" 18:33:34 so URI from {provider_id} on would be interpreted by plugin 18:33:43 "popular" feature 18:34:09 it's not "sexy", but it would support admin tasks 18:34:25 and some security scenarios 18:34:29 dmitryme: you beat me to it... I was going to say is it a case that is applicable for more than just one plugin 18:34:53 most hadoop providers hava a rest api 18:35:07 for monitoring and management 18:35:33 by "popular" I mean popularity within end users 18:35:35 this provides flexibility to the user without complicating the savanna api 18:36:01 currently, there is great demand for the Ambari REST api's for HDP users 18:36:26 I would assume the same would hold for Cloudera, etc 18:36:27 and if we're concerned about perception, there is no exposure to end users via UI elements etc. It is an admin feature 18:37:32 ok guys, your points sounds reasonable, just give us some time to consider that 18:37:40 ok. thanks! 18:37:49 jmaron: I wouldn't be too opposed to it, if it's proposed as an admin feature ;) .. the perception holds 18:38:09 if i have a script to manage cluster (hdp or cdh) and i have a cluster i want to manage through rest api. I have two choices here: 1. update cluster name in the script to work with it. 2. update management url to work with it. So I don't see a difference between returning management URL or passing through such requests 18:38:26 jmaron: another question... so it's a pass thru rest call ? 18:38:39 well n/m 18:38:54 it still has to be passed to the plugin right 18:39:15 one difference is that the user would need to resolve the public ip addr and port of each management server prior to invoking the api 18:39:32 instead of just providing a cluster name 18:39:46 ruhe: there's a third option - you don't have to modify the script at all. you continue to make your requests to savanna 18:39:48 and having savanna/plugin resolve the cluster management server 18:40:45 also, savanna could streamline security as mentioned earlier instead of the user having to obtain management specific user credentials 18:41:03 other than the proxy part.. I'm still not seeing too much differences between both the approaches. 18:41:34 sorry benefits one over the other 18:42:02 I suppose lets doc the pros/cons and get back to it 18:42:19 agree 18:42:45 I'm not sure how you can argue with the DMZ/security proxy use case. but yes - let's think about it some more... 18:44:06 DMZ is a good case of course 18:45:58 I can't really think of any cons for providing this and haven't seen any mentioned yet 18:47:37 on to…you seem to have a concern with an "execute" command with a list of prompt responses to handle situations in which there is an interactive session? we're concerned with writing that capability in the plugin since it make environmental assumptions (i.e. OS, SSH availability) in the plugin which we feel are unwise… 18:47:57 with multiple providers handling the requests / responses could make the savanna api complicated. 18:47:57 "forces the plugin to make.." 18:48:26 only one plugin would handle the request 18:48:35 rnimral: the savanna api would only need to add a single 'hadoop' endpoint 18:48:53 there's nothing complicated about the api. all it means from savanna is exposing a context root 18:50:15 Dmitry pointed a couple of cons today in the mailing list: supporting exec_resource_action() call will require significant amount of work. It will include HTTP proxy functionality, error handling, etc. 18:50:41 Jon, as for interactive execute. No matter where this code will be, it will need to handler OS differences. 18:50:53 I think that's a misunderstanding? the plugin is making the REST invocation 18:51:36 On the other side, at this time we think about working mainly with RHEL and Ubuntu 18:51:37 dmitryme: that is why this should be provided as a service to the plugin 18:52:00 right - but that abstraction is precisely the sort of service we expect of the controller 18:52:01 the hadoop provider should focus on hadoop 18:52:11 not low level connection details 18:52:15 jmaron, it's seems to me that each plugin will end up with it's own rest client implementation 18:52:40 each already have their own REST API's 18:52:42 and will require its own set of utilities, which is not good 18:52:48 ruhe: unless the controller provides HTTP client as a service 18:53:09 we want Savanna to keep only API common for different plugins 18:53:31 looking at cloudera rest api python client- it's a sufficient amount of code 18:54:17 in our view, the plugin should only deal with hadoop provisioning and be abstracted from environmental concerns. any leakage of the environment into the plugin is going to make the whole thing very brittle 18:54:34 ruhe: not sure I understand. what does it matter how much code in in the cloudera python client? 18:56:26 yeah that shouldn't matter... it will just be a dependency and not actually live within the savanna codebase. 18:57:40 we are simply proposing making access to provider api's easier for a savanna user by providing a savanna context root 18:57:48 it is not a dependency 18:58:32 all you would need to do is pass the request to the provider and the provider would execute the rest call against the correct management server 18:59:03 well if they plan on using their python sdk to interface with the rest api then it is... but that's a specific implementation detail 18:59:25 yes, it is really just an http call right 18:59:56 yup 19:00:16 well think times up.. 19:00:27 Jon, actually commands passed over SSH will always be environment-dependant, even if we implment interactive execute inside Savanna 19:01:04 I mean, you will have list of commands and env variables dependent on OS you run on 19:01:24 that is really no different that providing the ability to copy files is it? 19:01:37 in broader case, you might even run on non-bash shell 19:02:06 right. it seems to me you're making our argument... 19:02:16 the controller should abstract those details 19:02:23 and allow plugins to simply execute 19:02:51 otherwise every hadoop provider will need to deal with these VM provider level details 19:02:55 since it would be a mistake to have plugins assume bash, or ssh availability, or ftp availability 19:03:37 the plugins should request a service (e.g. execute command on host) and not have to worry about the execution details 19:04:49 agree, plugin should not deal with OS details. 19:04:56 imagine an openstack deployment on windows.... 19:06:30 the plugin should still work 19:06:58 yep, I too think that controller should take control of that 19:06:59 ok, I guess we 19:07:00 but savanna would have an execution toolkit for supporting the same functionality on windows 19:07:06 we're out of time 19:07:12 the vm plugin would know how to deal with windows in this case 19:08:02 "vm plugin" - VM provisioning (as opposed to hadoop plugin) (just to be clear) 19:08:32 yes 19:08:47 it would know how to deal with the underlying vm's 19:09:51 yeah that will be something that needs to be built. that's a whole another topic 19:09:53 and the controller would still support "execute on host". the plugin would not know that the execution is occurring on a windows VM 19:10:02 we only need to carefully pick the right tool for this task. do you have suggestions? 19:10:30 task? 19:11:03 provide OS-abstract functions such as install, execute 19:12:43 I have no suggestions off the top of my head. I'm just making the architectural argument that these abstractions are important to a resilient successful software product 19:13:34 ok, I agree with your argument, just wondering what would be the right tool 19:14:43 ruhe: you mean like a cross platform tool to do it? 19:15:30 yes. something like puppet or chef 19:15:39 but simpler :) 19:16:12 yeah also need to look at heat a little more.. maybe something we can leverage from there. 19:16:54 anyways.. shall we end today's meeting... just seems like going into a long tail of conversations that can be carried over to #savanna 19:17:35 yep, lets continue the discussion in emails 19:18:23 #endmeeting