18:06:59 #startmeeting savanna 18:07:00 Meeting started Thu Aug 1 18:06:59 2013 UTC and is due to finish in 60 minutes. The chair is SergeyLukjanov. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:07:01 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:07:03 The meeting name has been set to 'savanna' 18:07:21 #topic Agenda 18:07:21 #info News / updates 18:07:21 #info Action items from the last meeting 18:07:21 #info General discussion 18:07:28 #topic News / updates 18:07:52 whos around beside me and Sergey? ;) 18:08:00 i am 18:08:04 me too 18:08:05 #info Savanna 0.2.1 released has been released with bunch of bug fixes and improvements 18:08:10 me here 18:08:21 cool 18:08:24 #link http://lists.openstack.org/pipermail/openstack-dev/2013-July/012747.html 18:08:37 ^^ announcement of 0.2.1 release with details 18:08:44 #link https://wiki.openstack.org/wiki/Savanna/ReleaseNotes/0.2.1 18:08:48 ^^ release notes 18:08:48 congrats on getting it out! 18:09:13 HDP plugin has been postponed to be polished and released in 0.2.2 version 18:09:30 I mean HDP plugin release 18:10:19 #info active work started on both EDP and scaling component 18:10:44 #info we are working now on implementing conductor abstraction 18:11:05 aignatov_, akuznetsov, are there any updates on EDP side? 18:11:24 ok, my updates about EDP. I mostly worked this week on Oozie service integration into vanilla Plugin 18:11:30 and Nadya too, sorry 18:11:32 it's done i think 18:11:53 first version of REST API for EDP was merged to trunk 18:12:10 Oozie integration code is already merged 18:12:32 also I put dib elements of oozie installation 18:12:42 on this week I worked on workflow.xml helper. Initial version will be on review tomorrow 18:12:47 #link https://review.openstack.org/#/c/39671/ 18:13:01 also we plan to interact with cluster via Oozie and created a simple REST client for Oozie 18:13:11 got reasonable comment frrom matt already)) 18:13:21 it also merged to the trunk 18:13:32 mattf, thx, I will do wget and tar installation 18:13:43 waiting also Ivan's comments 18:13:45 yeah, minor stuff. you should plow forward. 18:14:32 also I upload compiled oozie.tar.gz library to our CDN 18:14:32 tmckayrh, do you have any updates? 18:14:57 there are some differences in workflow.xml between hive, pig and mapreduce jobs so I think it would be useful to write helpers for all this things 18:15:12 its here 18:15:16 #link http://a8e0dce84b3f00ed7910-a5806ff0396addabb148d230fde09b7b.r31.cf1.rackcdn.com/oozie-3.3.2.tar.gz 18:15:23 I am working on learning SqlAlchemy/Flask usage in savanna as a precursor to supporting Swift in the Job Source component 18:15:42 I added some notes to the Savanna EDP api draft etherpad yesterday. 18:16:02 There may be a mistake in the sequence diagrams having to do with job code retrieval 18:16:12 could someone post a link here to etherpad?) 18:16:23 Also, I wonder if we should change terminology slighly 18:16:27 tmckayrh, we are now working on rewriting db-related code to make it more consistent, btw we'll help to port changes if you'll have questions 18:16:29 yes, hold on... 18:16:47 #link https://etherpad.openstack.org/savanna_API_draft_EDP_extensions 18:17:19 tmckayrh, thx 18:17:45 <_crobertsrh> I'm working on the UI for Job Sources. commenting out the actual api calls until they are ready to go. Seems to be going well so far, but I have yet to bring anything very "dynamic" to the UI yet. 18:18:19 tmckayrh, if you have questions about swift integration please post your question in irc. I dealt with swift in savanna 18:18:52 tmckayrh, _crobertsrh, folks, I remember that you have several concerns yesterday? 18:18:56 my terminology question is as follows: "Job Source" means a storage facility for jobs. But "source" often means "code" as well. So "job source" could be ambiguous. 18:19:09 I have a little question about that draft. I hope im not missing anytihng. You create a job execution by specifying a cluster ID. why doesnt the GET method doesnt specify the used cluster? 18:19:30 about Job Source naming and scm 18:19:49 for pig and hive source and code will be the same 18:20:46 So this is where it starts to get confusing :) Suppose we have Pig and Hive jobs stored in a git repository 18:21:22 The "Job Source" is the git, as I see it. The Job Source component manages information about the git repository and how to access it. 18:21:35 Files in the git are "job source code" 18:21:43 benl__, am I understand you correctly that we need to have cluster_id in job? 18:22:04 I meant job execution 18:22:22 benl__, thx, it's a good point 18:22:38 :) 18:22:39 So for example, maybe "Job Source Component" could be "Job Depot Component" or similar. 18:22:40 I added the cluster_id to the job execution object 18:23:06 akuznetsov, thx 18:23:10 thx 18:24:16 SergeyLukjanov, the other issue we chatted about is that the draft API had "Hive" as a type in the Job Source Object example, but I changed it to "Swift". It seems to me that "type" for that object should be git, svn, mercurial, swift, hdfs, internal (for the savanna db), etc. 18:24:27 what are you think about "Job Origin"? 18:24:30 type is describes the storage facility, not what is stored there 18:24:46 I like Job Origin 18:24:51 <_crobertsrh> Job Origin works for me 18:25:15 akuznetsov, aignatov_, Nadya_? 18:25:47 #vote Rename "Job Source" to "Job Origin" 18:25:55 it's not working O_O 18:26:13 #startvote Rename "Job Source" to "Job Origin" 18:26:14 Unable to parse vote topic and options. 18:26:19 So would we rename "Job Source Component" to "Job Origin Component" as well as the classes defined there? 18:26:23 tmckayrh possible we should have a two types in Job Source components one of type there job is stored and second is the type of job like HIve, Pig etc. 18:27:10 akuznetsov, could be. A single repo could store multiple types, though, so maybe a list? My git can contain anything :) 18:27:24 #startvote Rename "Job Source" to …? "Job Depot", "Job Origin" 18:27:25 Begin voting on: Rename "Job Source" to …? Valid vote options are , Job, Depot, Job, Origin, . 18:27:26 Vote using '#vote OPTION'. Only your last vote counts. 18:27:28 I'm thinking the facility type is helpful for finding plugins, etc. 18:27:32 oooops 18:27:35 #undo 18:27:36 Removing item from minutes: 18:27:50 #endvote 18:27:51 Voted on "Rename "Job Source" to …?" Results are 18:28:03 let's not use voting :) 18:28:11 what about Job Home? 18:28:12 * tmckayrh laughs 18:28:23 Home is also good. 18:28:31 #startvote Rename "Job Source" to …? JobDepot, JobOrigin, JobHome 18:28:32 Begin voting on: Rename "Job Source" to …? Valid vote options are JobDepot, JobOrigin, JobHome. 18:28:33 Vote using '#vote OPTION'. Only your last vote counts. 18:28:33 so, there are two things here "Job Source" where the actual code resides. And Job Storage where we'll compiled jar files, Pig scripts, etc, etc 18:28:35 to me, origin is better 18:28:47 looks like it's working good 18:28:50 *we'll keep compiled 18:28:52 #vote JobOrigin 18:29:00 <_crobertsrh> #vote JobOrigin 18:29:02 #vote JobOrigin 18:29:04 #vote JobOrigin 18:29:07 #vote JobHome 18:29:19 #vote JobStorage 18:29:20 ruhe: JobStorage is not a valid option. Valid options are JobDepot, JobOrigin, JobHome. 18:29:24 ruhe, I'm just thinking about the same question) 18:29:25 #vote JobOrigin 18:29:26 I removed type from Job Source object and two fields storage_type and job_type 18:29:35 #vote JobHome 18:29:51 #vote JobDepot 18:30:12 #endvote 18:30:13 Voted on "Rename "Job Source" to …?" Results are 18:30:14 JobDepot (1): akuznetsov 18:30:15 JobOrigin (5): tmckayrh, NikitaKonovalov, dmitryme, _crobertsrh, SergeyLukjanov 18:30:16 JobHome (2): Nadya_, aignatov_ 18:30:35 + JobStorage: ruhe 18:30:53 ok, let it be JobOrigin) 18:30:57 looks like we name it Job Origin :) 18:31:57 tmckayrh, what about SCM? 18:31:59 so please check my understanding. based on ^^, the Job Origin Component will manage the registration of Job Origins. We can define Job Origins there. It can also interact with plugins for particular origin types so that raw, executable job script code can be retrieved from a particular Job Origin (which may be an scm storing jobs of multiple types like Pig, Hive, jar, oozie, etc) 18:32:42 belt__, do akuznetsov's changes resolve your question about cluster_id in job execution? 18:33:07 yeah, looks good now :) 18:33:47 shold JobOrigin manage different storages for the jobs, like Swift, Gluster, HDSF etc? 18:34:16 SergeyLukjanov, I was wondering yesterday if we define each supported SCM as it's own job origin type. So, rather than have a single type "SCM" we would have "git, mercurial, etc" as distinct job origin types. 18:34:51 ruhe, I thought that was the intention. 18:35:01 tmckayrh: +1 18:35:12 <_crobertsrh> I think that would make it easier UI-wise to know which fields need to be displayed for a given type. +1 18:35:21 I would prefer different types 18:35:39 dmitryme, tmckayrh, +1 for have different types 18:37:07 the second question: should it also support different build systems: mvn, gradle, ant, make … ? 18:37:21 ruhe_, lein... 18:37:40 I should update the sequence diagram to reflect this thinking, that the Job Manager on execute() pass "job id" and "job origin id" to the Job Origin component, which then handles interacting with plugins to retrieve the actual code. Agreed? 18:38:35 tmckayrh, agree 18:38:40 ruhe_ I think this support is valuable, but this not for this development phase 18:38:40 +1 18:38:46 ruhe_, how does the Job Origin component interact with build? 18:39:10 tmckayrh, we will process all pig, hive, jar through oozie, right? 18:39:36 Nadya_, yes, we seemed to have consensus on that. I think it's a great idea. 18:40:02 Nadya_, hmm, that brings up a good question.... 18:40:08 ruhe_ Job Origin can download a job code for execution form build server 18:40:14 tmckayrh, yes, great:) just to clarify 18:40:35 If we autowrap scripts in Oozie, does that happen before the job is stored, or after we retrieve it but before it is submitted to the cluster for execution? 18:41:10 Maybe it's better to wrap on the fly, so that a Hive job can still be a Hive job in case external storage is used by something other than savanna 18:41:24 in other words, wrap it late, just before submission 18:41:27 tmckayrh, I'm working on this now. for each job we will create it's own workflow.xml 18:41:38 i agree with this idea 18:41:45 tmckayrh it will happen after downloading job code and before job execution 18:41:46 on the fly I mean 18:41:50 tmckayrh, on the fly, yes 18:41:52 +1 18:42:15 tmckayrh, akuznetsov, just to clarify. JobOrigin doesn't deal with java source code, or any other type of source code. it just works compiled binaries. right? 18:42:35 I don't know the answer to that :) 18:42:36 in this case end use will not need to know the oozie 18:43:26 I was thinking of it as uncompiled code, I admit 18:43:32 ruhe_, +1 18:43:55 ruhe_ yes, not now in this phase, possible in feature we add this functionality 18:44:04 compilation from source code is a nice feature, but it should be dealt with by another component 18:44:13 ruhe_, i also agree with you, it's complex to build code in this stage of EDP develompment 18:44:25 I think it should be separate component for jos sources (binaries I mean) 18:44:50 * tmckayrh has been writing Python too long 18:45:22 So the user must compile against machines similar to the ones in the cluster? 18:45:35 ruhe_, also in case of pig and hive we don't need a compilation 18:45:54 i think we can create job compilation and building on top of well-created JobOrigin - the component whish stores binaries, pig scripts, not code right now 18:46:18 benl__, jars are not OS-dependent 18:46:44 akuznetsov, sometimes we need, actually in most cases people are using UDFs for Pig script. UDF is written in java and compiled to jar 18:46:52 but that is true in case of C jobs 18:47:10 Yeah I was refering C jobs 18:47:12 yes, benl, also pig and hive scripts are also translated to hadoop job with known structures 18:47:16 benl_, SerjeyLukjanov, what about java version? 18:47:32 benl__ in most cases user need to compile a java and it is cross platform language, so it is not a problem to build jar on windows when run on linux 18:47:52 dmitryme, benl__, why would someone write jobs in C? just wondering 18:48:03 Nadya_: I think with Java it is simpler: you just need to find JDK of the given version 18:48:22 benl__ dmitryme this is a not typical use case for Hadoop 18:48:28 ruhe_: I don't know, but there is Hadoop Pipes for some reason here 18:48:28 I was thinking of "streaming" jobs 18:48:44 let's start from the pre-compiled jobs 18:48:51 I guess for faster execution 18:49:14 it's ok for the 0.3 version to postpone sources compilation and etc. 18:49:22 benl_ for streaming API users often use a scripting language like python and perl 18:50:30 Okay, thanks 18:51:32 are there any other questions to discuss or concerns around the EDP topic? 18:51:53 okay, so Job Origins will only store "binary" jobs which can be wrapped in Oozie for now (not that I really need to care about that, I think, in implementing the API) 18:52:17 tmckayrh, I think it'll be ok for now 18:52:34 okay, thanks. Very helpful meeting for me! 18:52:39 +1 tmckayrh 18:52:44 agreed 18:52:49 we should design "source" jobs management in future 18:53:08 do we have a BP for EDP dashboard? 18:53:23 maybe we should add a blueprint for "source" management, too 18:53:27 I think nope 18:53:34 ruhe_ you means horizon? 18:53:37 (to ruhe) 18:53:40 yes 18:54:02 tmckayrh, please create the new one 18:54:03 not yet 18:54:05 tmckayrh, yes, it'll be good to create bp for it 18:54:12 _crobertsrh, would you do that? 18:54:16 okay 18:54:49 #action tmckayrh will make a blueprint for an uncompiled source code management component 18:55:38 #action _crobertsrh create blueprint for EDP Horizon dashboard (since Robert is working on UI part) 18:55:41 I think we lost crobertsrh but we can assign it 18:55:50 sorry (*Chad) 18:56:21 I think that's all areund EDP 18:56:23 around* 18:56:30 #topic Action items from the last meeting 18:56:37 there are two action items 18:56:43 Sergey to create an issue to cover image creation for HDP plugin 18:56:43 aignatov to create a blueprint for ubuntu packaging 18:56:48 done 18:56:53 #link https://bugs.launchpad.net/savanna/+bug/1206249 18:56:58 #link https://blueprints.launchpad.net/savanna/+spec/savanna-in-ubuntu 18:57:20 and we have several minutes before the end of meeting :) 18:57:25 #topic General discussion 18:57:28 can I add actions still? 18:57:34 sure 18:57:35 yep, sure 18:58:16 #action tmckayrh will update sequence diagrams and etherpad to show proper flow and the rename of Job Source Component to Job Origin Component 18:58:26 I suppose the blueprint name needs to change too? 18:58:31 Is that easy to do? 18:58:43 it's easy 18:59:03 but you might not have permissions. i'm not sure 18:59:21 that's what I was wondering. 18:59:40 tmckayrh, just ping me if you will have any questions 18:59:43 If its alright with everyone I'd like to have a go at the PKI bp. I assume you expect it to be used as a middleware in the same way its done in Swift? 18:59:54 tmckayrh I have a permissions you can assign this action to me 19:00:00 okay 19:00:23 #action akuznetsov will rename the Job Source Component to Job Origin Component 19:00:25 thanks! 19:00:34 benl__, PKI tokens are not working now with Savanna now 19:00:37 oops, I meant in the blueprint 19:00:52 benl__, we doesn't dig into it 19:01:05 benl__, feel free to investigate/fix it 19:01:41 we out of time for today... 19:01:47 Alright 19:01:49 let's move to the #savanna channel 19:01:57 thanks folks! 19:02:00 #info JFYI you can always use openstack-dev@lists.openstack.org mailing lists and #savanna irc channel to find us and ask your questions 19:02:08 #endmeeting