#openstack-meeting-alt log

18:06:59 <SergeyLukjanov> #startmeeting savanna
18:07:00 <openstack> Meeting started Thu Aug  1 18:06:59 2013 UTC and is due to finish in 60 minutes.  The chair is SergeyLukjanov. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:07:01 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:07:03 <openstack> The meeting name has been set to 'savanna'
18:07:21 <SergeyLukjanov> #topic Agenda
18:07:21 <SergeyLukjanov> #info News / updates
18:07:21 <SergeyLukjanov> #info Action items from the last meeting
18:07:21 <SergeyLukjanov> #info General discussion
18:07:28 <SergeyLukjanov> #topic News / updates
18:07:52 <aignatov_> whos around beside me and Sergey? ;)
18:08:00 <mattf> i am
18:08:04 <tmckayrh> me too
18:08:05 <SergeyLukjanov> #info Savanna 0.2.1 released has been released with bunch of bug fixes and improvements
18:08:10 <dmitryme> me here
18:08:21 <aignatov_> cool
18:08:24 <SergeyLukjanov> #link http://lists.openstack.org/pipermail/openstack-dev/2013-July/012747.html
18:08:37 <SergeyLukjanov> ^^ announcement of 0.2.1 release with details
18:08:44 <SergeyLukjanov> #link https://wiki.openstack.org/wiki/Savanna/ReleaseNotes/0.2.1
18:08:48 <SergeyLukjanov> ^^ release notes
18:08:48 <mattf> congrats on getting it out!
18:09:13 <SergeyLukjanov> HDP plugin has been postponed to be polished and released in 0.2.2 version
18:09:30 <SergeyLukjanov> I mean HDP plugin release
18:10:19 <SergeyLukjanov> #info active work started on both EDP and scaling component
18:10:44 <SergeyLukjanov> #info we are working now on implementing conductor abstraction
18:11:05 <SergeyLukjanov> aignatov_, akuznetsov, are there any updates on EDP side?
18:11:24 <aignatov_> ok, my updates about EDP. I mostly worked this week on Oozie service integration into vanilla Plugin
18:11:30 <SergeyLukjanov> and Nadya too, sorry
18:11:32 <aignatov_> it's done i think
18:11:53 <akuznetsov> first version of REST API for EDP was merged to trunk
18:12:10 <aignatov_> Oozie integration code is already merged
18:12:32 <aignatov_> also I put dib elements of oozie installation
18:12:42 <Nadya_> on this week I worked on workflow.xml helper. Initial version will be on review tomorrow
18:12:47 <aignatov_> #link https://review.openstack.org/#/c/39671/
18:13:01 <akuznetsov> also we plan to interact with cluster via Oozie and created a simple REST client for Oozie
18:13:11 <aignatov_> got reasonable comment frrom matt already))
18:13:21 <akuznetsov> it also merged to the trunk
18:13:32 <aignatov_> mattf, thx, I will do wget and tar installation
18:13:43 <aignatov_> waiting also Ivan's comments
18:13:45 <mattf> yeah, minor stuff. you should plow forward.
18:14:32 <aignatov_> also I upload compiled oozie.tar.gz library to our CDN
18:14:32 <SergeyLukjanov> tmckayrh, do you have any updates?
18:14:57 <Nadya_> there are some differences in workflow.xml between hive, pig and mapreduce jobs so I think it would be useful to write helpers for all this things
18:15:12 <aignatov_> its here
18:15:16 <aignatov_> #link http://a8e0dce84b3f00ed7910-a5806ff0396addabb148d230fde09b7b.r31.cf1.rackcdn.com/oozie-3.3.2.tar.gz
18:15:23 <tmckayrh> I am working on learning SqlAlchemy/Flask usage in savanna as a precursor to supporting Swift in the Job Source component
18:15:42 <tmckayrh> I added some notes to the Savanna EDP api draft etherpad yesterday.
18:16:02 <tmckayrh> There may be a mistake in the sequence diagrams having to do with job code retrieval
18:16:12 <aignatov_> could someone post a link here to etherpad?)
18:16:23 <tmckayrh> Also, I wonder if we should change terminology slighly
18:16:27 <SergeyLukjanov> tmckayrh, we are now working on rewriting db-related code to make it more consistent, btw we'll help to port changes if you'll have questions
18:16:29 <tmckayrh> yes, hold on...
18:16:47 <tmckayrh> #link https://etherpad.openstack.org/savanna_API_draft_EDP_extensions
18:17:19 <aignatov_> tmckayrh, thx
18:17:45 <_crobertsrh> I'm working on the UI for Job Sources.  commenting out the actual api calls until they are ready to go.  Seems to be going well so far, but I have yet to bring anything very "dynamic" to the UI yet.
18:18:19 <Nadya_> tmckayrh, if you have questions about swift integration please post your question in irc. I dealt with swift in savanna
18:18:52 <SergeyLukjanov> tmckayrh, _crobertsrh, folks, I remember that you have several concerns yesterday?
18:18:56 <tmckayrh> my terminology question is as follows:  "Job Source" means a storage facility for jobs.  But "source" often means "code" as well.  So "job source" could be ambiguous.
18:19:09 <benl__> I have a little question about that draft. I hope im not missing anytihng. You create a job execution by specifying a cluster ID. why doesnt the GET method doesnt specify the used cluster?
18:19:30 <SergeyLukjanov> about Job Source naming and scm
18:19:49 <akuznetsov> for pig and hive source and code will be the same
18:20:46 <tmckayrh> So this is where it starts to get confusing :)   Suppose we have Pig and Hive jobs stored in a git repository
18:21:22 <tmckayrh> The "Job Source" is the git, as I see it.  The Job Source component manages information about the git repository and how to access it.
18:21:35 <tmckayrh> Files in the git are "job source code"
18:21:43 <SergeyLukjanov> benl__, am I understand you correctly that we need to have cluster_id in job?
18:22:04 <benl__> I meant job execution
18:22:22 <SergeyLukjanov> benl__, thx, it's a good point
18:22:38 <benl__> :)
18:22:39 <tmckayrh> So for example, maybe "Job Source Component" could be "Job Depot Component" or similar.
18:22:40 <akuznetsov> I added the cluster_id to the job execution object
18:23:06 <SergeyLukjanov> akuznetsov, thx
18:23:10 <aignatov_> thx
18:24:16 <tmckayrh> SergeyLukjanov, the other issue we chatted about is that the draft API had "Hive" as a type in the Job Source Object example, but I changed it to "Swift".  It seems to me that "type" for that object should be git, svn, mercurial, swift, hdfs, internal (for the savanna db), etc.
18:24:27 <SergeyLukjanov> what are you think about "Job Origin"?
18:24:30 <tmckayrh> type is describes the storage facility, not what is stored there
18:24:46 <tmckayrh> I like Job Origin
18:24:51 <_crobertsrh> Job Origin works for me
18:25:15 <SergeyLukjanov> akuznetsov, aignatov_, Nadya_?
18:25:47 <SergeyLukjanov> #vote Rename "Job Source" to "Job Origin"
18:25:55 <SergeyLukjanov> it's not working O_O
18:26:13 <SergeyLukjanov> #startvote Rename "Job Source" to "Job Origin"
18:26:14 <openstack> Unable to parse vote topic and options.
18:26:19 <tmckayrh> So would we rename "Job Source Component" to "Job Origin Component" as well as the classes defined there?
18:26:23 <akuznetsov> tmckayrh possible we should have a two types in Job Source components one of type there job is stored and second is the type of job like HIve, Pig etc.
18:27:10 <tmckayrh> akuznetsov, could be.  A single repo could store multiple types, though, so maybe a list?  My git can contain anything :)
18:27:24 <SergeyLukjanov> #startvote Rename "Job Source" to …? "Job Depot", "Job Origin"
18:27:25 <openstack> Begin voting on: Rename "Job Source" to …? Valid vote options are , Job, Depot, Job, Origin, .
18:27:26 <openstack> Vote using '#vote OPTION'. Only your last vote counts.
18:27:28 <tmckayrh> I'm thinking the facility type is helpful for finding plugins, etc.
18:27:32 <SergeyLukjanov> oooops
18:27:35 <SergeyLukjanov> #undo
18:27:36 <openstack> Removing item from minutes: <ircmeeting.items.Link object at 0x22d9110>
18:27:50 <SergeyLukjanov> #endvote
18:27:51 <openstack> Voted on "Rename "Job Source" to …?" Results are
18:28:03 <SergeyLukjanov> let's not use voting :)
18:28:11 <Nadya_> what about Job Home?
18:28:12 * tmckayrh laughs
18:28:23 <tmckayrh> Home is also good.
18:28:31 <SergeyLukjanov> #startvote Rename "Job Source" to …? JobDepot, JobOrigin, JobHome
18:28:32 <openstack> Begin voting on: Rename "Job Source" to …? Valid vote options are JobDepot, JobOrigin, JobHome.
18:28:33 <openstack> Vote using '#vote OPTION'. Only your last vote counts.
18:28:33 <ruhe> so, there are two things here "Job Source" where the actual code resides. And Job Storage where we'll compiled jar files, Pig scripts, etc, etc
18:28:35 <dmitryme> to me, origin is better
18:28:47 <SergeyLukjanov> looks like it's working good
18:28:50 <ruhe> *we'll keep compiled
18:28:52 <dmitryme> #vote JobOrigin
18:29:00 <_crobertsrh> #vote JobOrigin
18:29:02 <NikitaKonovalov> #vote JobOrigin
18:29:04 <SergeyLukjanov> #vote JobOrigin
18:29:07 <Nadya_> #vote JobHome
18:29:19 <ruhe> #vote JobStorage
18:29:20 <openstack> ruhe: JobStorage is not a valid option. Valid options are JobDepot, JobOrigin, JobHome.
18:29:24 <aignatov_> ruhe, I'm just thinking about the same question)
18:29:25 <tmckayrh> #vote JobOrigin
18:29:26 <akuznetsov> I removed type from Job Source object and two fields storage_type and job_type
18:29:35 <aignatov_> #vote JobHome
18:29:51 <akuznetsov> #vote JobDepot
18:30:12 <SergeyLukjanov> #endvote
18:30:13 <openstack> Voted on "Rename "Job Source" to …?" Results are
18:30:14 <openstack> JobDepot (1): akuznetsov
18:30:15 <openstack> JobOrigin (5): tmckayrh, NikitaKonovalov, dmitryme, _crobertsrh, SergeyLukjanov
18:30:16 <openstack> JobHome (2): Nadya_, aignatov_
18:30:35 <SergeyLukjanov> + JobStorage: ruhe
18:30:53 <aignatov_> ok, let it be JobOrigin)
18:30:57 <SergeyLukjanov> looks like we name it Job Origin :)
18:31:57 <SergeyLukjanov> tmckayrh, what about SCM?
18:31:59 <tmckayrh> so please check my understanding.  based on ^^, the Job Origin Component will manage the registration of Job Origins.  We can define Job Origins there.  It can also interact with plugins for particular origin types so that raw, executable job script code can be retrieved from a particular Job Origin (which may be an scm storing jobs of multiple types like Pig, Hive, jar, oozie, etc)
18:32:42 <aignatov_> belt__, do akuznetsov's changes resolve your question about cluster_id in job execution?
18:33:07 <benl__> yeah, looks good now :)
18:33:47 <ruhe> shold JobOrigin manage different storages for the jobs, like Swift, Gluster, HDSF etc?
18:34:16 <tmckayrh> SergeyLukjanov, I was wondering yesterday if we define each supported SCM as it's own job origin type.  So, rather than have a single type "SCM" we would have "git, mercurial, etc" as distinct job origin types.
18:34:51 <tmckayrh> ruhe, I thought that was the intention.
18:35:01 <dmitryme> tmckayrh: +1
18:35:12 <_crobertsrh> I think that would make it easier UI-wise to know which fields need to be displayed for a given type.  +1
18:35:21 <dmitryme> I would prefer different types
18:35:39 <SergeyLukjanov> dmitryme, tmckayrh, +1 for have different types
18:37:07 <ruhe_> the second question: should it also support different build systems: mvn, gradle, ant, make … ?
18:37:21 <SergeyLukjanov> ruhe_, lein...
18:37:40 <tmckayrh> I should update the sequence diagram to reflect this thinking, that the Job Manager on execute() pass "job id" and "job origin id" to the Job Origin component, which then handles interacting with plugins to retrieve the actual code.  Agreed?
18:38:35 <ruhe_> tmckayrh, agree
18:38:40 <akuznetsov> ruhe_ I think this support is valuable, but this not for this development phase
18:38:40 <aignatov_> +1
18:38:46 <tmckayrh> ruhe_, how does the Job Origin component interact with build?
18:39:10 <Nadya_> tmckayrh, we will process all pig, hive, jar through oozie, right?
18:39:36 <tmckayrh> Nadya_, yes, we seemed to have consensus on that.  I think it's a great idea.
18:40:02 <tmckayrh> Nadya_, hmm, that brings up a good question....
18:40:08 <akuznetsov> ruhe_ Job Origin can download a job code for execution form build server
18:40:14 <Nadya_> tmckayrh, yes, great:) just to clarify
18:40:35 <tmckayrh> If we autowrap scripts in Oozie, does that happen before the job is stored, or after we retrieve it but before it is submitted to the cluster for execution?
18:41:10 <tmckayrh> Maybe it's better to wrap on the fly, so that a Hive job can still be a Hive job in case external storage is used by something other than savanna
18:41:24 <tmckayrh> in other words, wrap it late, just before submission
18:41:27 <Nadya_> tmckayrh, I'm working on this now. for each job we will create it's own workflow.xml
18:41:38 <aignatov_> i agree with this idea
18:41:45 <akuznetsov> tmckayrh it will happen after downloading job code and before job execution
18:41:46 <aignatov_> on the fly I mean
18:41:50 <Nadya_> tmckayrh, on the fly, yes
18:41:52 <tmckayrh> +1
18:42:15 <ruhe_> tmckayrh, akuznetsov, just to clarify. JobOrigin doesn't deal with java source code, or any other type of source code. it just works compiled binaries. right?
18:42:35 <tmckayrh> I don't know the answer to that :)
18:42:36 <akuznetsov> in this case end use will not need to know the oozie
18:43:26 <tmckayrh> I was thinking of it as uncompiled code, I admit
18:43:32 <Nadya_> ruhe_, +1
18:43:55 <akuznetsov> ruhe_ yes, not  now in this phase, possible in feature we add this functionality
18:44:04 <ruhe_> compilation from source code is a nice feature, but it should be dealt with by another component
18:44:13 <aignatov_> ruhe_, i also agree with you, it's complex to build code in this stage of EDP develompment
18:44:25 <Nadya_> I think it should be separate component for jos sources (binaries I mean)
18:44:50 * tmckayrh has been writing Python too long
18:45:22 <benl__> So the user must compile against machines similar to the ones in the cluster?
18:45:35 <akuznetsov> ruhe_, also in case of pig and hive we don't need a compilation
18:45:54 <aignatov_> i think we can create job compilation and building on top of well-created JobOrigin - the component whish stores binaries, pig scripts, not code right now
18:46:18 <SergeyLukjanov> benl__, jars are not OS-dependent
18:46:44 <ruhe_> akuznetsov, sometimes we need, actually in most cases people are using UDFs for Pig script. UDF is written in java and compiled to jar
18:46:52 <dmitryme> but that is true in case of C jobs
18:47:10 <benl__> Yeah I was refering C jobs
18:47:12 <aignatov_> yes, benl, also pig and hive scripts are also translated to hadoop job with known structures
18:47:16 <Nadya_> benl_, SerjeyLukjanov, what about java version?
18:47:32 <akuznetsov> benl__ in most cases user need to compile a java and it is cross platform language, so it is not a problem to build jar on windows when run on linux
18:47:52 <ruhe_> dmitryme, benl__, why would someone write jobs in C? just wondering
18:48:03 <dmitryme> Nadya_: I think with Java it is simpler: you just need to find JDK of the given version
18:48:22 <akuznetsov> benl__ dmitryme this is a not typical use case for Hadoop
18:48:28 <dmitryme> ruhe_: I don't know, but there is Hadoop Pipes for some reason here
18:48:28 <benl__> I was thinking of "streaming" jobs
18:48:44 <SergeyLukjanov> let's start from the pre-compiled jobs
18:48:51 <dmitryme> I guess for faster execution
18:49:14 <SergeyLukjanov> it's ok for the 0.3 version to postpone sources compilation and etc.
18:49:22 <akuznetsov> benl_ for streaming API users often use a scripting language like python and perl
18:50:30 <benl__> Okay, thanks
18:51:32 <SergeyLukjanov> are there any other questions to discuss or concerns around the EDP topic?
18:51:53 <tmckayrh> okay, so Job Origins will only store "binary" jobs which can be wrapped in Oozie for now (not that I really need to care about that, I think, in implementing the API)
18:52:17 <SergeyLukjanov> tmckayrh, I think it'll be ok for now
18:52:34 <tmckayrh> okay, thanks.  Very helpful meeting for me!
18:52:39 <aignatov_> +1 tmckayrh
18:52:44 <Nadya_> agreed
18:52:49 <SergeyLukjanov> we should design "source" jobs management in future
18:53:08 <ruhe_> do we have a BP for EDP dashboard?
18:53:23 <tmckayrh> maybe we should add a blueprint for "source" management, too
18:53:27 <SergeyLukjanov> I think nope
18:53:34 <akuznetsov> ruhe_ you means horizon?
18:53:37 <SergeyLukjanov> (to ruhe)
18:53:40 <ruhe_> yes
18:54:02 <aignatov_> tmckayrh, please create the new one
18:54:03 <akuznetsov> not yet
18:54:05 <SergeyLukjanov> tmckayrh, yes, it'll be good to create bp for it
18:54:12 <ruhe_> _crobertsrh, would you do that?
18:54:16 <tmckayrh> okay
18:54:49 <tmckayrh> #action tmckayrh will make a blueprint for an uncompiled source code management component
18:55:38 <ruhe_> #action _crobertsrh create blueprint for EDP Horizon dashboard (since Robert is working on UI part)
18:55:41 <tmckayrh> I think we lost crobertsrh but we can assign it
18:55:50 <ruhe_> sorry (*Chad)
18:56:21 <SergeyLukjanov> I think that's all areund EDP
18:56:23 <SergeyLukjanov> around*
18:56:30 <SergeyLukjanov> #topic Action items from the last meeting
18:56:37 <SergeyLukjanov> there are two action items
18:56:43 <SergeyLukjanov> Sergey to create an issue to cover image creation for HDP plugin
18:56:43 <SergeyLukjanov> aignatov to create a blueprint for ubuntu packaging
18:56:48 <aignatov_> done
18:56:53 <SergeyLukjanov> #link https://bugs.launchpad.net/savanna/+bug/1206249
18:56:58 <SergeyLukjanov> #link https://blueprints.launchpad.net/savanna/+spec/savanna-in-ubuntu
18:57:20 <SergeyLukjanov> and we have several minutes before the end of meeting :)
18:57:25 <SergeyLukjanov> #topic General discussion
18:57:28 <tmckayrh> can I add actions still?
18:57:34 <aignatov_> sure
18:57:35 <SergeyLukjanov> yep, sure
18:58:16 <tmckayrh> #action tmckayrh will update sequence diagrams and etherpad to show proper flow and the rename of Job Source Component to Job Origin Component
18:58:26 <tmckayrh> I suppose the blueprint name needs to change too?
18:58:31 <tmckayrh> Is that easy to do?
18:58:43 <ruhe_> it's easy
18:59:03 <ruhe_> but you might not have permissions. i'm not sure
18:59:21 <tmckayrh> that's what I was wondering.
18:59:40 <SergeyLukjanov> tmckayrh, just ping me if you will have any questions
18:59:43 <benl__> If its alright with everyone I'd like to have a go at the PKI bp. I assume you expect it to be used as a middleware in the same way its done in Swift?
18:59:54 <akuznetsov> tmckayrh I have a permissions you can assign this action to me
19:00:00 <tmckayrh> okay
19:00:23 <tmckayrh> #action akuznetsov will rename the Job Source Component to Job Origin Component
19:00:25 <tmckayrh> thanks!
19:00:34 <SergeyLukjanov> benl__, PKI tokens are not working now with Savanna now
19:00:37 <tmckayrh> oops, I meant in the blueprint
19:00:52 <SergeyLukjanov> benl__, we doesn't dig into it
19:01:05 <SergeyLukjanov> benl__, feel free to investigate/fix it
19:01:41 <SergeyLukjanov> we out of time for today...
19:01:47 <benl__> Alright
19:01:49 <SergeyLukjanov> let's move to the #savanna channel
19:01:57 <SergeyLukjanov> thanks folks!
19:02:00 <SergeyLukjanov> #info JFYI you can always use openstack-dev@lists.openstack.org mailing lists and #savanna irc channel to find us and ask your questions
19:02:08 <SergeyLukjanov> #endmeeting