18:00:30 <SergeyLukjanov> #startmeeting sahara
18:00:31 <openstack> Meeting started Thu Feb 18 18:00:30 2016 UTC and is due to finish in 60 minutes.  The chair is SergeyLukjanov. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:00:32 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:00:34 <openstack> The meeting name has been set to 'sahara'
18:00:35 <huichun> 😀
18:00:36 <crobertsrh> hello/
18:00:38 <NikitaKonovalov> o/
18:00:45 <elmiko> heyo/
18:00:47 <esikachev> hi!
18:00:52 <vgridnev> hi
18:00:55 <mionkin> hello :)
18:01:00 <SergeyLukjanov> #link https://wiki.openstack.org/wiki/Meetings/SaharaAgenda
18:01:08 <SergeyLukjanov> #topic News / updates
18:01:16 <SergeyLukjanov> client release is on review
18:01:22 <apavlov> hi
18:01:59 <SergeyLukjanov> https://review.openstack.org/#/c/280893/
18:02:13 <vgridnev> SergeyLukjanov, are we are going to have additional one for health checks?
18:02:17 <crobertsrh> UI rework is still up for review.  As vgridnev pointed out this morning, the integration tests are broken in those patches.  I may need a bit of help to sort out what tweaks are required.
18:02:32 <SergeyLukjanov> vgridnev yup
18:02:39 <elmiko> i've been adding more to the api v2 work, and also looking into a few more security issues
18:02:55 <elmiko> speaking of which, SergeyLukjanov did you ever see that security bug i logged?
18:03:08 <vgridnev> SergeyLukjanov, great
18:03:18 <SergeyLukjanov> elmiko not yet :) could you pleas re-send link?
18:03:25 <elmiko> yup, 1sec
18:03:52 <elmiko> SergeyLukjanov: https://bugs.launchpad.net/sahara/+bug/1541122
18:03:53 <openstack> elmiko: Error: malone bug 1541122 not found
18:04:26 <SergeyLukjanov> elmiko sad
18:04:42 <elmiko> SergeyLukjanov: what's sad is that i have a few more of these to post :/
18:04:51 <SergeyLukjanov> yup
18:05:10 <elmiko> but, we need to fix em =)
18:05:12 <SergeyLukjanov> I think we should fix it as part of adding castelan based secure store to sahra
18:05:28 <SergeyLukjanov> and ensure that all of this stuff is configurable through UI
18:05:31 <SergeyLukjanov> API
18:05:41 <SergeyLukjanov> to make users able to specify some password
18:05:49 <elmiko> interesting thought
18:06:06 <elmiko> and i agree, it will help operators to have more access to these values
18:06:47 <elmiko> but, to begin with, we can fix up their hardcoded usage
18:06:58 <SergeyLukjanov> agree
18:08:40 <SergeyLukjanov> any else news?
18:08:48 <SergeyLukjanov> #topic API v2 progress
18:08:54 <SergeyLukjanov> elmiko your turn :)
18:08:59 <SergeyLukjanov> #link https://review.openstack.org/#/c/273316/
18:09:03 <SergeyLukjanov> #link https://wiki.openstack.org/wiki/Sahara/api-v2
18:09:09 <elmiko> thanks
18:09:15 <elmiko> we need more reviews on the initial commit for v2
18:09:31 <elmiko> also, i am adding more items to the wiki
18:09:43 <elmiko> i have some local patches that depend on the initial commit, should i start posting these?
18:09:55 <elmiko> i thought maybe i should wait until the first merges
18:10:33 <elmiko> if there are no opinions, i'm just going to start posting them =D
18:11:23 <SergeyLukjanov> yeah, probably wait for the first merges
18:11:27 <SergeyLukjanov> to avoid rebase hell
18:11:35 <elmiko> yup, that was my thought too
18:11:49 <elmiko> so, everyone go review https://review.openstack.org/#/c/273316/ !!!
18:11:54 <elmiko> please =D
18:12:05 <elmiko> that's all from me
18:12:19 <SergeyLukjanov> elmiko thx!
18:12:27 <SergeyLukjanov> #topic Open discussion
18:12:52 <rickflare> good afternoon everyone
18:13:25 <rickflare> I wanted to simple start out by thanking everyone who has continued to provide support for me!
18:13:33 <vgridnev> Also I would appreciate if someone will review health checks stuff https://review.openstack.org/#/q/status:open++branch:master+topic:bp/cluster-verification
18:13:39 <huichun> vgridnev  thx for your helpful review comments about is_engine_implement fun, never notice before, really wanted
18:13:54 <huichun> I will update suspend EDP patch
18:13:56 <vgridnev> huichun, np
18:14:09 <vgridnev> ok, thanks!
18:14:17 <rickflare> I have had several items that I feel could be very helpful to this project
18:14:17 <crobertsrh> vgridnev:  I will give them a review today or tomorrow at the latest.
18:15:05 <huichun> SergeyLukjanov:  I have an idea for replace the current Oozie engine with Luigi
18:15:23 <huichun> Guys, what do you think about Oozie?
18:16:08 <huichun> https://github.com/spotify/luigi
18:16:48 <huichun> Oozie need tomcat, write XML and need extra jar file by running jobs
18:17:11 <elmiko> huichun: i have not used luigi, but i don't see why we wouldn't consider a proposal to add it as well as the OozieEngine
18:17:27 <elmiko> oh, you want to replace with luigi?
18:17:41 <crobertsrh> I have not experienced Luigi either
18:18:06 <NikitaKonovalov> need to have a look at it
18:18:10 <elmiko> seems nice, and it's python 1
18:18:12 <elmiko> er +1
18:18:23 <huichun> elmiko:  you mean add Luigi as a EDP engine like Oozie?
18:18:28 <huichun> Luigi is Python
18:18:40 <elmiko> huichun: that's what i thought, but i missed that you said "replace" oozie
18:18:52 <NikitaKonovalov> huichun: which Job types are supported?
18:19:00 <huichun> Yes, my first thought is replacing
18:20:35 <elmiko> i don't have a strong opinion either way, but i can see a few course of action;
18:20:37 <huichun> NikitaKonovalov:  Luigi support all Oozie can do, batch jobs and with dependency resolution workflow management
18:20:45 <elmiko> 1. write a spec, so we can debate on review
18:21:00 <elmiko> 2. make an option to allow either oozie or luigi, so we can keep backwards compat
18:21:10 <elmiko> 3. plan a future migration away from oozie, if we desire
18:21:32 <elmiko> does that sound reasonable?
18:21:46 <vgridnev> agreed with elmiko
18:21:46 <crobertsrh> +1 elmkio:  fine plan
18:22:05 <huichun> elmiko: yes, it's my original idea currently
18:22:12 <elmiko> huichun: great!
18:23:07 <huichun> elmiko:  just working on EDP engine parts in Sahara, and make lots of research on others workflow engine and have this idea
18:23:26 <elmiko> cool, thanks for bringing it up =)
18:24:07 <elmiko> huichun: i think it's great if we can keep up to date with new technologies that might improve our experience.
18:24:07 <SergeyLukjanov> I've not seen luigi as well
18:24:10 * tosky reads Luigi
18:24:16 <tosky> what is this thing called like me?
18:24:22 <elmiko> tosky: https://github.com/spotify/luigi
18:24:28 <elmiko> hehe
18:24:30 <tosky> oh, thanks, I missed that
18:24:37 <elmiko> np
18:24:38 <crobertsrh> tosky, we plan to just have you launch jobs for all of our users.  We figured you have the spare time.
18:24:45 <elmiko> yup, what crobertsrh said
18:24:59 <elmiko> just tosky operating on 1000s of nodes, manually running jobs
18:25:09 <tosky> '-_-
18:25:14 <elmiko> LOL
18:25:15 <rickflare> so guys i have stood up a few clusters
18:25:21 <rickflare> close to a 100 or so nodes
18:25:22 <huichun> lol
18:25:34 <rickflare> and one of the biggest issues I see is cluster persistance
18:26:12 <rickflare> when running hadoop is it not uncommon to have data nodes go down for a number of possible reasons
18:26:38 <rickflare> the horizon interface really needs a method to restart services
18:27:04 <rickflare> very much in the manner in which cloudera manager and custom dev ops tools would
18:27:21 <SergeyLukjanov> sorry folks, I need to go earlier today
18:27:24 <SergeyLukjanov> #chair elmiko
18:27:25 <openstack> Current chairs: SergeyLukjanov elmiko
18:27:36 <elmiko> i think this is an interesting idea, makes me wonder if we will have much overlap with reproducing cloudera manager functionality
18:27:40 <elmiko> SergeyLukjanov: no worries
18:27:53 <rickflare> elmiko you will somewhate
18:28:13 <elmiko> also, vgridnev, is this something that we might lead to eventually with the cluster health checks?
18:28:20 <crobertsrh> I think it might be a nice addition since we're already adding health checks, service restarts might be a nice thing to have
18:28:21 <rickflare> however for the the vanilla distributions it would be awesome
18:28:28 <elmiko> rickflare: agreed
18:28:40 <rickflare> restarts would be amazing
18:28:42 <elmiko> vanilla is tough though, because we are the only support mechanism for it
18:28:55 <rickflare> Taz and I are willing to help with that
18:29:03 <elmiko> cool
18:29:08 <rickflare> we have tons and I do mean tons of experience managing hadoop
18:29:11 <crobertsrh> great
18:29:22 <elmiko> i'd say take a look at the cluster health check stuff and maybe propose a spec about doing service restarts
18:29:35 <rickflare> ok
18:29:37 <elmiko> we can certainly help fill in the details about the specifics of how sahara works
18:29:42 <elmiko> it sounds like a fine idea
18:29:50 <rickflare> this kind of leads into my second idea
18:29:57 <rickflare> that is centered around security
18:30:21 <vgridnev> elmiko, I think that we can do some kind of auto scaling ideas or / and some kind of restarts
18:30:36 <rickflare> some clusters are going to need update, E.G. kernel patching, glibc patch etc
18:30:46 <elmiko> vgridnev: yea, i think there might be nice integration between health checks and service restarts
18:30:57 <rickflare> I would like to propose a hardend version of some of the images
18:31:07 <elmiko> rickflare: sounds great
18:31:16 <rickflare> like a vanilla that uses hadoop with kerberos
18:31:18 <elmiko> like, building on the centos images?
18:31:25 <elmiko> ooh, my favorite topic
18:31:27 <rickflare> elmiko exactly
18:31:33 <elmiko> we have talked about kerb integration before
18:31:54 <elmiko> i have a document i should share with you, we had a big session about it in vancouver (i think, or was that paris)
18:32:06 <rickflare> id like to see openscap intergrated and perhaps even drop in dev op tools like saltstack or puppet into the clusters
18:32:10 <elmiko> there are several questions surrounding kerb integration though
18:32:22 <elmiko> that's an interesting thought
18:32:27 <rickflare> this way if users need to customize anything in the cluster they can
18:32:34 <elmiko> hmm
18:32:51 <elmiko> you may want to propose this as an idea on the ML, it's a big topic
18:32:52 <rickflare> for us the biggest hurdle for pushing sahara will be our ability to control and push security
18:32:57 <elmiko> right
18:32:58 <rickflare> i know
18:33:23 <elmiko> but, if you create your own images, you could certainly run puppet outside the cluster and update images in the cluster with it, no?
18:33:24 <rickflare> and I figured I start her because you guys have been so receptive
18:33:44 <rickflare> yes you can
18:33:48 <rickflare> and we would be fine with that
18:33:54 <elmiko> but...
18:33:56 <elmiko> ;)
18:34:06 <rickflare> but customizing the images is really what we are aiming for
18:34:14 <elmiko> hmm
18:34:14 <rickflare> at least providing more secure instances
18:34:20 <rickflare> than what we have now
18:34:22 <elmiko> look at the work done on the image validation specs
18:34:23 <rickflare> like using ssl
18:34:30 <rickflare> within hadoop
18:34:33 <rickflare> etc
18:34:38 <elmiko> we are in the process of improving how we create images and deploy them
18:34:39 <rickflare> for the status pages
18:34:42 <elmiko> you may find it interesting
18:34:53 <rickflare> ive been looking at shara image elements
18:34:58 <elmiko> also, for ssl/kerb within the cluster we have a few options
18:35:00 <crobertsrh> Yeah, the new image creation bits might simplify things a bit
18:35:03 <rickflare> if that is what you are referring to
18:35:26 <elmiko> we could do something like adding a KDC such dogtag/ipa into the cluster and allow it to handle all kerb and tls stuff
18:35:34 <rickflare> YES
18:35:37 <rickflare> freeipa
18:35:40 <rickflare> would be amazing
18:35:46 <elmiko> but, we need to have sahara controlling the internal kdc to add users as necessary
18:35:49 <elmiko> OR
18:36:10 <elmiko> we could use something like apache knox to segregate a cluster, and use an external kdc to do authN
18:36:10 <rickflare> yes and keystone should control the internal kdc
18:36:13 <rickflare> in most cases
18:36:15 <elmiko> no
18:36:22 <elmiko> i disagree here
18:36:25 <rickflare> ok
18:36:48 <elmiko> i'm not sure we want to cross the streams of a kerb-backed keystone with users in the sahara cluster
18:37:00 <elmiko> i mean we *could* but i'm not convinced its the best method
18:37:11 <rickflare> so either have a external kdc or just have users manage the internal one that gets created
18:37:24 <elmiko> not users, we would let sahara manage the internal kdcs
18:37:36 <rickflare> k
18:37:39 <rickflare> k
18:37:48 <elmiko> it would be like an ephemeral kdc
18:37:56 <elmiko> living as long as the cluster
18:38:29 <elmiko> now, otoh, if we want to do something like an external kdc managed by the operator and that is also backing keystone, we might want to investigate using apache know
18:38:32 <elmiko> *knox
18:39:01 <rickflare> ok
18:39:05 <elmiko> i just think the identity management will get really unmanageable if you need to back keystone and have it control access to sahara clustres
18:39:09 <rickflare> i think having both would be great
18:39:15 <elmiko> it just seems like that will be complicated
18:39:22 <rickflare> this way it can plug into existing domains if needed
18:39:58 <rickflare> I think to start then having it external makes the most sense
18:40:03 <crobertsrh> This is sounding familiarly complex
18:40:19 <elmiko> yup
18:40:21 <elmiko> it is complex
18:40:27 <rickflare> because most environments will have some form of ldap or kdc
18:40:41 <rickflare> just being able to plugin would be a great start IMO
18:41:05 <elmiko> yea, it would be awesome
18:41:19 <elmiko> but it's tough to wrangle what sahara knows about identity with what the kdc will want
18:41:22 <rickflare> but to start off the service control is by far the biggest issue in the short term
18:41:28 <elmiko> remember, we only know what keystone tells us
18:42:00 <rickflare> have my cluster crap one and the only solution is to rebuild will be a tough sell for folks who can not regenerate data that has been ingested
18:42:27 <elmiko> i can see that
18:42:48 <elmiko> rickflare: pm me your email address, i'll send you a slide deck i made on secure sahara ideas
18:42:57 <rickflare> the ability to quick restart all services will be a massive improment
18:43:03 <elmiko> if anyone is interested pm me as well
18:43:09 <elmiko> agred
18:43:15 <elmiko> agreed, even
18:43:24 <rickflare> Taz and ysm
18:43:32 <rickflare> please message elmiko
18:43:47 <rickflare> sorry ryusk
18:43:47 <NikitaKonovalov> elmiko, please forward that to me
18:44:26 <rickflare> also after working with tmckay on spark for some time
18:44:42 <rickflare> log reporting from the batch jobs could be improved
18:44:49 <elmiko> NikitaKonovalov: sent
18:45:02 <elmiko> yea, we've been talking about how to improve logging
18:45:06 <rickflare> we spent quite a bit of time trouble shooting main class path errors
18:45:22 <elmiko> imo, i'd like to see something where we use zaqar to publish logs from the cluster nodes
18:45:39 <rickflare> might be cool to have logstash or elastic search stood up in a image that is a part of the cluster
18:45:45 <rickflare> that one can monitor output
18:45:58 <rickflare> zabbix or ganglia also come to mind
18:46:24 <elmiko> right, or for ultimate dogfooding
18:46:38 <elmiko> imagine sahara using a sahara cluster to process its own log data
18:46:41 <elmiko> *BAM*
18:46:57 <crobertsrh> trippy man
18:47:01 <elmiko> hehe
18:47:37 <elmiko> rickflare, huichun, thanks for bringing up all these new ideas
18:47:47 <rickflare> absolutely
18:47:49 <elmiko> i hope there are some posts to the ML for us to argue over ;)
18:47:53 <crobertsrh> Yeah, good stuff for sure
18:47:54 <rickflare> thank you guys for always being so awesome
18:47:56 <rickflare> and helpful
18:48:02 * elmiko blushes
18:48:07 <rickflare> really makes working on this fun
18:48:10 <rickflare> honestly
18:48:16 <elmiko> we like to have fun =D
18:48:46 <elmiko> ok, anything else? or should we gain 10 minutes of our day back?
18:49:05 <rickflare> just looked at the slides and at first glance
18:49:12 <rickflare> this is exactly what im talking about
18:49:15 <elmiko> =D
18:49:20 <crobertsrh> nice
18:49:22 <crobertsrh> +1 for #endmeeting
18:49:29 <elmiko> going once
18:49:31 <elmiko> ...
18:49:33 <rickflare> sold!
18:49:33 <elmiko> twice
18:49:35 <elmiko> ...
18:49:41 <elmiko> sold!
18:49:44 <elmiko> thanks everyone!
18:49:47 <elmiko> #endmeeting