#openstack-meeting-alt log

18:02:52 <SergeyLukjanov> #startmeeting sahara
18:02:53 <openstack> Meeting started Thu Aug 21 18:02:52 2014 UTC and is due to finish in 60 minutes.  The chair is SergeyLukjanov. Information about MeetBot at http://wiki.debian.org/MeetBot.
18:02:54 <openstack> Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.
18:02:57 <tmckay> o/
18:02:57 <openstack> The meeting name has been set to 'sahara'
18:02:58 <openstack> alazarev: Error: Can't start another meeting, one is in progress.  Use #endmeeting first.
18:03:00 <alazarev> ops
18:03:01 <SergeyLukjanov> :)
18:03:07 <SergeyLukjanov> #chair alazarev
18:03:08 <openstack> Current chairs: SergeyLukjanov alazarev
18:03:22 <SergeyLukjanov> alazarev, please
18:03:38 <alazarev> ok, let’s start
18:04:32 <alazarev> #link https://wiki.openstack.org/wiki/Meetings/SaharaAgenda#Next_meetings
18:04:52 <alazarev> #topic sahara@horizon status (croberts, NikitaKonovalov)
18:05:06 <NikitaKonovalov> Ok
18:05:20 <aignatov> alazarev: you are the chair! ;) something new :)
18:05:38 <dmitryme> NikitaKonovalov: nice status in horizon :-)
18:05:46 <aignatov> lol
18:05:54 <NikitaKonovalov> The change allowing the dashobard work with bootstrapt 3 is merge
18:05:56 <tmckay> yeah, Ok sounds great
18:05:59 <NikitaKonovalov> merged*
18:06:00 <alazarev> aignatov: yes, and Sergey told about this 2 mins ago :)
18:06:03 <NikitaKonovalov> finally
18:06:20 <elmiko> alazarev: surprise!
18:06:38 <SergeyLukjanov> alazarev, aignatov, my inet connection is extremely bad today and I'm receiving messages by 1 min batches ;)
18:06:49 <NikitaKonovalov> there is still a bugfix for templates' copy on review
18:07:01 <SergeyLukjanov> NikitaKonovalov, that's awesome, so, we'll not have any more several version dropdowns :)
18:07:04 <NikitaKonovalov> but that's very small and I hope it will land faster
18:07:39 <NikitaKonovalov> SergeyLukjanov: Yes, the dropdowns now appear in right places only
18:08:01 <alazarev_> hopefully other patches will land faster
18:08:07 <NikitaKonovalov> so that's pretty much all from horizon
18:08:38 <tosky> alazarev: I have request related to horizon and sahara, should I talk about this at the end (open discussion)?
18:08:47 <alazarev_> tosky: sure
18:08:53 <SergeyLukjanov> #chair alazarev_
18:08:54 <openstack> Current chairs: SergeyLukjanov alazarev alazarev_
18:09:00 <alazarev_> #topic News / updates
18:09:55 <elmiko> i've been working on getting a prototype of the swift/auth stuff running. it's going well, i'm currently testing the proxy user creation and trust delegation. next step will be authentication JobBinaries using proxy.
18:10:17 <alazarev> I’ve done series about plugin SPI -> edp engine
18:10:47 <SergeyLukjanov> I'm working on some oslo.* upgrades and going to push new patch sets soon
18:10:51 <tmckay> I poked around with Oozie security, small discussion during open topics on "code" or "doc fix instead"
18:11:03 <tmckay> and catching up on lots of reviews :)
18:13:22 <alazarev> someone else?
18:13:39 <alazarev> ok, let’s move on
18:14:02 <alazarev> no action items from the previous meeting
18:14:22 <alazarev> #topic https://wiki.openstack.org/wiki/Juno_Release_Schedule
18:14:37 <alazarev> #topic Juno release
18:14:39 <alazarev> #link https://wiki.openstack.org/wiki/Juno_Release_Schedule
18:15:00 <alazarev> SergeyLukjanov: update from you?
18:15:43 <SergeyLukjanov> nope
18:16:02 <SergeyLukjanov> ops
18:16:07 <SergeyLukjanov> one moment
18:16:14 <SergeyLukjanov> we have some unassigned stuff
18:16:19 <elmiko> i'm concerned about the schedule for getting all the swift/auth patches together. ideally, when should the patches be in for inclusion in Juno?
18:16:27 <SergeyLukjanov> #link https://launchpad.net/sahara/+milestone/juno-3
18:16:52 <SergeyLukjanov> we need volunteer to verify and close https://blueprints.launchpad.net/sahara/+spec/swift-url-proto-cleanup-deprecations
18:17:24 <alazarev> SergeyLukjanov: I can do it
18:17:25 <SergeyLukjanov> elmiko, so, we have two weeks to land sahara side patches
18:17:29 <SergeyLukjanov> alazarev, ok, thx
18:17:54 <SergeyLukjanov> alazarev, assigned to you
18:18:02 <alazarev> #action https://blueprints.launchpad.net/sahara/+spec/swift-url-proto-cleanup-deprecations (alazarev)
18:18:18 <elmiko> SergeyLukjanov: ok, i think it will be close. i'm hopeful to get this all concluded, but i know the testing and integration with gate mayb be cumbersome.
18:18:22 <SergeyLukjanov> elmiko, do you have estimates on time needed to make this work done?
18:18:58 <elmiko> i'm hoping to have a full prototype by the end of next week. then we will need to coordinate about how to test the proxy domains in the gate.
18:19:13 <dmitryme> elmiko: also consider putting your patch for early review
18:19:18 <elmiko> hopefully, the patches will be up for review by the end of next week.
18:19:24 <elmiko> dmitryme: thanks, i will
18:20:05 <alazarev> ok, let’s continue
18:20:34 <alazarev> #topic Heat engine backward compatibility
18:21:20 <alazarev> https://bugs.launchpad.net/sahara/+bug/1336525 broke backward compatibility for heat engine
18:21:21 <uvirtbot> Launchpad bug 1336525 in sahara "[DOC][HEAT] Document note about ec2-user" [High,Fix released]
18:21:38 <alazarev> do we want to simply document this?
18:22:49 <SergeyLukjanov> нup
18:22:53 <SergeyLukjanov> yup*
18:22:59 <dmitryme> alazarev: it is woth noting that it is not the only change which broke backward compatibility, it is just the first one
18:23:02 <SergeyLukjanov> we need to make progress on heat engine
18:23:20 <aignatov> SergeyLukjanov:
18:23:30 <aignatov> what do you mean?
18:23:35 <SergeyLukjanov> and heat was a beta quality for Icehouse release, so, it's ok
18:24:00 <SergeyLukjanov> aignatov, I mean that if we need to improve heat engine we could break backward compat for it with Icehouse
18:24:01 <alazarev> ok, we have https://bugs.launchpad.net/sahara/+bug/1357135 to document that
18:24:02 <uvirtbot> Launchpad bug 1357135 in sahara "Document changes from #1336525 in upgrade notes" [Undecided,New]
18:24:05 <alazarev> #link https://bugs.launchpad.net/sahara/+bug/1357135
18:24:12 <tosky> it's not the default choice for installation in fact, if I remember the documentation
18:24:17 <alazarev> #topic Anti-affinity semantics
18:24:23 <dmitryme> SergeyLukjanov: I think you meant ‘heat ENGINE is of beta quality” :-)
18:24:33 <SergeyLukjanov> dmitryme, sure ;)
18:24:48 <alazarev> and related question abut anti affinity
18:25:02 <SergeyLukjanov> so, here we have a much more interesting issue - we'd like to update a-a to work using server groups
18:25:05 <SergeyLukjanov> and I like this idea
18:25:11 <alazarev> server groups is an openstack way to do it
18:25:14 <SergeyLukjanov> but it changes a bit the semantics of a-a
18:25:28 <alazarev> but we started anti affinity before server groups were introduced
18:25:34 <SergeyLukjanov> alazarev, please, write the diff between how it'll working
18:26:03 <alazarev> server groups now have restriction - each instance could be only in one server group
18:26:54 <alazarev> so, if nodegroup has several processes we can’t do anti affinity independently
18:27:37 <alazarev> proposed solution - make one server group per cluster and assign it to all instances that has at least one affected process
18:28:39 <alazarev> I made this for heat engine only: https://review.openstack.org/#/c/112159/
18:29:13 <alazarev> because anti affinity was broken for heat engine at all
18:29:36 <aignatov> not at all :)
18:29:37 <alazarev> so, now we need to decide futher plans
18:30:18 <alazarev> aignatov: would you recommend way how it works for production? ;)
18:30:59 <alazarev> so, the first question: are we going to support direct engine for a log time?
18:31:07 <SergeyLukjanov> so, I think that we should migrate to server groups in both direct and heat engines
18:31:11 <SergeyLukjanov> for consistency
18:31:16 <dmitryme> I would review at a different angle: we didn’t promised any specific algorithm for a-a in documentation. We just promised that it will work. By this change we do not break the promise, we just change the algorithm. So it is not a backward incompatible change
18:31:29 <SergeyLukjanov> it doesn't breaks backward compat
18:31:36 <SergeyLukjanov> and we have no docs about how a-a works
18:31:38 <SergeyLukjanov> dmitryme, +
18:31:43 <alazarev> yeap
18:32:02 <aignatov> I agree with all thoughts, but direct engine should be changed too IMO
18:32:09 <alazarev> but migration to server groups in direct will break backward compatibility
18:32:22 <dmitryme> alazarev: why?
18:32:42 <alazarev> dmitryme: all ‘old’ instances are not in the group
18:32:46 <SergeyLukjanov> ++ for updating direct engine
18:33:03 <SergeyLukjanov> alazarev, but it'll not break removal of old clusters
18:33:07 <dmitryme> alazarev: aha, and you can’t assign an existing instance to a group, right?
18:33:13 <SergeyLukjanov> we'll add note to the upgrade docs
18:33:37 <SergeyLukjanov> I think it'll be ok
18:33:47 <SergeyLukjanov> tmckay, elmiko, any thoughts?
18:33:49 <alazarev> dmitryme: Sahara need knowledge about ‘old’ way to do that
18:34:07 <dmitryme> alazarev: I see
18:34:11 <alazarev> SergeyLukjanov: removal is Ok, it will break scaling only
18:34:27 <elmiko> i think best not to break backward compat, if possible
18:34:46 <SergeyLukjanov> alazarev, it could be easily checked by looking for server groups while scaling
18:35:19 <alazarev> if we are going to remove direct in L for example, I would vote on ‘do not change direct, make engines work differently’
18:35:42 <elmiko> is it possible to migrate "old" style instances to server groups?
18:36:14 <alazarev> elmiko: technically - yes, practically - I don’t know
18:36:26 <tmckay> I'm not too familiar with how server groups work, but based on reading, I think I agree with dmitryme -- no broken promises.
18:37:17 <tmckay> alazarev, good point -- if the engines work differently, that seems okay to me.  Especially if direct is on its way out.
18:37:39 <alazarev> tmckay: the only difference in behavior is that instances from different anti-affinity groups will be on different hosts now (or errored if there are no enough hosts)
18:38:10 <SergeyLukjanov> tmckay, alazarev, I'd like to avoid situation when users will use a-a and it will magically work differently
18:38:21 <SergeyLukjanov> so, IMO it's better to upgrade both engines
18:39:10 <alazarev> SergeyLukjanov: we don’t have engine switch policy, all clusters need to be recreated during switch
18:39:38 <alazarev> SergeyLukjanov: so, very low probability that anyone cares
18:39:40 <SergeyLukjanov> alazarev, I'm talking about two installations
18:41:19 <alazarev> I thought a little bit more, I think it is possible to write ‘upgrade’ code that will add existing VMs to group
18:41:42 <alazarev> in this case we can keep backward compatibility
18:42:05 <alazarev> do we have plans to test backward compatibility?
18:42:11 <elmiko> +1 to the 'upgrade' method
18:42:59 <alazarev> ok, will try to do ‘upgrade’ way
18:43:00 <SergeyLukjanov> +1 for upgrading while scaling for example
18:43:06 <alazarev> #topic Open discussion
18:43:15 <dmitryme> alazarev: but the old algorithm can assign two VMs to a single node, while they need to be added to a group by the new algorithm
18:43:15 <elmiko> SergeyLukjanov: that makes a lot of sense
18:43:39 <alazarev> dmitryme: this is nova problem, not Sahara :)
18:43:58 <dmitryme> alazarev: I am just not sure it will works
18:44:05 <SergeyLukjanov> I have an idea of adding versions to the engines to make validation aware of new/old clusters in case of heat engine
18:44:06 <dmitryme> s/works/work/
18:44:24 <SergeyLukjanov> I'll propose the lp issue with description and implementation earlier next week
18:44:25 <elmiko> +1 to versioned engines
18:44:32 <alazarev> SergeyLukjanov: +2 on versions
18:44:37 <dmitryme> +1 for versions
18:45:19 <elmiko> i have a couple questions about when to produce errors for the swift/auth stuff
18:45:21 <alazarev> #agreed Add Sahara version as cluster property
18:45:31 <SergeyLukjanov> #agreed upgrade both direct and heat engines to use server groups (try to upgrade cluster while scaling, add notes to upgrade doc)
18:45:56 <tmckay> folks, quick opinions on https://review.openstack.org/#/c/115113/.  In short, there is no way really to hide configs/args in Oozie (including swift stuff, old or new).  So, there are 2 choices: secgroup for public networks, and iptables on the Oozie server host for finer control on public/private networks.  I think documentation, leaving the solution up to admins, is the way to go.   I had proposed optional itpables rules applica
18:46:15 <tmckay> So, doc only, or iptables option, rules set by Sahara with ssh?
18:46:32 <elmiko> i like doc + iptables option
18:46:33 <alazarev> may be we could add provisioning engine and stuff like that too? e.g. to disable scaling with other engine
18:47:16 <tmckay> I think iptables option is not hard
18:47:20 <alazarev> tmckay: I think users must do it manually via security groups, ability to specify secgroup is already merged
18:47:39 <dmitryme> I like alazarev
18:47:46 <alazarev> tmckay: +1 on document that
18:47:51 <dmitryme> I like alazarev’s point
18:47:59 <alazarev> dmitryme: (hug)
18:48:07 <dmitryme> alazarev: :-)
18:48:34 <tmckay> alazarev, I was talking to elmiko yesterday, with lots of users in the same tenant, anyone who can launch an instance can get on the private network and get to Oozie, so you may want iptables.  But, anyone in the same tenant can get to swift, too.
18:48:35 <elmiko> lol
18:48:50 <tmckay> If Oozie would just screen jobs by user, it would be fine.  But it doesn't.
18:49:33 <alazarev> tmckay: users in the same tenant are visible to each other by definition, don’t see problem here
18:50:08 <alazarev> tmckay: if they need privasy - they can separate tenants
18:50:15 <tmckay> okay, fine with me.
18:50:30 <alazarev> *privacy
18:50:32 <dmitryme> tmckay: I agree with alazarev, user from the same tenant can simply snapshot/terminate instances of another user
18:50:45 <tmckay> documentation on secgroup, note to leave out oozie port recommended
18:51:11 * tmckay still thinks Oozie security could be better
18:51:29 <alazarev> tmckay: I would say not only oozie, hadoop has plenty of sensitive stuff
18:51:37 <elmiko> even with users in the same tenant, you wouldn't want them to get access to each other's creds.
18:51:39 <tmckay> agreed
18:51:54 <tmckay> elmiko, apparently, we don't care
18:52:02 <tmckay> same tenant is one big happy family
18:52:25 <tmckay> you better trust all the people in your tenant
18:52:27 <elmiko> Alice and Bob are working on a project in tenant_A, Bob also has access to tenant_B, if Alice gets Bob's creds, she can now access tenant_B
18:52:28 <tmckay> :)
18:52:37 <alazarev> ‘happy’ is questional… but one family for sure :)
18:52:41 <dmitryme> elmiko: very good point
18:52:47 <tmckay> elmiko, doh
18:53:29 <elmiko> i just like having a binary option for the admin to lock down the Oozie port with iptables. i think it's a simple option.
18:53:41 <alazarev> will this be solved with trusts? we don’t store passwords anymore
18:53:55 <tmckay> elmiko, more complicated, because Alice can also read hadoop logs
18:54:09 <elmiko> alazarev: it will
18:54:18 <tmckay> elmiko, I wonder if the job configs are in the hadoop logs, too
18:54:32 <tmckay> in which case locking down the oozie server doesn't matter
18:54:33 <alazarev> elmiko: what’s the problem in this case?
18:54:40 <elmiko> tmckay: i think only what we pass through workflow.xml
18:54:58 <elmiko> alazarev: if the user is only using the trust methodology, no problem.
18:55:21 <elmiko> what should we do in a situation where sahara is configured to use proxy, but the user still inputs creds for a data source?
18:55:39 <elmiko> drop the creds, use them, send a warning, ?
18:55:57 <tmckay> drop them, I think
18:56:03 <dmitryme> elmiko: I would say “throw an exception”, to be explicit
18:56:20 <elmiko> dmitryme: reject the form, essentially
18:56:21 <dmitryme> I mean, return an error, in terms of API
18:56:28 <dmitryme> elmiko: yep
18:56:58 <dmitryme> I just hate implicit things
18:57:00 <elmiko> ok, so if sahara is configured for proxy domains, then it will be a non-issue as no user creds will ever be used.
18:57:09 <tmckay> elmiko, back up.  is there any way to guard against the Bob and Alice scenario, if workflows can be seen?
18:57:16 <elmiko> dmitryme: agreed. explicit > implicit
18:57:44 <tmckay> cause Alice can get a trust in another tenant, right?
18:57:52 <elmiko> tmckay: it's still a problem if anyone can access a user's creds through the workflow
18:58:12 <elmiko> tmckay: oh that, yea the trust is between Alice and the proxy user
18:58:17 <alazarev> do we need creds in workflow?
18:58:27 <elmiko> alazarev: unfortunately yes
18:58:53 <elmiko> well, we need to get them to hadoop somehow. if we take oozie out of the mix, then we don't need the workflow.
18:58:56 <elmiko> as i understand it
18:59:01 <tmckay> elmiko, so you would still recommend an iptable option (or at least a doc note?)  Part of it for me hinges on what is visible in hadoop logs
18:59:05 <alazarev> elmiko: for swift? untill swift is patched?
18:59:11 <dmitryme> it is my understanding that these will be the creds of proxy user, which will have access to the given tenant only by the means of trust
18:59:21 <elmiko> dmitryme: yes
18:59:48 <elmiko> alazarev: currently, we pass the user's creds through workflow. when the swift/auth patch goes through, we will only be passing creds for the proxy user
19:00:13 <tmckay> elmiko, okay, so you can't cross tenants in that case.  The worst you can do is get auth in the current tenant, which you already have, right?
19:00:17 <dmitryme> so, if somebody from the same tenant steals them, he will not be able to access any other tenant Bob has access too
19:00:22 <dmitryme> s/too/to/
19:00:27 <elmiko> tmckay: yea
19:00:33 <elmiko> dmitryme: yea
19:00:48 <tmckay> alright, so public oozie access is still the major issue
19:00:49 <elmiko> the trust is scoped only to the user and project that the swift container is in
19:01:22 <tosky> if may I say one thing before closing...
19:01:30 <elmiko> in the future we might be able to use something like Barbican and asymmetric keys from the instances to completely remove creds from the process
19:01:45 <tmckay> tosky, sure
19:02:07 <alazarev> elmiko: yeap, somewhere in M cycle :)
19:02:11 <elmiko> alazarev: LOL
19:02:12 <tosky> one of our intern was working on horizon selenium tests (starting from https://blueprints.launchpad.net/horizon/+spec/selenium-integration-testing) with the goal of horizon/sahara tests
19:02:23 <amitgandhinz> do you mind wrapping up, thanks =P we have our poppy meeting starting now....
19:02:36 <alazarev> ops
19:02:37 <tmckay> move to the other channel
19:02:39 <dmitryme> tosky: lets go to #sahara
19:02:40 <elmiko> amitgandhinz: sorry
19:02:41 <alazarev> #endmeeting