18:02:52 #startmeeting sahara 18:02:53 Meeting started Thu Aug 21 18:02:52 2014 UTC and is due to finish in 60 minutes. The chair is SergeyLukjanov. Information about MeetBot at http://wiki.debian.org/MeetBot. 18:02:54 Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. 18:02:57 o/ 18:02:57 The meeting name has been set to 'sahara' 18:02:58 alazarev: Error: Can't start another meeting, one is in progress. Use #endmeeting first. 18:03:00 ops 18:03:01 :) 18:03:07 #chair alazarev 18:03:08 Current chairs: SergeyLukjanov alazarev 18:03:22 alazarev, please 18:03:38 ok, let’s start 18:04:32 #link https://wiki.openstack.org/wiki/Meetings/SaharaAgenda#Next_meetings 18:04:52 #topic sahara@horizon status (croberts, NikitaKonovalov) 18:05:06 Ok 18:05:20 alazarev: you are the chair! ;) something new :) 18:05:38 NikitaKonovalov: nice status in horizon :-) 18:05:46 lol 18:05:54 The change allowing the dashobard work with bootstrapt 3 is merge 18:05:56 yeah, Ok sounds great 18:05:59 merged* 18:06:00 aignatov: yes, and Sergey told about this 2 mins ago :) 18:06:03 finally 18:06:20 alazarev: surprise! 18:06:38 alazarev, aignatov, my inet connection is extremely bad today and I'm receiving messages by 1 min batches ;) 18:06:49 there is still a bugfix for templates' copy on review 18:07:01 NikitaKonovalov, that's awesome, so, we'll not have any more several version dropdowns :) 18:07:04 but that's very small and I hope it will land faster 18:07:39 SergeyLukjanov: Yes, the dropdowns now appear in right places only 18:08:01 hopefully other patches will land faster 18:08:07 so that's pretty much all from horizon 18:08:38 alazarev: I have request related to horizon and sahara, should I talk about this at the end (open discussion)? 18:08:47 tosky: sure 18:08:53 #chair alazarev_ 18:08:54 Current chairs: SergeyLukjanov alazarev alazarev_ 18:09:00 #topic News / updates 18:09:55 i've been working on getting a prototype of the swift/auth stuff running. it's going well, i'm currently testing the proxy user creation and trust delegation. next step will be authentication JobBinaries using proxy. 18:10:17 I’ve done series about plugin SPI -> edp engine 18:10:47 I'm working on some oslo.* upgrades and going to push new patch sets soon 18:10:51 I poked around with Oozie security, small discussion during open topics on "code" or "doc fix instead" 18:11:03 and catching up on lots of reviews :) 18:13:22 someone else? 18:13:39 ok, let’s move on 18:14:02 no action items from the previous meeting 18:14:22 #topic https://wiki.openstack.org/wiki/Juno_Release_Schedule 18:14:37 #topic Juno release 18:14:39 #link https://wiki.openstack.org/wiki/Juno_Release_Schedule 18:15:00 SergeyLukjanov: update from you? 18:15:43 nope 18:16:02 ops 18:16:07 one moment 18:16:14 we have some unassigned stuff 18:16:19 i'm concerned about the schedule for getting all the swift/auth patches together. ideally, when should the patches be in for inclusion in Juno? 18:16:27 #link https://launchpad.net/sahara/+milestone/juno-3 18:16:52 we need volunteer to verify and close https://blueprints.launchpad.net/sahara/+spec/swift-url-proto-cleanup-deprecations 18:17:24 SergeyLukjanov: I can do it 18:17:25 elmiko, so, we have two weeks to land sahara side patches 18:17:29 alazarev, ok, thx 18:17:54 alazarev, assigned to you 18:18:02 #action https://blueprints.launchpad.net/sahara/+spec/swift-url-proto-cleanup-deprecations (alazarev) 18:18:18 SergeyLukjanov: ok, i think it will be close. i'm hopeful to get this all concluded, but i know the testing and integration with gate mayb be cumbersome. 18:18:22 elmiko, do you have estimates on time needed to make this work done? 18:18:58 i'm hoping to have a full prototype by the end of next week. then we will need to coordinate about how to test the proxy domains in the gate. 18:19:13 elmiko: also consider putting your patch for early review 18:19:18 hopefully, the patches will be up for review by the end of next week. 18:19:24 dmitryme: thanks, i will 18:20:05 ok, let’s continue 18:20:34 #topic Heat engine backward compatibility 18:21:20 https://bugs.launchpad.net/sahara/+bug/1336525 broke backward compatibility for heat engine 18:21:21 Launchpad bug 1336525 in sahara "[DOC][HEAT] Document note about ec2-user" [High,Fix released] 18:21:38 do we want to simply document this? 18:22:49 нup 18:22:53 yup* 18:22:59 alazarev: it is woth noting that it is not the only change which broke backward compatibility, it is just the first one 18:23:02 we need to make progress on heat engine 18:23:20 SergeyLukjanov: 18:23:30 what do you mean? 18:23:35 and heat was a beta quality for Icehouse release, so, it's ok 18:24:00 aignatov, I mean that if we need to improve heat engine we could break backward compat for it with Icehouse 18:24:01 ok, we have https://bugs.launchpad.net/sahara/+bug/1357135 to document that 18:24:02 Launchpad bug 1357135 in sahara "Document changes from #1336525 in upgrade notes" [Undecided,New] 18:24:05 #link https://bugs.launchpad.net/sahara/+bug/1357135 18:24:12 it's not the default choice for installation in fact, if I remember the documentation 18:24:17 #topic Anti-affinity semantics 18:24:23 SergeyLukjanov: I think you meant ‘heat ENGINE is of beta quality” :-) 18:24:33 dmitryme, sure ;) 18:24:48 and related question abut anti affinity 18:25:02 so, here we have a much more interesting issue - we'd like to update a-a to work using server groups 18:25:05 and I like this idea 18:25:11 server groups is an openstack way to do it 18:25:14 but it changes a bit the semantics of a-a 18:25:28 but we started anti affinity before server groups were introduced 18:25:34 alazarev, please, write the diff between how it'll working 18:26:03 server groups now have restriction - each instance could be only in one server group 18:26:54 so, if nodegroup has several processes we can’t do anti affinity independently 18:27:37 proposed solution - make one server group per cluster and assign it to all instances that has at least one affected process 18:28:39 I made this for heat engine only: https://review.openstack.org/#/c/112159/ 18:29:13 because anti affinity was broken for heat engine at all 18:29:36 not at all :) 18:29:37 so, now we need to decide futher plans 18:30:18 aignatov: would you recommend way how it works for production? ;) 18:30:59 so, the first question: are we going to support direct engine for a log time? 18:31:07 so, I think that we should migrate to server groups in both direct and heat engines 18:31:11 for consistency 18:31:16 I would review at a different angle: we didn’t promised any specific algorithm for a-a in documentation. We just promised that it will work. By this change we do not break the promise, we just change the algorithm. So it is not a backward incompatible change 18:31:29 it doesn't breaks backward compat 18:31:36 and we have no docs about how a-a works 18:31:38 dmitryme, + 18:31:43 yeap 18:32:02 I agree with all thoughts, but direct engine should be changed too IMO 18:32:09 but migration to server groups in direct will break backward compatibility 18:32:22 alazarev: why? 18:32:42 dmitryme: all ‘old’ instances are not in the group 18:32:46 ++ for updating direct engine 18:33:03 alazarev, but it'll not break removal of old clusters 18:33:07 alazarev: aha, and you can’t assign an existing instance to a group, right? 18:33:13 we'll add note to the upgrade docs 18:33:37 I think it'll be ok 18:33:47 tmckay, elmiko, any thoughts? 18:33:49 dmitryme: Sahara need knowledge about ‘old’ way to do that 18:34:07 alazarev: I see 18:34:11 SergeyLukjanov: removal is Ok, it will break scaling only 18:34:27 i think best not to break backward compat, if possible 18:34:46 alazarev, it could be easily checked by looking for server groups while scaling 18:35:19 if we are going to remove direct in L for example, I would vote on ‘do not change direct, make engines work differently’ 18:35:42 is it possible to migrate "old" style instances to server groups? 18:36:14 elmiko: technically - yes, practically - I don’t know 18:36:26 I'm not too familiar with how server groups work, but based on reading, I think I agree with dmitryme -- no broken promises. 18:37:17 alazarev, good point -- if the engines work differently, that seems okay to me. Especially if direct is on its way out. 18:37:39 tmckay: the only difference in behavior is that instances from different anti-affinity groups will be on different hosts now (or errored if there are no enough hosts) 18:38:10 tmckay, alazarev, I'd like to avoid situation when users will use a-a and it will magically work differently 18:38:21 so, IMO it's better to upgrade both engines 18:39:10 SergeyLukjanov: we don’t have engine switch policy, all clusters need to be recreated during switch 18:39:38 SergeyLukjanov: so, very low probability that anyone cares 18:39:40 alazarev, I'm talking about two installations 18:41:19 I thought a little bit more, I think it is possible to write ‘upgrade’ code that will add existing VMs to group 18:41:42 in this case we can keep backward compatibility 18:42:05 do we have plans to test backward compatibility? 18:42:11 +1 to the 'upgrade' method 18:42:59 ok, will try to do ‘upgrade’ way 18:43:00 +1 for upgrading while scaling for example 18:43:06 #topic Open discussion 18:43:15 alazarev: but the old algorithm can assign two VMs to a single node, while they need to be added to a group by the new algorithm 18:43:15 SergeyLukjanov: that makes a lot of sense 18:43:39 dmitryme: this is nova problem, not Sahara :) 18:43:58 alazarev: I am just not sure it will works 18:44:05 I have an idea of adding versions to the engines to make validation aware of new/old clusters in case of heat engine 18:44:06 s/works/work/ 18:44:24 I'll propose the lp issue with description and implementation earlier next week 18:44:25 +1 to versioned engines 18:44:32 SergeyLukjanov: +2 on versions 18:44:37 +1 for versions 18:45:19 i have a couple questions about when to produce errors for the swift/auth stuff 18:45:21 #agreed Add Sahara version as cluster property 18:45:31 #agreed upgrade both direct and heat engines to use server groups (try to upgrade cluster while scaling, add notes to upgrade doc) 18:45:56 folks, quick opinions on https://review.openstack.org/#/c/115113/. In short, there is no way really to hide configs/args in Oozie (including swift stuff, old or new). So, there are 2 choices: secgroup for public networks, and iptables on the Oozie server host for finer control on public/private networks. I think documentation, leaving the solution up to admins, is the way to go. I had proposed optional itpables rules applica 18:46:15 So, doc only, or iptables option, rules set by Sahara with ssh? 18:46:32 i like doc + iptables option 18:46:33 may be we could add provisioning engine and stuff like that too? e.g. to disable scaling with other engine 18:47:16 I think iptables option is not hard 18:47:20 tmckay: I think users must do it manually via security groups, ability to specify secgroup is already merged 18:47:39 I like alazarev 18:47:46 tmckay: +1 on document that 18:47:51 I like alazarev’s point 18:47:59 dmitryme: (hug) 18:48:07 alazarev: :-) 18:48:34 alazarev, I was talking to elmiko yesterday, with lots of users in the same tenant, anyone who can launch an instance can get on the private network and get to Oozie, so you may want iptables. But, anyone in the same tenant can get to swift, too. 18:48:35 lol 18:48:50 If Oozie would just screen jobs by user, it would be fine. But it doesn't. 18:49:33 tmckay: users in the same tenant are visible to each other by definition, don’t see problem here 18:50:08 tmckay: if they need privasy - they can separate tenants 18:50:15 okay, fine with me. 18:50:30 *privacy 18:50:32 tmckay: I agree with alazarev, user from the same tenant can simply snapshot/terminate instances of another user 18:50:45 documentation on secgroup, note to leave out oozie port recommended 18:51:11 * tmckay still thinks Oozie security could be better 18:51:29 tmckay: I would say not only oozie, hadoop has plenty of sensitive stuff 18:51:37 even with users in the same tenant, you wouldn't want them to get access to each other's creds. 18:51:39 agreed 18:51:54 elmiko, apparently, we don't care 18:52:02 same tenant is one big happy family 18:52:25 you better trust all the people in your tenant 18:52:27 Alice and Bob are working on a project in tenant_A, Bob also has access to tenant_B, if Alice gets Bob's creds, she can now access tenant_B 18:52:28 :) 18:52:37 ‘happy’ is questional… but one family for sure :) 18:52:41 elmiko: very good point 18:52:47 elmiko, doh 18:53:29 i just like having a binary option for the admin to lock down the Oozie port with iptables. i think it's a simple option. 18:53:41 will this be solved with trusts? we don’t store passwords anymore 18:53:55 elmiko, more complicated, because Alice can also read hadoop logs 18:54:09 alazarev: it will 18:54:18 elmiko, I wonder if the job configs are in the hadoop logs, too 18:54:32 in which case locking down the oozie server doesn't matter 18:54:33 elmiko: what’s the problem in this case? 18:54:40 tmckay: i think only what we pass through workflow.xml 18:54:58 alazarev: if the user is only using the trust methodology, no problem. 18:55:21 what should we do in a situation where sahara is configured to use proxy, but the user still inputs creds for a data source? 18:55:39 drop the creds, use them, send a warning, ? 18:55:57 drop them, I think 18:56:03 elmiko: I would say “throw an exception”, to be explicit 18:56:20 dmitryme: reject the form, essentially 18:56:21 I mean, return an error, in terms of API 18:56:28 elmiko: yep 18:56:58 I just hate implicit things 18:57:00 ok, so if sahara is configured for proxy domains, then it will be a non-issue as no user creds will ever be used. 18:57:09 elmiko, back up. is there any way to guard against the Bob and Alice scenario, if workflows can be seen? 18:57:16 dmitryme: agreed. explicit > implicit 18:57:44 cause Alice can get a trust in another tenant, right? 18:57:52 tmckay: it's still a problem if anyone can access a user's creds through the workflow 18:58:12 tmckay: oh that, yea the trust is between Alice and the proxy user 18:58:17 do we need creds in workflow? 18:58:27 alazarev: unfortunately yes 18:58:53 well, we need to get them to hadoop somehow. if we take oozie out of the mix, then we don't need the workflow. 18:58:56 as i understand it 18:59:01 elmiko, so you would still recommend an iptable option (or at least a doc note?) Part of it for me hinges on what is visible in hadoop logs 18:59:05 elmiko: for swift? untill swift is patched? 18:59:11 it is my understanding that these will be the creds of proxy user, which will have access to the given tenant only by the means of trust 18:59:21 dmitryme: yes 18:59:48 alazarev: currently, we pass the user's creds through workflow. when the swift/auth patch goes through, we will only be passing creds for the proxy user 19:00:13 elmiko, okay, so you can't cross tenants in that case. The worst you can do is get auth in the current tenant, which you already have, right? 19:00:17 so, if somebody from the same tenant steals them, he will not be able to access any other tenant Bob has access too 19:00:22 s/too/to/ 19:00:27 tmckay: yea 19:00:33 dmitryme: yea 19:00:48 alright, so public oozie access is still the major issue 19:00:49 the trust is scoped only to the user and project that the swift container is in 19:01:22 if may I say one thing before closing... 19:01:30 in the future we might be able to use something like Barbican and asymmetric keys from the instances to completely remove creds from the process 19:01:45 tosky, sure 19:02:07 elmiko: yeap, somewhere in M cycle :) 19:02:11 alazarev: LOL 19:02:12 one of our intern was working on horizon selenium tests (starting from https://blueprints.launchpad.net/horizon/+spec/selenium-integration-testing) with the goal of horizon/sahara tests 19:02:23 do you mind wrapping up, thanks =P we have our poppy meeting starting now.... 19:02:36 ops 19:02:37 move to the other channel 19:02:39 tosky: lets go to #sahara 19:02:40 amitgandhinz: sorry 19:02:41 #endmeeting