Added support for accessing secured HDFS #265

liyinan926 · 2014-03-28T21:35:09Z

Also changed the way task run so tasks always run under the user who submit the tasks. This replaces the old approach of using a environment variable SPARK_USER to specify the user, which is far less flexible. This eases security management since users no longer need to open access to HDFS files under their home directory to the user who starts the Spark cluster.

Signed-off-by: Yinan Li [email protected]

Also changed the way task run so tasks always run under the user who submit the tasks. This replaces the old approach of using a environment variable SPARK_USER to specify the user, which is far less flexible. This eases security management since users no longer need to open access to HDFS files under their home directory to the user who starts the Spark cluster. Signed-off-by: Yinan Li <[email protected]>

liyinan926 · 2014-03-28T21:35:49Z

This PR replaces https://github.com/apache/incubator-spark/pull/467.

AmplabJenkins · 2014-03-28T21:37:23Z

Merged build triggered. Build is starting -or- tests failed to complete.

AmplabJenkins · 2014-03-28T21:37:29Z

Merged build started. Build is starting -or- tests failed to complete.

AmplabJenkins · 2014-03-28T21:38:56Z

Merged build finished. Build is starting -or- tests failed to complete.

AmplabJenkins · 2014-03-28T21:38:57Z

Build is starting -or- tests failed to complete.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13560/

pwendell · 2014-03-28T21:41:02Z

This is failing because of a style error:
error file=/root/workspace/SparkPullRequestBuilder/core/src/main/scala/org/apache/spark/executor/Executor.scala message=File line length exceeds 100 characters line=192

Signed-off-by: Yinan Li <[email protected]>

AmplabJenkins · 2014-03-28T22:02:23Z

Merged build triggered. Build is starting -or- tests failed to complete.

AmplabJenkins · 2014-03-28T22:02:30Z

Merged build started. Build is starting -or- tests failed to complete.

AmplabJenkins · 2014-03-28T22:59:54Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-03-28T22:59:54Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13562/

zevada · 2014-06-06T19:00:40Z

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

+   * @return Type of Hadoop security authentication
+   */
+  private def getAuthenticationType: String = {
+    sparkConf.get("spark.hadoop.security.authentication")


Should this not have a default value?

dkanoafry · 2014-08-06T17:08:12Z

hi, whatever happened to this PR? I am interested in reading data from secure HDFS into spark running on Mesos...

huozhanfeng · 2014-09-02T05:35:47Z

I want to know the reason that why this pull request not be merged. Does it go against the roadmap of spark?

srowen · 2014-09-02T08:48:37Z

(I imagine part of the reason is that it doesn't merge into master, and failed tests)

pwendell · 2014-09-08T05:46:02Z

@dkanoafry with this patch, the main issue I see is that it distributes the delegation tokens insecurity (through sc.AddFile)... so anyone could just read the tokens over the network and mimic the user who is running the Spark job. In fact we start an HTTP file server, so you wouldn't even need to observe the traffic you could just make a request against it. I'm guessing this is fine for the company submitting the patch, but it's too weak of a security model IMO to merge upstream.

Since we've added more recently support for securing the HTTP file server through a shared secret I think this might be okay to pull in now. @tgravescs would you mind taking a quick look? I think the idea here is that in standalone mode a user would just log in with a keytab and send delegation tokens to the executors, with the main goal being to provide access to a secured HDFS deployment. Is there a way now for them to set a share secret to authenticate this HTTP request? (I think it's fine to assume that they just set something in a conf file on all of the worker nodes, i.e. we don't need to disseminate that secret).

huozhanfeng · 2014-09-08T15:44:00Z

@pwendell @tgravescs I have done some improvement for it and have created a new PR base on the newest master, you can work on it .

PR:#2320
JIRA:https://issues.apache.org/jira/browse/SPARK-3438

I am using this patch now (with spark-1.0.2) and I really hope it can be merged into the master so it can help others and I don't need to maintain the code.

Thanks

pwendell · 2014-09-08T19:03:39Z

Hey @huozhanfeng - from what I can tell your PR also has the same issue with security I was mentioning above. I think it's worth seeing whether the addFile serving can be authenticated easily. I agree it would be great to get this patch merged in since I think a few different companies are maintaining somewhat-working versions of this.

tgravescs · 2014-09-08T19:24:08Z

I commented on the other pr.

SparkQA · 2014-09-23T19:37:17Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20720/

JoshRosen · 2014-12-19T04:08:11Z

I think we should close this issue for now, since there's another more-recent PR to add the same feature.

…tion uses NULL as its default value This is a backport of apache@c4a6519 #### What changes were proposed in this pull request? This PR aims to fix the following two things: 1. `sql("SET -v").collect()` or `sql("SET -v").show()` throws a NullPointerException when a String configuration with default value `null` has been defined. 2. Currently, `SET` and `SET -v` commands show unsorted result. This PR sorts the result. ## How was this patch tested? Added a regression test to SQLQuerySuite. Author: Herman van Hovell <[email protected]> Author: Dongjoon Hyun <[email protected]> Closes apache#265 from hvanhovell/SPARK-19218.

Move bosh-openstack-cpi-release job definitions in repo

…pache#265)

Fixed style error

3f5d7d6

Signed-off-by: Yinan Li <[email protected]>

zevada reviewed Jun 6, 2014
View reviewed changes

huozhanfeng mentioned this pull request Sep 8, 2014

Support for accessing secured HDFS in Standalone Mode #2320

Closed

asfgit closed this in 534f24b Dec 27, 2014

mccheah pushed a commit to mccheah/spark that referenced this pull request Oct 12, 2017

Correct name for bintray upload (apache#265)

dc9af54

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#265 from animationzl/cpi-job-in-repo

d2fcb0f

Move bosh-openstack-cpi-release job definitions in repo

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

MapR [SPARK-180] Spark Integration for Kafka 0.8 unit tests disabled (a…

beecf74

…pache#265)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for accessing secured HDFS #265

Added support for accessing secured HDFS #265

liyinan926 commented Mar 28, 2014

liyinan926 commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

pwendell commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

zevada Jun 6, 2014

dkanoafry commented Aug 6, 2014

huozhanfeng commented Sep 2, 2014

srowen commented Sep 2, 2014

pwendell commented Sep 8, 2014

huozhanfeng commented Sep 8, 2014

pwendell commented Sep 8, 2014

tgravescs commented Sep 8, 2014

SparkQA commented Sep 23, 2014

JoshRosen commented Dec 19, 2014

Added support for accessing secured HDFS #265

Added support for accessing secured HDFS #265

Conversation

liyinan926 commented Mar 28, 2014

liyinan926 commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

pwendell commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

AmplabJenkins commented Mar 28, 2014

zevada Jun 6, 2014

Choose a reason for hiding this comment

dkanoafry commented Aug 6, 2014

huozhanfeng commented Sep 2, 2014

srowen commented Sep 2, 2014

pwendell commented Sep 8, 2014

huozhanfeng commented Sep 8, 2014

pwendell commented Sep 8, 2014

tgravescs commented Sep 8, 2014

SparkQA commented Sep 23, 2014

JoshRosen commented Dec 19, 2014