-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for accessing secured HDFS #265
Conversation
Also changed the way task run so tasks always run under the user who submit the tasks. This replaces the old approach of using a environment variable SPARK_USER to specify the user, which is far less flexible. This eases security management since users no longer need to open access to HDFS files under their home directory to the user who starts the Spark cluster. Signed-off-by: Yinan Li <[email protected]>
This PR replaces https://github.com/apache/incubator-spark/pull/467. |
Merged build triggered. Build is starting -or- tests failed to complete. |
Merged build started. Build is starting -or- tests failed to complete. |
Merged build finished. Build is starting -or- tests failed to complete. |
Build is starting -or- tests failed to complete. |
This is failing because of a style error: |
Signed-off-by: Yinan Li <[email protected]>
Merged build triggered. Build is starting -or- tests failed to complete. |
Merged build started. Build is starting -or- tests failed to complete. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
* @return Type of Hadoop security authentication | ||
*/ | ||
private def getAuthenticationType: String = { | ||
sparkConf.get("spark.hadoop.security.authentication") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not have a default value?
hi, whatever happened to this PR? I am interested in reading data from secure HDFS into spark running on Mesos... |
I want to know the reason that why this pull request not be merged. Does it go against the roadmap of spark? |
(I imagine part of the reason is that it doesn't merge into master, and failed tests) |
@dkanoafry with this patch, the main issue I see is that it distributes the delegation tokens insecurity (through sc.AddFile)... so anyone could just read the tokens over the network and mimic the user who is running the Spark job. In fact we start an HTTP file server, so you wouldn't even need to observe the traffic you could just make a request against it. I'm guessing this is fine for the company submitting the patch, but it's too weak of a security model IMO to merge upstream. Since we've added more recently support for securing the HTTP file server through a shared secret I think this might be okay to pull in now. @tgravescs would you mind taking a quick look? I think the idea here is that in standalone mode a user would just log in with a keytab and send delegation tokens to the executors, with the main goal being to provide access to a secured HDFS deployment. Is there a way now for them to set a share secret to authenticate this HTTP request? (I think it's fine to assume that they just set something in a conf file on all of the worker nodes, i.e. we don't need to disseminate that secret). |
@pwendell @tgravescs I have done some improvement for it and have created a new PR base on the newest master, you can work on it . PR:#2320 I am using this patch now (with spark-1.0.2) and I really hope it can be merged into the master so it can help others and I don't need to maintain the code. Thanks |
Hey @huozhanfeng - from what I can tell your PR also has the same issue with security I was mentioning above. I think it's worth seeing whether the |
I commented on the other pr. |
Test FAILed. |
I think we should close this issue for now, since there's another more-recent PR to add the same feature. |
…tion uses NULL as its default value This is a backport of apache@c4a6519 #### What changes were proposed in this pull request? This PR aims to fix the following two things: 1. `sql("SET -v").collect()` or `sql("SET -v").show()` throws a NullPointerException when a String configuration with default value `null` has been defined. 2. Currently, `SET` and `SET -v` commands show unsorted result. This PR sorts the result. ## How was this patch tested? Added a regression test to SQLQuerySuite. Author: Herman van Hovell <[email protected]> Author: Dongjoon Hyun <[email protected]> Closes apache#265 from hvanhovell/SPARK-19218.
Move bosh-openstack-cpi-release job definitions in repo
Also changed the way task run so tasks always run under the user who submit the tasks. This replaces the old approach of using a environment variable SPARK_USER to specify the user, which is far less flexible. This eases security management since users no longer need to open access to HDFS files under their home directory to the user who starts the Spark cluster.
Signed-off-by: Yinan Li [email protected]