Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20060][Deploy][Kerberos]Support Standalone visiting secured HDFS #17387

Closed
wants to merge 11 commits into from
Closed

[SPARK-20060][Deploy][Kerberos]Support Standalone visiting secured HDFS #17387

wants to merge 11 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Mar 22, 2017

What changes were proposed in this pull request?

Brief design

Introductions

The basic issue for Standalone mode to visit kerberos secured HDFS or other kerberized Services is how to gather the delegated tokens on the driver side and deliver them to the executor side.
When we run Spark on Yarn, we set the tokens to the container launch context to deliver them automatically and for long-term running issue caused by token expiration, we have it fixed with SPARK-14743 by writing the tokens to HDFS and updating the credential file and renewing them over and over.
When run Spark On Standalone, we currently have no implementations like Yarn to get and deliver those tokens.

Implementations

Firstly, we simply move the implementation of SPARK-14743 which is only for yarn to core module. And we use it to gather the credentials we need, and also we use it to update and renew with credential files on HDFS.
Secondly, credential files on secured HDFS are reachable for executors before they get the tokens. Here we add a sequence configuration spark.security.credential.entities which is used by the driver to put token.encodeToUrlString() before launching the executors, and used by the executors to fetch the credential as a string sequence during fetching the driver side spark properties, and then decode them to tokens. Before setting up the CoarseGrainedExecutorBackend we set the credentials to current executor side ugi.

How was this patch tested?

unit tests
manual tests on kerberized metastore service and hdfs

…ng to metastore, while using --proxy-user in a kerberized hdfs
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@yaooqinn
Copy link
Member Author

@jerryshao This may also fix issue SPARK-19995 and SPARK-19997 on yarn with those apps which take SparkSQLCLIDriver.main as entrance, plz take a look.
@vanzin @tgravescs @mridulm @dongjoon-hyun can you help to review this, thanks.

@tgravescs
Copy link
Contributor

I didn't think Spark officially supported kerberos in standalone mode. I'm pretty sure it doesn't work at all even if kinit'd due to a change that went in a long time back in runAsSparkUser, unless that changed fairly recently. This was brought up before as well but would have to dig up those details as to why it didn't go in. I think it was because if we are going to add support we should do it fully and document support for it and such.

I'm definitely fine with adding it, this would just be a feature vs a bug.

@tgravescs
Copy link
Contributor

Here is a jira from a long time back: https://issues.apache.org/jira/browse/SPARK-2541

@jerryshao
Copy link
Contributor

Does kerberos authentication really work in non-yarn cluster mode? AFAIK I don't see any code which will ship delegation tokens to executors other than yarn.

@yaooqinn
Copy link
Member Author

yes,this seem only fix the local mode,in standalone mode it still got problems of hdfs token lost @tgravescs @jerryshao .

@yaooqinn yaooqinn changed the title [SPARK-20060][Deploy][Kerberos][Spark Shell] Obtain credentials for proxy user before talking to hive metastore WIP[SPARK-20060][Deploy][Kerberos][Spark Shell] Obtain credentials for proxy user before talking to hive metastore Mar 23, 2017
@yaooqinn yaooqinn changed the title WIP[SPARK-20060][Deploy][Kerberos][Spark Shell] Obtain credentials for proxy user before talking to hive metastore WIP[SPARK-20060][Deploy][Kerberos]Support Standalone visiting secured HDFS Mar 25, 2017
@yaooqinn
Copy link
Member Author

cc @tgravescs, tested with a secured HDFS with standalone and works fine. And this pr has a lot of yarn specified security arguments to rename.

cc @jerryshao plz take a look.

@yaooqinn yaooqinn changed the title WIP[SPARK-20060][Deploy][Kerberos]Support Standalone visiting secured HDFS [SPARK-20060][Deploy][Kerberos]Support Standalone visiting secured HDFS Mar 27, 2017
@jerryshao
Copy link
Contributor

@yaooqinn normally such a big behavior change requires design doc and well discussion. It is not a good idea to push bunch of codes silently without any discussion.

Besides, we're working on refactoring current credential mechanism to support push tokens from external service and get rid of distribution of keytab, please see SPARK-19143. I think after that supporting kerberized services for standalone mode and others are much more easier and straightforward.

@tgravescs
Copy link
Contributor

Yeah if you plan on adding support for secure hdfs access in standalone mode, it needs a feature jira, probably go through SPIP and make sure everything truly works and is documented. I remember someone in the past tried to just revert that jira that broke it but others wanted to make sure it was fully supported.

I also think there is a jira for adding kerberos support to the mesos side.

So I think we should start with more design in the jira and SPIP to make sure people agree with this.

@mgummelt
Copy link
Contributor

@yaooqinn I'm trying to add Kerberos support for Mesos, and creating HadoopRDDs fail for me because YARN isn't configured: https://issues.apache.org/jira/browse/SPARK-20328

Did you run into this?

@jiangxb1987
Copy link
Contributor

@yaooqinn Shall we listen to @tgravescs and start with more design docs and SPIP?

@jerryshao
Copy link
Contributor

@yaooqinn I think the patch here is quite old and cannot be merged anymore, can you please close it.

If you still want to address this issue, can you please create a new PR, thanks!

@yaooqinn yaooqinn closed this Sep 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants