-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20060][Deploy][Kerberos]Support Standalone visiting secured HDFS #17387
Conversation
…ng to metastore, while using --proxy-user in a kerberized hdfs
Can one of the admins verify this patch? |
@jerryshao This may also fix issue SPARK-19995 and SPARK-19997 on yarn with those apps which take |
I didn't think Spark officially supported kerberos in standalone mode. I'm pretty sure it doesn't work at all even if kinit'd due to a change that went in a long time back in runAsSparkUser, unless that changed fairly recently. This was brought up before as well but would have to dig up those details as to why it didn't go in. I think it was because if we are going to add support we should do it fully and document support for it and such. I'm definitely fine with adding it, this would just be a feature vs a bug. |
Here is a jira from a long time back: https://issues.apache.org/jira/browse/SPARK-2541 |
Does kerberos authentication really work in non-yarn cluster mode? AFAIK I don't see any code which will ship delegation tokens to executors other than yarn. |
yes,this seem only fix the local mode,in standalone mode it still got problems of hdfs token lost @tgravescs @jerryshao . |
cc @tgravescs, tested with a secured HDFS with standalone and works fine. And this pr has a lot of yarn specified security arguments to rename. cc @jerryshao plz take a look. |
@yaooqinn normally such a big behavior change requires design doc and well discussion. It is not a good idea to push bunch of codes silently without any discussion. Besides, we're working on refactoring current credential mechanism to support push tokens from external service and get rid of distribution of keytab, please see SPARK-19143. I think after that supporting kerberized services for standalone mode and others are much more easier and straightforward. |
Yeah if you plan on adding support for secure hdfs access in standalone mode, it needs a feature jira, probably go through SPIP and make sure everything truly works and is documented. I remember someone in the past tried to just revert that jira that broke it but others wanted to make sure it was fully supported. I also think there is a jira for adding kerberos support to the mesos side. So I think we should start with more design in the jira and SPIP to make sure people agree with this. |
@yaooqinn I'm trying to add Kerberos support for Mesos, and creating Did you run into this? |
@yaooqinn Shall we listen to @tgravescs and start with more design docs and SPIP? |
@yaooqinn I think the patch here is quite old and cannot be merged anymore, can you please close it. If you still want to address this issue, can you please create a new PR, thanks! |
What changes were proposed in this pull request?
Brief design
Introductions
The basic issue for Standalone mode to visit kerberos secured HDFS or other kerberized Services is how to gather the delegated tokens on the driver side and deliver them to the executor side.
When we run Spark on Yarn, we set the tokens to the container launch context to deliver them automatically and for long-term running issue caused by token expiration, we have it fixed with SPARK-14743 by writing the tokens to HDFS and updating the credential file and renewing them over and over.
When run Spark On Standalone, we currently have no implementations like Yarn to get and deliver those tokens.
Implementations
Firstly, we simply move the implementation of SPARK-14743 which is only for yarn to core module. And we use it to gather the credentials we need, and also we use it to update and renew with credential files on HDFS.
Secondly, credential files on secured HDFS are reachable for executors before they get the tokens. Here we add a sequence configuration
spark.security.credential.entities
which is used by the driver to puttoken.encodeToUrlString()
before launching the executors, and used by the executors to fetch the credential as a string sequence during fetching the driver side spark properties, and then decode them to tokens. Before setting up theCoarseGrainedExecutorBackend
we set the credentials to current executor side ugi.How was this patch tested?
unit tests
manual tests on kerberized metastore service and hdfs