Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reload credentials file config when app starts with checkpoint file i… #17782

Closed
wants to merge 1 commit into from

Conversation

jiasheng55
Copy link

What changes were proposed in this pull request?

Currently credentials file configuration is recovered from checkpoint file when Spark Streaming applicatioin is restarted, which will lead to some unwanted behaviors, for example:

  1. Submit Spark Streaming application using keytab file with checkpoint enabled in yarn-cluster mode.

spark-submit --master yarn-cluster --principal xxxx --keytab xxx ...

  1. Stop Spark Streaming application;
  2. Resubmit this application after a period of time (i.e. one day);
  3. Credentials file configuration recover from checkpoint file, so value of "spark.yarn.credentials.file" points to old staging directory (i.e. hdfs://xxxx/.sparkStaging/application_xxxx/credentials-xxxx, application_xxxx is the application id of the previous application which was stopped.)
  4. When launching executor, ExecutorDelegationTokenUpdater will update credentials from credentials file immediately. As credentials file was generated one day ago (maybe older), it has already expired, so after a period of time the executor keeps failing.

Some useful logs are shown below :

2017-04-27,15:08:08,098 INFO org.apache.spark.executor.CoarseGrainedExecutorBackend: Will periodically update credentials from: hdfs://xxxx/application_xxxx/credentials-xxxx
2017-04-27,15:08:12,519 INFO org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater: Reading new delegation tokens from hdfs://xxxx/application_1xxxx/credentials-xxxx-xx
2017-04-27,15:08:12,661 INFO org.apache.spark.deploy.yarn.ExecutorDelegationTokenUpdater: Tokens updated from credentials file.
...
2017-04-27,15:08:48,156 WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token xxxx for xx) can't be found in cache

How was this patch tested?

manual tests

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@jerryshao
Copy link
Contributor

@Victor-Wong are you going to submit PR for branch-1.6, does this issue exist in master branch? Also would you please elaborate more about this issue, thanks.

@jiasheng55
Copy link
Author

@jerryshao
Sorry for replying late. This issue does exist in master branch as I described in https://issues.apache.org/jira/browse/SPARK-19688.
What should I do to elaborate more about this issue? Should I submit a new PR for master branch?
Thanks.

@srowen
Copy link
Member

srowen commented May 9, 2017

@Victor-Wong we wouldn't merge a change just to 1.6, if something else fixed it in master. You could investigate back-porting whatever fixed it into 1.6, though I doubt there will be more 1.6 releases. This should be closed in any event.

@jiasheng55
Copy link
Author

@srowen
Thanks for replying. I tested with master branch and it turned out the issue still existed.
I create a new PR against master branch, #17937.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants