-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure backoff and retry options for credentials provider authentication refresh #59
Comments
@olileach Thanks for reporting the problem. I will look into the error handling in that code path. In the meanwhile is there any chance for getting debug logs of such a failure, say in a test environment ? It would be very helpful to find out the exact problem the |
@sayantacC - Thanks for such a quick response on this. We will report back and update the open support case with the relevant info. |
Looks like the new branch released to solve this issue has fixed our problem as we now see the re-try after a failed auth , which results in the job continuing rather than failing with the no default credentials error. Please can we get this branch merged to main and then I'd be happy to close this issue? Thanks . |
I have merged the branch to main as 1.1.3. |
Version 1.1.3 has been released. It should show up in the maven repos in a day or two. |
We are running a large number of processes on EMR. We have 10 YARN jobs, with each YARN job spawning 8 processes using a Java Futures object, and these 10 YARN jobs are running on one EC2 instance. We have several EC2 instances running within our EMR cluster of which some don't exhibit problems authenticating and some do. We are seeing intermittent authentication failures after the EMR jobs are running for a few hours, where the aws-msk-iam-auth library is trying to refresh the IAM token in order to continue processing messages from MSK in EMR. Here's the error message we receive:
The credentials provider should be using the EC2 instance profile attached to the EC2 instance. If you follow the errors above , you can see the process matches this chain https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html, but doesn't find a credentials provider and then fails. The key is that this is an intermittent issue whereby most of the time, the auth works. However, when there are no default credentials provider found, the YARN job fails and EMR jobs fail.
I can see where the token refresh callback is:
aws-msk-iam-auth/src/main/java/software/amazon/msk/auth/iam/IAMClientCallbackHandler.java
Line 94 in af25047
It would be great to have some config that allows us configure a backoff and retry to refresh the IAM credentials to handle situations where there is potential throttling happening when querying the metadata service where there is particular high load.
Similar to the backoff for the number of connections to MSK, we would like options to configure the retries and backoff in ms (say 1000 or 2000) and retry attempts
So if the option is specified, sleep 1 or 2 seconds (or time based on the provided configuration) and retry 3 times?
Thanks in advance
The text was updated successfully, but these errors were encountered: