-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exception while evaluating challenge [Caused by javax.security.auth.callback.UnsupportedCallbackException: Unsupported callback type:software.amazon.msk.auth.iam.internals.AWSCredentialsCallback] #36
Comments
Hello @lauroawsps , just to confirm that we (Kinesis Data Analytics) also ran into this issue and are currently working towards a fix. I will keep you posted. Sorry for the inconvenience. |
Hello @lauroawsps, we have now addressed this issue for KDA Studio. We have added support for the non-relocated Kafka connector and MSK IAM as service managed dependencies. This means you will no longer need to rebuild and provide the connector. This change is not yet available on the console, if you would like to test it out now you can update your application using the CLI. Here is an example request (please note that you will need to provide the entire list of custom artifacts here. Any additional Jars in S3 you want to include should be added to the list (reference)):
|
Brilliant @dannycranmer! Thank you very much! |
Closing since original issue seems to have been resolved. |
@dannycranmer does the fix also apply for KDA applications and not KDA studio? Encountering the same error in one of our KDA applications using Flink trying to consume from MSK using IAM access:
Curiously in just one of our accounts with the same MSK + IAM + KDA setup as other accounts. |
@irisgve in our testing this issue did not impact KDA Flink applications. The issue surfaces once a second job is created; KDA Flink applications only ever create one job. Unless there is an edge case we did not consider? Can you please provide more details? Does the job consume records initially? Does the job failover/restart before hitting this issue? |
Yeah, the job consumes records initially. The error as we saw actually causes the job to failover/restart so we had to fork the Flink Kafka source to retry when hitting MSK IAM errors because it just kills the job when encountered. |
@dannycranmer I am seeing this error too in KDA using Flink application (not studio). This issue we have is this is causing KDA to restart which increases the source latency. |
We are working on a fix. In the meantime if you want to move forwards you can patch the library yourself. This issue occurs when the
This ensures that the most recent @sayantacC would you mind re-opening this ticket? |
@dannycranmer Is there anyway for me to catch this error from my application and retry instead of patching the lib? I am having a hard time validating since our servers with IAM auth enabled are in higher environments which have a long release process. I am wondering if something like this would work:
But since this class is actually invoked in the callback, I suspect if this will catch the error when thrown. |
@dannycranmer Reopened as requested. |
@vmohanan1 you could try re-creating the provider under this failure mode, example below. Disclaimer, I have not tested this. One concern is that if
Note: ExceptionUtils from Apache commons |
thanks I will try this. I am currently on version |
@dannycranmer Just noticed that |
Apologies I cannot provide an ETA at this point. Some supplementary information: The block diagram below illustrates the problem: We have confirmed the issue also exists for other Kafka plugins, for example, Scram. However when using Scram the majority of callbacks are usable because they are also loaded from parent classloader (since the package prefix is
If you try to define any extensions via |
When you say Job A and Job B, you mean individual parallel units in a single job, correct? |
No I am referring to two parallel Flink jobs here. However, the same problem applies to a single job stop/start (failover). |
@dannycranmer we have forked the library and applied the patch that uses
The classloader references are shown in the stack trace, do you have any insights on what's happening and whether this can be manually patched? |
@dannycranmer I'm running into the same issue. I have a Flink job with a single task running in Kubernetes: 2 jobmanagers (for HA) and 3 task managers. If I kill the leader jobmanager and let it restart, I start to see the UnsupportedCallbackException. I forked the library and wasn't able to get this suggestion working
but I tweaked it slightly and this seems to fix the problem for me
|
I have the same issue with flink (run on EKS) with services role and MSK IAM.
and similar error
Versions |
@james4388 Did you try the workaround that is mentioned here? Of course, if you have control over the Flink cluster for replacing the libs in the location Just for clarity, I am sharing the list of jars (the following image), that I added to the lib location. |
…der based on ClassLoader
…der based on ClassLoader
…der based on ClassLoader
…der based on ClassLoader
For the record I got the same issue, trying to produce a record in MSK and GlueSchemaRegistryKafkaSerializer with Scala:
Converting the exact same code in Java, make it work, I don't know why. |
Fixed in Release 1.1.5.. |
Hello,
Using the library aws-msk-iam-auth to authenticate Zeppelin and Flink processes in an MSK cluster with IAM authentication enabled, following error is being thrown when trying to run more than one task at the same TaskManager:
Library version 1.1.0 has been imported from S3 bucket to Studio Notebook and added as a dependency. Following configuration is being used:
As the default artifact included in the notebook (flink-sql-connector-kafka_2.12) shades Kafka client dependencies, following Gradle was used to also shade dependencies within aws-msk-iam-auth:
The setup works and authenticates fine with MSK when a single task is being executed by TaskManager. However, as mentioned before, when trying to run more than one task in the same manager, following exception is thrown:
Could you please help with the issue?
Steps to reproduce:
The text was updated successfully, but these errors were encountered: