-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support IAM Roles for Service Accounts #947
Comments
This seems reasonable. Sounds like we just need to upgrade the dependency then. I can take a look at this next week. |
I started taking a look at this myself, and ended up down the rabbit hole. Upgrading the aws sdk reference for operator-metering itself was straightforward, and I started things up using the custom docker image I built with that. It wasn't clear from the documentation exactly which pods needed access to AWS resources, but the hive-server seemed to be one of them, as its logs reported access denied trying to access the s3 bucket. Looking at the code to Hive, it appears to leverage Hadoop's s3a plugin to connect to s3, and doesn't itself actually have the AWS SDK as a dependency. So I looked at the source to s3a, and it appears to just grab the latest AWS SDK when it is built with the rest of Hadoop, which is great. Unfortunately, it seems like operator-metering uses a really ancient fork of Hive (https://github.com/operator-framework/hive) 3804 commits behind mainline, and its Hadoop dependency is years old and by extension, so is its AWS SDK. Does that all sound about right? I'm really eager to try this project, but because our corporate security policy doesn't allow IAM users (only assumed IAM roles and instance profiles) I'm not able to. |
So there's a few things that will talk to AWS. Reporting-operator, Presto, and Hive (both metastore and server). Hive has multiple versions, and version 3 hasn't been "stable" that long, and most distributions of Hive are based on Hive 2.3.x which is still maintained. We're not that far behind on that either, we're at 2.3.3 and latest is 2.3.6, and we follow the releases if there are fixes critical to our project. Presto, until recently hasn't worked with Hive 3.x at all really so we've been stuck on Hive 2.3.x. S3A comes from the core hadoop-common libraries which come from the hadoop installation which isn't part of Hive. Similarly, most are using Hadoop 2.x, and we're actually on hadoop 3.1.1 https://github.com/operator-framework/hadoop. I would expect all of these libraries to behave similarly to all AWS credentials chains, trying to use whichever credentials are available. My only thought it that we may be setting the AWS_* environment variables unconditionally, perhaps to empty values which might be an issue if that's the case. I know quite a bit in the past in our earlier days people had success with the kube2iam project, so I know these things work to some degree, but the service account stuff is new so we may need to upgrade some more things. |
For what it's worth, the version of the AWS java SDK is here: https://github.com/operator-framework/hadoop/blob/master/hadoop-project/pom.xml#L137
|
Thanks for clarifying which services need to be targeted. For the java1 sdk, 1.11.623 is the minimum to support the new authentication mechanism. I started by making a simple docker image like this:
Hoping I could get lucky and there wouldn't be any breaking changes. It didn't actually explode, but s3 is still unable to authenticate. I am currently working on rebuilding the hive image with debug logging turned on for s3a (it didn't look like there was any easier way to do that, but please let me know if there is). It's interesting that you say there has been success with kube2iam - we do actually use that, though we are trying to retire it in favor of the new service account mechanism. In order to get the project to deploy at all, I had to remove |
The files in
|
Ok, I've put the new jar there (as well as in /opt/hive/lib just to be sure, since the symlink gets broken when I remove the file, I guess) and confirmed that it's on the classpath (according to the log that's printed at the start). I also confirmed that there are no empty or otherwise conflicting env vars that might throw off the AWS SDK. Unfortunately, it still isn't authenticating and I can't think of a way to figure out which creds (if any) it is trying to use, without the app itself logging that. I just got the hive image rebuilt with debug logging (if I did it right), will see how that goes. Edit: Poorly. My image seems to be defective, it doesn't manage to start the hive server at all, it just logs
I'll dig into that tomorrow. |
You might be able to just set the logging you need in the log4j
configuration files as well.
…On Mon, Oct 14, 2019 at 12:26 PM Noah Potash ***@***.***> wrote:
Ok, I've put the new jar there (as well as in /opt/hive/lib just to be
sure, since the symlink gets broken when I remove the file, I guess) and
confirmed that it's on the classpath (according to the log that's printed
at the start). I also confirmed that there are no empty or otherwise
conflicting env vars that might throw off the AWS SDK. Unfortunately, it
still isn't authenticating and I can't think of a way to figure out which
creds (if any) it is trying to use, without the app itself logging that.
I'm struggling to get the hive docker image with debug logging built,
probably as a consequence of trying to do it on Windows. First I ran into
this:
The system is out of resources.
Consult the following stack trace for details.
java.lang.OutOfMemoryError: GC overhead limit exceeded
So I threw in ENV MAVEN_OPTS=-Xmx768m -XX:MaxPermSize=256m and it got
further, but now just seems to be hanging with the last log output being [INFO]
--- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @
hive-llap-ext-client ---. I'll let it run over night and hope it finds
its way through.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#947>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKV4UOSAMCGNIOXOKVFPILQOTBWZANCNFSM4IY2Z6EA>
.
|
I figured out what's going on here. First, I rebuilt the ansible-operator image using a modified hive-configmap.yaml that enabled the debug logging for s3a. This confirmed that s3a was using only the BasicAWSCredentialsProvider, EnvironmentVariableCredentialsProvider, and InstanceProfileCredentialsProvider - not the new WebIdentityTokenCredentialsProvider. Then I edited _hadoop_config_helpers.tpl to force the use of the new provider, by setting
which then caused it to throw the following stack trace:
And sure enough, WebIdentityTokenCredentialsProvider offers neither of those things. Looking at how it's used in the AWS SDK (https://github.com/aws/aws-sdk-java/blob/53683b836dec55226f45060dd5af3f00259650c3/aws-java-sdk-core/src/main/java/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.java) you can see that it uses So I guess that puts me back to editing the Hadoop fork to use the new provider from the new SDK. |
Looking from it, the default provider chain should make it so that So if you leave Either way, WebIdentityTokenCredentialsProvider should probably also have a default constructor in it so you can specify it directly. |
DefaultAWSCredentialsProviderChain does indeed fall back to WebIdentityTokenCredentialsProvider, but Hadoop/s3a doesn't use DefaultAWSCredentialsProviderChain . It uses its own ( https://github.com/operator-framework/hadoop/blob/master/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/AWSCredentialProviderList.java ) and populates it here: https://github.com/operator-framework/hadoop/blob/master/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java#L558 I'm working on rebuilding with |
Ah I see. I can look at making a PR to add a default constructor to this type in the aws-java-sdk since it seems reasonably simple and easy. |
aws/aws-sdk-java#2121 with this, if we updated the AWS SDK to use these changes I think that |
Cool. Yeah that ought to work, though since we would still need to rebuild Hadoop to pick up the new SDK it probably makes sense to put it in the default AWSCredentialProviderList anyway (ideally that would happen in the upstream Hadoop repo and backported, of course). |
You shouldn't need to rebuild hadoop, you should just be able to update the AWS SDK jar. |
Right - patching over the docker image like I've been doing. |
I managed to get everything working! Here's what I did:
With all that in place, I was able to generate the sample namespace-cpu-request report from the documentation. Thanks for your help with this, I'm looking forward to "proper" support for the new creds so I don't have to run with custom docker images, but this is good enough for now. |
we already support annotations on every component that I'm aware of.
we can add this as an option you can set in the MeteringConfig probably this was in Presto 316 and we'll be updating to a newer Presto pretty soon, we're on 311 now.
I can take a look at getting this done too. Overall seems reasonsable. I think once the aws-java-sdk change is merged we'll need to bump the version in our pom.xml files of each to get the latest, but I could be wrong, as I'm not sure if they've just got constraints or if they're pinned. |
Could you show me an example for how to add an annotation at "deployment" time? The only avenue for customization I saw was metering-custom.yaml, and it wasn't obvious to me how I would put an annotation in there that would get applied to a ServiceAccount. I've had to do one more thing for the reporting operator, which I didn't realize until I tried to use the aws billing support: I needed to add |
Hmm we don't yet support annotations on serviceAccounts, just the pods mostly, but we could add that. What do you need it for and what would be an example annotation? |
The service account is bound to the IAM role via annotation, and it is from there that the token gets projected into the pod. The full technical details are in the link I included in the first post of this issue, but in short, when a pod is created it gets intercepted by a mutating webhook (https://github.com/aws/amazon-eks-pod-identity-webhook) which inspects the Here's an example annotation applied to a ServiceAccount-
|
…dentials Hive catalog config property Partial fix for kube-reporting#947
Amazon recently released support for assigning IAM roles to Service Accounts (https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/).
In order to take advantage of this, the AWS SDK needs to be upgraded (The Go SDK needs to be a minimum of version 1.23.13), and we need a way to indicate the IAM role to be used, so it can be added as an annotation to the service account(s) used by the operator.
Assigning IAM roles through this method is easier and more secure than keeping long-running IAM users around, which have fixed credentials and [ideally] need them rotated periodically.
The text was updated successfully, but these errors were encountered: