Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IAM AssumedRole doesn't work in ECS #243

Closed
bploetz opened this issue Dec 23, 2019 · 10 comments
Closed

IAM AssumedRole doesn't work in ECS #243

bploetz opened this issue Dec 23, 2019 · 10 comments

Comments

@bploetz
Copy link

bploetz commented Dec 23, 2019

Hi. I can't tell if this is an issue with cloudwatch_exporter, or the underlying AWS Java SDK.

I'm trying to run the prom/cloudwatch-exporter:cloudwatch_exporter-0.7.0 Docker container on AWS ECS. The ECS Service is using an IAM Role Policy which includes the permissions:

cloudwatch:ListMetrics
cloudwatch:GetMetricStatistics
tag:GetResources

The following error occurs when starting the exporter:

[ec2-user@ip-X-X-X-X ~]$ sudo docker logs -f 5dbc385dade9
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.amazonaws.util.XpathUtils (file:/cloudwatch_exporter.jar) to method com.sun.org.apache.xpath.internal.XPathContext.getDTMManager()
WARNING: Please consider reporting this to the maintainers of com.amazonaws.util.XpathUtils
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Dec 23, 2019 4:26:41 PM io.prometheus.cloudwatch.CloudWatchCollector collect
WARNING: CloudWatch scrape failed
com.amazonaws.services.cloudwatch.model.AmazonCloudWatchException: User: arn:aws:sts::xxxxxxxxxxxx:assumed-role/xxxxx-xxxxx-xxxxxx-xxxxxxx-xxxxx/xxxxxxxxxxxxxxx is not authorized to perform: cloudwatch:ListMetrics (Service: AmazonCloudWatch; Status Code: 403; Error Code: AccessDenied; Request ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:2030)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:1997)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:1986)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.executeListMetrics(AmazonCloudWatchClient.java:1323)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.listMetrics(AmazonCloudWatchClient.java:1295)
	at io.prometheus.cloudwatch.CloudWatchCollector.listDimensions(CloudWatchCollector.java:392)
	at io.prometheus.cloudwatch.CloudWatchCollector.getDimensions(CloudWatchCollector.java:346)
	at io.prometheus.cloudwatch.CloudWatchCollector.scrape(CloudWatchCollector.java:558)
	at io.prometheus.cloudwatch.CloudWatchCollector.collect(CloudWatchCollector.java:671)
	at io.prometheus.client.CollectorRegistry.collectorNames(CollectorRegistry.java:100)
	at io.prometheus.client.CollectorRegistry.register(CollectorRegistry.java:50)
	at io.prometheus.client.Collector.register(Collector.java:139)
	at io.prometheus.client.Collector.register(Collector.java:132)
	at io.prometheus.cloudwatch.WebServer.main(WebServer.java:27)

2019-12-23 16:26:41.727:INFO::main: Logging initialized @3281ms to org.eclipse.jetty.util.log.StdErrLog
2019-12-23 16:26:41.845:INFO:oejs.Server:main: jetty-9.4.z-SNAPSHOT; built: 2019-02-15T16:53:49.381Z; git: eb70b240169fcf1abbd86af36482d1c49826fa0b; jvm 11.0.5+10
2019-12-23 16:26:41.926:INFO:oejsh.ContextHandler:main: Started o.e.j.s.ServletContextHandler@6a9d5dff{/,null,AVAILABLE}
2019-12-23 16:26:41.963:INFO:oejs.AbstractConnector:main: Started ServerConnector@79813403{HTTP/1.1,[http/1.1]}{0.0.0.0:9106}
2019-12-23 16:26:41.964:INFO:oejs.Server:main: Started @3519ms

It seems as if the assumed IAM Role attached to the container is not honored correctly.

I then tried adding the same IAM Role policy permissions noted above to the underlying EC2 host's IAM Role, and ran the exporter directly on the EC2 instance. This starts cleanly.

[ec2-user@ip-x-x-x-x ~]$ java -jar cloudwatch_exporter.jar 9106 /etc/config/cloudwatch_exporter/config.yml 
2019-12-23 16:28:49.379:INFO::main: Logging initialized @2385ms to org.eclipse.jetty.util.log.StdErrLog
2019-12-23 16:28:49.503:INFO:oejs.Server:main: jetty-9.4.z-SNAPSHOT; built: 2019-02-15T16:53:49.381Z; git: eb70b240169fcf1abbd86af36482d1c49826fa0b; jvm 1.8.0_222-b10
2019-12-23 16:28:49.565:INFO:oejsh.ContextHandler:main: Started o.e.j.s.ServletContextHandler@54eb2b70{/,null,AVAILABLE}
2019-12-23 16:28:49.581:INFO:oejs.AbstractConnector:main: Started Serv

Is the issue regarding assumed roles in an ECS container an issue with prom/cloudwatch-exporter, or the underlying Java AWS SDK? Is anyone successfully running the prom/cloudwatch-exporter Docker container in ECS, and perhaps I've just got something misconfigured?

Thanks in advance.

@brian-brazil
Copy link
Contributor

We don't do anything fancy, we only use the basic SDK.

@bploetz
Copy link
Author

bploetz commented Dec 23, 2019

I reported this in the aws-sdk-java issues. See the workaround I noted there if anyone else runs into this.

aws/aws-sdk-java#2186

@mozai
Copy link

mozai commented Jan 21, 2020

I ran into this too, deploying into a kubernetes pod, using docker image prom/cloudwatch-exporter:cloudwatch_exporter-0.6.0

role kops-node has a policy including { "Effect": "Allow", "Action": [ "sts:AssumeRole" ], "Resource": [ "arn:aws:iam::123456789012:role/*" ] } . Every EC2 instance used as nodes by the kubernetes cluster has this role assigned to it.

role prom-exporter has a policy including { "Effect": "Allow", "Action": [ "cloudwatch:Describe*", "cloudwatch:Get*", "cloudwatch:List*" ], "Resource": "*" }

role prom-exporter has a trust relationship that includes { "Sid": "", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::123456789012:role/kops-node" }, "Action": "sts:AssumeRole" }

I can run aws cloudwatch list-metrics (*) successfully with no credentials in the same pod, so I know the pod has the correct access by role. cloudwatch_exporter, however, throws this error:

com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: User: arn:aws:sts::123456789012:assumed-role/prom-exporter/12345678-prom-exporter is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::123456789012:role/prom-exporter (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: some-uuid)

(*: I admit the aws command uses amazon's python libraries, not amazon's Java SDK, so it's not a true apples-to-apples comparison, but I hope it's enough to prove the problem isn't external to the docker container nor EC2 instance.)

@max-rocket-internet
Copy link

I have the same problem on EKS and version 0.8.0 of this exporter.

The env vars on the pod are correct:

AWS_ROLE_ARN: arn:aws:iam::66666666666:role/xxxxxxxxxxx
AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token

But it still tries to assume the role from instance instead.

I will chase it up with the SDK.

@monsterxx03
Copy link

monsterxx03 commented Apr 15, 2020

Did you run as nobody? it will make cloudwatch-exporter have no access to the token file.

The official chart run as nobody by default: https://github.com/helm/charts/blob/master/stable/prometheus-cloudwatch-exporter/values.yaml#L200

securityContext:
    runAsUser: 65534  # run as nobody user instead of root

Just remove it, will make process run as root, then works.

@max-rocket-internet
Copy link

max-rocket-internet commented Apr 15, 2020

Good find @monsterxx03

Here's the Kubernetes issue for this problem: kubernetes/kubernetes#82573

Abd here's a workaround: kubernetes-sigs/external-dns#1185

Add this for the k8s pod/deployment:

      securityContext:
        fsGroup: 65534

I'll test it and make a PR for the cloudwatch exporter chart

EDIT: No PR required, just set this in k8s chart values:

securityContext:
  runAsUser: 65534
  fsGroup: 65534

I don't know how that works for ECS though.

@lachlancooper
Copy link
Contributor

EDIT: No PR required, just set this in k8s chart values:

securityContext:
  runAsUser: 65534
  fsGroup: 65534

Have you managed to get IRSA working with this approach? I've set these values and am able to access the projected token file from inside the pod (as nobody) but the exporter (0.8.0) is still trying to assume the role from the instance.

See also #248 and #263 for SDK updates. Updating to 1.11.704 or higher should have been sufficient to address the provider priority issue.

@max-rocket-internet
Copy link

@lachlancooper

Have you managed to get IRSA working with this approach?

Yes. It works for us in 7 different EKS clusters. But remember you need the correct annotation and assume role policy on the role. And correct namespace etc.

@lachlancooper
Copy link
Contributor

Thanks! Upon further investigation I can see I was confused and the exporter (0.8.0) is actually working correctly with IRSA.

@brian-brazil
Copy link
Contributor

Sounds like this was all fixed already then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants