Skip to content
This repository has been archived by the owner on Jul 26, 2022. It is now read-only.

Issue with SSM and OpenIDConnect provider on version 4.1.0+ #442

Closed
phajduk opened this issue Jul 17, 2020 · 15 comments · Fixed by #453
Closed

Issue with SSM and OpenIDConnect provider on version 4.1.0+ #442

phajduk opened this issue Jul 17, 2020 · 15 comments · Fixed by #453
Labels
aws bug Something isn't working

Comments

@phajduk
Copy link

phajduk commented Jul 17, 2020

Hey.

Below issue does not occur on version 4.0.0. This error occurs only on 4.1.0+

External secret definition:

apiVersion: kubernetes-client.io/v1
kind: ExternalSecret
metadata:
  namespace: <my-namespace>
  name: <my-key>
spec:
  backendType: systemManager
  roleArn: <ARN-of-the-role-I-use-to-query-SSM>
  data:
    - key: <my-SSM-path>
      name: <my-target-key>

Error in external secrets pod:

{"level":50,"time":1594992337968,"pid":17,"hostname":"external-secrets-release-kubernetes-external-secrets-f4c46hkk6p","message":"No OpenIDConnect provider found in your account for https://oidc.eks.us-ea │
│ st-1.amazonaws.com/id/<removed-GUID>","code":"InvalidIdentityToken","time":"2020-07-17T13:25:37.968Z","requestId":"<removed-GUID>","statusCode":400,"retryable":false, "retryDelay":6.0951075243333985,"stack":"InvalidIdentityToken: No OpenIDConnect provider found in your account for https://oidc.eks.us-east-1.amazonaws.com/id/<removed-GUID>\n    at Re │
│ quest.extractError (/app/node_modules/aws-sdk/lib/protocol/query.js:50:29)\n    at Request.callListeners (/app/node_modules/aws-sdk/lib/sequential_executor.js:106:20)\n    at Request.emit (/app/node_modul │
│ es/aws-sdk/lib/sequential_executor.js:78:10)\n    at Request.emit (/app/node_modules/aws-sdk/lib/request.js:683:14)\n    at Request.transition (/app/node_modules/aws-sdk/lib/request.js:22:10)\n    at Acce │
│ ptorStateMachine.runTo (/app/node_modules/aws-sdk/lib/state_machine.js:14:12)\n    at /app/node_modules/aws-sdk/lib/state_machine.js:26:10\n    at Request.<anonymous> (/app/node_modules/aws-sdk/lib/reques │
│ t.js:38:9)\n    at Request.<anonymous> (/app/node_modules/aws-sdk/lib/request.js:685:12)\n    at Request.callListeners (/app/node_modules/aws-sdk/lib/sequential_executor.js:116:18)","type":"Error","msg":" │
│ failure while polling the secret <my-secret>"}
@Flydiverny
Copy link
Member

Are you using IRSA/IAM roles for service account?
There was a fix that should cause those uses to assume role correctly. Most likely they shouldn't have worked before, at least not in the intended way.

Related changes:
#416
#417

@phajduk
Copy link
Author

phajduk commented Jul 17, 2020

@Flydiverny thanks for the fast response.

Yes, I am using IAM role.

Moreover my case is a bit more complex. IAM role assigned to the Service Account is used to assume role mentioned in the ExternalSecret. And it worked perfectly before 🤨

@Flydiverny Flydiverny added aws bug Something isn't working labels Jul 18, 2020
@Flydiverny
Copy link
Member

Hmm.. so I take it the old code worked as well (obviously).

If I understand correctly, before there was a need for the pod's iam role to have a trust relation ship with AssumeRole action to assume the secret specific role. Which is the behavior that I thought wasn't working at all before. Kind of like the example here https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts-technical-overview.html#cross-account-access

While now the trust relationship would need to be from OIDC with sts:AssumeRoleWithWebIdentity action.

There was some discussion in #254 for fargate support which seemed to have extra needs for the changes introduced.
But reading back and digging thru the main concern there seemed to be that fargate usage didn't load the service account role to begin with. At least the gist here https://gist.github.com/lukaszbudnik/f1f42bd5a57430e3c25034200ba44c2e doesn't try to assume a different role for the external secret.

@lukaszbudnik @dalgibbard any thoughts?

@dalgibbard
Copy link
Contributor

This was kinda why I originally put it behind a Fargate flag / envvar, in case this usage breaks other edge cases.
I'd suggest we should do that :)

@Flydiverny
Copy link
Member

Yeah! Have to say that now I'm confused why fargate needed this change to begin with!

Are you setting roleArn in external secrets to assume other roles @dalgibbard? Or is it just for the role used by the deployment?

@dalgibbard
Copy link
Contributor

Ermm I'd have to check the code to give a comment with all the right terminology 😅 but i created a service account with the role needed as per the helm chart, and then deployed external-secrets without it creating the service account etc.
Fargate only EKS is OIDC with WebIdentity; and that's about all I know right now :)

@phajduk
Copy link
Author

phajduk commented Jul 19, 2020

I allowed myself to dig a bit and it looks like this statement may be wrong (cc: @lukaszbudnik).

The problem is that for non-Fargate EKS, the env var AWS_WEB_IDENTITY_TOKEN_FILE is set as well for pod. It means, for the non-Fargate cluster we still call AWS.STS.assumeRoleWithWebIdentity instead of AWS.STS.assumeRole in the current implementation.

At this point, I'm not sure what's the best solution. I believe that trying to distinguish if we run Fargate vs EC2 worker node may be a bad idea (I'm even not sure if it's possible right now). I think (this)[https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/] wall of text may be useful.

@dalgibbard
Copy link
Contributor

Yeap that sounds right. This is why I'm not convinced that the current implementation (if Web Identity is present, use it always) is the right way to do it.
I don't have an EKS Fargate cluster to hand at the moment, but if we can identify it is Fargate, we can use that, or allow using WebIdentity using a flag...
Or it's also rumoured that the AWS SDK should actually work all this out automatically, but I don't know JS anywhere near well enough to even take a stab at that.

@Flydiverny
Copy link
Member

I'll see if I can get a test cluster up using fargate if I get some time on my hands.

@sdickhoven
Copy link

sdickhoven commented Jul 25, 2020

i'd like to chime in here as i'm struggling with the same thing (or at least something closely related).

i'm using external-secrets 4.2.0 on a cluster with irsa enabled (not fargate, but managed worker nodes).

i have a ServiceAccount annotation that lets external-secrets pods assume the role arn:aws:iam::111111111111:role/external-secrets:

    eks.amazonaws.com/role-arn: arn:aws:iam::111111111111:role/external-secrets

in my ExternalSecret manifest i tried to use:

  assumeRole: arn:aws:iam::222222222222:role/secret-reader

but doing that lead to:

{"level":50,"time":1595648005611,"pid":17,"hostname":"external-secrets-c79f97985-gvw9q","message":"Not authorized to perform sts:AssumeRoleWithWebIdentity","code":"AccessDenied","time":"2020-07-25T03:33:25.611Z","requestId":"5f8c8963-885c-4318-b6b1-556a3f9abd05","statusCode":403,"retryable":false,"retryDelay":65.78212229463618,"stack":"AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity\n    at Request.extractError (/app/node_modules/aws-sdk/lib/protocol/query.js:50:29)\n    at Request.callListeners (/app/node_modules/aws-sdk/lib/sequential_executor.js:106:20)\n    at Request.emit (/app/node_modules/aws-sdk/lib/sequential_executor.js:78:10)\n    at Request.emit (/app/node_modules/aws-sdk/lib/request.js:683:14)\n    at Request.transition (/app/node_modules/aws-sdk/lib/request.js:22:10)\n    at AcceptorStateMachine.runTo (/app/node_modules/aws-sdk/lib/state_machine.js:14:12)\n    at /app/node_modules/aws-sdk/lib/state_machine.js:26:10\n    at Request.<anonymous> (/app/node_modules/aws-sdk/lib/request.js:38:9)\n    at Request.<anonymous> (/app/node_modules/aws-sdk/lib/request.js:685:12)\n    at Request.callListeners (/app/node_modules/aws-sdk/lib/sequential_executor.js:116:18)","type":"Error","msg":"failure while polling the secret default/secretsmanager-example"}

i have all the iam wiring set up for irsa to work and to allow arn:aws:iam::111111111111:role/external-secrets to assume the role arn:aws:iam::222222222222:role/secret-reader (in both accounts - this is not an iam issue, i have this exact same setup working in the same cluster with grafana assuming roles in different accounts to access cloudwatch logs).

so external-secrets seems to try to assumeRoleWithWebIdentity the arn:aws:iam::222222222222:role/secret-reader role. that logic seems incorrect.

as i understand it, external-secrets should assumeRoleWithWebIdentity the irsa role for the deployment itself (arn:aws:iam::111111111111:role/external-secrets) and then use that to do a regular assumeRole to the arn:aws:iam::222222222222:role/secret-reader role.

fwiw, the irsa part totally works: as long as i don't add assumeRole to my ExternalSecret, everything works as expected. but it appears that things got muddled up between the base identity and the assumeRole part.

just for giggles i also tried adding the eks oidc provider to the target account and using an assume role policy that would allow a straight-up irsa-style assumeRoleWithWebIdentity into the target role.

but since external-secrets is using the web token that belongs to the base role (arn:aws:iam::111111111111:role/external-secrets) to assumeRoleWithWebIdentity a role in another account (arn:aws:iam::222222222222:role/secret-reader), that also doesn't work.

@Flydiverny
Copy link
Member

@sdickhoven Yepp! Sounds right :)

Work around is to use 4.0.0 for the time being.

@dalgibbard
Copy link
Contributor

dalgibbard commented Jul 25, 2020

Is it fair to suggest that we should push the WebIdentity option behind an environment variable / parameter?
AWS_WEBIDENTITY=true or something?

Any reasons against this?

@sdickhoven
Copy link

sdickhoven commented Jul 26, 2020

Work around is to use 4.0.0 for the time being.

...just to confirm: v4.0.0 fixes the problem and cross-account sts:AssumeRole in conjunction with irsa works as expected.

(no securityContext.fsGroup: 65534 required by the way)

@Flydiverny
Copy link
Member

@dalgibbard I think I'll need some more insight in your setup.

I've now done some trial runs using EKS Fargate.
First following the gist mentioned earlier ( https://gist.github.com/lukaszbudnik/f1f42bd5a57430e3c25034200ba44c2e )

That gives you a new eks cluster using fargate, as well as setting up a role for OIDC which can be used by your service account, in my case the role arn:aws:iam::111111111:role/eksctl-fargate-test-kes-iamserviceaccount-role.

Then I also tried in combination with setting roleArn on an external secret.

Creating a new IAM role arn:aws:iam::1111111111:role/assume-this-role with a trust relationship to the role assumed by the service account:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::111111111:role/eksctl-fargate-test-kes-iamserviceaccount-role"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

the secret that needs an alternative role:

apiVersion: 'kubernetes-client.io/v1'
kind: ExternalSecret
metadata:
  name: hello-service-with-role
spec:
  backendType: secretsManager
  roleArn: arn:aws:iam::1111111111:role/assume-this-role
  data:
    - key: hello-service/password
      name: password

This gives me the correct expected output:

[1595814400855] DEBUG : spinning up poller for default/hello-service-with-role
[1595814400857] INFO  : starting poller for default/hello-service-with-role
[1595814401023] DEBUG : next poll for default/hello-service-with-role in 0 ms
[1595814401023] INFO  : running poll on the secret default/hello-service-with-role
[1595814401193] INFO  : fetching secret property hello-service/password with role: arn:aws:iam::111111111:role/assume-this-role
[1595814402799] INFO  : upserting secret default/hello-service-with-role
[1595814402969] DEBUG : updating status for default/hello-service-with-role to: SUCCESS

#453 restores the old behavior

I've pushed a docker image for testing here flydiverny/kuberenetes-external-secrets:pr-453

And tested it with the helm chart values:

env:
  AWS_REGION: eu-west-1
  AWS_DEFAULT_REGION: eu-west-1
  DISABLE_POLLING: true
  LOG_LEVEL: debug

serviceAccount:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::111111111:role/eksctl-fargate-test-kes-iamserviceaccount-role

image:
  repository: flydiverny/kuberenetes-external-secrets
  tag: pr-453

securityContext:
  fsGroup: 65534

@dalgibbard
Copy link
Contributor

That is curious tbh.

I've dropped EKS Fargate from our project due to lacking centralised logging options.
I probably can't share the terraform directly for compliance reasons, so at this point we can probably assume I've done something wrong, and revert the two changes?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
aws bug Something isn't working
Projects
None yet
4 participants