-
-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] IRSA example login errors #1083
Comments
@Justin-DynamicD which version of this module are you using ? Are you using the module "iam_assumable_role_admin" {
source = "terraform-aws-modules/iam/aws//modules/iam-assumable-role-with-oidc"
version = "~> v3.0"
create_role = true
role_name = "cluster-autoscaler"
provider_url = replace(module.eks.cluster_oidc_issuer_url, "https://", "")
role_policy_arns = [aws_iam_policy.cluster_autoscaler.arn]
oidc_fully_qualified_subjects = ["system:serviceaccount:${local.k8s_service_account_namespace}:${local.k8s_service_account_name}"]
}
|
I am, but I'm using it -as-is from the example page linked, meaning its at 2.14.0. Ill retry off hours and let you know how it behaves, thanks for the tip! Edit: I'm re-inting with the following module versions:
TF version is: 0.13.4 |
no dice, sadly. I did not completely destroy, just deleted the old deployment then tried hte helm chart again:
Im at a loss. I guess Ill just fall back on applying the permissions to the worker role for now as I honeslty have no idea where to begin to troubleshoot this feature. |
My point here #1083 (comment) is to update the IAM role for oidc version constraint and terraform apply again to see if it create your assumable role correctly. |
Sorry if I was not clear. Here's what I did to the environment:
My problem is I I honestly dont understand how IRSA works at all, so I dont even know where to look to verify the deployment is trying to hit the right endpoint, using right creds, etc. I was hoping this template would get me a fast running example so I could see a "functioning" cluster, but I jsut seem to have hit a brick wall. |
Im wondering if in the helm plan I need to make sure the k8s_service_account_name matches what's expected? finally reading up on how this IRSA is supposed ot work and I think maybe the helm chart and the terraform plan are not making the same service account names. EDIT: Update OK if I run |
It sounds like your helm chart doesn't create correct annotation for your service account. Are you using defining helm values like this https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/examples/irsa/cluster-autoscaler-chart-values.yaml#L5-L7 ? rbac:
create: true
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::<ACCOUNT ID>:role/cluster-autoscaler" We updated the doc yesterday #1063 |
I'm experiencing this exact same issue. I've confirmed the annotation is applied to the service account as mentioned above, and that it points to the ARN identified in this role. |
well I'm glad im not crazy/the only one not seeing this work? I did modify the heml chart yaml. As I am running eks 1.18, I updated the file to the following:
|
should this line be updated? Not certain why it's set as "cluster-autoscaler-aws-cluster-autoscaler" whereas the aws role is simply "cluster-autoscaler". Again, I appologize if im blindly poking at the wrong things, I'm really just trying to look for gaps. |
This comment tipped me off to the solution. It appears that the latest cluster-autoscaler helm chart creates a service account named @Justin-DynamicD are you using the cluster-autoscaler chart from the new helm repo or the deprecated Perhaps we should update the irsa example README to link to the new helm chart and update |
Somewhat ironically, updating the By using 2.14.0 instead I got this working with the solution in my previous comment. |
can confirm: v3 does remove a role as well:
I think you unlocked pandoras box for me. Seems like i should have either used the deprecated autoscaler or dug into helm deeper. |
Confirming the issue was the newer auto-scaler excpects a different name than configured. In the end, I had to update the service account name to match the new name, and now things seem to be running. Thank you all for your help in this. (EDIT: also note the new auto-scaler does not have any OS filters on the node selector, so if you are running a mixed-worker environment, you need to add a nodeselector or similar). |
To avoid this, shouldn't we set a service account name for the cluster autoscaler https://github.com/helm/charts/blob/master/stable/cluster-autoscaler/values.yaml#L121 ? |
Thanks for sharing this. I'll take a look as soon as I have a moment. |
Yep, that works too. Although I think it actually makes more sense to update I've opened a PR #1090 with these changes as well as a few other fixes in the IRSA README. |
Great. Thanks for opening a PR for this. And yes, we should update the example to use the recommended helm chart. With that said, in production environment, I think it better to be explicit for these kind of change instead of trying to always create something that should match the chart default. Defaults will change during upgrades and you'll noticed most of the time when it's too late. This only engages me, but to me we should also set explicitly the autoscaler service account name to always match with What do you think ? |
Yeah, that's a good point. If this relationship had been more specific to begin with, it might not have been such a mystery as to what was going wrong here. I've updated the PR to explicitly set the |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
the example deployment outlined here:
https://github.com/terraform-aws-modules/terraform-aws-eks/tree/master/examples/irsa
I could not get this example to function in my testing, running into IAM permissions issues.
While the role and permissions appear to get created properly, after modifying and apply the helm chart, the resulting auto-scaler service gets stuck in a crash loop stating access denied and declaring it did not have the nessisary permissions. to perform the action, failing imediately citing lack of sts:AssumeRoleWithWebIdentity permissions (verified this is set in the trust relationship of the default "cluster-autoscaler" role"
The only variations made was I updated the cluster to 1.18, and set the image version accordingly in the helm chart.
After struggling with this, I fell back on AWS documentation and applied permissions directly to the EC2 roles generated by the module and the service started perfectly.
I feel like there is some step missing here Im unaware of, in order to use the endpoint?
My actual module call is below, tags remvoed:
The text was updated successfully, but these errors were encountered: