-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws_eks_addon creation race condition #20404
Comments
@paulgear we see this issue too, but adding the manual dependency on the nodegroup resource doesn't work for us. Do you any more insights? |
@wcarlsen Maybe try a cluster readiness check like this? https://github.com/cmdlabs/cmd-tf-aws-eks/blob/master/cluster/auth-cm.tf#L22 |
Actually it's quite possible to create EKS cluster with addons but without any workers, at least AWS Web Console does this when creating cluster. Obviously coredns deployment will be in degraded state until some worker nodes are available. I think there are two ways to handle this:
|
Thanks for the input @paulgear, but I still didn't manage to get it working. I also tried out @z0rc's suggestion with adding a dependency between node group workers and the coreDNS addon without any luck. I guess we will have to wait around for the latter to be fixed and do the good "double apply" trick. |
I'm having the same issue since yesterday. I had a working run before the week-end.
We can see that the addons were created before the node group without any error, since yesterday i get also the following:
If I add a dependency on the addon definition relative to the node group then the creation goes fine but i end up with some ENI and SG left after the cluster deletion :( |
I'm also having the same issue, but with a slightly different use case. Last week I was able to provision the addon and then patch the deployment to run on fargate (https://docs.aws.amazon.com/eks/latest/userguide/fargate-getting-started.html#fargate-gs-coredns), unfortunately this is no longer possible due to the following error:
I've also been able to replicate this using older versions of the provider (such as v3.47.0) and this still occurs. Edit: In this case the EKS cluster is fargate only, with no node groups. |
We use un-managed node groups (aka. plain auto-scaling groups) controlled by one Terraform module, and manage EKS add-ons through another module. This workaround seems to do the trick: In our main EKS cluster module: module "eks_addons" {
source = "../../_sub/compute/eks-addons"
depends_on = [
module.eks_cluster,
module.eks_nodegroup1_workers,
module.eks_nodegroup2_workers
] # added explicit dependencies on node group modules, as a workaround to dfds/cloudplatform#380 and hashicorp/terraform-provider-aws#20404
...
} In our un-managed node group sub-module: resource "aws_autoscaling_group" "eks" {
...
provisioner "local-exec" {
command = "sleep 60" # added arbitrary delay to allow ASG to spin up instances, as a workaround to dfds/cloudplatform#380 and hashicorp/terraform-provider-aws#20404
}
} See also dfds/infrastructure-modules#276. |
* Workaround/fix for dfds/cloudplatform#380 and hashicorp/terraform-provider-aws#20404 * Re-enable QA destroy steps Co-authored-by: abstrask <[email protected]> Co-authored-by: Rasmus Rask <[email protected]>
This functionality has been released in v3.55.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you! |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Description
When created too soon after the EKS cluster (presumably before or during nodegroup creation), the
aws_eks_addon
resource doesn't always create correctly.Community Note
Terraform CLI and Terraform AWS Provider Version
Affected Resource(s)
Terraform Configuration Files
Please include all Terraform configurations required to reproduce the bug. Bug reports without a functional reproduction may be closed without investigation.
Debug Output
This will be provided later if needed, once I've redacted it sufficiently.
Panic Output
n/a
Expected Behavior
Degraded seems to be a fairly normal state for initial creation of EKS add-ons when the cluster is fairly new. The provider should wait long enough for the add-on to transition from degraded to active.
Actual Behavior
Error when applying initial configuration:
A second apply works fine.
Steps to Reproduce
terraform apply
Important Factoids
References
The text was updated successfully, but these errors were encountered: