Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade from version v13.2.1 to v15.1.0 fails with "NodeGroup already exists with name..." #1314

Closed
Ghazgkull opened this issue Apr 20, 2021 · 5 comments

Comments

@Ghazgkull
Copy link

Ghazgkull commented Apr 20, 2021

Description

I currently have a deployment using Terraform 0.14 and version 13.2.1 of this module. I am upgrading to Terraform 0.15, which requires me to upgrade to version 15.x of this module.

As per the release notes, I changed my config to adopt the breaking instance_type=>instance_types change:
Before: instance_type = var.worker_instance_type
After: instance_types = [var.worker_instance_type]

Versions

  • Terraform:
    Terraform v0.15.0
    on darwin_amd64
  • provider registry.terraform.io/gavinbunney/kubectl v1.10.0
  • provider registry.terraform.io/hashicorp/aws v3.37.0
  • provider registry.terraform.io/hashicorp/external v2.1.0
  • provider registry.terraform.io/hashicorp/kubernetes v2.1.0
  • provider registry.terraform.io/hashicorp/local v2.1.0
  • provider registry.terraform.io/hashicorp/null v3.1.0
  • provider registry.terraform.io/hashicorp/random v3.1.0
  • provider registry.terraform.io/hashicorp/template v2.2.0
  • provider tf.platforms.nike.com/platforms/cerberus v0.8.1
  • Provider(s):
    Terraform v0.15.0
    on darwin_amd64
  • provider registry.terraform.io/gavinbunney/kubectl v1.10.0
  • provider registry.terraform.io/hashicorp/aws v3.37.0
  • provider registry.terraform.io/hashicorp/external v2.1.0
  • provider registry.terraform.io/hashicorp/kubernetes v2.1.0
  • provider registry.terraform.io/hashicorp/local v2.1.0
  • provider registry.terraform.io/hashicorp/null v3.1.0
  • provider registry.terraform.io/hashicorp/random v3.1.0
  • provider registry.terraform.io/hashicorp/template v2.2.0
  • provider tf.platforms.nike.com/platforms/cerberus v0.8.1
  • Module:
    15.1.0

Reproduction

Steps to reproduce the behavior:

Attempting to apply my terraform after upgrading results in the following error:

│ Error: error creating EKS Node Group ([redacted]:[redacted]-us-west-2-worker-node-group): ResourceInUseException: NodeGroup already exists with name [redacted]-us-west-2-worker-node-group and cluster name [redacted]
│ {
│   RespMetadata: {
│     StatusCode: 409,
│     RequestID: "42b7a027-830f-4bc9-8451-6a31b17b84dd"
│   },
│   ClusterName: "[redacted]",
│   Message_: "NodeGroup already exists with name [redacted]-us-west-2-worker-node-group and cluster name [redacted]",
│   NodegroupName: "[redacted]-us-west-2-worker-node-group"
│ }
│
│   on .terraform/modules/eks/modules/node_groups/node_groups.tf line 1, in resource "aws_eks_node_group" "workers":
│    1: resource "aws_eks_node_group" "workers" {

Code Snippet to Reproduce

Expected behavior

Actual behavior

Terminal Output Screenshot(s)

Additional context

@kahootali
Copy link
Contributor

Facing the same issue, updating from 13.2.1 to 15.x and saying it will recreate node groups with different instance types, is it possible I can update without having to recreate node groups?

@Ghazgkull
Copy link
Author

Update on this: I think the "NodeGroup already exists" error actually only happens if you try to update the workers_group_defaults configuration from instance_type to instance_types. It looks like they actually didn't update this configuration to match the node group module change... So individual node groups are configured with instance_types, but the workers_group_defaults is configured with instance_type.

If I revert this change and deploy with workers_group_defaults using instance_type, the error goes away and the update deploys cleanly.... Though it does incur an effective total cluster outage when the old node group gets destroyed because it doesn't wait for workloads to migrate to the new node group first. That's a total mess, but I feel like we should probably open a separate issue for the outage issue and let this one track the issue with the configuration syntax mismatch and the doc misleading folks to think they need to change something which hasn't changed.

@cbui
Copy link

cbui commented May 5, 2021

v15.2.0 with #1138 should fix this.

@barryib
Copy link
Member

barryib commented May 29, 2021

This has been fixed. Please upgrade your module. See docs and changelog for more details.

@barryib barryib closed this as completed May 29, 2021
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants