Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/clustername' is not a valid tag key. #1785

Closed
brenwhyte opened this issue Jan 17, 2022 · 15 comments · Fixed by #1788

Comments

@brenwhyte
Copy link

brenwhyte commented Jan 17, 2022

Description

Testing the Complete and the IRSA examples I get an error that the AutoScalingGroup can't create instances:

Versions

❯ terraform version
Terraform v1.1.3
on linux_amd64

  • provider registry.terraform.io/hashicorp/aws v3.72.0
  • provider registry.terraform.io/hashicorp/cloudinit v2.2.0
  • provider registry.terraform.io/hashicorp/helm v2.4.1
  • provider registry.terraform.io/hashicorp/null v3.1.0
  • provider registry.terraform.io/hashicorp/tls v3.1.0

Reproduction

Clone repo and test the examples.

Code Snippet to Reproduce

Expected behavior

Auto Scaling group creates instances

Actual behavior

Terminal Output Screenshot(s)

module.eks.module.self_managed_node_group["spot"].aws_autoscaling_group.this[0]: Still destroying... [id=spot-20220117114248850100000023, 9m50s elapsed]
module.eks.module.self_managed_node_group["refresh"].aws_autoscaling_group.this[0]: Still creating... [9m50s elapsed]
module.eks.module.self_managed_node_group["spot"].aws_autoscaling_group.this[0]: Still destroying... [id=spot-20220117114248850100000023, 10m0s elapsed]
module.eks.module.self_managed_node_group["refresh"].aws_autoscaling_group.this[0]: Still creating... [10m0s elapsed]
╷
│ Error: "refresh-20220117122547139700000001": Waiting up to 10m0s: Need at least 1 healthy instances in ASG, have 0. Most recent activity: {
│   ActivityId: "a8b5f9a0-2397-c9ec-a78e-25f731576957",
│   AutoScalingGroupARN: "arn:aws:autoscaling:eu-west-1:*snip*:autoScalingGroup:a6153771-becc-4c1c-89da-44d7e9044423:autoScalingGroupName/refresh-20220117122547139700000001",
│   AutoScalingGroupName: "refresh-20220117122547139700000001",
│   Cause: "At 2022-01-17T12:35:19Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.",
│   Description: "Launching a new EC2 instance.  Status Reason: 'kubernetes.io/cluster/exirsatest' is not a valid tag key. Tag keys must match pattern ([0-9a-zA-Z\\\\-_+=,.@:]{1,255}), and must not be a reserved name ('.', '..', '_index'). Launching EC2 instance failed.",
│   Details: "{\"Subnet ID\":\"subnet-006ccbc3820532b53\",\"Availability Zone\":\"eu-west-1c\"}",
│   EndTime: 2022-01-17 12:35:20 +0000 UTC,
│   Progress: 100,
│   StartTime: 2022-01-17 12:35:20.69 +0000 UTC,
│   StatusCode: "Failed",
│   StatusMessage: "'kubernetes.io/cluster/exirsatest' is not a valid tag key. Tag keys must match pattern ([0-9a-zA-Z\\\\-_+=,.@:]{1,255}), and must not be a reserved name ('.', '..', '_index'). Launching EC2 instance failed."
│ }
│ 
│   with module.eks.module.self_managed_node_group["refresh"].aws_autoscaling_group.this[0],
│   on ../../modules/self-managed-node-group/main.tf line 260, in resource "aws_autoscaling_group" "this":
│  260: resource "aws_autoscaling_group" "this" {
│ 
╵

Additional context

You can see I tried renaming the folder to exirsatest and that didn't help.

By removing the keys below keys unblocked the ASG but is not a fix of course.

kubernetes.io/cluster/exirsatest
k8s.io/cluster/ex-irsatest

@brenwhyte brenwhyte changed the title AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/exirsatest' is not a valid tag key. AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/<clustername>' is not a valid tag key. Jan 17, 2022
@brenwhyte brenwhyte changed the title AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/<clustername>' is not a valid tag key. AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/clustername' is not a valid tag key. Jan 17, 2022
@daroga0002
Copy link
Contributor

please paste a config of module which you using

@brenwhyte
Copy link
Author

❯ git clone https://github.com/terraform-aws-modules/terraform-aws-eks.git
Cloning into 'terraform-aws-eks'...
remote: Enumerating objects: 4176, done.
remote: Counting objects: 100% (843/843), done.
remote: Compressing objects: 100% (457/457), done.
remote: Total 4176 (delta 508), reused 620 (delta 381), pack-reused 3333
Receiving objects: 100% (4176/4176), 1.42 MiB | 535.00 KiB/s, done.
Resolving deltas: 100% (2716/2716), done.
❯ cd terraform-aws-eks/examples/irsa_autoscale_refresh
❯ terraform init
Initializing modules...
Downloading registry.terraform.io/terraform-aws-modules/iam/aws 4.9.0 for aws_node_termination_handler_role...
- aws_node_termination_handler_role in .terraform/modules/aws_node_termination_handler_role/modules/iam-assumable-role-with-oidc
Downloading registry.terraform.io/terraform-aws-modules/sqs/aws 3.2.1 for aws_node_termination_handler_sqs...
- aws_node_termination_handler_sqs in .terraform/modules/aws_node_termination_handler_sqs
- eks in ../..
- eks.eks_managed_node_group in ../../modules/eks-managed-node-group
- eks.eks_managed_node_group.user_data in ../../modules/_user_data
- eks.fargate_profile in ../../modules/fargate-profile
- eks.self_managed_node_group in ../../modules/self-managed-node-group
- eks.self_managed_node_group.user_data in ../../modules/_user_data
Downloading registry.terraform.io/terraform-aws-modules/iam/aws 4.9.0 for iam_assumable_role_cluster_autoscaler...
- iam_assumable_role_cluster_autoscaler in .terraform/modules/iam_assumable_role_cluster_autoscaler/modules/iam-assumable-role-with-oidc
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 3.11.3 for vpc...
- vpc in .terraform/modules/vpc

Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/helm versions matching ">= 2.0.0"...
- Finding hashicorp/aws versions matching ">= 2.23.0, >= 3.63.0, >= 3.72.0"...
- Finding hashicorp/null versions matching ">= 3.0.0"...
- Finding hashicorp/tls versions matching ">= 2.2.0"...
- Finding hashicorp/cloudinit versions matching ">= 2.0.0"...
- Installing hashicorp/helm v2.4.1...
- Installed hashicorp/helm v2.4.1 (signed by HashiCorp)
- Installing hashicorp/aws v3.72.0...
- Installed hashicorp/aws v3.72.0 (signed by HashiCorp)
- Installing hashicorp/null v3.1.0...
- Installed hashicorp/null v3.1.0 (signed by HashiCorp)
- Installing hashicorp/tls v3.1.0...
- Installed hashicorp/tls v3.1.0 (signed by HashiCorp)
- Installing hashicorp/cloudinit v2.2.0...
- Installed hashicorp/cloudinit v2.2.0 (signed by HashiCorp)

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
❯ terraform apply --auto-approve

and it should timeout/fail when creating the ASGs

@brenwhyte
Copy link
Author

All three Auto Scaling groups have the issue from the example:

image

@bryantbiggs
Copy link
Member

it looks fine per awsdocs/amazon-eks-user-guide#38 (comment) but the regex doesn't have forward slashes

@brenwhyte are you able to file a ticket with AWS to get their input?

@brenwhyte
Copy link
Author

I can, I'll get on that in 2 secs. I noticed on an earlier cluster that tags are the same, so this did work previously.

Lets see what AWS Support say.

@doker78
Copy link

doker78 commented Jan 17, 2022

Has same errors with 'self-managed-node-group' example:

AutoScalingGroupName: "worker-group-2-20220117194210164800000003", │ Cause: "At 2022-01-17T19:49:11Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.", │ Description: "Launching a new EC2 instance. Status Reason: 'k8s.io/cluster/self-test-eks-lU3GjdcB' is not a valid tag key. **Tag keys must match pattern ([0-9a-zA-Z\\\\-_+=,.@:]{1,255}), and must not be a reserved name ('.', '..', '_index')**. Launching EC2 instance failed.", │ Details: "{\"Subnet ID\":\"subnet-0f4c15ddb73b04b7a\",\"Availability Zone\":\"us-east-1a\"}", │ EndTime: 2022-01-17 19:49:12 +0000 UTC, │ Progress: 100, │ StartTime: 2022-01-17 19:49:12.43 +0000 UTC, │ StatusCode: "Failed", │ StatusMessage: "'k8s.io/cluster/self-test-eks-lU3GjdcB' is not a valid tag key. Tag keys must match pattern ([0-9a-zA-Z\\\\-_+=,.@:]{1,255}), and must not be a reserved name ('.', '..', '_index'). Launching EC2 instance failed." │ } │ │ with module.eks.module.self_managed_node_group["1"].aws_autoscaling_group.this[0], │ on .terraform/modules/eks/modules/self-managed-node-group/main.tf line 260, in resource "aws_autoscaling_group" "this": │ 260: resource "aws_autoscaling_group" "this" {

@kahirokunn
Copy link
Contributor

The EKS managed node groups can no longer be created either.
Is this probably a similar problem?

Error: error creating EKS Node Group (<eks_cluster_name>:<eks_node_group_name>): InvalidRequestException: 'k8s.io/cluster-autoscaler/enabled' is not a valid tag key. Tag keys must match pattern ([0-9a-zA-Z\-+=,.@:]{1,255}), and must not be a reserved name ('.', '..', 'index') { RespMetadata: { StatusCode: 400, RequestID: "" }, Message: "'k8s.io/cluster-autoscaler/enabled' is not a valid tag key. Tag keys must match pattern ([0-9a-zA-Z\\-+=,.@:]{1,255}), and must not be a reserved name ('.', '..', '_index')" }

@bryantbiggs
Copy link
Member

this seems to be a change on the AWS side since these examples were working without issue. is anyone able to file a ticket with AWS support to get their feedback?

@kahirokunn
Copy link
Contributor

By the way, it does not occur in 18.1.0, but in 18.2.0.

@grzegorzlisowski
Copy link

Yes, the problems seem to be related to the instance_metadata_tags = "enabled". Once disabled the problem disapears

@bryantbiggs
Copy link
Member

Hmm, interesting. Thanks for identifying that

@kauf0144
Copy link

I'm seeing the same issue. However, disabling instance_metadata_tags = "disabled" seemed to correct the issue for me as well.

@antonbabenko
Copy link
Member

This issue has been resolved in version 18.2.1 🎉

@bryantbiggs
Copy link
Member

ok, we've changed the default behavior to disabled - disappointing that AWS has different tag requirements but we will leave it up to users to manage for now

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
8 participants