AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/clustername' is not a valid tag key. #1785

brenwhyte · 2022-01-17T12:48:32Z

Description

Testing the Complete and the IRSA examples I get an error that the AutoScalingGroup can't create instances:

Versions

❯ terraform version
Terraform v1.1.3
on linux_amd64

provider registry.terraform.io/hashicorp/aws v3.72.0
provider registry.terraform.io/hashicorp/cloudinit v2.2.0
provider registry.terraform.io/hashicorp/helm v2.4.1
provider registry.terraform.io/hashicorp/null v3.1.0
provider registry.terraform.io/hashicorp/tls v3.1.0

Reproduction

Clone repo and test the examples.

Code Snippet to Reproduce

Expected behavior

Auto Scaling group creates instances

Actual behavior

Terminal Output Screenshot(s)

module.eks.module.self_managed_node_group["spot"].aws_autoscaling_group.this[0]: Still destroying... [id=spot-20220117114248850100000023, 9m50s elapsed]
module.eks.module.self_managed_node_group["refresh"].aws_autoscaling_group.this[0]: Still creating... [9m50s elapsed]
module.eks.module.self_managed_node_group["spot"].aws_autoscaling_group.this[0]: Still destroying... [id=spot-20220117114248850100000023, 10m0s elapsed]
module.eks.module.self_managed_node_group["refresh"].aws_autoscaling_group.this[0]: Still creating... [10m0s elapsed]
╷
│ Error: "refresh-20220117122547139700000001": Waiting up to 10m0s: Need at least 1 healthy instances in ASG, have 0. Most recent activity: {
│   ActivityId: "a8b5f9a0-2397-c9ec-a78e-25f731576957",
│   AutoScalingGroupARN: "arn:aws:autoscaling:eu-west-1:*snip*:autoScalingGroup:a6153771-becc-4c1c-89da-44d7e9044423:autoScalingGroupName/refresh-20220117122547139700000001",
│   AutoScalingGroupName: "refresh-20220117122547139700000001",
│   Cause: "At 2022-01-17T12:35:19Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.",
│   Description: "Launching a new EC2 instance.  Status Reason: 'kubernetes.io/cluster/exirsatest' is not a valid tag key. Tag keys must match pattern ([0-9a-zA-Z\\\\-_+=,.@:]{1,255}), and must not be a reserved name ('.', '..', '_index'). Launching EC2 instance failed.",
│   Details: "{\"Subnet ID\":\"subnet-006ccbc3820532b53\",\"Availability Zone\":\"eu-west-1c\"}",
│   EndTime: 2022-01-17 12:35:20 +0000 UTC,
│   Progress: 100,
│   StartTime: 2022-01-17 12:35:20.69 +0000 UTC,
│   StatusCode: "Failed",
│   StatusMessage: "'kubernetes.io/cluster/exirsatest' is not a valid tag key. Tag keys must match pattern ([0-9a-zA-Z\\\\-_+=,.@:]{1,255}), and must not be a reserved name ('.', '..', '_index'). Launching EC2 instance failed."
│ }
│ 
│   with module.eks.module.self_managed_node_group["refresh"].aws_autoscaling_group.this[0],
│   on ../../modules/self-managed-node-group/main.tf line 260, in resource "aws_autoscaling_group" "this":
│  260: resource "aws_autoscaling_group" "this" {
│ 
╵

Additional context

You can see I tried renaming the folder to exirsatest and that didn't help.

By removing the keys below keys unblocked the ASG but is not a fix of course.

kubernetes.io/cluster/exirsatest
k8s.io/cluster/ex-irsatest

The text was updated successfully, but these errors were encountered:

daroga0002 · 2022-01-17T12:58:44Z

please paste a config of module which you using

brenwhyte · 2022-01-17T13:11:26Z

❯ git clone https://github.com/terraform-aws-modules/terraform-aws-eks.git
Cloning into 'terraform-aws-eks'...
remote: Enumerating objects: 4176, done.
remote: Counting objects: 100% (843/843), done.
remote: Compressing objects: 100% (457/457), done.
remote: Total 4176 (delta 508), reused 620 (delta 381), pack-reused 3333
Receiving objects: 100% (4176/4176), 1.42 MiB | 535.00 KiB/s, done.
Resolving deltas: 100% (2716/2716), done.
❯ cd terraform-aws-eks/examples/irsa_autoscale_refresh
❯ terraform init
Initializing modules...
Downloading registry.terraform.io/terraform-aws-modules/iam/aws 4.9.0 for aws_node_termination_handler_role...
- aws_node_termination_handler_role in .terraform/modules/aws_node_termination_handler_role/modules/iam-assumable-role-with-oidc
Downloading registry.terraform.io/terraform-aws-modules/sqs/aws 3.2.1 for aws_node_termination_handler_sqs...
- aws_node_termination_handler_sqs in .terraform/modules/aws_node_termination_handler_sqs
- eks in ../..
- eks.eks_managed_node_group in ../../modules/eks-managed-node-group
- eks.eks_managed_node_group.user_data in ../../modules/_user_data
- eks.fargate_profile in ../../modules/fargate-profile
- eks.self_managed_node_group in ../../modules/self-managed-node-group
- eks.self_managed_node_group.user_data in ../../modules/_user_data
Downloading registry.terraform.io/terraform-aws-modules/iam/aws 4.9.0 for iam_assumable_role_cluster_autoscaler...
- iam_assumable_role_cluster_autoscaler in .terraform/modules/iam_assumable_role_cluster_autoscaler/modules/iam-assumable-role-with-oidc
Downloading registry.terraform.io/terraform-aws-modules/vpc/aws 3.11.3 for vpc...
- vpc in .terraform/modules/vpc

Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/helm versions matching ">= 2.0.0"...
- Finding hashicorp/aws versions matching ">= 2.23.0, >= 3.63.0, >= 3.72.0"...
- Finding hashicorp/null versions matching ">= 3.0.0"...
- Finding hashicorp/tls versions matching ">= 2.2.0"...
- Finding hashicorp/cloudinit versions matching ">= 2.0.0"...
- Installing hashicorp/helm v2.4.1...
- Installed hashicorp/helm v2.4.1 (signed by HashiCorp)
- Installing hashicorp/aws v3.72.0...
- Installed hashicorp/aws v3.72.0 (signed by HashiCorp)
- Installing hashicorp/null v3.1.0...
- Installed hashicorp/null v3.1.0 (signed by HashiCorp)
- Installing hashicorp/tls v3.1.0...
- Installed hashicorp/tls v3.1.0 (signed by HashiCorp)
- Installing hashicorp/cloudinit v2.2.0...
- Installed hashicorp/cloudinit v2.2.0 (signed by HashiCorp)

Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
❯ terraform apply --auto-approve

and it should timeout/fail when creating the ASGs

brenwhyte · 2022-01-17T13:27:33Z

All three Auto Scaling groups have the issue from the example:

bryantbiggs · 2022-01-17T15:31:27Z

it looks fine per awsdocs/amazon-eks-user-guide#38 (comment) but the regex doesn't have forward slashes

@brenwhyte are you able to file a ticket with AWS to get their input?

brenwhyte · 2022-01-17T15:39:07Z

I can, I'll get on that in 2 secs. I noticed on an earlier cluster that tags are the same, so this did work previously.

Lets see what AWS Support say.

doker78 · 2022-01-17T20:17:50Z

Has same errors with 'self-managed-node-group' example:

AutoScalingGroupName: "worker-group-2-20220117194210164800000003", │ Cause: "At 2022-01-17T19:49:11Z an instance was started in response to a difference between desired and actual capacity, increasing the capacity from 0 to 1.", │ Description: "Launching a new EC2 instance. Status Reason: 'k8s.io/cluster/self-test-eks-lU3GjdcB' is not a valid tag key. **Tag keys must match pattern ([0-9a-zA-Z\\\\-_+=,.@:]{1,255}), and must not be a reserved name ('.', '..', '_index')**. Launching EC2 instance failed.", │ Details: "{\"Subnet ID\":\"subnet-0f4c15ddb73b04b7a\",\"Availability Zone\":\"us-east-1a\"}", │ EndTime: 2022-01-17 19:49:12 +0000 UTC, │ Progress: 100, │ StartTime: 2022-01-17 19:49:12.43 +0000 UTC, │ StatusCode: "Failed", │ StatusMessage: "'k8s.io/cluster/self-test-eks-lU3GjdcB' is not a valid tag key. Tag keys must match pattern ([0-9a-zA-Z\\\\-_+=,.@:]{1,255}), and must not be a reserved name ('.', '..', '_index'). Launching EC2 instance failed." │ } │ │ with module.eks.module.self_managed_node_group["1"].aws_autoscaling_group.this[0], │ on .terraform/modules/eks/modules/self-managed-node-group/main.tf line 260, in resource "aws_autoscaling_group" "this": │ 260: resource "aws_autoscaling_group" "this" {

kahirokunn · 2022-01-18T00:55:10Z

The EKS managed node groups can no longer be created either.
Is this probably a similar problem?

Error: error creating EKS Node Group (<eks_cluster_name>:<eks_node_group_name>): InvalidRequestException: 'k8s.io/cluster-autoscaler/enabled' is not a valid tag key. Tag keys must match pattern ([0-9a-zA-Z\-+=,.@:]{1,255}), and must not be a reserved name ('.', '..', 'index') { RespMetadata: { StatusCode: 400, RequestID: "" }, Message: "'k8s.io/cluster-autoscaler/enabled' is not a valid tag key. Tag keys must match pattern ([0-9a-zA-Z\\-+=,.@:]{1,255}), and must not be a reserved name ('.', '..', '_index')" }

bryantbiggs · 2022-01-18T01:10:32Z

this seems to be a change on the AWS side since these examples were working without issue. is anyone able to file a ticket with AWS support to get their feedback?

kahirokunn · 2022-01-18T01:14:08Z

By the way, it does not occur in 18.1.0, but in 18.2.0.

grzegorzlisowski · 2022-01-18T01:26:06Z

Yes, the problems seem to be related to the instance_metadata_tags = "enabled". Once disabled the problem disapears

bryantbiggs · 2022-01-18T01:39:21Z

Hmm, interesting. Thanks for identifying that

kauf0144 · 2022-01-18T02:23:52Z

I'm seeing the same issue. However, disabling instance_metadata_tags = "disabled" seemed to correct the issue for me as well.

antonbabenko · 2022-01-18T14:03:57Z

This issue has been resolved in version 18.2.1 🎉

bryantbiggs · 2022-01-18T14:06:04Z

ok, we've changed the default behavior to disabled - disappointing that AWS has different tag requirements but we will leave it up to users to manage for now

github-actions · 2022-11-15T02:28:12Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

brenwhyte changed the title ~~AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/exirsatest' is not a valid tag key.~~ AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/<clustername>' is not a valid tag key. Jan 17, 2022

brenwhyte changed the title ~~AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/<clustername>' is not a valid tag key.~~ AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/clustername' is not a valid tag key. Jan 17, 2022

bryantbiggs added bug labels Jan 17, 2022

bryantbiggs removed bug labels Jan 17, 2022

bryantbiggs added the bug label Jan 18, 2022

bryantbiggs mentioned this issue Jan 18, 2022

fix: Change instance_metadata_tags to default to null/disabled due to tag key pattern conflict #1788

Merged

1 task

antonbabenko closed this as completed in #1788 Jan 18, 2022

spring1843 mentioned this issue Mar 17, 2022

Invalid tag key 'karpenter.sh/provisioner-name' aws/karpenter-provider-aws#1527

Closed

github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/clustername' is not a valid tag key. #1785

AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/clustername' is not a valid tag key. #1785

brenwhyte commented Jan 17, 2022 •

edited

Loading

daroga0002 commented Jan 17, 2022

brenwhyte commented Jan 17, 2022

brenwhyte commented Jan 17, 2022

bryantbiggs commented Jan 17, 2022

brenwhyte commented Jan 17, 2022

doker78 commented Jan 17, 2022

kahirokunn commented Jan 18, 2022

bryantbiggs commented Jan 18, 2022

kahirokunn commented Jan 18, 2022

grzegorzlisowski commented Jan 18, 2022

bryantbiggs commented Jan 18, 2022

kauf0144 commented Jan 18, 2022

antonbabenko commented Jan 18, 2022

bryantbiggs commented Jan 18, 2022

github-actions bot commented Nov 15, 2022

AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/clustername' is not a valid tag key. #1785

AutoScalingGroup timeout: Status Reason: 'kubernetes.io/cluster/clustername' is not a valid tag key. #1785

Comments

brenwhyte commented Jan 17, 2022 • edited Loading

Description

Versions

Reproduction

Code Snippet to Reproduce

Expected behavior

Actual behavior

Terminal Output Screenshot(s)

Additional context

daroga0002 commented Jan 17, 2022

brenwhyte commented Jan 17, 2022

brenwhyte commented Jan 17, 2022

bryantbiggs commented Jan 17, 2022

brenwhyte commented Jan 17, 2022

doker78 commented Jan 17, 2022

kahirokunn commented Jan 18, 2022

bryantbiggs commented Jan 18, 2022

kahirokunn commented Jan 18, 2022

grzegorzlisowski commented Jan 18, 2022

bryantbiggs commented Jan 18, 2022

kauf0144 commented Jan 18, 2022

antonbabenko commented Jan 18, 2022

bryantbiggs commented Jan 18, 2022

github-actions bot commented Nov 15, 2022

brenwhyte commented Jan 17, 2022 •

edited

Loading