Cluster Autoscaler does not interpret labels specified with `k8s.io/cluster-autoscaler/node-template/label/*` tags on an AWS ASG unless those tags are set to propagate to the instances #4490

adamnovak · 2021-11-30T19:14:12Z

Which component are you using?:

cluster-autoscaler

What version of the component are you using?:

Component version: v1.17.3

What k8s version are you using (kubectl version)?:

kubectl version Output

$ kubectl version
...
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0", GitCommit:"e19964183377d0ec2052d1f1fa930c4d7575bd50", GitTreeState:"clean", BuildDate:"2020-08-26T14:23:04Z", GoVersion:"go1.15", Compiler:"gc", Platform:"linux/amd64"}

What environment is this in?:

We're deploying nodes on Amazon AWS with Autoscaling Groups, using the cluster autoscaler's ability to automatically pick up ASGs tagged weith certain tags

What did you expect to happen?:

I expected the cluster autoscaler to read the tags of the ASG to determine what the tags on nodes that the ASG produces will be, when scaling from 0. I don't expect the value of the "Tag new instances" toggle on the tag to matter here.

In particular, I expect that if I tag an ASG with k8s.io/cluster-autoscaler/node-template/label/eks.amazonaws.com/capacityType with value SPOT, and don't set the tag to propagate to instances, then the cluster autoscaler will make a hypothetical node that will match a nodeSelector of eks.amazonaws.com/capacityType: SPOT.

(Note that I'm not using EKS here, just the label values they define, since Kubernetes itself has no standard for labeling or tainting preemptible nodes.)

What happened instead?:

When the labeling tag was set to not propagate to nodes, I got log messages like:

I1130 18:29:45.793842       1 pod_schedulable.go:165] Pod adamnovak-spot-pi-qsd9c can't be scheduled on cg-kubernetes-r5ad.8xlarge-spot, predicate failed: GeneralPredicates predicate mismatch, reason: node(s) didn't match node selector,

When I changed the tag to propagate to new instances, then I got a different error (because I'd misspelled my ephemeral storage limit tag):

When I fixed that tag, then the autoscaler started provisioning my node.

How to reproduce it (as minimally and precisely as possible):

Set up a working autoscaling group with the cluster autoscaler.
Scale it down to 0.
Give it a tag that starts with k8s.io/cluster-autoscaler/node-template/label/, and specifies a unique label, but set it not to propagate to nodes. (Also, optionally configure the node to really have that label when it comes up.)
Launch a pod that has a nodeSelector to match that label.
Note that the autoscaler won't try to scale up the ASG, because it thinks the node won't have the label that the tag says it will have.
Check the tag new instances checkbox on the tag on the ASG.
Wait for the autoscaler to reread the ASGs
Observe the autoscaler trying to scale up the ASG to run the pod.

Anything else we need to know?:

I suspect taints and other stuff inferred from tags also work this way.

This might be a possible cause of people reporting they are affected by #4010 and #3802, even though the screenshots I've seen there indicate that the tag new instances flags are set by the main reporters.

The text was updated successfully, but these errors were encountered:

acalm · 2022-02-18T13:44:33Z

I think this was "kind of" documented in an example before, it was later replaced with some recommendations and moved to the FAQ. Not sure if that means that the behavior was fixed between those commits or if it's just unfortunate that the example got replaced/moved.

k8s-triage-robot · 2022-05-22T08:22:43Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2022-06-21T08:24:35Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-07-21T08:53:04Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-07-21T08:53:21Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

EHJ-52n · 2022-08-23T07:14:24Z

/remove-lifecycle rotten

k8s-ci-robot · 2022-08-23T07:15:19Z

@EHJ-52n: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

EHJ-52n · 2022-08-23T07:15:59Z

@adamnovak Is this issue solved for you?

adamnovak · 2022-08-23T14:43:27Z

I've been employing the workaround of always setting the tags to propagate, and I'm not likely to find time to try and reproduce this again on our live system any time soon.

As for documenting that setting the tags to propagate is necessary, it looks like @acalm found where in the docs that would belong, so I think you could look there in the current mainline to see if it has been documented yet.

adamnovak added the kind/bug Categorizes issue or PR as related to a bug. label Nov 30, 2021

jbartosik added the area/cluster-autoscaler label Dec 2, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 22, 2022

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 21, 2022

k8s-ci-robot closed this as completed Jul 21, 2022

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Autoscaler does not interpret labels specified with `k8s.io/cluster-autoscaler/node-template/label/*` tags on an AWS ASG unless those tags are set to propagate to the instances #4490

Cluster Autoscaler does not interpret labels specified with `k8s.io/cluster-autoscaler/node-template/label/*` tags on an AWS ASG unless those tags are set to propagate to the instances #4490

adamnovak commented Nov 30, 2021

acalm commented Feb 18, 2022 •

edited

Loading

k8s-triage-robot commented May 22, 2022

k8s-triage-robot commented Jun 21, 2022

k8s-triage-robot commented Jul 21, 2022

k8s-ci-robot commented Jul 21, 2022

EHJ-52n commented Aug 23, 2022 •

edited

Loading

k8s-ci-robot commented Aug 23, 2022

EHJ-52n commented Aug 23, 2022

adamnovak commented Aug 23, 2022

Cluster Autoscaler does not interpret labels specified with k8s.io/cluster-autoscaler/node-template/label/* tags on an AWS ASG unless those tags are set to propagate to the instances #4490

Cluster Autoscaler does not interpret labels specified with k8s.io/cluster-autoscaler/node-template/label/* tags on an AWS ASG unless those tags are set to propagate to the instances #4490

Comments

adamnovak commented Nov 30, 2021

acalm commented Feb 18, 2022 • edited Loading

k8s-triage-robot commented May 22, 2022

k8s-triage-robot commented Jun 21, 2022

k8s-triage-robot commented Jul 21, 2022

k8s-ci-robot commented Jul 21, 2022

EHJ-52n commented Aug 23, 2022 • edited Loading

k8s-ci-robot commented Aug 23, 2022

EHJ-52n commented Aug 23, 2022

adamnovak commented Aug 23, 2022

Cluster Autoscaler does not interpret labels specified with `k8s.io/cluster-autoscaler/node-template/label/*` tags on an AWS ASG unless those tags are set to propagate to the instances #4490

Cluster Autoscaler does not interpret labels specified with `k8s.io/cluster-autoscaler/node-template/label/*` tags on an AWS ASG unless those tags are set to propagate to the instances #4490

acalm commented Feb 18, 2022 •

edited

Loading

EHJ-52n commented Aug 23, 2022 •

edited

Loading