Cluster Autoscaler does not start new nodes when Taints and NodeSelector are used in EKS #3802

dschunack · 2021-01-11T15:55:17Z

Hi,

we use EKS with kubernetes 1.18 and the Cluster Autoscaler. With kubernetes 1.17 the "beta.kubernetes.io/Instance-type" is deprecated. We use instead the new "node.kubernetes.io/instance-type" as NodeSelector. This is working for autoscaling groups without taints. For the autoscaling groups with taints is the new "node.kubernetes.io/instance-type" selector not working and the cluster autoscaler doesn't start new nodes. If we switch back to the old and deprecated "beta.kubernetes.io/instance-type" Selector the cluster autoscaler starts a new Node. We see this behavior on all of our EKS.

Events output for both Test PODs with beta and node.kubernetes.io as NodeSelector.
POD with node.kubernetes.io selector was started first.

% kubectl get pods
NAME                READY   STATUS    RESTARTS   AGE
test-4xlarge-beta   0/1     Pending   0          41s
test-4xlarge-node   0/1     Pending   0          72s

% kubectl describe pod test-4xlarge-node
Name:         test-4xlarge-node
Namespace:    default
Priority:     0
Node:         <none>
Labels:       <none>
Annotations:  kubernetes.io/psp: eks.privileged
Status:       Pending
IP:           
IPs:          <none>
Containers:
  test-4xlarge-node:
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-lzknk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-lzknk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node.kubernetes.io/instance-type=c5a.4xlarge
Tolerations:     disk=true:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age               From                Message
  ----     ------             ----              ----                -------
  Normal   NotTriggerScaleUp  88s               cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 12 node(s) didn't match node selector
  Warning  FailedScheduling   9s (x8 over 92s)  default-scheduler   0/35 nodes are available: 3 node(s) were unschedulable, 32 node(s) didn't match node selector.


% kubectl describe pod test-4xlarge-beta
Name:         test-4xlarge-beta
Namespace:    default
Priority:     0
Node:         <none>
Labels:       <none>
Annotations:  kubernetes.io/psp: eks.privileged
Status:       Pending
Containers:
  test-4xlarge-beta:     
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-lzknk:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-lzknk
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  beta.kubernetes.io/instance-type=c5a.4xlarge
Tolerations:     disk=true:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age               From                Message
  ----     ------            ----              ----                -------
  Normal   TriggeredScaleUp  47s               cluster-autoscaler  pod triggered scale-up: [{eks-agileci-cattle-disk-asg20201117110440315400000002 0->1 (max: 100)}]
  Warning  FailedScheduling  7s (x5 over 51s)  default-scheduler   0/35 nodes are available: 3 node(s) were unschedulable, 32 node(s) didn't match node selector.

Which component are you using?: cluster-autoscaler
What version of the component are you using?: cluster-autoscaler release v1.18.3
What k8s version are you using (kubectl version)?: 1.18.9

kubectl version
Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-12T01:09:16Z", GoVersion:"go1.15.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.9-eks-d1db3c", GitCommit:"d1db3c46e55f95d6a7d3e5578689371318f95ff9", GitTreeState:"clean", BuildDate:"2020-10-20T22:18:07Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"linux/amd64"}

What did you expect to happen?: Cluster-Autoscaler starts a new Nodes
What happened instead?: Cluster-Autoscaler doesn't start a new Nodes. See the following Error.

Events:
  Type     Reason             Age               From                Message
  ----     ------             ----              ----                -------
  Normal   NotTriggerScaleUp  88s               cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 12 node(s) didn't match node selector
  Warning  FailedScheduling   9s (x8 over 92s)  default-scheduler   0/35 nodes are available: 3 node(s) were unschedulable, 32 node(s) didn't match node selector.

How to reproduce it (as minimally and precisely as possible):

We use the following POD template to test the cluster-autoscaler.

Is Working:

apiVersion: v1
kind: Pod
metadata:
  name: test-4xlarge-beta
spec:
  restartPolicy: OnFailure
  containers:
  - name: test-4xlarge-beta
    image: radial/busyboxplus
    args:
    - "sh"
  tolerations:
  - key: "disk"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
  nodeSelector:
    beta.kubernetes.io/instance-type: c5a.4xlarge

Is not Working:

apiVersion: v1
kind: Pod
metadata:
  name: test-4xlarge-node
spec:
  restartPolicy: OnFailure
  containers:
  - name: test-4xlarge-node
    image: radial/busyboxplus
    args:
    - "sh"
  tolerations:
  - key: "disk"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"
  nodeSelector:
    node.kubernetes.io/instance-type: c5a.4xlarge

Taints and Tags are configured on the ASG and also in kubelet configuration.
See Screenshot

The text was updated successfully, but these errors were encountered:

umialpha · 2021-01-13T09:30:46Z

Hi, could you provide the labels on your node? I thought the label on your node may be "beta.kubernetes.io/instance-type"

dschunack · 2021-01-13T09:58:11Z

Since 1.17 both labels are present on the nodes (beta and node label).

kubectl get nodes --show-labels -l node.kubernetes.io/instance-type=c5a.4xlarge 
NAME                                             STATUS   ROLES    AGE    VERSION              LABELS
ip-10-194-24-148.eu-central-1.compute.internal   Ready    <none>   3h1m   v1.18.9-eks-d1db3c   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=c5a.4xlarge,beta.kubernetes.io/os=linux,cpu=true,failure-domain.beta.kubernetes.io/region=eu-central-1,failure-domain.beta.kubernetes.io/zone=eu-central-1a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-194-24-148.u0.ww.conti.de,kubernetes.io/os=linux,node.kubernetes.io/instance-type=c5a.4xlarge,topology.kubernetes.io/region=eu-central-1,topology.kubernetes.io/zone=eu-central-1a

umialpha · 2021-01-13T13:22:06Z

thanks for feedback, TBH I am not familiar with aws. Could you please check the tag "node.kubernetes.io/instance-type" is set in your ASG tags. From my perspective, if you scale from 0, when creating a node template for predicting, the node tags are copied from the asg.
BTW, from the logs you offered, it seems that the nodegroup is scaled from 0, right? If you choose to scale from 1(or more), it may perform correctly.

dschunack · 2021-01-13T14:13:22Z

The scaling with node.kubernetes.io/instance-type is working without taints and also a scale up from 0.
But, if you add a taint on the ASG the cluster autoscaler doesn't scale up and report an error.
If we switch back to the old beta.kubernetes.io/instance-type label it's working and also scale up from 0.
I think it's not a problem of the tags on the ASG. We don't set any tag like "beta.kubernetes.io" or "node.kubernetes.io", this is not needed.

dschunack · 2021-01-25T18:54:07Z

Any news?

pre · 2021-01-28T15:01:41Z

I'm having the same issue, cluster-autoscaler fails to start a new node when requesting an instance type which is not yet online.

For example, when cluster does not have a large instance type c5.24xlarge, cluster-autoscaler fails to start a new node with a pod launched with node selector node.kubernetes.io/instance-type: c5.24xlarge, even though we have this exact instance type defined in the managed node group available instance types.

Cluster autoscaler logs don't contain anything meaningful, pod has

pod didn't trigger scale-up: 2 node(s) didn't match Pod's node affinity

dschunack · 2021-02-12T15:20:52Z

Hi,

We tested it today again and it looks that the autoscaler is not working correct with "node.kubernetes.io/instance-type".
Sometimes it works, sometimes not.

Today we started a POD with nginx image to test the autoscaling. Only the NodeSelector is different.
"beta.kubernetes.io" is working and "node.kubernetes.io" not.
The POD with nodeSelector "node.kubernetes.io" was started first.

Autoscaler version: 1.18.4

Doesn't Work:

kubectl describe pod nginx-reg                                      
Name:         nginx-reg
Namespace:    default
Priority:     0
Node:         <none>
Labels:       env=test
Annotations:  kubernetes.io/psp: eks.privileged
Status:       Pending
IP:           
IPs:          <none>
Containers:
  nginx:
    Image:        nginx
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8bf7f (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-8bf7f:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8bf7f
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  node.kubernetes.io/instance-type=m5a.xlarge
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason             Age                  From                Message
  ----     ------             ----                 ----                -------
  Normal   NotTriggerScaleUp  4m39s                cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added): 12 node(s) didn't match node selector
  Warning  FailedScheduling   13s (x7 over 4m43s)  default-scheduler   0/42 nodes are available: 34 node(s) were unschedulable, 8 node(s) didn't match node selector.

Works:

kubectl describe pod nginx-reg2
Name:         nginx-reg2
Namespace:    default
Priority:     0
Node:         <none>
Labels:       env=test
Annotations:  kubernetes.io/psp: eks.privileged
Status:       Pending
IP:           
IPs:          <none>
Containers:
  nginx:
    Image:        nginx
    Port:         <none>
    Host Port:    <none>
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-8bf7f (ro)
Conditions:
  Type           Status
  PodScheduled   False 
Volumes:
  default-token-8bf7f:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-8bf7f
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  beta.kubernetes.io/instance-type=m5a.xlarge
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From                Message
  ----     ------            ----               ----                -------
  Warning  FailedScheduling  23s (x2 over 23s)  default-scheduler   0/42 nodes are available: 34 node(s) were unschedulable, 8 node(s) didn't match node selector.
  Normal   TriggeredScaleUp  12s                cluster-autoscaler  pod triggered scale-up: [{agileci-prod-pet-system-asg20210212094053622400000002 0->1 (max: 50)}]

umialpha · 2021-03-17T07:29:35Z

Hi, I think I found the root cause. When scaling from 0, aws_cloud_provide will generate nodeinfo from template(not real node). When generating, it forgot to add "node.kubernetes.io/instance-type" to the label. Check the code here aws_manager.go

dschunack · 2021-03-17T09:39:58Z

Hi, yes we have the same feeling that the autoscaler forget the "node.kubernetes.io" labels, but not immediately.
Some minutes after the shutdown of the last node in the ASG it's also working with "node.kubernetes.io" but not after some hours. A fix will maybe solve also the following issue: Scale up windows

dschunack · 2021-03-17T10:42:20Z

Hi again,

found some new code lines to support also ARM64 with the stable API #3848 and also for AZURE.

I think some of the stable APIs are missing in the AWS manager:

LableArchStable
LableOSStable

Hope it's possible to add all Stable APIs/Labels soon.

dschunack · 2021-03-30T10:12:00Z

Hi,

Some PRs added to integrate stable API and this is very nice, thanks.
Is it possible to add also the LabelArchStable and LabelOSStable for the aws_manager?
This is missing at the moment for the aws_manager.

alexmnyc · 2021-04-21T14:12:47Z

I had an issue with zero instance ASG's and nodeselector not targeting correct node labels #4010 also on EKS

lsowen · 2021-04-29T11:17:55Z

Hi, yes we have the same feeling that the autoscaler forget the "node.kubernetes.io" labels, but not immediately.
Some minutes after the shutdown of the last node in the ASG it's also working with "node.kubernetes.io" but not after some hours. A fix will maybe solve also the following issue: Scale up windows

I'm seeing something similar, but I'm not using any node.kubernetes.io labels. When cluster-autoscaler (v1.20.0) is first launched, it successfully scales up from zero when needed by creating template-node-for-... template nodes. For a while, it works without issue, scaling up and down (even to and from 0). However, within 24 hours it stops being able to find a match for any ASG which has been scaled down to zero. I see no more log entries for template-node-for-...., so I suspect the "actual definitions" of the ASG expire from a cache, and the logic for using the template node definition does not start back up. After this occurs, I start to see log messages like:

Pod <POD_NAME> can't be scheduled on <ASG_NAME>, predicate checking error: node(s) didn't match Pod's node affinity; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity; debugInfo=

Though this is the ASG which should scale up. Restarting the cluster-autoscaler "resolves" the issue (but is not a real solution, as this requires restarting the autoscaler every day at random times).

lsowen · 2021-05-23T19:51:35Z

I have continued to experience this issue. And have tracked down part of the issue.

In the loop where it is checking the nodeGroups, it looks for a cached definition in the nodeInfoCache:

autoscaler/cluster-autoscaler/core/utils/utils.go

Lines 103 to 110 in 79a43df

    
           if nodeInfoCache != nil { 
        
           	if nodeInfo, found := nodeInfoCache[id]; found { 
        
           		if nodeInfoCopy, err := deepCopyNodeInfo(nodeInfo); err == nil { 
        
           			result[id] = nodeInfoCopy 
        
           			continue 
        
           		} 
        
           	} 
        
           }

For the groups which do have issues, the results are being returned from that cache, and the nodeInfoCopy.node.ObjectMeta.Labels is missing the expected labels. So the node templates are not matching the required "NodeAffinity.Filter()" (https://github.com/kubernetes/kubernetes/blob/d8f9e4587ac1265efd723bce74ae6a39576f2d58/pkg/scheduler/framework/plugins/nodeaffinity/node_affinity.go#L115)

Labels from a "correct" group (which does autoscale up from 0):

                Labels: map[string]string [                                                                                                                    
                        "kubernetes.io/os": "linux",                                                                                                           
                        "kops.k8s.io/instancegroup": "workers-devstage-large-spot",                
                        "spotinstance": "yes",                                                                                                                 
                        "kubernetes.io/arch": "amd64",                         
                        "workergroup": "devstage", 
                        "topology.kubernetes.io/zone": "us-east-1a", 
                        "node-role.kubernetes.io/spot-worker": "true", 
                        "kubernetes.io/hostname": "template-node-for-workers-devstage-large-spot.cluster-01....+22 more", 
                        "node.kubernetes.io/instance-type": "r5.24xlarge", 
                        "beta.kubernetes.io/os": "linux", 
                        "beta.kubernetes.io/arch": "amd64", 
                        "nodetype": "worker", 
                        "failure-domain.beta.kubernetes.io/region": "us-east-1", 
                        "topology.kubernetes.io/region": "us-east-1", 
                        "beta.kubernetes.io/instance-type": "r5.24xlarge", 
                        "node-role.kubernetes.io/node": "", 
                        "failure-domain.beta.kubernetes.io/zone": "us-east-1a",  
                        "kubernetes.io/role": "node", 
                        "workersize": "large", 
                ],

Labels from an "incorrect" group (which does not autoscale up from 0 since it is missing the workersize and workergroup labels we use in our pod nodeSelector):

                Labels: map[string]string [                                                                                                                                                                                                                                                                                   
                        "topology.kubernetes.io/region": "us-east-1",                                                                                                                                                                                                                                                         
                        "node.kubernetes.io/instance-type": "c5.12xlarge",                                                                                                                                                                                                                                                    
                        "topology.kubernetes.io/zone": "us-east-1a",                                                                                                                                                                                                                                                          
                        "beta.kubernetes.io/instance-type": "c5.12xlarge",                                                                                                                                                                                                                                                    
                        "kubernetes.io/os": "linux",                                                                                                                                                                                                                                                                          
                        "beta.kubernetes.io/arch": "amd64",                                                                                                                                                                                                                                                                   
                        "beta.kubernetes.io/os": "linux",                                                                                                                                                                                                                                                                     
                        "kubernetes.io/arch": "amd64",                                                                                                                                                                                                                                                                        
                        "failure-domain.beta.kubernetes.io/region": "us-east-1",                                                                                                                                                                                                                                              
                        "kubernetes.io/hostname": "template-node-for-workers-dev-normal-spot.cluster-01.-2...+18 more",                                                                                                                                                                                              
                        "failure-domain.beta.kubernetes.io/zone": "us-east-1a",                                                                                                                                                                                                                                               
                ],

My guess is that the node is still "booting" when the info is cached, so not all labels have been added to the data which is permanently cached. Possibly IsNodeReadyAndSchedulable triggering too early?

autoscaler/cluster-autoscaler/core/utils/utils.go

Lines 80 to 94 in 79a43df

    
           for _, node := range nodes { 
        
           	// Broken nodes might have some stuff missing. Skipping. 
        
           	if !kube_util.IsNodeReadyAndSchedulable(node) { 
        
           		continue 
        
           	} 
        
           	added, id, typedErr := processNode(node) 
        
           	if typedErr != nil { 
        
           		return map[string]*schedulerframework.NodeInfo{}, typedErr 
        
           	} 
        
           	if added && nodeInfoCache != nil { 
        
           		if nodeInfoCopy, err := deepCopyNodeInfo(result[id]); err == nil { 
        
           			nodeInfoCache[id] = nodeInfoCopy 
        
           		} 
        
           	} 
        
           }

Restarting the cluster-autoscaler pod allows it to refresh all data from AWS, at which point the correct node groups are scaled up for the existing pending pods. Then, at some point in the next 24 or so hours one or more groups will stop scaling properly (which ones of our 10 or so groups stop failing seems to be random).

lsowen · 2021-06-02T21:05:25Z

I think I have confirmed that my hypothesis in #3802 (comment) is correct.

I've deployed a patched version with a workaround (not a fix), which has prevented the issue from re-occurring.

https://github.com/kubernetes/autoscaler/compare/cluster-autoscaler-1.21.0...lsowen:autoscaler-failure-workaround?expand=1

Basically, wait 5 minutes after the nodes is "ready" before caching the info about the node, which includes the labels. This prevents instance groups from being cached with missing labels.

As for a fix, I'm not sure the best way. A few options:

A configurable "timeout" similar to my workaround, to delay caching
A different way of determining IsNodeReadyAndSchedulable (not sure what way other than the current implementation:

autoscaler/cluster-autoscaler/utils/kubernetes/ready.go

Line 27 in 79a43df

func IsNodeReadyAndSchedulable(node *apiv1.Node) bool {

)
A change to kubelet to not set the NodeReady condition until after all node labels are registered.

Option 3 seems the most robust, but is definitely the most complicated. I don't even know where to begin. It might also be the root of my issue, because older versions of kubernetes (and thus older kubelet) didn't seem to trigger this issue.

dany74q · 2021-08-11T10:44:43Z

I've been experiencing similar symptoms to what's described here.

@lsowen - I think the race is a tad bit more specific - from what I see at least, it seems like
the cache is populated (and existing entries overridden) from the k8s api server on every autoscaling attempt;

I believe the flow is the following:

We take all relevant nodes from the k8s api server - this uses a ListWatcher behind the scenes, which watches for Node changes from the k8s api server, and also resyncs the entire node list every hour;
if the watch operation does not consistently fail - I believe that one would get a relatively up-to-date view of the nodes in the cluster with this on each invocation.
With the k8s supplied nodes at hand, we cache the node info of the first seen node for any cloud-provider node group on each iteration; different invocation might cache info from different nodes within the group, depending on the lister result;
this means that if you have several nodes within your group, but one of them is off-sync with its labels - it might corrupt the autoscaler view of the entire group.
After caching all node infos, we iterate on all node groups from the cloud provider - and then we use the previously populated cached view if such exists; I'd guess this is due to the autoscaler preferring the use of the real-world view of your nodes vs the template generated from the cloud provider, as they may be off-sync.

If the above is correct, what I believe needs to happen in order to trigger such a race condition is that the last time the autoscaler had seen a node from the k8s api server and cached its info - only then should the labels be off-sync, in order to corrupt the state for the entirety of the next runs.

If we're operating under the premise that all of your node group nodes eventually do consist of all required labels, which are added at runtime - then, as long as there are alive nodes, the autoscaler state should be eventually consistent and it should work well in one of the next cycles (b/c it does override the cache entries on each cycle);

When it could indeed break, I believe, is at times where the group scales from 1->0, and when this soon-to-be-terminated node has a partial label list - potentially because it's removing labels before termination, or if it's terminating before it's fully provisioned;
In that case, we would cache this partial view one last time before we no longer have any nodes for that given node group in the cluster - we then continue to use this corrupted view endlessly, as the cache entries aren't expiring.

Would you agree @lsowen ?

lsowen · 2021-08-11T13:10:32Z

@dany74q I agree that the issue arises when a node group is scaled down to 0 and cannot scale back up, caused by a corruption in the cache of labels that autoscaler is using.

However, at least in my case, the cache that autoscaler holds is populated by the first node in the group as it boots up, not as it is terminating. The issue is that not all labels are applied on the node before it is marked as "ready". If I apply a delay so that autoscaler doesn't see the newly booted node for a bit (in my case I arbitrarily used a 5 minute delay), then the issue goes away. I was having the issue multiple times a day, but with my (badly) patched version, I have not seen the issue once in over 2 months.

patched version: https://github.com/kubernetes/autoscaler/compare/cluster-autoscaler-1.21.0...lsowen:autoscaler-failure-workaround?expand=1

dany74q · 2021-08-11T13:49:09Z

@lsowen - Thanks ! I've seen the patch - the thing I don't fully understand about it though, is why the continuous overriding of the cache entries does not resolve this on its own after a period of time, if indeed the problematic cache entry is that initial one ?

GetNodeInfosForGroups is called on every scale attempt, and from the code it looks like the cache is always overridden with the latest k8s-supplied node object:
https://github.com/lsowen/autoscaler/blob/5f5e0be76c99504cd20b7019c7e3694cfc5ec79d/cluster-autoscaler/core/utils/utils.go#L96-L100

What I would've expected in your case then, is that once the Node had stabilized with all correct labels,
the cache entry would've eventually been overridden -
and if it wouldn't have been cached on its way down with a partial label list, a well-formed Node should've been returned from the cache, and not the first partial view.

Do you see a flow in the code in which that first invalid entry would've been cached - and newer entries never overriding it (in case it's still up in the next autoscale cycle) ?

lsowen · 2021-08-11T15:09:12Z

@dany74q I believe it is because added is only true once, when the node is initially not found in the cache: https://github.com/lsowen/autoscaler/blob/5f5e0be76c99504cd20b7019c7e3694cfc5ec79d/cluster-autoscaler/core/utils/utils.go#L64-L75

dany74q · 2021-08-11T15:10:47Z

@lsowen - I thought that might be the case, but the cache is not probed at that point at all, the result there is purely local to the function and is being re-calculated on every call, correct me if I'm wrong ofc.

Thanks !

thpang · 2021-10-27T14:38:59Z

Any activity on this one? Been open for a while and is an issue for folks who apply labels/taints to their node pools. Hoping to see some movement soon.

draeath · 2021-11-04T21:38:25Z

I don't apply taints or labels to my nodegroups and have run into this behavior with kubernetes 1.21 (via AWS EKS) and autoscaler 9.9.2 (which I believe is the right version for 1.21? this is itself still screwy, see #4054). I had to switch from kubernetes.io/arch to beta.kubernetes.io/arch (and likewise for /os) for it to scale up from 0 nodes.

I'm not sure if that's a separate issue given I am not applying any taints or labels. If it isn't a separate problem, it suggests this still is broken.

thpang · 2021-11-15T18:09:26Z

Has anyone been able to determine the root cause or a fix for this issue? We are currently having an issue where a customer using EKS does not see their nodes register correctly once they are scaled up from 0 (zero). Again, tanints and labels are used.

k8s-triage-robot · 2022-02-13T18:24:17Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

draeath · 2022-02-13T19:06:03Z

/remove-lifecycle stale

smrutiranjantripathy · 2022-03-04T07:29:49Z

Has anyone been able to determine the root cause or a fix for this issue? We are currently having an issue where a customer using EKS does not see their nodes register correctly once they are scaled up from 0 (zero). Again, tanints and labels are used.

We are able to get around this issue by using tags of labels as described here.

olahouze · 2022-05-11T08:29:32Z

Has anyone been able to determine the root cause or a fix for this issue? We are currently having an issue where a customer using EKS does not see their nodes register correctly once they are scaled up from 0 (zero). Again, tanints and labels are used.

We are able to get around this issue by using tags of labels as described here.

I'm not sure I understand all of the workarounds

I have POD with a node selector like this:

nodeSelectorTerms:
          - matchExpressions:
              - key: eks.amazonaws.com/nodegroup
                operator: In
                values:
                  - nodegroup-name

So I need to add the following tag to my AWS group Autoscaler:

"k8s.io/cluster-autoscaler/node-template/label/eks.amazonaws.com/nodegroup" = nodegroupe-name
or
"k8s.io/cluster-autoscaler/node-template/label/nodegroup" = nodegroupe-name

Best regards

dev-rowbot · 2022-05-12T10:59:06Z

@olahouze - to get this working I needing to add this tag to my AWS Autoscaling Group

k8s.io/cluster-autoscaler/node-template/label/nodegroup-type: stateless

Make sure that Tag new instances is ticked as well.

I then set the pod affinity to

affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
            - key: eks.amazonaws.com/capacityType
              operator: In
              values:
                - SPOT
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: nodegroup-type
              operator: In
              values:
                - stateless

The autoscaler picked up the change on the next cycle and scaled up the ASG from 0.

Hope this helps

olahouze · 2022-05-12T12:25:54Z

Hello

Thank you for the answer

With this information it forces me

To modify all my helms / pod definition to use a nodeaffinity on the label nodegroup=nodegroupe-name (and not eks.amazonaws.com/nodegroup = nodegroupe-name)
To add on my autoscaller-groups and my instances of manual labels : nodegroup=nodegroupe-name

The advantage of using eks.amazonaws.com/nodegroup in nodeaffinity is that AWS realizes all alone the addition of this label...

Other people have already tested successfully the use of "k8s.io/cluster-autoscaler/node-template/label/eks.amazonaws.com/nodegroup" = nodegroupe-name on group autoscaler?

Sincerely

dev-rowbot · 2022-05-12T12:59:57Z

@olahouze - I agree with your thinking, I was also going to update all my helm charts. One point that I missed is that I also have a label in m eksctl nodegroup that matches the tag I just added. I suspect that cluster autoscaler will need the tag and the Scheduler will need the label:

  - name: ng-2-stateless-spot-1a
    spot: true
    tags:
      k8s.io/cluster-autoscaler/node-template/label/nodegroup-type: stateless
    labels:
      nodegroup-type: stateless
      instance-type: spot

There is also an advanced eksctl cluster example here which uses the cluster-autoscaler tags on nodegroups

pkit · 2022-07-14T13:17:59Z

I'm not sure why it's never mentioned here but the whole thing seems to be fixed by #5002
And yes, it was not fixed before that.

scravy · 2022-08-21T10:31:51Z

@pkit How does #5002 fix anything? I am running a cluster autoscaler with auto config enabled and I am experiencing exactly the issues described here in this ticket. #5002 does not give a workaround which fixes this, does not patch this, does not reference a pull request... ?

dschunack · 2022-08-26T15:31:44Z

Hi,

we fixed our issues as follows and the cluster autoscaler is now able to start new Instances based on node selectors.
In our use case, we used self managed ASG instead of Node Groups. That give us more flexibility to manage our Nodes.

We set the following tags on the ASGs. In this case with a Taint.

Tags	Value	Tag new instances
k8s.io/cluster-autoscaler/enabled	true	Yes
k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/arch	amd64	Yes
k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/os	linux	Yes
k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/instance-type	g4dn.2xlarge	Yes
k8s.io/cluster-autoscaler/node-template/label/node.kubernetes.io/lifecycle	on-demand	Yes
k8s.io/cluster-autoscaler/node-template/label/topology.ebs.csi.aws.com/zone	eu-central-1b	Yes
k8s.io/cluster-autoscaler/node-template/label/topology.kubernetes.io/zone	eu-central-1b	Yes
k8s.io/cluster-autoscaler/node-template/taint/gpu	true:NoSchedule	Yes
kubernetes.io/cluster/eks-XXXXXXXXXXXXXX	owned	Yes

k8s-triage-robot · 2022-11-24T16:13:02Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2023-01-19T16:02:19Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2023-02-18T16:45:42Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2023-02-18T16:45:47Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dschunack added the kind/bug Categorizes issue or PR as related to a bug. label Jan 11, 2021

dschunack mentioned this issue Mar 24, 2021

Scale up windows on AWS EKS cluster #3133

Closed

This was referenced Mar 26, 2021

Add label "v1.LabelInstanceTypeStable" to func:BuildGenericLabels of cloudproviders. #3976

Closed

Add v1.LabelInstanceTypeStable to labels when building from template #3977

Closed

tsunamishaun mentioned this issue Jul 15, 2021

predicate checking error: node(s) didn't match Pod's node affinity; predicateName=NodeAffinity; reasons: node(s) didn't match Pod's node affinity; debugInfo= #4052

Closed

dany74q mentioned this issue Aug 12, 2021

Added node readiness grace time & node-info cache expiration #4258

Closed

jbartosik added the area/cluster-autoscaler label Sep 15, 2021

pierluigilenoci mentioned this issue Nov 16, 2021

[EKS Add-On] [CoreDNS]: Patched Add-On never recovers from 'Degraded' State aws/containers-roadmap#1389

Open

adamnovak mentioned this issue Nov 30, 2021

Cluster Autoscaler does not interpret labels specified with k8s.io/cluster-autoscaler/node-template/label/* tags on an AWS ASG unless those tags are set to propagate to the instances #4490

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 13, 2022

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 13, 2022

smalltown mentioned this issue Aug 25, 2022

FEATURE: Add variable to support extra aws resource tag for asg getamis/vishwakarma#150

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 24, 2022

qianlei90 mentioned this issue Dec 21, 2022

Scaling from 0 doesn't work for GPU nodes. Reasons: Insufficient nvidia.com/gpu #5278

Closed

paleozogt mentioned this issue Jan 18, 2023

Fix/asg resource tags #5214

Merged

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 19, 2023

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 18, 2023

andrewwdye mentioned this issue Apr 29, 2023

Cannot scale up nodegroup after adding an ASG tag node-template label #5718

Closed

Cluster Autoscaler does not start new nodes when Taints and NodeSelector are used in EKS #3802

Cluster Autoscaler does not start new nodes when Taints and NodeSelector are used in EKS #3802

Comments

dschunack commented Jan 11, 2021 • edited Loading

umialpha commented Jan 13, 2021

dschunack commented Jan 13, 2021 • edited Loading

umialpha commented Jan 13, 2021 • edited Loading

dschunack commented Jan 13, 2021 • edited Loading

dschunack commented Jan 25, 2021

pre commented Jan 28, 2021 • edited Loading

dschunack commented Feb 12, 2021 • edited Loading

umialpha commented Mar 17, 2021 • edited Loading

dschunack commented Mar 17, 2021 • edited Loading

dschunack commented Mar 17, 2021 • edited Loading

dschunack commented Mar 30, 2021

alexmnyc commented Apr 21, 2021

lsowen commented Apr 29, 2021

lsowen commented May 23, 2021

lsowen commented Jun 2, 2021

dany74q commented Aug 11, 2021

lsowen commented Aug 11, 2021

dany74q commented Aug 11, 2021

lsowen commented Aug 11, 2021

dany74q commented Aug 11, 2021

thpang commented Oct 27, 2021

draeath commented Nov 4, 2021

thpang commented Nov 15, 2021

k8s-triage-robot commented Feb 13, 2022

draeath commented Feb 13, 2022

smrutiranjantripathy commented Mar 4, 2022

olahouze commented May 11, 2022

dev-rowbot commented May 12, 2022

olahouze commented May 12, 2022

dev-rowbot commented May 12, 2022

pkit commented Jul 14, 2022

scravy commented Aug 21, 2022

dschunack commented Aug 26, 2022

k8s-triage-robot commented Nov 24, 2022

k8s-triage-robot commented Jan 19, 2023

k8s-triage-robot commented Feb 18, 2023

k8s-ci-robot commented Feb 18, 2023

dschunack commented Jan 11, 2021 •

edited

Loading

dschunack commented Jan 13, 2021 •

edited

Loading

umialpha commented Jan 13, 2021 •

edited

Loading

dschunack commented Jan 13, 2021 •

edited

Loading

pre commented Jan 28, 2021 •

edited

Loading

dschunack commented Feb 12, 2021 •

edited

Loading

umialpha commented Mar 17, 2021 •

edited

Loading

dschunack commented Mar 17, 2021 •

edited

Loading

dschunack commented Mar 17, 2021 •

edited

Loading