-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Autoscaler does not start new nodes when Taints and NodeSelector are used in EKS #3802
Comments
Hi, could you provide the labels on your node? I thought the label on your node may be "beta.kubernetes.io/instance-type" |
Since 1.17 both labels are present on the nodes (beta and node label).
|
thanks for feedback, TBH I am not familiar with aws. Could you please check the tag "node.kubernetes.io/instance-type" is set in your ASG tags. From my perspective, if you scale from 0, when creating a node template for predicting, the node tags are copied from the asg. |
The scaling with node.kubernetes.io/instance-type is working without taints and also a scale up from 0. |
Any news? |
I'm having the same issue, cluster-autoscaler fails to start a new node when requesting an instance type which is not yet online. For example, when cluster does not have a large instance type Cluster autoscaler logs don't contain anything meaningful, pod has
|
Hi, We tested it today again and it looks that the autoscaler is not working correct with "node.kubernetes.io/instance-type". Today we started a POD with nginx image to test the autoscaling. Only the NodeSelector is different. Autoscaler version: 1.18.4 Doesn't Work:
Works:
|
Hi, I think I found the root cause. When scaling from 0, aws_cloud_provide will generate nodeinfo from template(not real node). When generating, it forgot to add "node.kubernetes.io/instance-type" to the label. Check the code here aws_manager.go |
Hi, yes we have the same feeling that the autoscaler forget the "node.kubernetes.io" labels, but not immediately. |
Hi, Some PRs added to integrate stable API and this is very nice, thanks. |
I had an issue with zero instance ASG's and nodeselector not targeting correct node labels #4010 also on EKS |
I'm seeing something similar, but I'm not using any
Though this is the ASG which should scale up. Restarting the cluster-autoscaler "resolves" the issue (but is not a real solution, as this requires restarting the autoscaler every day at random times). |
I have continued to experience this issue. And have tracked down part of the issue. In the loop where it is checking the nodeGroups, it looks for a cached definition in the autoscaler/cluster-autoscaler/core/utils/utils.go Lines 103 to 110 in 79a43df
For the groups which do have issues, the results are being returned from that cache, and the Labels from a "correct" group (which does autoscale up from 0):
Labels from an "incorrect" group (which does not autoscale up from 0 since it is missing the
My guess is that the node is still "booting" when the info is cached, so not all labels have been added to the data which is permanently cached. Possibly autoscaler/cluster-autoscaler/core/utils/utils.go Lines 80 to 94 in 79a43df
Restarting the cluster-autoscaler pod allows it to refresh all data from AWS, at which point the correct node groups are scaled up for the existing pending pods. Then, at some point in the next 24 or so hours one or more groups will stop scaling properly (which ones of our 10 or so groups stop failing seems to be random). |
I think I have confirmed that my hypothesis in #3802 (comment) is correct. I've deployed a patched version with a workaround (not a fix), which has prevented the issue from re-occurring. Basically, wait 5 minutes after the nodes is "ready" before caching the info about the node, which includes the labels. This prevents instance groups from being cached with missing labels. As for a fix, I'm not sure the best way. A few options:
Option 3 seems the most robust, but is definitely the most complicated. I don't even know where to begin. It might also be the root of my issue, because older versions of kubernetes (and thus older kubelet) didn't seem to trigger this issue. |
I've been experiencing similar symptoms to what's described here. @lsowen - I think the race is a tad bit more specific - from what I see at least, it seems like I believe the flow is the following:
If the above is correct, what I believe needs to happen in order to trigger such a race condition is that the last time the autoscaler had seen a node from the k8s api server and cached its info - only then should the labels be off-sync, in order to corrupt the state for the entirety of the next runs. If we're operating under the premise that all of your node group nodes eventually do consist of all required labels, which are added at runtime - then, as long as there are alive nodes, the autoscaler state should be eventually consistent and it should work well in one of the next cycles (b/c it does override the cache entries on each cycle); When it could indeed break, I believe, is at times where the group scales from 1->0, and when this soon-to-be-terminated node has a partial label list - potentially because it's removing labels before termination, or if it's terminating before it's fully provisioned; Would you agree @lsowen ? |
@dany74q I agree that the issue arises when a node group is scaled down to 0 and cannot scale back up, caused by a corruption in the cache of labels that autoscaler is using. However, at least in my case, the cache that autoscaler holds is populated by the first node in the group as it boots up, not as it is terminating. The issue is that not all labels are applied on the node before it is marked as "ready". If I apply a delay so that autoscaler doesn't see the newly booted node for a bit (in my case I arbitrarily used a 5 minute delay), then the issue goes away. I was having the issue multiple times a day, but with my (badly) patched version, I have not seen the issue once in over 2 months. patched version: https://github.com/kubernetes/autoscaler/compare/cluster-autoscaler-1.21.0...lsowen:autoscaler-failure-workaround?expand=1 |
@lsowen - Thanks ! I've seen the patch - the thing I don't fully understand about it though, is why the continuous overriding of the cache entries does not resolve this on its own after a period of time, if indeed the problematic cache entry is that initial one ?
What I would've expected in your case then, is that once the Node had stabilized with all correct labels, Do you see a flow in the code in which that first invalid entry would've been cached - and newer entries never overriding it (in case it's still up in the next autoscale cycle) ? |
@dany74q I believe it is because |
@lsowen - I thought that might be the case, but the cache is not probed at that point at all, the Thanks ! |
Any activity on this one? Been open for a while and is an issue for folks who apply labels/taints to their node pools. Hoping to see some movement soon. |
I don't apply taints or labels to my nodegroups and have run into this behavior with kubernetes 1.21 (via AWS EKS) and autoscaler 9.9.2 (which I believe is the right version for 1.21? this is itself still screwy, see #4054). I had to switch from I'm not sure if that's a separate issue given I am not applying any taints or labels. If it isn't a separate problem, it suggests this still is broken. |
Has anyone been able to determine the root cause or a fix for this issue? We are currently having an issue where a customer using EKS does not see their nodes register correctly once they are scaled up from 0 (zero). Again, tanints and labels are used. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
We are able to get around this issue by using tags of labels as described here. |
I'm not sure I understand all of the workarounds I have POD with a node selector like this: nodeSelectorTerms:
- matchExpressions:
- key: eks.amazonaws.com/nodegroup
operator: In
values:
- nodegroup-name So I need to add the following tag to my AWS group Autoscaler: "k8s.io/cluster-autoscaler/node-template/label/eks.amazonaws.com/nodegroup" = nodegroupe-name Best regards |
@olahouze - to get this working I needing to add this tag to my AWS Autoscaling Group k8s.io/cluster-autoscaler/node-template/label/nodegroup-type: stateless Make sure that I then set the pod affinity to affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: eks.amazonaws.com/capacityType
operator: In
values:
- SPOT
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nodegroup-type
operator: In
values:
- stateless The autoscaler picked up the change on the next cycle and scaled up the ASG from 0. Hope this helps |
Hello Thank you for the answer With this information it forces me
The advantage of using eks.amazonaws.com/nodegroup in nodeaffinity is that AWS realizes all alone the addition of this label... Other people have already tested successfully the use of "k8s.io/cluster-autoscaler/node-template/label/eks.amazonaws.com/nodegroup" = nodegroupe-name on group autoscaler? Sincerely |
@olahouze - I agree with your thinking, I was also going to update all my helm charts. One point that I missed is that I also have a label in m eksctl nodegroup that matches the tag I just added. I suspect that cluster autoscaler will need the - name: ng-2-stateless-spot-1a
spot: true
tags:
k8s.io/cluster-autoscaler/node-template/label/nodegroup-type: stateless
labels:
nodegroup-type: stateless
instance-type: spot There is also an advanced eksctl cluster example here which uses the |
I'm not sure why it's never mentioned here but the whole thing seems to be fixed by #5002 |
Hi, we fixed our issues as follows and the cluster autoscaler is now able to start new Instances based on node selectors. We set the following tags on the ASGs. In this case with a Taint.
|
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hi,
we use EKS with kubernetes 1.18 and the Cluster Autoscaler. With kubernetes 1.17 the "beta.kubernetes.io/Instance-type" is deprecated. We use instead the new "node.kubernetes.io/instance-type" as NodeSelector. This is working for autoscaling groups without taints. For the autoscaling groups with taints is the new "node.kubernetes.io/instance-type" selector not working and the cluster autoscaler doesn't start new nodes. If we switch back to the old and deprecated "beta.kubernetes.io/instance-type" Selector the cluster autoscaler starts a new Node. We see this behavior on all of our EKS.
Events output for both Test PODs with beta and node.kubernetes.io as NodeSelector.
POD with node.kubernetes.io selector was started first.
Which component are you using?: cluster-autoscaler
What version of the component are you using?: cluster-autoscaler release v1.18.3
What k8s version are you using (
kubectl version
)?: 1.18.9What did you expect to happen?: Cluster-Autoscaler starts a new Nodes
What happened instead?: Cluster-Autoscaler doesn't start a new Nodes. See the following Error.
How to reproduce it (as minimally and precisely as possible):
We use the following POD template to test the cluster-autoscaler.
Is Working:
Is not Working:
Taints and Tags are configured on the ASG and also in kubelet configuration.
See Screenshot
The text was updated successfully, but these errors were encountered: