-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cluster Autoscaler doesn't scale ASG from 0 unless manually scaled first (and afterward works as expected). #5006
Comments
@ZTGallagher - have a look at my comment on #4998 - you are probably experiencing the same issue. |
We also got these errors when the size of the node group is initially 0
We're using eks_managed_node_groups and we (finally) got it working by setting the scaling options like this
The important bit is that we have the initial desired_size at 1 instead of 0. It allows the side effect to occur to make the node group eligible by the autoscaler. It's more or less like manually setting the desired size to 1 after creation. Not the greatest workaround |
We're also having the exact same issue. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close not-planned |
@k8s-triage-robot: Closing this issue, marking it as "Not Planned". In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
If we have at least once scaled up the ASG while the cluster autoscaler is installed, it can from then one scale the deployment to and from 0.
But if the ASG has never once been scaled up, it won't work.
Here is the IAM role permissions set for the cluster autoscaler.
Tags on the ASG.
The tags match the label and gpu the deployment looks for.
I don't know how Cluster Autoscaler caches which ASG applies to which labels. But it works after at least one manual scale-up. That's not really a viable solution though. The whole deployment would be essentially hands-off without this issue.
The text was updated successfully, but these errors were encountered: