Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS AZ Rebalancing Unexpectedly Terminates Nodes #1744

Closed
cablespaghetti opened this issue Mar 1, 2019 · 5 comments
Closed

AWS AZ Rebalancing Unexpectedly Terminates Nodes #1744

cablespaghetti opened this issue Mar 1, 2019 · 5 comments
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider kind/documentation Categorizes issue or PR as related to documentation.

Comments

@cablespaghetti
Copy link
Contributor

Not strictly speaking an autoscaler issue, but I will raise a PR to at least mention this in the docs if nothing else.

We couldn't work out why some of our nodes were getting surprise terminated without being drained first. Our apps need to get a nice SIGTERM and shutdown nicely or things get a bit wobbly.

I've now traced it to a feature of AWS Autoscaling Groups, which is designed to keep an even number of instances in each AZ. It will automatically spin up a new instance in the AZ with less instances and terminate an instance in the AZ with too many. Which is of course not very helpful in our use case. https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-benefits.html#arch-AutoScalingMultiAZ

Here's the docs of how to disable or "Suspend" this feature for your Autoscaling group, which I imagine anyone using this autoscaler on AWS will probably want to do: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html

@bskiba
Copy link
Member

bskiba commented Mar 4, 2019

Hi there. Is this connected with the open source Cluster Autoscaler in any way or is it only a problem with AWS Autoscaling Groups?

@aleksandra-malinowska aleksandra-malinowska added area/provider/aws Issues or PRs related to aws provider sig/aws labels Mar 4, 2019
@cablespaghetti
Copy link
Contributor Author

cablespaghetti commented Mar 4, 2019

In my eyes it should be in the documentation for this project, so the information is easily discoverable. I spent a while trying to work out why the K8S autoscaler was behaving oddly and not draining my nodes when actually it was a side effect of the cluster autoscaler making my cluster "unbalanced" as it scaled down and AWS trying to rectify that.

@bskiba
Copy link
Member

bskiba commented Mar 4, 2019

Makes sense. Looks like this could go here as an AWS specific gotcha: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#common-notes-and-gotchas
Would you be interested in contributing?

@bskiba bskiba added area/cluster-autoscaler kind/documentation Categorizes issue or PR as related to documentation. labels Mar 4, 2019
@cablespaghetti
Copy link
Contributor Author

cablespaghetti commented Mar 4, 2019 via email

@mtsr
Copy link

mtsr commented Mar 11, 2019

Has anyone tried whether ASG Notifications work normally for Rebalancing? Because it's relatively easy to setup a SNS notification and Lambda to trigger draining for ECS and I imagine the same should be doable for K8s.

See this blog post: https://aws.amazon.com/de/blogs/compute/how-to-automate-container-instance-draining-in-amazon-ecs/

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider kind/documentation Categorizes issue or PR as related to documentation.
Projects
None yet
Development

No branches or pull requests

4 participants