-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS AZ Rebalancing Unexpectedly Terminates Nodes #1744
Comments
Hi there. Is this connected with the open source Cluster Autoscaler in any way or is it only a problem with AWS Autoscaling Groups? |
In my eyes it should be in the documentation for this project, so the information is easily discoverable. I spent a while trying to work out why the K8S autoscaler was behaving oddly and not draining my nodes when actually it was a side effect of the cluster autoscaler making my cluster "unbalanced" as it scaled down and AWS trying to rectify that. |
Makes sense. Looks like this could go here as an AWS specific gotcha: https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#common-notes-and-gotchas |
I agree. I'll make a PR shortly.
…On Mon, 4 Mar 2019 at 10:57, Beata Skiba ***@***.***> wrote:
Makes sense. Looks like this could go here as an AWS specific gotcha:
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#common-notes-and-gotchas
Would you be interested in contributing?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1744 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKoi5hXuSRg_M1dosSW52Zs2n-fBCbWpks5vTPw0gaJpZM4bZbem>
.
|
Has anyone tried whether ASG Notifications work normally for Rebalancing? Because it's relatively easy to setup a SNS notification and Lambda to trigger draining for ECS and I imagine the same should be doable for K8s. See this blog post: https://aws.amazon.com/de/blogs/compute/how-to-automate-container-instance-draining-in-amazon-ecs/ |
Not strictly speaking an autoscaler issue, but I will raise a PR to at least mention this in the docs if nothing else.
We couldn't work out why some of our nodes were getting surprise terminated without being drained first. Our apps need to get a nice SIGTERM and shutdown nicely or things get a bit wobbly.
I've now traced it to a feature of AWS Autoscaling Groups, which is designed to keep an even number of instances in each AZ. It will automatically spin up a new instance in the AZ with less instances and terminate an instance in the AZ with too many. Which is of course not very helpful in our use case. https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-benefits.html#arch-AutoScalingMultiAZ
Here's the docs of how to disable or "Suspend" this feature for your Autoscaling group, which I imagine anyone using this autoscaler on AWS will probably want to do: https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html
The text was updated successfully, but these errors were encountered: