Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve AWS "gotchas" list and discoverability of cloud-provider READMEs. #1746

Merged
merged 5 commits into from
Mar 5, 2019
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update multi-az gotcha to represent recommended practice
Signed-off-by: Sam Weston <weston.sam@gmail.com>
cablespaghetti committed Mar 5, 2019
commit f61c8d0ba43a0241ac57dd8046e1661293979248
5 changes: 1 addition & 4 deletions cluster-autoscaler/cloudprovider/aws/README.md
Original file line number Diff line number Diff line change
@@ -143,10 +143,7 @@ If you'd like to scale node groups from 0, an `autoscaling:DescribeLaunchConfigu

## Common Notes and Gotchas:
- The `/etc/ssl/certs/ca-certificates.crt` should exist by default on your ec2 instance. If you use Amazon Linux 2 (EKS worker node AMI by default), use `/etc/kubernetes/pki/ca.crt` instead for the volume hostPath of your cluster-autoscaler manifest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "instead of the volume hostPath of your cluster-autoscaler manifest"

- Cluster autoscaler is not zone aware (for now), so if you wish to span multiple availability zones in your autoscaling groups beware that cluster autoscaler will not evenly distribute them.
* This is not a problem for all use cases, as AWS will try to evenly distribute the nodes in your ASG on scale up, but you should ["Suspend" the "AZRebalance" process](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-suspend-resume-processes.html) if you want to avoid AWS unexpectedly terminating your nodes to try and keep your ASG balanced.
* More information on how this works under the hood in the cluster autoscaler is [available here](https://github.com/kubernetes/contrib/pull/1552#discussion_r75532949).
* You may also want to look at having multiple autoscaling groups (one for each availability zone) and using [--balance-similar-node-groups](../../FAQ.md#im-running-cluster-with-nodes-in-multiple-zones-for-ha-purposes-is-that-supported-by-cluster-autoscaler).
- Cluster autoscaler does not support Auto Scaling Groups which span multiple Availability Zones; instead you should use an Auto Scaling Group for each Availability Zone and enable the [--balance-similar-node-groups](../../FAQ.md#im-running-cluster-with-nodes-in-multiple-zones-for-ha-purposes-is-that-supported-by-cluster-autoscaler) feature. If you do use a single Auto Scaling Group that spans multiple Availability Zones you will find that AWS unexpectedly terminates nodes without them being drained because of the [rebalancing feature](https://docs.aws.amazon.com/autoscaling/ec2/userguide/auto-scaling-benefits.html#arch-AutoScalingMultiAZ).
- By default, cluster autoscaler will not terminate nodes running pods in the kube-system namespace. You can override this default behaviour by passing in the `--skip-nodes-with-system-pods=false` flag.
- By default, cluster autoscaler will wait 10 minutes between scale down operations, you can adjust this using the `--scale-down-delay-after-add`, `--scale-down-delay-after-delete`, and `--scale-down-delay-after-failure` flag. E.g. `--scale-down-delay-after-add=5m` to decrease the scale down delay to 5 minutes after a node has been added.
- If you're running multiple ASGs, the `--expander` flag supports three options: `random`, `most-pods` and `least-waste`. `random` will expand a random ASG on scale up. `most-pods` will scale up the ASG that will schedule the most amount of pods. `least-waste` will expand the ASG that will waste the least amount of CPU/MEM resources. In the event of a tie, cluster autoscaler will fall back to `random`.