Skip to content

Commit

Permalink
Fix memory limit
Browse files Browse the repository at this point in the history
CA has a known bug kubernetes/autoscaler#3506
The container consumes more memory than it is limited to.

This fix will prevent issues with OOMKill errors with cluster-autoscaler container
  • Loading branch information
georgio-sd authored Jul 23, 2021
1 parent 6998257 commit eb87379
Showing 1 changed file with 9 additions and 2 deletions.
11 changes: 9 additions & 2 deletions doc_source/cluster-autoscaler.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,14 @@ Complete the following steps to deploy the Cluster Autoscaler\. We recommend tha
-n kube-system \
-p '{"spec":{"template":{"metadata":{"annotations":{"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"}}}}}'
```

1. Patch the container memory limit to avoid `OOMKill` error in the future\.

```
kubectl patch deployment cluster-autoscaler \
-n kube-system \
-p '{"spec":{"template":{"spec":{"containers":[{"name":"cluster-autoscaler","resources":{"limits":{"memory":"1000Mi"}}}]}}}}'
```

1. Edit the Cluster Autoscaler deployment with the following command\.

```
Expand Down Expand Up @@ -387,4 +394,4 @@ There are other benefits to overprovisioning\. Without overprovisioning, pods in
It's important to choose an appropriate amount of overprovisioned capacity\. One way that you can make sure that you choose an appropriate amount is by taking your average scaleup frequency and dividing it by the duration of time it takes to scale up a new node\. For example, if, on average, you require a new node every 30 seconds and Amazon EC2 takes 30 seconds to provision a new node, a single node of overprovisioning ensures that there’s always an extra node available\. Doing this can reduce scheduling latency by 30 seconds at the cost of a single additional Amazon EC2 instance\. To imake better zonal scheduling decisions, you can also overprovision the number of nodes to be the same as the number of Availability Zones in your Amazon EC2 Auto Scaling group\. Doing this ensures that the scheduler can select the best zone for incoming pods\.

**Prevent scale down eviction**
Some workloads are expensive to evict\. Big data analysis, machine learning tasks, and test runners can take a long time to complete and must be restarted if they're interrupted\. The Cluster Autoscaler helps to scale down any node under the `scale-down-utilization-threshold`\. This interrupts any remaining pods on the node\. However, you can prevent this from happening by ensuring that pods that are expensive to evict are protected by a label recognized by the Cluster Autoscaler\. To do this, ensure that pods that are expensive to evict have the label `cluster-autoscaler.kubernetes.io/safe-to-evict=false`\.
Some workloads are expensive to evict\. Big data analysis, machine learning tasks, and test runners can take a long time to complete and must be restarted if they're interrupted\. The Cluster Autoscaler helps to scale down any node under the `scale-down-utilization-threshold`\. This interrupts any remaining pods on the node\. However, you can prevent this from happening by ensuring that pods that are expensive to evict are protected by a label recognized by the Cluster Autoscaler\. To do this, ensure that pods that are expensive to evict have the label `cluster-autoscaler.kubernetes.io/safe-to-evict=false`\.

0 comments on commit eb87379

Please sign in to comment.