Fix memory limit

CA has a known bug kubernetes/autoscaler#3506 The container consumes more memory than it is limited to. This fix will prevent issues with OOMKill errors with cluster-autoscaler container
georgio-sd · Jul 23, 2021 · eb87379 · eb87379
1 parent 6998257
commit eb87379
Showing 1 changed file with 9 additions and 2 deletions.
diff --git a/doc_source/cluster-autoscaler.md b/doc_source/cluster-autoscaler.md
@@ -152,7 +152,14 @@ Complete the following steps to deploy the Cluster Autoscaler\. We recommend tha
      -n kube-system \
      -p '{"spec":{"template":{"metadata":{"annotations":{"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"}}}}}'
    ```
-
+1. Patch the container memory limit to avoid `OOMKill` error in the future\.
+  
+   ```
+   kubectl patch deployment cluster-autoscaler \
+   -n kube-system \
+   -p '{"spec":{"template":{"spec":{"containers":[{"name":"cluster-autoscaler","resources":{"limits":{"memory":"1000Mi"}}}]}}}}'
+   ```
+  
 1. Edit the Cluster Autoscaler deployment with the following command\.
 
    ```
@@ -387,4 +394,4 @@ There are other benefits to overprovisioning\. Without overprovisioning, pods in
 It's important to choose an appropriate amount of overprovisioned capacity\. One way that you can make sure that you choose an appropriate amount is by taking your average scaleup frequency and dividing it by the duration of time it takes to scale up a new node\. For example, if, on average, you require a new node every 30 seconds and Amazon EC2 takes 30 seconds to provision a new node, a single node of overprovisioning ensures that there’s always an extra node available\. Doing this can reduce scheduling latency by 30 seconds at the cost of a single additional Amazon EC2 instance\. To imake better zonal scheduling decisions, you can also overprovision the number of nodes to be the same as the number of Availability Zones in your Amazon EC2 Auto Scaling group\. Doing this ensures that the scheduler can select the best zone for incoming pods\.
 
 **Prevent scale down eviction**  
-Some workloads are expensive to evict\. Big data analysis, machine learning tasks, and test runners can take a long time to complete and must be restarted if they're interrupted\. The Cluster Autoscaler helps to scale down any node under the `scale-down-utilization-threshold`\. This interrupts any remaining pods on the node\. However, you can prevent this from happening by ensuring that pods that are expensive to evict are protected by a label recognized by the Cluster Autoscaler\. To do this, ensure that pods that are expensive to evict have the label `cluster-autoscaler.kubernetes.io/safe-to-evict=false`\. 
+Some workloads are expensive to evict\. Big data analysis, machine learning tasks, and test runners can take a long time to complete and must be restarted if they're interrupted\. The Cluster Autoscaler helps to scale down any node under the `scale-down-utilization-threshold`\. This interrupts any remaining pods on the node\. However, you can prevent this from happening by ensuring that pods that are expensive to evict are protected by a label recognized by the Cluster Autoscaler\. To do this, ensure that pods that are expensive to evict have the label `cluster-autoscaler.kubernetes.io/safe-to-evict=false`\.