From f1b25ffe921fca107e0f44e632091db372f5ceeb Mon Sep 17 00:00:00 2001 From: Yaroslava Serdiuk Date: Thu, 1 Aug 2024 12:10:58 +0000 Subject: [PATCH] Add documentation regarding best-effort-atomic-scale-up ProvReq class --- cluster-autoscaler/FAQ.md | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/cluster-autoscaler/FAQ.md b/cluster-autoscaler/FAQ.md index 52d7c75f3d5c..d6c9c95a78b9 100644 --- a/cluster-autoscaler/FAQ.md +++ b/cluster-autoscaler/FAQ.md @@ -275,6 +275,7 @@ __However, if the substantial number of nodes are tainted with `startup taints` might stop working as it might assume the cluster is broken and should not be scaled (creating new nodes doesn't help as they don't become ready).__ Startup taints are defined as: + * all taints with the prefix `startup-taint.cluster-autoscaler.kubernetes.io/`, * all taints defined using `--startup-taint` flag. @@ -287,6 +288,7 @@ Cluster Autoscaler internally treats nodes tainted with `status taints` as ready This means that even though the node is ready, no pods should run there as long as the node is tainted and if necessary a scale-up should occur. Status taints are defined as: + * all taints with the prefix `status-taint.cluster-autoscaler.kubernetes.io/`, * all taints defined using `--status-taint` flag. @@ -295,6 +297,7 @@ Status taints are defined as: Ignore taints are now deprecated and treated as startup taints. Ignore taints are defined as: + * all taints with the prefix `ignore-taint.cluster-autoscaler.kubernetes.io/`, * all taints defined using `--ignore-taint` flag. @@ -553,6 +556,7 @@ This annotation has no effect on pods that are not a part of any DaemonSet. Kubernetes scheduler will fail to schedule a Pod to a Node if the Node's max volume count is exceeded. In such case to enable Cluster Autoscaler to scale up in a Kubernetes cluster with [CSI migration](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/625-csi-migration/README.md) enabled, the appropriate CSI related feature gates have to be specified for the Cluster Autoscaler (if the corresponding feature gates are not enabled by default). For example: + ``` --feature-gates=CSIMigration=true,CSIMigration{Provdider}=true,InTreePlugin{Provider}Unregister=true ``` @@ -566,27 +570,41 @@ Provisioning Request (abbr. ProvReq) is a new namespaced Custom Resource that ai #### Enabling ProvisioningRequest Support -1. **Cluster Autoscaler Version**: Ensure you are using Cluster Autoscaler version 1.30.1 or later. +1. __Cluster Autoscaler Version__: Ensure you are using Cluster Autoscaler version 1.30.1 or later. -2. **Feature Flag**: Enable ProvisioningRequest support by setting the following flag in your Cluster Autoscaler configuration: +2. __Feature Flag__: Enable ProvisioningRequest support by setting the following flag in your Cluster Autoscaler configuration: --enable-provisioning-reques=true. #### Supported ProvisioningClasses Currently, ClusterAutoscaler supports following ProvisioningClasses: -* `check-capacity.autoscaling.x-k8s.io`. +* `check-capacity.autoscaling.x-k8s.io`. When using this class, Cluster Autoscaler performs following actions: - * **Capacity Check**: Determines if sufficient capacity exists in the cluster to fulfill the ProvisioningRequest. + * __Capacity Check__: Determines if sufficient capacity exists in the cluster to fulfill the ProvisioningRequest. - * **Reservation from other ProvReqs** (if capacity is available): Reserves this capacity for the ProvisioningRequest for 10 minutes, preventing other ProvReqs from using it. + * __Reservation from other ProvReqs__ (if capacity is available): Reserves this capacity for the ProvisioningRequest for 10 minutes, preventing other ProvReqs from using it. - * **Condition Updates**: + * __Condition Updates__: Adds a Accepted=True condition when ProvReq is accepted by ClusterAutoscaler and ClusterAutoscaler will check capacity for this ProvReq. Adds a Provisioned=True condition to the ProvReq if capacity is available. Adds a BookingExpired=True condition when the 10-minute reservation period expires. +* `best-effort-atomic-scale-up.autoscaling.x-k8s.io` (supported from Cluster Autoscalere version 1.30.2 or later). +When using this class, Cluster Autoscaler performs following actions: + + * __Capacity Check__: Check which pods could be scheduled on existing capacity. + + * __ScaleUp Request__: Evaluates if scaling up a node group could fulfill the remaining requirements of the ProvisioningRequest.. + + * __Reservation from other ProvReqs (if scale up request succeeded)__: Reserves this capacity for the ProvisioningRequest for 10 minutes, preventing other ProvReqs from using it. + + * __Condition Updates__: + Adds a Accepted=True condition when ProvReq is accepted by ClusterAutoscaler. + Adds a Provisioned=True condition to the ProvReq if the node group scale up request is successfull. + Adds a BookingExpired=True condition when the 10-minute reservation period expires. + **************** # Internals @@ -1058,7 +1076,6 @@ $ kubectl describe pods memory-reservation-73rl0 --namespace e2e-tests-autoscali Name: memory-reservation-73rl0 ... - Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ -------