Skip to content

Commit

Permalink
Add documentation regarding best-effort-atomic-scale-up ProvReq class
Browse files Browse the repository at this point in the history
  • Loading branch information
yaroslava-serdiuk committed Aug 1, 2024
1 parent 616930b commit f1b25ff
Showing 1 changed file with 24 additions and 7 deletions.
31 changes: 24 additions & 7 deletions cluster-autoscaler/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,7 @@ __However, if the substantial number of nodes are tainted with `startup taints`
might stop working as it might assume the cluster is broken and should not be scaled (creating new nodes doesn't help as they don't become ready).__
Startup taints are defined as:
* all taints with the prefix `startup-taint.cluster-autoscaler.kubernetes.io/`,
* all taints defined using `--startup-taint` flag.
Expand All @@ -287,6 +288,7 @@ Cluster Autoscaler internally treats nodes tainted with `status taints` as ready
This means that even though the node is ready, no pods should run there as long as the node is tainted and if necessary a scale-up should occur.
Status taints are defined as:
* all taints with the prefix `status-taint.cluster-autoscaler.kubernetes.io/`,
* all taints defined using `--status-taint` flag.
Expand All @@ -295,6 +297,7 @@ Status taints are defined as:
Ignore taints are now deprecated and treated as startup taints.
Ignore taints are defined as:
* all taints with the prefix `ignore-taint.cluster-autoscaler.kubernetes.io/`,
* all taints defined using `--ignore-taint` flag.
Expand Down Expand Up @@ -553,6 +556,7 @@ This annotation has no effect on pods that are not a part of any DaemonSet.
Kubernetes scheduler will fail to schedule a Pod to a Node if the Node's max volume count is exceeded. In such case to enable Cluster Autoscaler to scale up in a Kubernetes cluster with [CSI migration](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/625-csi-migration/README.md) enabled, the appropriate CSI related feature gates have to be specified for the Cluster Autoscaler (if the corresponding feature gates are not enabled by default).
For example:
```
--feature-gates=CSIMigration=true,CSIMigration{Provdider}=true,InTreePlugin{Provider}Unregister=true
```
Expand All @@ -566,27 +570,41 @@ Provisioning Request (abbr. ProvReq) is a new namespaced Custom Resource that ai

#### Enabling ProvisioningRequest Support

1. **Cluster Autoscaler Version**: Ensure you are using Cluster Autoscaler version 1.30.1 or later.
1. __Cluster Autoscaler Version__: Ensure you are using Cluster Autoscaler version 1.30.1 or later.

2. **Feature Flag**: Enable ProvisioningRequest support by setting the following flag in your Cluster Autoscaler configuration:
2. __Feature Flag__: Enable ProvisioningRequest support by setting the following flag in your Cluster Autoscaler configuration:
--enable-provisioning-reques=true.

#### Supported ProvisioningClasses

Currently, ClusterAutoscaler supports following ProvisioningClasses:

* `check-capacity.autoscaling.x-k8s.io`.
* `check-capacity.autoscaling.x-k8s.io`.
When using this class, Cluster Autoscaler performs following actions:

* **Capacity Check**: Determines if sufficient capacity exists in the cluster to fulfill the ProvisioningRequest.
* __Capacity Check__: Determines if sufficient capacity exists in the cluster to fulfill the ProvisioningRequest.

* **Reservation from other ProvReqs** (if capacity is available): Reserves this capacity for the ProvisioningRequest for 10 minutes, preventing other ProvReqs from using it.
* __Reservation from other ProvReqs__ (if capacity is available): Reserves this capacity for the ProvisioningRequest for 10 minutes, preventing other ProvReqs from using it.

* **Condition Updates**:
* __Condition Updates__:
Adds a Accepted=True condition when ProvReq is accepted by ClusterAutoscaler and ClusterAutoscaler will check capacity for this ProvReq.
Adds a Provisioned=True condition to the ProvReq if capacity is available.
Adds a BookingExpired=True condition when the 10-minute reservation period expires.

* `best-effort-atomic-scale-up.autoscaling.x-k8s.io` (supported from Cluster Autoscalere version 1.30.2 or later).
When using this class, Cluster Autoscaler performs following actions:

* __Capacity Check__: Check which pods could be scheduled on existing capacity.

* __ScaleUp Request__: Evaluates if scaling up a node group could fulfill the remaining requirements of the ProvisioningRequest..

* __Reservation from other ProvReqs (if scale up request succeeded)__: Reserves this capacity for the ProvisioningRequest for 10 minutes, preventing other ProvReqs from using it.

* __Condition Updates__:
Adds a Accepted=True condition when ProvReq is accepted by ClusterAutoscaler.
Adds a Provisioned=True condition to the ProvReq if the node group scale up request is successfull.
Adds a BookingExpired=True condition when the 10-minute reservation period expires.

****************

# Internals
Expand Down Expand Up @@ -1058,7 +1076,6 @@ $ kubectl describe pods memory-reservation-73rl0 --namespace e2e-tests-autoscali
Name: memory-reservation-73rl0

...

Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
Expand Down

0 comments on commit f1b25ff

Please sign in to comment.