Skip to content

Commit

Permalink
Set DefaultRequeuingBackoffBaseSeconds to 60s. (kubernetes-sigs#2251)
Browse files Browse the repository at this point in the history
  • Loading branch information
mbobrovskyi authored and kannon92 committed Nov 19, 2024
1 parent 67371ed commit c204d22
Show file tree
Hide file tree
Showing 8 changed files with 17 additions and 17 deletions.
4 changes: 2 additions & 2 deletions apis/config/v1beta1/configuration_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ type RequeuingStrategy struct {
// - "Rand" represents the random jitter.
// During this time, the workload is taken as an inadmissible and
// other workloads will have a chance to be admitted.
// By default, the consecutive requeue delays are around: (10s, 20s, 40s, ...).
// By default, the consecutive requeue delays are around: (60s, 120s, 240s, ...).
//
// Defaults to null.
// +optional
Expand All @@ -256,7 +256,7 @@ type RequeuingStrategy struct {
// BackoffBaseSeconds defines the base for the exponential backoff for
// re-queuing an evicted workload.
//
// Defaults to 10.
// Defaults to 60.
// +optional
BackoffBaseSeconds *int32 `json:"backoffBaseSeconds,omitempty"`
}
Expand Down
2 changes: 1 addition & 1 deletion apis/config/v1beta1/defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ const (
DefaultMultiKueueGCInterval = time.Minute
DefaultMultiKueueOrigin = "multikueue"
DefaultMultiKueueWorkerLostTimeout = 15 * time.Minute
DefaultRequeuingBackoffBaseSeconds = 10
DefaultRequeuingBackoffBaseSeconds = 60
)

func getOperatorNamespace() string {
Expand Down
2 changes: 1 addition & 1 deletion charts/kueue/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ managerConfig:
# requeuingStrategy:
# timestamp: Eviction
# backoffLimitCount: null # null indicates infinite requeuing
# backoffBaseSeconds: 10
# backoffBaseSeconds: 60
#manageJobsWithoutQueueName: true
#internalCertManagement:
# enable: false
Expand Down
2 changes: 1 addition & 1 deletion config/components/manager/controller_manager_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ clientConnection:
# requeuingStrategy:
# timestamp: Eviction
# backoffLimitCount: null # null indicates infinite requeuing
# backoffBaseSeconds: 10
# backoffBaseSeconds: 60
#manageJobsWithoutQueueName: true
#internalCertManagement:
# enable: false
Expand Down
12 changes: 6 additions & 6 deletions keps/1282-pods-ready-requeue-strategy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,12 +143,12 @@ type RequeuingStrategy struct {
// Once the number is reached, the workload is deactivated (`.spec.activate`=`false`).
// When it is null, the workloads will repeatedly and endless re-queueing.
//
// Every backoff duration is about "10s*2^(n-1)+Rand" where:
// Every backoff duration is about "60s*2^(n-1)+Rand" where:
// - "n" represents the "workloadStatus.requeueState.count",
// - "Rand" represents the random jitter.
// During this time, the workload is taken as an inadmissible and
// other workloads will have a chance to be admitted.
// By default, the consecutive requeue delays are around: (10s, 20s, 40s, ...).
// By default, the consecutive requeue delays are around: (60s, 120s, 240s, ...).
//
// Defaults to null.
// +optional
Expand All @@ -157,7 +157,7 @@ type RequeuingStrategy struct {
// BackoffBaseSeconds defines the base for the exponential backoff for
// re-queuing an evicted workload.
//
// Defaults to 10.
// Defaults to 60.
// +optional
BackoffBaseSeconds *int32 `json:"backoffBaseSeconds,omitempty"`
}
Expand Down Expand Up @@ -230,16 +230,16 @@ Duration this time, other workloads will have a chance to be admitted.

The queueManager calculates an exponential backoff duration by [the Step function](https://pkg.go.dev/k8s.io/apimachinery/pkg/util/[email protected]#Backoff.Step)
according to the $b*2^{(n-1)}+Rand$ where:
- $b$ represents the base delay, configured by `baseDelaySeconds`
- $b$ represents the base delay, configured by `backoffBaseSeconds`
- $n$ represents the `workloadStatus.requeueState.count`,
- $Rand$ represents the random jitter.

It will spend awaiting to be requeued after eviction:
$$\sum_{k=1}^{n}(b*2^{(k-1)} + Rand)$$

Assuming `backoffLimitCount` equals 10, and `baseDelaySeconds` equals 10 (default) the workload is requeued 10 times
Assuming `backoffLimitCount` equals 10, and `backoffBaseSeconds` equals 60 (default) the workload is requeued 10 times
after failing to have all pods ready, then the total time awaiting for requeue
will take (neglecting the jitter): `10s+20s+40s +...+7680s=2h 8min`.
will take (neglecting the jitter): `60s+120s+240s +...+30720s=8h 32min`.
Also, considering `.waitForPodsReady.timeout=300s` (default),
the workload will spend `50min` total waiting for pods ready.

Expand Down
2 changes: 1 addition & 1 deletion pkg/controller/core/workload_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -455,7 +455,7 @@ func (r *WorkloadReconciler) triggerDeactivationOrBackoffRequeue(ctx context.Con
"Deactivated Workload %q by reached re-queue backoffLimitCount", klog.KObj(wl))
return true, nil
}
// Every backoff duration is about "10s*2^(n-1)+Rand" where:
// Every backoff duration is about "60s*2^(n-1)+Rand" where:
// - "n" represents the "requeuingCount",
// - "Rand" represents the random jitter.
// During this time, the workload is taken as an inadmissible and other
Expand Down
4 changes: 2 additions & 2 deletions site/content/en/docs/reference/kueue-config.v1beta1.md
Original file line number Diff line number Diff line change
Expand Up @@ -719,7 +719,7 @@ When it is null, the workloads will repeatedly and endless re-queueing.</p>
<li>&quot;Rand&quot; represents the random jitter.
During this time, the workload is taken as an inadmissible and
other workloads will have a chance to be admitted.
By default, the consecutive requeue delays are around: (10s, 20s, 40s, ...).</li>
By default, the consecutive requeue delays are around: (60s, 120s, 240s, ...).</li>
</ul>
<p>Defaults to null.</p>
</td>
Expand All @@ -730,7 +730,7 @@ By default, the consecutive requeue delays are around: (10s, 20s, 40s, ...).</li
<td>
<p>BackoffBaseSeconds defines the base for the exponential backoff for
re-queuing an evicted workload.</p>
<p>Defaults to 10.</p>
<p>Defaults to 60.</p>
</td>
</tr>
</tbody>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ fields:
requeuingStrategy:
timestamp: Eviction | Creation
backoffLimitCount: 5
backoffBaseSeconds: 10
backoffBaseSeconds: 60
```
{{% alert title="Note" color="primary" %}}
Expand Down Expand Up @@ -99,8 +99,8 @@ _The `backoffBaseSeconds` is available in Kueue v0.7.0 and later_
{{% /alert %}}
The time to re-queue a workload after each consecutive timeout is increased
exponentially, with the exponent of 2. The first delay is determined by the
`backoffBaseSeconds` parameter (defaulting to 10). So, after the consecutive timeouts
the evicted workload is re-queued after approximately `10, 20, 40, ...` seconds.
`backoffBaseSeconds` parameter (defaulting to 60). So, after the consecutive timeouts
the evicted workload is re-queued after approximately `60, 120, 240, ...` seconds.

## Example

Expand Down

0 comments on commit c204d22

Please sign in to comment.