Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set DefaultRequeuingBackoffBaseSeconds to 60s. #2251

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions apis/config/v1beta1/configuration_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ type RequeuingStrategy struct {
// - "Rand" represents the random jitter.
// During this time, the workload is taken as an inadmissible and
// other workloads will have a chance to be admitted.
// By default, the consecutive requeue delays are around: (10s, 20s, 40s, ...).
// By default, the consecutive requeue delays are around: (60s, 120s, 240s, ...).
//
// Defaults to null.
// +optional
Expand All @@ -256,7 +256,7 @@ type RequeuingStrategy struct {
// BackoffBaseSeconds defines the base for the exponential backoff for
// re-queuing an evicted workload.
//
// Defaults to 10.
// Defaults to 60.
// +optional
BackoffBaseSeconds *int32 `json:"backoffBaseSeconds,omitempty"`
}
Expand Down
2 changes: 1 addition & 1 deletion apis/config/v1beta1/defaults.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ const (
DefaultMultiKueueGCInterval = time.Minute
DefaultMultiKueueOrigin = "multikueue"
DefaultMultiKueueWorkerLostTimeout = 15 * time.Minute
DefaultRequeuingBackoffBaseSeconds = 10
DefaultRequeuingBackoffBaseSeconds = 60
)

func getOperatorNamespace() string {
Expand Down
2 changes: 1 addition & 1 deletion charts/kueue/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ managerConfig:
# requeuingStrategy:
# timestamp: Eviction
# backoffLimitCount: null # null indicates infinite requeuing
# backoffBaseSeconds: 10
# backoffBaseSeconds: 60
#manageJobsWithoutQueueName: true
#internalCertManagement:
# enable: false
Expand Down
2 changes: 1 addition & 1 deletion config/components/manager/controller_manager_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ clientConnection:
# requeuingStrategy:
# timestamp: Eviction
# backoffLimitCount: null # null indicates infinite requeuing
# backoffBaseSeconds: 10
# backoffBaseSeconds: 60
#manageJobsWithoutQueueName: true
#internalCertManagement:
# enable: false
Expand Down
12 changes: 6 additions & 6 deletions keps/1282-pods-ready-requeue-strategy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,12 +143,12 @@ type RequeuingStrategy struct {
// Once the number is reached, the workload is deactivated (`.spec.activate`=`false`).
// When it is null, the workloads will repeatedly and endless re-queueing.
//
// Every backoff duration is about "10s*2^(n-1)+Rand" where:
// Every backoff duration is about "60s*2^(n-1)+Rand" where:
// - "n" represents the "workloadStatus.requeueState.count",
// - "Rand" represents the random jitter.
// During this time, the workload is taken as an inadmissible and
// other workloads will have a chance to be admitted.
// By default, the consecutive requeue delays are around: (10s, 20s, 40s, ...).
// By default, the consecutive requeue delays are around: (60s, 120s, 240s, ...).
//
// Defaults to null.
// +optional
Expand All @@ -157,7 +157,7 @@ type RequeuingStrategy struct {
// BackoffBaseSeconds defines the base for the exponential backoff for
// re-queuing an evicted workload.
//
// Defaults to 10.
// Defaults to 60.
// +optional
BackoffBaseSeconds *int32 `json:"backoffBaseSeconds,omitempty"`
}
Expand Down Expand Up @@ -230,16 +230,16 @@ Duration this time, other workloads will have a chance to be admitted.

The queueManager calculates an exponential backoff duration by [the Step function](https://pkg.go.dev/k8s.io/apimachinery/pkg/util/[email protected]#Backoff.Step)
according to the $b*2^{(n-1)}+Rand$ where:
- $b$ represents the base delay, configured by `baseDelaySeconds`
- $b$ represents the base delay, configured by `backoffBaseSeconds`
- $n$ represents the `workloadStatus.requeueState.count`,
- $Rand$ represents the random jitter.

It will spend awaiting to be requeued after eviction:
$$\sum_{k=1}^{n}(b*2^{(k-1)} + Rand)$$

Assuming `backoffLimitCount` equals 10, and `baseDelaySeconds` equals 10 (default) the workload is requeued 10 times
Assuming `backoffLimitCount` equals 10, and `backoffBaseSeconds` equals 60 (default) the workload is requeued 10 times
after failing to have all pods ready, then the total time awaiting for requeue
will take (neglecting the jitter): `10s+20s+40s +...+7680s=2h 8min`.
will take (neglecting the jitter): `60s+120s+240s +...+30720s=8h 32min`.
Also, considering `.waitForPodsReady.timeout=300s` (default),
the workload will spend `50min` total waiting for pods ready.

Expand Down
2 changes: 1 addition & 1 deletion pkg/controller/core/workload_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -455,7 +455,7 @@ func (r *WorkloadReconciler) triggerDeactivationOrBackoffRequeue(ctx context.Con
"Deactivated Workload %q by reached re-queue backoffLimitCount", klog.KObj(wl))
return true, nil
}
// Every backoff duration is about "10s*2^(n-1)+Rand" where:
// Every backoff duration is about "60s*2^(n-1)+Rand" where:
// - "n" represents the "requeuingCount",
// - "Rand" represents the random jitter.
// During this time, the workload is taken as an inadmissible and other
Expand Down
4 changes: 2 additions & 2 deletions site/content/en/docs/reference/kueue-config.v1beta1.md
Original file line number Diff line number Diff line change
Expand Up @@ -719,7 +719,7 @@ When it is null, the workloads will repeatedly and endless re-queueing.</p>
<li>&quot;Rand&quot; represents the random jitter.
During this time, the workload is taken as an inadmissible and
other workloads will have a chance to be admitted.
By default, the consecutive requeue delays are around: (10s, 20s, 40s, ...).</li>
By default, the consecutive requeue delays are around: (60s, 120s, 240s, ...).</li>
</ul>
<p>Defaults to null.</p>
</td>
Expand All @@ -730,7 +730,7 @@ By default, the consecutive requeue delays are around: (10s, 20s, 40s, ...).</li
<td>
<p>BackoffBaseSeconds defines the base for the exponential backoff for
re-queuing an evicted workload.</p>
<p>Defaults to 10.</p>
<p>Defaults to 60.</p>
</td>
</tr>
</tbody>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ fields:
requeuingStrategy:
timestamp: Eviction | Creation
backoffLimitCount: 5
backoffBaseSeconds: 10
backoffBaseSeconds: 60
```

{{% alert title="Note" color="primary" %}}
Expand Down Expand Up @@ -99,8 +99,8 @@ _The `backoffBaseSeconds` is available in Kueue v0.7.0 and later_
{{% /alert %}}
The time to re-queue a workload after each consecutive timeout is increased
exponentially, with the exponent of 2. The first delay is determined by the
`backoffBaseSeconds` parameter (defaulting to 10). So, after the consecutive timeouts
the evicted workload is re-queued after approximately `10, 20, 40, ...` seconds.
`backoffBaseSeconds` parameter (defaulting to 60). So, after the consecutive timeouts
the evicted workload is re-queued after approximately `60, 120, 240, ...` seconds.

## Example

Expand Down