Make the defaults for PodsReadyTimeout backoff more practical #2025

mimowo · 2024-04-19T15:52:50Z

What type of PR is this?

/kind bug
/kind documentation

What this PR does / why we need it:

Which issue(s) this PR fixes:

Part of #2009

Special notes for your reviewer:

WIP because still testing, and I need to update the estimations from KEP and API comments.
Early feedback is welcome.

Does this PR introduce a user-facing change?

Make the defaults for PodsReadyTimeout backoff more practical, as for the original values
the couple of first requeues made the impression as immediate on users (below 10s, which 
is negligible to the wait time spent waiting for PodsReady). 

The defaults values for the formula to determine the exponential back are changed as follows:
- base `1s -> 10s`
- exponent: `1.41284738 -> 2`
So, now the consecutive times to requeue a workload are: 10s, 20s, 40s, ...

netlify · 2024-04-19T15:53:06Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`3aa079e`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/66267a1aab2a4f00085df88e

mimowo · 2024-04-22T10:24:07Z

/assign @tenzen-y

mimowo · 2024-04-22T10:34:05Z

/cc @alculquicondor

tenzen-y

Thank you for creating this PR!
Basically, lgtm.

Could you update API documentation (site)? Also, I'm not sure the reason why our CI didn't detect outdated API documentation...

tenzen-y · 2024-04-22T13:59:33Z

keps/1282-pods-ready-requeue-strategy/README.md

+For comparison, considering `.waitForPodsReady.timeout=300s` (default),
+the workload will spend `50min` total waiting for pods ready.


Could you mention backoffLimitCount the same as before since the backoffLimitCount never defaults any value, and the requeueing will occur forever?

Done, but I set it to 10 now PTAL

I meant the backoffLimitCount in Config API. In this PR, we introduced the fixed value to Duration in backoff calculation, right?

// When it is null, the workloads will repeatedly and endless re-queueing. Isn't that enough?

I meant the backoffLimitCount in Config API. In this PR, we introduced the fixed value to Duration in backoff calculation, right?

I discussed this with @mimowo offline. It seemed that our understanding was missing each other, but we could sync our opinions.

// When it is null, the workloads will repeatedly and endless re-queueing. Isn't that enough?

I'm ok with removing this sentence since it seems that this example seems to lose importance.

pkg/controller/core/core.go

alculquicondor · 2024-04-22T15:11:35Z

pkg/controller/core/core.go

+
+type ControllerOption func(*ControllerOptions)
+
+func WithControllerRequeuingBaseDelaySeconds(value int32) ControllerOption {


This seems unnecessary for the cherry-pick

I use it to pass different base in prod and integration tests. When I use 10s in integration tests they fail, as the Timeout is 5s only. I considered the following approaches:

Expose the configuration via API (deferred to a follow up since this shouldn't be cherry-picked)

Bump the timeout for the subset of integration tests for PodsReady - this would work, but seems wasteful

Expose configuration which allows me to pass different values to SetupControllers (chosen)

Let me know if there is another approach.

oh ok, let's keep this approach.

alculquicondor

/lgtm
/approve

k8s-ci-robot · 2024-04-22T16:47:01Z

LGTM label has been added.

Git tree hash: 2fa23b305d3603f6365f0b44615630e3d0a1362b

k8s-ci-robot · 2024-04-22T16:47:01Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [alculquicondor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mimowo · 2024-04-22T16:47:11Z

Could you update API documentation (site)? Also, I'm not sure the reason why our CI didn't detect outdated API documentation...

Thanks for spotting the issue, I opened: #2032

tenzen-y · 2024-04-22T16:55:31Z

/lgtm

/cherry-pick release-0.6

k8s-infra-cherrypick-robot · 2024-04-22T16:56:09Z

@tenzen-y: #2025 failed to apply on top of branch "release-0.6":

Applying: Make the defaults for PodsReady backoff more practical
Using index info to reconstruct a base tree...
M	apis/config/v1beta1/configuration_types.go
M	pkg/controller/core/workload_controller.go
M	pkg/controller/core/workload_controller_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/controller/core/workload_controller_test.go
CONFLICT (content): Merge conflict in pkg/controller/core/workload_controller_test.go
Auto-merging pkg/controller/core/workload_controller.go
Auto-merging apis/config/v1beta1/configuration_types.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Make the defaults for PodsReady backoff more practical
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/lgtm

/cherry-pick release-0.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tenzen-y · 2024-04-22T16:57:04Z

@tenzen-y: #2025 failed to apply on top of branch "release-0.6":

Applying: Make the defaults for PodsReady backoff more practical
Using index info to reconstruct a base tree...
M	apis/config/v1beta1/configuration_types.go
M	pkg/controller/core/workload_controller.go
M	pkg/controller/core/workload_controller_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/controller/core/workload_controller_test.go
CONFLICT (content): Merge conflict in pkg/controller/core/workload_controller_test.go
Auto-merging pkg/controller/core/workload_controller.go
Auto-merging apis/config/v1beta1/configuration_types.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Make the defaults for PodsReady backoff more practical
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

@mimowo Could you submit a cherry-pick PR?

…gs#2025) Change-Id: Icf5937311c40f2a28050d35e1fc3189a855c9aa4

…ff more practical (#2033) * Make the defaults for PodsReady backoff more practical * Fix API reference for PodsReady config

alculquicondor · 2024-05-08T17:48:35Z

/remove-kind documentation

…gs#2025)

k8s-ci-robot requested review from denkensk and kerthcet April 19, 2024 15:52

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 19, 2024

mimowo force-pushed the pods-ready-timeout-defaults branch from d17b5de to 9eac5c4 Compare April 22, 2024 09:25

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 22, 2024

mimowo force-pushed the pods-ready-timeout-defaults branch from 9eac5c4 to f31ab70 Compare April 22, 2024 10:05

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 22, 2024

mimowo changed the title ~~WIP: Make the defaults for PodsReadyTimeout backoff more practical~~ Make the defaults for PodsReadyTimeout backoff more practical Apr 22, 2024

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 22, 2024

k8s-ci-robot assigned tenzen-y Apr 22, 2024

k8s-ci-robot requested a review from alculquicondor April 22, 2024 10:34

tenzen-y reviewed Apr 22, 2024

View reviewed changes

Make the defaults for PodsReady backoff more practical

3aa079e

mimowo force-pushed the pods-ready-timeout-defaults branch from 83d64ed to 3aa079e Compare April 22, 2024 14:54

alculquicondor reviewed Apr 22, 2024

View reviewed changes

mimowo mentioned this pull request Apr 22, 2024

Changes to API comments aren't regenerated by generate-apiref #2032

Closed

alculquicondor reviewed Apr 22, 2024

View reviewed changes

k8s-ci-robot assigned alculquicondor Apr 22, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 22, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 22, 2024

k8s-ci-robot merged commit 63f46ac into kubernetes-sigs:main Apr 22, 2024
15 checks passed

k8s-ci-robot added this to the v0.7 milestone Apr 22, 2024

mimowo mentioned this pull request Apr 22, 2024

Automated cherry pick of #2025: Make the defaults for PodsReady backoff more practical #2033

Merged

alculquicondor pushed a commit to alculquicondor/kueue that referenced this pull request Apr 22, 2024

Make the defaults for PodsReady backoff more practical (kubernetes-si…

c9a29e9

…gs#2025) Change-Id: Icf5937311c40f2a28050d35e1fc3189a855c9aa4

k8s-ci-robot pushed a commit that referenced this pull request Apr 22, 2024

Automated cherry pick of #2025: Make the defaults for PodsReady backo…

f174c21

…ff more practical (#2033) * Make the defaults for PodsReady backoff more practical * Fix API reference for PodsReady config

tenzen-y mentioned this pull request Apr 24, 2024

[WaitForPodsReady] Make requeue base delay confiurable #2040

Merged

k8s-ci-robot removed the kind/documentation Categorizes issue or PR as related to documentation. label May 8, 2024

mimowo deleted the pods-ready-timeout-defaults branch May 29, 2024 14:53

kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Nov 19, 2024

Make the defaults for PodsReady backoff more practical (kubernetes-si…

555250b

…gs#2025)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make the defaults for PodsReadyTimeout backoff more practical #2025

Make the defaults for PodsReadyTimeout backoff more practical #2025

mimowo commented Apr 19, 2024 •

edited

Loading

netlify bot commented Apr 19, 2024 •

edited

Loading

mimowo commented Apr 22, 2024

mimowo commented Apr 22, 2024

tenzen-y left a comment

tenzen-y Apr 22, 2024

mimowo Apr 22, 2024

tenzen-y Apr 22, 2024

alculquicondor Apr 22, 2024

tenzen-y Apr 22, 2024

alculquicondor Apr 22, 2024

mimowo Apr 22, 2024

alculquicondor Apr 22, 2024

alculquicondor left a comment

k8s-ci-robot commented Apr 22, 2024

k8s-ci-robot commented Apr 22, 2024

mimowo commented Apr 22, 2024

tenzen-y commented Apr 22, 2024

k8s-infra-cherrypick-robot commented Apr 22, 2024

tenzen-y commented Apr 22, 2024

alculquicondor commented May 8, 2024

		For comparison, considering `.waitForPodsReady.timeout=300s` (default),
		the workload will spend `50min` total waiting for pods ready.


		type ControllerOption func(*ControllerOptions)

		func WithControllerRequeuingBaseDelaySeconds(value int32) ControllerOption {

Make the defaults for PodsReadyTimeout backoff more practical #2025

Make the defaults for PodsReadyTimeout backoff more practical #2025

Conversation

mimowo commented Apr 19, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

netlify bot commented Apr 19, 2024 • edited Loading

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

mimowo commented Apr 22, 2024

mimowo commented Apr 22, 2024

tenzen-y left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Apr 22, 2024

k8s-ci-robot commented Apr 22, 2024

mimowo commented Apr 22, 2024

tenzen-y commented Apr 22, 2024

k8s-infra-cherrypick-robot commented Apr 22, 2024

tenzen-y commented Apr 22, 2024

alculquicondor commented May 8, 2024

mimowo commented Apr 19, 2024 •

edited

Loading

netlify bot commented Apr 19, 2024 •

edited

Loading