Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scheduler: impose a backoff penalty on gated Pods #126029

Merged
merged 1 commit into from
Aug 28, 2024

Conversation

sanposhiho
Copy link
Member

What type of PR is this?

/kind feature

What this PR does / why we need it:

Starting the story from the concept of backoff in the scheduler; a backoff time in the scheduler is a penalty that we impose on Pods when they consume a scheduling cycle, but they didn't get scheduled and came back to the queue.

But, currently all gated Pods are always regarded as not backing off.
That is only correct for a vanilla scheduler because all Pods gated by SchedulingGates haven't experienced any scheduling and thus are not backing off for sure.
A custom PreEnqueue plugin might gate Pods after they experience some scheduling cycles; those mean -

  1. Pods have experienced some scheduling cycles.
  2. They get gated by a custom PreEnqueue plugin.
  3. They get un-gated for some reasons.
  4. 💥 Whoa! They're moved to activeQ without a backoff penalty.

Regardless of whether a Pod is gated or not, they are supposed to get a penalty if they wasted some scheduling cycles before.
It's the law in the scheduler; it's their obligation that they must meet before retrying a schedule again.

This PR changes isPodBackingoff() not to skip gated Pods so that we can prevent such Pods from exploiting loopholes, ignoring the law, and escaping a penalty.

Which issue(s) this PR fixes:

Fixes #125538

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The scheduler retries gated Pods more appropriately, giving them a backoff penalty too.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 11, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jul 11, 2024
@sanposhiho sanposhiho marked this pull request as ready for review July 11, 2024 11:51
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 11, 2024
@k8s-ci-robot k8s-ci-robot requested review from damemi and denkensk July 11, 2024 11:52
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 11, 2024
@k8s-ci-robot k8s-ci-robot requested a review from kerthcet July 11, 2024 11:52
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 11, 2024
@sanposhiho
Copy link
Member Author

/hold

to go thru an approver.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 11, 2024
@sanposhiho
Copy link
Member Author

/cc @alculquicondor

@sanposhiho sanposhiho marked this pull request as draft July 11, 2024 12:09
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 11, 2024
@sanposhiho
Copy link
Member Author

I found out I have to fix some tests. Just converted to WIP for now.

@sanposhiho sanposhiho force-pushed the backoff-preenqueue branch from e4454a6 to 9da01d0 Compare July 12, 2024 03:11
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 12, 2024
@sanposhiho sanposhiho marked this pull request as ready for review July 12, 2024 03:12
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9f6ecea95badeb34063e338f2136bb6a905b4cd0

@sanposhiho
Copy link
Member Author

/assign @alculquicondor
for approval

@sanposhiho
Copy link
Member Author

Looks like Aldo is on vacation.
/assign @kerthcet

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 14, 2024
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 14, 2024
@sanposhiho
Copy link
Member Author

/cc @macsko @alculquicondor

Just fixed the conflict.

@macsko
Copy link
Member

macsko commented Aug 14, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 14, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 6aa2aea7fee9c6b4a12019a9b4be80d5b8432268

pkg/scheduler/internal/queue/scheduling_queue_test.go Outdated Show resolved Hide resolved
@@ -3089,6 +3106,7 @@ scheduler_plugin_execution_duration_seconds_count{extension_point="PreEnqueue",p
for _, test := range tests {
t.Run(test.name, func(t *testing.T) {
resetMetrics()
resetPodInfos()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is increasing attempts in this case?

Could we recreate podInfos for every case, instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of test.operations (addPodUnschedulablePods) do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we recreate podInfos for every case, instead?

Given all test.operands referpInfos / pInfosWithDelay like the following, that'd require a big change in the test implementation, which I want to avoid (at least, in this PR)

			operands: [][]*framework.QueuedPodInfo{
				pInfos[:30], // Evern test case refers to the same pInfos. 
				pInfos[30:],
			},

@@ -1461,6 +1458,12 @@ func (p *PriorityQueue) getBackoffTime(podInfo *framework.QueuedPodInfo) time.Ti
// calculateBackoffDuration is a helper function for calculating the backoffDuration
// based on the number of attempts the pod has made.
func (p *PriorityQueue) calculateBackoffDuration(podInfo *framework.QueuedPodInfo) time.Duration {
if podInfo.Attempts == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the name of the test for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QueueHintFunction is called when Pod is gated by a plugin other than SchedulingGate test case in TestPriorityQueue_MoveAllToActiveOrBackoffQueueWithQueueingHint ensures that the Pod with zero attempt doesn't get backoff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the hints are disabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test isn't related to the feature gate.
It's just that when the feature gate is disabled we don't accept the queueing hint from the plugin, but still use the default queueing hint.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 20, 2024
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 27, 2024
@sanposhiho
Copy link
Member Author

@alculquicondor Updated based on your point.

@sanposhiho
Copy link
Member Author

/retest

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 27, 2024
Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 28, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5189157ac5ac5fbfac5a470f15e8e97efd6353a2

@alculquicondor
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 28, 2024
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit 59051eb into kubernetes:master Aug 28, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pods gated by custom PreEnqueue plugins don't go through backoffQ even in case they ought to
6 participants