-
Notifications
You must be signed in to change notification settings - Fork 40k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler: impose a backoff penalty on gated Pods #126029
Conversation
Skipping CI for Draft Pull Request. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold to go thru an approver. |
/cc @alculquicondor |
I found out I have to fix some tests. Just converted to WIP for now. |
e4454a6
to
9da01d0
Compare
LGTM label has been added. Git tree hash: 9f6ecea95badeb34063e338f2136bb6a905b4cd0
|
/assign @alculquicondor |
Looks like Aldo is on vacation. |
9da01d0
to
2f69be5
Compare
Just fixed the conflict. |
/lgtm |
LGTM label has been added. Git tree hash: 6aa2aea7fee9c6b4a12019a9b4be80d5b8432268
|
@@ -3089,6 +3106,7 @@ scheduler_plugin_execution_duration_seconds_count{extension_point="PreEnqueue",p | |||
for _, test := range tests { | |||
t.Run(test.name, func(t *testing.T) { | |||
resetMetrics() | |||
resetPodInfos() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is increasing attempts in this case?
Could we recreate podInfos for every case, instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of test.operations
(addPodUnschedulablePods
) do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we recreate podInfos for every case, instead?
Given all test.operands
referpInfos
/ pInfosWithDelay
like the following, that'd require a big change in the test implementation, which I want to avoid (at least, in this PR)
operands: [][]*framework.QueuedPodInfo{
pInfos[:30], // Evern test case refers to the same pInfos.
pInfos[30:],
},
@@ -1461,6 +1458,12 @@ func (p *PriorityQueue) getBackoffTime(podInfo *framework.QueuedPodInfo) time.Ti | |||
// calculateBackoffDuration is a helper function for calculating the backoffDuration | |||
// based on the number of attempts the pod has made. | |||
func (p *PriorityQueue) calculateBackoffDuration(podInfo *framework.QueuedPodInfo) time.Duration { | |||
if podInfo.Attempts == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the name of the test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QueueHintFunction is called when Pod is gated by a plugin other than SchedulingGate
test case in TestPriorityQueue_MoveAllToActiveOrBackoffQueueWithQueueingHint
ensures that the Pod with zero attempt doesn't get backoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the hints are disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test isn't related to the feature gate.
It's just that when the feature gate is disabled we don't accept the queueing hint from the plugin, but still use the default queueing hint.
2f69be5
to
b5a1569
Compare
@alculquicondor Updated based on your point. |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
LGTM label has been added. Git tree hash: 5189157ac5ac5fbfac5a470f15e8e97efd6353a2
|
/hold cancel |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Starting the story from the concept of backoff in the scheduler; a backoff time in the scheduler is a penalty that we impose on Pods when they consume a scheduling cycle, but they didn't get scheduled and came back to the queue.
But, currently all gated Pods are always regarded as not backing off.
That is only correct for a vanilla scheduler because all Pods gated by SchedulingGates haven't experienced any scheduling and thus are not backing off for sure.
A custom PreEnqueue plugin might gate Pods after they experience some scheduling cycles; those mean -
Regardless of whether a Pod is gated or not, they are supposed to get a penalty if they wasted some scheduling cycles before.
It's the law in the scheduler; it's their obligation that they must meet before retrying a schedule again.
This PR changes
isPodBackingoff()
not to skip gated Pods so that we can prevent such Pods from exploiting loopholes, ignoring the law, and escaping a penalty.Which issue(s) this PR fixes:
Fixes #125538
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: