-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix preemption blocked by earlier pending Workload #1866
Fix preemption blocked by earlier pending Workload #1866
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @yaroslava-serdiuk |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
257c688
to
b0a1338
Compare
@@ -241,6 +245,9 @@ func (s *Scheduler) schedule(ctx context.Context) { | |||
e.inadmissibleMsg += fmt.Sprintf(". Pending the preemption of %d workload(s)", preempted) | |||
e.requeueReason = queue.RequeueReasonPendingPreemption | |||
} | |||
if cq.Cohort != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the cohort is nil? I mean the CQ is standalone. The workload can still preempt another workload in this CQ, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only try to admit one workload per CQ in a cycle (because we just take the heads). So we don't need to prevent any other admissions in this case.
@@ -262,6 +269,9 @@ func (s *Scheduler) schedule(ctx context.Context) { | |||
if err := s.admit(ctx, e, cq.AdmissionChecks); err != nil { | |||
e.inadmissibleMsg = fmt.Sprintf("Failed to admit workload: %v", err) | |||
} | |||
if cq.Cohort != nil { | |||
cycleCohortsSkipPreemption.Insert(cq.Cohort.Name) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if there is a workload admitted during the cycle, then the preemption calculations are no longer valid.
pkg/scheduler/scheduler_test.go
Outdated
@@ -716,6 +717,61 @@ func TestSchedule(t *testing.T) { | |||
"eng-alpha/borrower": *utiltesting.MakeAdmission("eng-alpha").Assignment(corev1.ResourceCPU, "on-demand", "60").Obj(), | |||
}, | |||
}, | |||
"multiple CQs need preemption": { | |||
focus: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a leftover, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Deleted.
Request(corev1.ResourceCPU, "1"). | ||
Obj() | ||
gomega.Expect(k8sClient.Create(ctx, preemptorBetaWl)).To(gomega.Succeed()) | ||
util.ExpectWorkloadsToBePreempted(ctx, k8sClient, useAllAlphaWl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useAllAlphaWl has higher priority than preemptorBetaWl, why it's preempted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's going over the nominal quota.
util.ExpectWorkloadsToBePreempted(ctx, k8sClient, useAllAlphaWl) | ||
util.FinishEvictionForWorkloads(ctx, k8sClient, useAllAlphaWl) | ||
util.ExpectWorkloadsToBeAdmitted(ctx, k8sClient, preemptorBetaWl) | ||
//util.ExpectPendingWorkloadsMetric(alphaCQ, 2, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, fixed
035a31b
to
2a13185
Compare
/lgtm |
LGTM label has been added. Git tree hash: 8ee36029553220709ce07c2c91e45dbc2f428d9d
|
Change-Id: I69584dd95c57539e163067a4fb93cdb32fc57461
2a13185
to
c2eeb3c
Compare
/cherry-pick release-0.6 |
@alculquicondor: once the present PR merges, I will cherry-pick it on top of release-0.6 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@alculquicondor: new pull request created: #1868 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
The previous check didn't allow any preemptions if the workload has any common resources with pending workloads that were sorted first. In other words, a preemption only was permitted for the first workload in the list for each cohort.
By definition, preempting workloads don't fit in the cohort. Here I propose that, by cohort, we just allow one preemption per cycle if there wasn't already an admission in the same cohort. This could be enhanced with a more comprehensive approach.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
I propose a follow up: #1867
However, the fix in this PR is a better candidate for backport.
Does this PR introduce a user-facing change?