Fix preemption blocked by earlier pending Workload #1866

alculquicondor · 2024-03-18T19:22:07Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

The previous check didn't allow any preemptions if the workload has any common resources with pending workloads that were sorted first. In other words, a preemption only was permitted for the first workload in the list for each cohort.

By definition, preempting workloads don't fit in the cohort. Here I propose that, by cohort, we just allow one preemption per cycle if there wasn't already an admission in the same cohort. This could be enhanced with a more comprehensive approach.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

I propose a follow up: #1867

However, the fix in this PR is a better candidate for backport.

Does this PR introduce a user-facing change?

Fix preemption to reclaim quota that is blocked by an earlier pending Workload from another ClusterQueue in the same cohort.

k8s-ci-robot · 2024-03-18T19:22:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/OWNERS~~ [alculquicondor]
~~test/OWNERS~~ [alculquicondor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alculquicondor · 2024-03-18T19:22:18Z

/assign @yaroslava-serdiuk

netlify · 2024-03-18T19:22:25Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`c2eeb3c`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/65f9c0a61402c200089ceebe

yaroslava-serdiuk · 2024-03-19T12:56:26Z

pkg/scheduler/scheduler.go

@@ -241,6 +245,9 @@ func (s *Scheduler) schedule(ctx context.Context) {
 					e.inadmissibleMsg += fmt.Sprintf(". Pending the preemption of %d workload(s)", preempted)
 					e.requeueReason = queue.RequeueReasonPendingPreemption
 				}
+				if cq.Cohort != nil {


What if the cohort is nil? I mean the CQ is standalone. The workload can still preempt another workload in this CQ, right?

We only try to admit one workload per CQ in a cycle (because we just take the heads). So we don't need to prevent any other admissions in this case.

alculquicondor · 2024-03-19T14:49:34Z

pkg/scheduler/scheduler.go

@@ -262,6 +269,9 @@ func (s *Scheduler) schedule(ctx context.Context) {
 		if err := s.admit(ctx, e, cq.AdmissionChecks); err != nil {
 			e.inadmissibleMsg = fmt.Sprintf("Failed to admit workload: %v", err)
 		}
+		if cq.Cohort != nil {
+			cycleCohortsSkipPreemption.Insert(cq.Cohort.Name)


Because if there is a workload admitted during the cycle, then the preemption calculations are no longer valid.

yaroslava-serdiuk · 2024-03-19T12:58:32Z

pkg/scheduler/scheduler_test.go

@@ -716,6 +717,61 @@ func TestSchedule(t *testing.T) {
 				"eng-alpha/borrower": *utiltesting.MakeAdmission("eng-alpha").Assignment(corev1.ResourceCPU, "on-demand", "60").Obj(),
 			},
 		},
+		"multiple CQs need preemption": {
+			focus: true,


This is a leftover, right?

Yes. Deleted.

yaroslava-serdiuk · 2024-03-19T15:50:45Z

test/integration/scheduler/preemption_test.go

+				Request(corev1.ResourceCPU, "1").
+				Obj()
+			gomega.Expect(k8sClient.Create(ctx, preemptorBetaWl)).To(gomega.Succeed())
+			util.ExpectWorkloadsToBePreempted(ctx, k8sClient, useAllAlphaWl)


useAllAlphaWl has higher priority than preemptorBetaWl, why it's preempted?

Because it's going over the nominal quota.

yaroslava-serdiuk · 2024-03-19T15:51:02Z

test/integration/scheduler/preemption_test.go

+			util.ExpectWorkloadsToBePreempted(ctx, k8sClient, useAllAlphaWl)
+			util.FinishEvictionForWorkloads(ctx, k8sClient, useAllAlphaWl)
+			util.ExpectWorkloadsToBeAdmitted(ctx, k8sClient, preemptorBetaWl)
+			//util.ExpectPendingWorkloadsMetric(alphaCQ, 2, 0)


Oops, fixed

yaroslava-serdiuk · 2024-03-19T16:40:11Z

/lgtm

k8s-ci-robot · 2024-03-19T16:40:16Z

LGTM label has been added.

Git tree hash: 8ee36029553220709ce07c2c91e45dbc2f428d9d

Change-Id: I69584dd95c57539e163067a4fb93cdb32fc57461

alculquicondor · 2024-03-19T16:54:05Z

/cherry-pick release-0.6

k8s-infra-cherrypick-robot · 2024-03-19T16:54:08Z

@alculquicondor: once the present PR merges, I will cherry-pick it on top of release-0.6 in a new PR and assign it to you.

In response to this:

/cherry-pick release-0.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-infra-cherrypick-robot · 2024-03-19T17:03:03Z

@alculquicondor: new pull request created: #1868

In response to this:

/cherry-pick release-0.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

) Change-Id: I69584dd95c57539e163067a4fb93cdb32fc57461

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Mar 18, 2024

k8s-ci-robot requested review from denkensk and trasc March 18, 2024 19:22

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 18, 2024

k8s-ci-robot assigned yaroslava-serdiuk Mar 18, 2024

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 18, 2024

alculquicondor mentioned this pull request Mar 18, 2024

Rework implementation for multiple admissions per cycle #1867

Closed

alculquicondor force-pushed the blocked-preemption branch 3 times, most recently from 257c688 to b0a1338 Compare March 18, 2024 20:21

yaroslava-serdiuk reviewed Mar 19, 2024

View reviewed changes

alculquicondor force-pushed the blocked-preemption branch from 035a31b to 2a13185 Compare March 19, 2024 16:32

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 19, 2024

Fix preemption blocked by earlier pending Workload

c2eeb3c

Change-Id: I69584dd95c57539e163067a4fb93cdb32fc57461

alculquicondor force-pushed the blocked-preemption branch from 2a13185 to c2eeb3c Compare March 19, 2024 16:43

k8s-ci-robot merged commit ba46285 into kubernetes-sigs:main Mar 19, 2024
14 checks passed

k8s-ci-robot added this to the v0.7 milestone Mar 19, 2024

k8s-infra-cherrypick-robot mentioned this pull request Mar 19, 2024

[release-0.6] Fix preemption blocked by earlier pending Workload #1868

Merged

vsoch pushed a commit to researchapps/kueue that referenced this pull request Apr 18, 2024

Fix preemption blocked by earlier pending Workload (kubernetes-sigs#1866

dfddc64

) Change-Id: I69584dd95c57539e163067a4fb93cdb32fc57461

kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Nov 19, 2024

Fix preemption blocked by earlier pending Workload (kubernetes-sigs#1866

ab8796a

) Change-Id: I69584dd95c57539e163067a4fb93cdb32fc57461

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix preemption blocked by earlier pending Workload #1866

Fix preemption blocked by earlier pending Workload #1866

alculquicondor commented Mar 18, 2024 •

edited

Loading

k8s-ci-robot commented Mar 18, 2024

alculquicondor commented Mar 18, 2024

netlify bot commented Mar 18, 2024 •

edited

Loading

yaroslava-serdiuk Mar 19, 2024

alculquicondor Mar 19, 2024

This comment was marked as resolved.

alculquicondor Mar 19, 2024

yaroslava-serdiuk Mar 19, 2024

alculquicondor Mar 19, 2024

yaroslava-serdiuk Mar 19, 2024

alculquicondor Mar 19, 2024

yaroslava-serdiuk Mar 19, 2024

alculquicondor Mar 19, 2024

yaroslava-serdiuk commented Mar 19, 2024

k8s-ci-robot commented Mar 19, 2024

alculquicondor commented Mar 19, 2024

k8s-infra-cherrypick-robot commented Mar 19, 2024

k8s-infra-cherrypick-robot commented Mar 19, 2024

Fix preemption blocked by earlier pending Workload #1866

Fix preemption blocked by earlier pending Workload #1866

Conversation

alculquicondor commented Mar 18, 2024 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

k8s-ci-robot commented Mar 18, 2024

alculquicondor commented Mar 18, 2024

netlify bot commented Mar 18, 2024 • edited Loading

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaroslava-serdiuk commented Mar 19, 2024

k8s-ci-robot commented Mar 19, 2024

alculquicondor commented Mar 19, 2024

k8s-infra-cherrypick-robot commented Mar 19, 2024

k8s-infra-cherrypick-robot commented Mar 19, 2024

alculquicondor commented Mar 18, 2024 •

edited

Loading

netlify bot commented Mar 18, 2024 •

edited

Loading