Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix preemption algorithm to reduce the number of preemptions #1979

Merged
merged 6 commits into from
Apr 15, 2024

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Apr 12, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #1974

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 12, 2024
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 12, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Apr 12, 2024

/cc @alculquicondor

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 12, 2024
Copy link

netlify bot commented Apr 12, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 7ca5892
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/661d3a429236bb0008c1e581

@mimowo mimowo changed the title [WIP] Fix preemption algorithm to reduce the number of preemptions Fix preemption algorithm to reduce the number of preemptions Apr 12, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 12, 2024
pkg/scheduler/preemption/preemption.go Show resolved Hide resolved
pkg/scheduler/scheduler_test.go Outdated Show resolved Hide resolved
pkg/scheduler/scheduler_test.go Show resolved Hide resolved
if isQueueExhaustedForAllRequestedFlavors(&wl, assignment, cq) {
return minimalPreemptions(&wl, assignment, snapshot, resPerFlv, sameQueueCandidates, true, nil)
}

targets := minimalPreemptions(&wl, assignment, snapshot, resPerFlv, candidates, false, nil)
if len(targets) == 0 {
Copy link
Contributor

@alculquicondor alculquicondor Apr 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we still reaching this branch with any of the test cases? In particular, for reclaimWithinCohort: Any.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to remove it.

Copy link
Contributor Author

@mimowo mimowo Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have a failing unit test in preemption_test.go: "preempting locally and borrowing other resources in cohort, without cohort candidates" (yes it is using reclaimWithinCohort: LowerPriority, but I checked that it also fails if we change reclaimWithinCohort to Any.

I would suggest keeping it (at least in this PR, and probably its removal would require a feature gate).

However, your comment prompted me to think of a cleaner code structure, so that return minimalPreemptions(wlReq, cq, assignment, snapshot, resPerFlv, sameQueueCandidates, true, nil) appears only once in code. PTAL.

@mimowo mimowo force-pushed the reduce-preemptions branch 2 times, most recently from f543566 to 1d42262 Compare April 15, 2024 07:39
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 15, 2024
pkg/scheduler/preemption/preemption_test.go Outdated Show resolved Hide resolved
pkg/scheduler/preemption/preemption_test.go Outdated Show resolved Hide resolved
pkg/scheduler/preemption/preemption_test.go Outdated Show resolved Hide resolved
pkg/scheduler/scheduler_test.go Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 15, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 15, 2024
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 15, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 3056eeffb692abb2029f27bed48374df8e635da4

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alculquicondor
Copy link
Contributor

/release-note-edit

Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible.

@k8s-ci-robot
Copy link
Contributor

@alculquicondor: /release-note-edit must be used with a release note block.

In response to this:

/release-note-edit

Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alculquicondor
Copy link
Contributor

/release-note-edit

Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible.

@alculquicondor
Copy link
Contributor

/cherry-pick release-0.6

@k8s-infra-cherrypick-robot
Copy link
Contributor

@alculquicondor: once the present PR merges, I will cherry-pick it on top of release-0.6 in a new PR and assign it to you.

In response to this:

/cherry-pick release-0.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot merged commit 0625969 into kubernetes-sigs:main Apr 15, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.7 milestone Apr 15, 2024
@k8s-infra-cherrypick-robot
Copy link
Contributor

@alculquicondor: #1979 failed to apply on top of branch "release-0.6":

Applying: Fix preemption algorithm to reduce the number of preemptions
Using index info to reconstruct a base tree...
M	pkg/scheduler/preemption/preemption.go
M	pkg/scheduler/scheduler_test.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/scheduler/scheduler_test.go
CONFLICT (content): Merge conflict in pkg/scheduler/scheduler_test.go
Auto-merging pkg/scheduler/preemption/preemption.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Fix preemption algorithm to reduce the number of preemptions
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-0.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vsoch pushed a commit to researchapps/kueue that referenced this pull request Apr 18, 2024
…tes-sigs#1979)

* Fix preemption algorithm to reduce the number of preemptions

* review

* Update pkg/scheduler/preemption/preemption_test.go

Co-authored-by: Aldo Culquicondor <[email protected]>

* Update pkg/scheduler/preemption/preemption_test.go

Co-authored-by: Aldo Culquicondor <[email protected]>

* Update pkg/scheduler/preemption/preemption_test.go

Co-authored-by: Aldo Culquicondor <[email protected]>

* remarks2

---------

Co-authored-by: Aldo Culquicondor <[email protected]>
@mimowo mimowo deleted the reduce-preemptions branch May 29, 2024 14:53
kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Nov 19, 2024
…tes-sigs#1979)

* Fix preemption algorithm to reduce the number of preemptions

* review

* Update pkg/scheduler/preemption/preemption_test.go

Co-authored-by: Aldo Culquicondor <[email protected]>

* Update pkg/scheduler/preemption/preemption_test.go

Co-authored-by: Aldo Culquicondor <[email protected]>

* Update pkg/scheduler/preemption/preemption_test.go

Co-authored-by: Aldo Culquicondor <[email protected]>

* remarks2

---------

Co-authored-by: Aldo Culquicondor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Too many preemptions within ClusterQueue when already above nominal quota
4 participants