Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish between Preemption due to reclamation and fair sharing #2411

Merged
merged 6 commits into from
Jul 10, 2024

Conversation

vladikkuzn
Copy link
Contributor

@vladikkuzn vladikkuzn commented Jun 14, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Introduces an additional reason in the Preemption condition to indicate that the preemption was due to fair sharing and not regular reclamation.

Which issue(s) this PR fixes:

Fixes #2404

Special notes for your reviewer:

Does this PR introduce a user-facing change?

More granular Preemption condition reasons: InClusterQueue, InCohortReclamation, InCohortFairSharing, InCohortReclaimWhileBorrowing

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 14, 2024
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 14, 2024
@vladikkuzn
Copy link
Contributor Author

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 14, 2024
Copy link

netlify bot commented Jun 14, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit f62b1bb
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/668e934828cd2f0008a238a5

@vladikkuzn
Copy link
Contributor Author

/assign @vladikkuzn @trasc

@vladikkuzn vladikkuzn force-pushed the detailed-preemption-condition branch from 45eee46 to abb03fe Compare June 17, 2024 00:10
@vladikkuzn vladikkuzn force-pushed the detailed-preemption-condition branch from abb03fe to 1223f92 Compare June 17, 2024 09:38
@vladikkuzn vladikkuzn force-pushed the detailed-preemption-condition branch from 1223f92 to 5e7d6b6 Compare June 19, 2024 12:52
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jun 19, 2024
@vladikkuzn
Copy link
Contributor Author

/cc @alculquicondor

pkg/scheduler/preemption/preemption.go Outdated Show resolved Hide resolved
pkg/scheduler/preemption/preemption.go Outdated Show resolved Hide resolved
@alculquicondor
Copy link
Contributor

/assign @gabesaba

* Rewrite to Targets
* Cover CohortFairSharing
@vladikkuzn vladikkuzn force-pushed the detailed-preemption-condition branch from 3d8d403 to 2f68cca Compare July 9, 2024 08:15
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 9, 2024
@vladikkuzn vladikkuzn force-pushed the detailed-preemption-condition branch from 2f68cca to d15223d Compare July 9, 2024 08:18
* Dedicated GetWorkloadReferences function
* Extract humanReadablePreemptionReasons
* !strategy && belowThreshold -> InCohortReclaimWhileBorrowingReason
* ExpectPreemptedCondition test util
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/hold for nits

I'll leave LGTM to @trasc

pkg/scheduler/preemption/preemption.go Outdated Show resolved Hide resolved
pkg/scheduler/preemption/preemption.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 9, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, vladikkuzn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 9, 2024
* Rename WL -> WorkloadInfo
* Move getWorkloadReferences to logging
@vladikkuzn vladikkuzn force-pushed the detailed-preemption-condition branch from 5977741 to 7327a1e Compare July 10, 2024 11:00
@vladikkuzn
Copy link
Contributor Author

/retest

Copy link
Contributor

@trasc trasc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only one NIT, otherwise LGTM

pkg/scheduler/logging.go Outdated Show resolved Hide resolved
@trasc
Copy link
Contributor

trasc commented Jul 10, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 10, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7b96a661745f64e63ec8b9cab6360cf63ea769ae

@vladikkuzn
Copy link
Contributor Author

/retest

@trasc
Copy link
Contributor

trasc commented Jul 10, 2024

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 10, 2024
@k8s-ci-robot k8s-ci-robot merged commit 5bca510 into kubernetes-sigs:main Jul 10, 2024
16 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.8 milestone Jul 10, 2024
log.V(3).Info("Preempted", "targetWorkload", klog.KObj(target.Obj), "reason", reason, "message", message)
p.recorder.Eventf(target.Obj, corev1.EventTypeNormal, "Preempted", message)
metrics.ReportEvictedWorkloads(target.ClusterQueue, kueue.WorkloadEvictedByPreemption)
log.V(3).Info("Preempted", "targetWorkload", klog.KObj(target.WorkloadInfo.Obj), "reason", target.Reason, "message", message)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also please add targetClusterQueue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@vladikkuzn vladikkuzn Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in #2538
WDYT should I also add preemptor's cluster queue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preemptor's information is already part of the log (it comes from the scheduler.go file, which adds the fields to the contextual log).

@vladikkuzn vladikkuzn deleted the detailed-preemption-condition branch July 11, 2024 09:20
// the function.
allowBorrowing = false
} else {
reason = InCohortReclaimWhileBorrowingReason
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug here.

If allowBorrowingBelowPriority is nil, then the reason should stay as InCohortReclamationReason.

Copy link
Contributor

@alculquicondor alculquicondor Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be more like

if allowBorrowingBelowPriority != nil {
  if priority.Priority(candWl.Obj) >= *allowBorrowingBelowPriority {
    allowBorrowing = false
  } else {
    reason = InCohortReclaimWhileBorrowingReason
  }
}

@alculquicondor
Copy link
Contributor

It looks like we forgot to change the release notes to match the review changes.

/release-note-edit

More granular Preemption condition reasons: InClusterQueue, InCohortReclamation, InCohortFairSharing, InCohortReclaimWhileBorrowing

kannon92 pushed a commit to openshift-kannon92/kubernetes-sigs-kueue that referenced this pull request Nov 19, 2024
…ubernetes-sigs#2411)

* Distinguish between Preemption due to reclamation and fair sharing

* Distinguish between Preemption due to reclamation and fair sharing

* extract getPreemptionReason
* borrowing in favour of reclamation

* Distinguish between Preemption due to reclamation and fair sharing

* Rewrite to Targets
* Cover CohortFairSharing

* Distinguish between Preemption due to reclamation and fair sharing

* Dedicated GetWorkloadReferences function
* Extract humanReadablePreemptionReasons
* !strategy && belowThreshold -> InCohortReclaimWhileBorrowingReason
* ExpectPreemptedCondition test util

* Distinguish between Preemption due to reclamation and fair sharing

* Rename WL -> WorkloadInfo
* Move getWorkloadReferences to logging

* Distinguish between Preemption due to reclamation and fair sharing

* Use slice in getWorkloadReferences
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distinguish between Preemption due to reclamation and fair sharing
6 participants