Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not consider priorities when sorting workloads from different ClusterQueues #1283

Closed
3 tasks done
Tracked by #1269
alculquicondor opened this issue Oct 27, 2023 · 9 comments · Fixed by #1597
Closed
3 tasks done
Tracked by #1269
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@alculquicondor
Copy link
Contributor

What would you like to be added:

Use priorities to sort workloads within a ClusterQueue, but ignore the priorities when sorting the heads of multiple cluster queues within a cohort.

The design needs to avoid the race condition presented here #1024

Why is this needed:

In organizations where teams do not know each other, they might be incentivized to use higher priorities to always be ahead of the rest. Ignoring priorities across ClusterQueues would remove this incentive, while allowing users to use priorities within their ClusterQueue.

Maybe this calls for a setting in a new Cohort object.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@alculquicondor alculquicondor added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 27, 2023
@yaroslava-serdiuk
Copy link
Contributor

/assign

@yaroslava-serdiuk
Copy link
Contributor

Having that StrictFIFO is actually a priority queueing, I couldn't reproduce the bug that was described in #1024. Maybe there was no bug but actually not desired behavior when low priority workload was admitted and it was preempted just in the next scheduling cycle and we wanted to avoid such scenario and schedule higher priority workload in the first place?

Anyway, my understanding is that we want to have the priority sorting in the scheduling loop as optional (enabled by default), so feature gate could be a solution here.

@alculquicondor
Copy link
Contributor Author

alculquicondor commented Nov 30, 2023

IIRC, the case was like this:
image

The important bit here is that the job X can fit by borrowing. But borrowing workloads are sorted last

// 1. Request under nominal quota.
aBorrows := a.assignment.Borrows()
bBorrows := b.assignment.Borrows()
if aBorrows != bBorrows {
return !aBorrows
}

But you are right, priorities shouldn't really matter, just the timing.

However, I think this was solved by #1039. Before that, the only option was to give jobs in Team-A-Standard a higher priority.

If we can prove that a higher priority is no longer necessary (through an integration test that does the above), then we can proceed with optionally disabling priority checks in the cohort.

@alculquicondor
Copy link
Contributor Author

alculquicondor commented Nov 30, 2023

The example above has some problems
but we can implement this much simpler case

image

workload #4 shouldn't be blocked.

@yaroslava-serdiuk
Copy link
Contributor

yaroslava-serdiuk commented Dec 5, 2023

I've added a test cases that you described #1283 (comment) in #1399 and it turns out that the pending workload is still blocking borrowing. Note, that priorities doesn't matter here, I putted the highest priority for borrowing workload. The reason is that the pending workload has "Preempt" FlavorAssignmentMode and has no borrowing, so in sorting it goes before any workload that requires borrowing and since it considered as Preempt, scheduler accounts for its resources in the cohort.

It's definitely a bug, however it's not related to disabling priority feature. So I've added a feature gate to a separate PR #1406 and will look further how to fix this bug.

@alculquicondor
Copy link
Contributor Author

alculquicondor commented Dec 5, 2023

For completeness when investigating a solution: in the following case, workload #3 should be blocked, because #2 (of size 2) is waiting to use its nominal quota.

image

See this comment:

// Preempt means that there is not enough unused min quota in the ClusterQueue
// or cohort. Preempting other workloads in the CluserQueue or cohort, or
// waiting for them to finish might make it possible to assign this flavor.
Preempt

I think a potential solution could be to change this logic

sum := cycleCohortsUsage.totalUsageForCommonFlavorResources(cq.Cohort.Name, e.assignment.Usage)
to only "use" (or block) up to the nominal quota.

@yaroslava-serdiuk
Copy link
Contributor

Probably adding minimum from workload resources and clusterQueue nominal resources that are left here will help

@tenzen-y
Copy link
Member

@yaroslava-serdiuk @alculquicondor I think this is almost done.

I guess that leaving task is adding docs for featureGate, PrioritySortingWithinCohort?
https://github.com/kubernetes-sigs/kueue/blob/main/site/content/en/docs/installation/_index.md#change-the-feature-gates-configuration

@alculquicondor
Copy link
Contributor Author

yes, we should document this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants