resources are not scheduled fairly among competing queues of job requests #939

garlick · 2022-05-16T22:16:28Z

Problem: when multiple queues are defined, fluxion schedules all jobs in the first queue before looking at the next queue. The queues are thus implicitly prioritized according to the order defined, with no available mechanisms to prevent starvation or ensure progress for all queues.

So if for example we define a debug queue and a batch queue, all jobs from the debug queue would be assigned resources before jobs in the batch queue. As long as there is a steady supply of jobs in the debug queue, nothing in the batch queue would run.

To meet short term goals outlined in flux-framework/flux-core#4306, we will need to employ resource constraints to ensure that queues are not in competition for the same resources.

Longer term, it would be neat to be able to fulfill say batch and debug queues from the same resource set without needing to designate certain nodes in a pool.

See also discussion in flux-framework/rfc#332

dongahn · 2022-05-16T23:47:54Z

Thanks @garlick. This captures our coffee discussion very well and thank you for opening up the issue to capture our discussion.

For the case we have multiple queues with overlapping resources, we will need more sophisticated cross-queue priority/fairness schemes.

It could be as simple as fixed queue priority scheme -- https://www.geeksforgeeks.org/multilevel-queue-mlq-cpu-scheduling/ (Although it won't work for complex cases)
max min fairness or dominant resource fairness which is a generalization of max min fairness
other Flux-specific custom schemes

At least, our software architecture iterates through queues in a central place, it should be very possible to extend the scheme to incorporate different cross-queue iteration order or other schemes. (e.g., C++ cmd containers are very flexible to make this kind of changes: https://github.com/flux-framework/flux-sched/blob/master/qmanager/modules/qmanager_callbacks.cpp#L393

Also, each queue can easily have per-queue specific control (e.g., queue-depth) which is also extensible, they can serve as devising/innovating Flux-specific cross-queue fairness for overlapping resources.

As we discussed, it would be best to capture "minimum viable in a near future based on the use cases" vs. future desires.

dongahn · 2022-05-16T23:51:47Z

Also, each queue can easily have per-queue specific control (e.g., queue-depth) which is also extensible, they can serve as devising/innovating Flux-specific cross-queue fairness for overlapping resources.

Maybe we could still do this through some per-queue limit as well: Schedule debug unless the resources it job uses go above 10% of resources etc.

dongahn · 2022-05-17T14:52:54Z

Using this ticket to collect more thoughts per flux-framework/rfc#332 (comment).

Two things that one can possibly make some incremental progress while observing the effects:

Assign a priority to each queue so that a queue with a higher priority is scheduled first in each schedule cycle (just matter of adding a custom comparator to std::map
Apply resource limit to queue (can be looked as we are coming up with a new configuration consistent
Define relative shares among queues and use that to provide max min fairness among queues

garlick mentioned this issue May 17, 2022

rfc33: add queues RFC flux-framework/rfc#332

Merged

garlick mentioned this issue Sep 20, 2022

ingest: set configured queue constraints flux-framework/flux-core#4587

Merged

2 tasks

jameshcorbett self-assigned this Oct 5, 2022

garlick mentioned this issue Jan 23, 2023

don't allow overlapping queues to be simultaneously started flux-framework/flux-core#4884

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resources are not scheduled fairly among competing queues of job requests #939

resources are not scheduled fairly among competing queues of job requests #939

garlick commented May 16, 2022 •

edited

Loading

dongahn commented May 16, 2022

dongahn commented May 16, 2022

dongahn commented May 17, 2022

resources are not scheduled fairly among competing queues of job requests #939

resources are not scheduled fairly among competing queues of job requests #939

Comments

garlick commented May 16, 2022 • edited Loading

dongahn commented May 16, 2022

dongahn commented May 16, 2022

dongahn commented May 17, 2022

garlick commented May 16, 2022 •

edited

Loading