-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resources are not scheduled fairly among competing queues of job requests #939
Comments
Thanks @garlick. This captures our coffee discussion very well and thank you for opening up the issue to capture our discussion. For the case we have multiple queues with overlapping resources, we will need more sophisticated cross-queue priority/fairness schemes.
At least, our software architecture iterates through queues in a central place, it should be very possible to extend the scheme to incorporate different cross-queue iteration order or other schemes. (e.g., C++ cmd containers are very flexible to make this kind of changes: https://github.com/flux-framework/flux-sched/blob/master/qmanager/modules/qmanager_callbacks.cpp#L393 Also, each queue can easily have per-queue specific control (e.g., queue-depth) which is also extensible, they can serve as devising/innovating Flux-specific cross-queue fairness for overlapping resources. As we discussed, it would be best to capture "minimum viable in a near future based on the use cases" vs. future desires. |
Maybe we could still do this through some per-queue limit as well: Schedule debug unless the resources it job uses go above 10% of resources etc. |
Using this ticket to collect more thoughts per flux-framework/rfc#332 (comment). Two things that one can possibly make some incremental progress while observing the effects:
|
Problem: when multiple queues are defined, fluxion schedules all jobs in the first queue before looking at the next queue. The queues are thus implicitly prioritized according to the order defined, with no available mechanisms to prevent starvation or ensure progress for all queues.
So if for example we define a debug queue and a batch queue, all jobs from the debug queue would be assigned resources before jobs in the batch queue. As long as there is a steady supply of jobs in the debug queue, nothing in the batch queue would run.
To meet short term goals outlined in flux-framework/flux-core#4306, we will need to employ resource constraints to ensure that queues are not in competition for the same resources.
Longer term, it would be neat to be able to fulfill say batch and debug queues from the same resource set without needing to designate certain nodes in a pool.
See also discussion in flux-framework/rfc#332
The text was updated successfully, but these errors were encountered: