support for job queue information in submission and query tools #4302

grondo · 2022-04-25T21:33:41Z

This was from a comment in #4143

Ok, it does appear that sched.queue is part of the RFC 27 Alloc annotate definition. Therefore, in flux jobs or job-list we can add an option to specifically filter jobs by this annotation key.

However, any requested queue is currently encoded in jobspec under the opaque, scheduler-specific attributes.system.scheduler. key, and therefore in order to request a specific queue, users will have to do something like

$ flux mini batch --setattr=system.scheduler.queue=NAME ...
which seems user unfriendly.

Additionally, all jobs enqueued which have not yet had an alloc request sent to the scheduler will, by definition, not have any scheduler annotations, and thus the queue will be unknown, and flux jobs will not be able to filter by queue for these jobs.

Because of these two issues, perhaps we should elevate the requested queue to a defined RFC 14 and/or RFC 25 jobspec property, so that our core utilities can be made aware of the requested queue. Schedulers which do not support a queue (like sched-simple) would reject the job during feasibility checks if it was submitted with a queue name other than a defined default (which I propose be called "default"). Utilities that process jobspec, such as job-list and flux-jobs could then directly support the queue name in a straightforward manner, and the queue would be available before annotations are made to the job.

Opening this issue to track the fact that we need to add support in flux-core for the configuration, submission of jobs, and querying of queues.

grondo · 2022-04-26T14:45:42Z

@garlick also made the point that supported queue names should probably be supported in the main flux-core config TOML.

This would not only allow validation of queue name on job submission, but also flux-core could support a common set of queue parameters, especially queue limits, which currently I believe are supported only via the flux-accounting multi-factor priority plugin. Limits like maximum number of nodes and duration could be directly supported in the TOML configuration, leaving the responsibility of the priority plugin to calculate priority. Of course, the mf_priority plugin could still reject jobs based on its internal configuration (e.g. for limits set via user/bank combination).

Scheduler-specific configuration of queues could occur within the scheduler TOML config namespace, but we might want to encourage the scheduler to also read the main queue TOML config, and verify that the queue names it has configured match what is in the "main" configuration, as well as any applicable generic configuration values. (I'm sure this will be discussed in the upcoming RFC)

cmoussa1 · 2022-04-26T16:01:14Z

Thanks for starting this discussion @grondo!

Limits like maximum number of nodes and duration could be directly supported in the TOML configuration, leaving the responsibility of the priority plugin to calculate priority.

I like this idea. I'll go ahead and add that the limits to be enforced on a per-queue basis that @ryanday36 specified were minimum nodes per-job, maximum nodes per-job, and max time per-job. These are not currently enforced in the multi-factor priority plugin, but they are stored there in an internal map.

std::map<std::string, struct queue_info> queues;

Please correct me if I am wrong, but I think this would also enable flux-accounting to simply read and store the queue configuration information from the TOML configuration file without requiring an admin or scheduler operator to have to also define these queues in the flux-accounting DB directly (i.e using commands like add-queue, edit-queue). Perhaps flux-accounting could be responsible for periodically fetching information from this TOML file and making sure its database is up-to-date with the latest information.

grondo · 2022-04-26T16:11:41Z

Please correct me if I am wrong, but I think this would also enable flux-accounting to simply read and store the queue configuration information from the TOML configuration file without requiring an admin or scheduler operator to have to also define these queues in the flux-accounting DB directly

Yes, that is the idea. I think we could also move the enforcement of these limits out of flux-accounting and into a plugin provided by flux-core. More complex limits, e.g. that require knowledge of the user/bank combo of a job, could still be enforced by flux-accounting when necessary.

garlick · 2022-07-27T16:51:49Z

Closing this issue which predates the design work in RFC 33 and subsequent implementation. We can open separate bugs for remaining stuff.

grondo mentioned this issue May 2, 2022

job manager needs new interface for job "limits" #4309

Open

ryanday36 added this to TOSS4 system instance tracking Jun 3, 2022

garlick closed this as completed Jul 27, 2022

garlick moved this to Done in TOSS4 system instance tracking Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for job queue information in submission and query tools #4302

support for job queue information in submission and query tools #4302

grondo commented Apr 25, 2022

grondo commented Apr 26, 2022

cmoussa1 commented Apr 26, 2022

grondo commented Apr 26, 2022

garlick commented Jul 27, 2022

support for job queue information in submission and query tools #4302

support for job queue information in submission and query tools #4302

Comments

grondo commented Apr 25, 2022

grondo commented Apr 26, 2022

cmoussa1 commented Apr 26, 2022

grondo commented Apr 26, 2022

garlick commented Jul 27, 2022