Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for job queue information in submission and query tools #4302

Closed
grondo opened this issue Apr 25, 2022 · 4 comments
Closed

support for job queue information in submission and query tools #4302

grondo opened this issue Apr 25, 2022 · 4 comments

Comments

@grondo
Copy link
Contributor

grondo commented Apr 25, 2022

This was from a comment in #4143

Ok, it does appear that sched.queue is part of the RFC 27 Alloc annotate definition. Therefore, in flux jobs or job-list we can add an option to specifically filter jobs by this annotation key.

However, any requested queue is currently encoded in jobspec under the opaque, scheduler-specific attributes.system.scheduler. key, and therefore in order to request a specific queue, users will have to do something like

$ flux mini batch --setattr=system.scheduler.queue=NAME ...
which seems user unfriendly.

Additionally, all jobs enqueued which have not yet had an alloc request sent to the scheduler will, by definition, not have any scheduler annotations, and thus the queue will be unknown, and flux jobs will not be able to filter by queue for these jobs.

Because of these two issues, perhaps we should elevate the requested queue to a defined RFC 14 and/or RFC 25 jobspec property, so that our core utilities can be made aware of the requested queue. Schedulers which do not support a queue (like sched-simple) would reject the job during feasibility checks if it was submitted with a queue name other than a defined default (which I propose be called "default"). Utilities that process jobspec, such as job-list and flux-jobs could then directly support the queue name in a straightforward manner, and the queue would be available before annotations are made to the job.

Opening this issue to track the fact that we need to add support in flux-core for the configuration, submission of jobs, and querying of queues.

@grondo
Copy link
Contributor Author

grondo commented Apr 26, 2022

@garlick also made the point that supported queue names should probably be supported in the main flux-core config TOML.

This would not only allow validation of queue name on job submission, but also flux-core could support a common set of queue parameters, especially queue limits, which currently I believe are supported only via the flux-accounting multi-factor priority plugin. Limits like maximum number of nodes and duration could be directly supported in the TOML configuration, leaving the responsibility of the priority plugin to calculate priority. Of course, the mf_priority plugin could still reject jobs based on its internal configuration (e.g. for limits set via user/bank combination).

Scheduler-specific configuration of queues could occur within the scheduler TOML config namespace, but we might want to encourage the scheduler to also read the main queue TOML config, and verify that the queue names it has configured match what is in the "main" configuration, as well as any applicable generic configuration values. (I'm sure this will be discussed in the upcoming RFC)

@cmoussa1
Copy link
Member

Thanks for starting this discussion @grondo!

Limits like maximum number of nodes and duration could be directly supported in the TOML configuration, leaving the responsibility of the priority plugin to calculate priority.

I like this idea. I'll go ahead and add that the limits to be enforced on a per-queue basis that @ryanday36 specified were minimum nodes per-job, maximum nodes per-job, and max time per-job. These are not currently enforced in the multi-factor priority plugin, but they are stored there in an internal map.

std::map<std::string, struct queue_info> queues;

Please correct me if I am wrong, but I think this would also enable flux-accounting to simply read and store the queue configuration information from the TOML configuration file without requiring an admin or scheduler operator to have to also define these queues in the flux-accounting DB directly (i.e using commands like add-queue, edit-queue). Perhaps flux-accounting could be responsible for periodically fetching information from this TOML file and making sure its database is up-to-date with the latest information.

@grondo
Copy link
Contributor Author

grondo commented Apr 26, 2022

Please correct me if I am wrong, but I think this would also enable flux-accounting to simply read and store the queue configuration information from the TOML configuration file without requiring an admin or scheduler operator to have to also define these queues in the flux-accounting DB directly

Yes, that is the idea. I think we could also move the enforcement of these limits out of flux-accounting and into a plugin provided by flux-core. More complex limits, e.g. that require knowledge of the user/bank combo of a job, could still be enforced by flux-accounting when necessary.

@garlick
Copy link
Member

garlick commented Jul 27, 2022

Closing this issue which predates the design work in RFC 33 and subsequent implementation. We can open separate bugs for remaining stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants