-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reject jobs submitted to a named queue when none are configured #4627
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4627 +/- ##
==========================================
- Coverage 83.37% 83.34% -0.03%
==========================================
Files 411 412 +1
Lines 68776 69014 +238
==========================================
+ Hits 57342 57523 +181
- Misses 11434 11491 +57
|
Once flux-framework/flux-accounting#278 is merged, I think CI will be happy, so dropping the WIP and this is ready for a review. Edit: I was sort of thinking the new
Also: these command emit non error output to stderr. I could address that here if people feel that would be appropriate. |
I added in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. The only thing I struggled with was the choice to print that queues are disabled in flux-uptime only when all queues are disabled. However, the rationale is clear in the commit messages, and it makes sense to me.
Since this is still waiting on the flux accounting PR getting merged, one last ditch change I think might be a good idea is to require a
instead of
and inadvertently disable all queues with the reason "debug This is the reason". |
Another idea would be instead of |
The plugin currently rejects jobs which submit a queue that the plugin does not know about. The changes proposed in flux-framework/flux-core#4627 enable the job-manager to be more aware of queues and enstill queue validation. As a result of this, the plugin should relax some of its current enforcements on queue validation. Remove the job rejection on a queue that flux-accounting does not know about. Instead, add a default factor of 0 to the job through a preprocessor directive called UNKNOWN_QUEUE (changed from NO_SUCH_QUEUE). If an association submits a job to a queue that 1) flux-accounting knows about and 2) the user is not allowed to submit jobs to, the job can still be rejected. Remove the requirement for a default queue for jobs to be submitted to; instead, if no queue is specified, add a default factor of 0 to the job through a preprocessor directive called NO_QUEUE_SPECIFIED (changed from NO_DEFAULT_QUEUE). Add a comment to the queue_info struct to explain that the min_nodes_per_job, max_nodes_per_job, and max_time_per_job fields are not currently enforced in the plugin.
The plugin currently rejects jobs which submit a queue that the plugin does not know about. The changes proposed in flux-framework/flux-core#4627 enable the job-manager to be more aware of queues and enstill queue validation. As a result of this, the plugin should relax some of its current enforcements on queue validation. Change the preprocessor directive INVALID_QUEUE to UNKNOWN_QUEUE, an integer to act as a default factor of 0 when a queue is specified that flux-accounting does not know about. Change the preprocessor directive NO_DEFAULT_QUEUE to UNSPECIFIED_QUEUE, an integer to act as a default factor of 0 when no queue is specified when an association submits a job.
Problem: t2240-queue-cmd.t checks for very specific error messages in some tests, which makes the test brittle when messages change. Relax the tests so that verbatim output is not required, just the substantive portion of the message.
Problem: some tests submit jobs with queues that are not present in the TOML configuration, but when the job manager becomes queue-aware, this will no longer be possible. Add [queues] configuration to tests where appropriate.
Problem: several futures are created but not destroyed. Destroy futures.
Problem: several flux-queue subcommands accept -q,--queue=NAME, but others accept -q,--quiet, which could be come confusing. Drop -q as a short option for quiet. The short option is not used anywhere.
Problem: flux-queue sends non-error output to stderr. Stop using log_msg(), which uses stderr, to print non-error output. This means "flux-queue: " is dropped from those lines of output. Update tests.
Problem: there is no way to list queues or enable/disable job submission on a queue basis. Add a new class to the job manager which provides new interfaces for enabling, disabling, listing, and querying status of queues. Note that these queues are not containers for jobs. Jobs are still enqueued in one "alloc queue" even when there are multiple named queues configured. The purpose of this class (at this time) is to capture the queue configuration and the administrative status to be be utilized by the job submit logic. A future commit will wire this into the job submit logic.
Problem: jobs are accepted when submitted with a named queue when queues are not configured. Call queue_submit_check() on each submitted job. The job is rejected if a requested queue is not configured, or if the queue is disabled. As part of the integration, the job-manager.submit-admin RPC is moved temporarily to queue.c. It will be removed once tools have been updated to use the new RPCs. Fixes flux-framework#4440
Problem: flux-uptime uses the deprecated job-manager.submit-admin RPC. Use the job-manager.queue-status RPC instead. Since the purpose of this section of code is to add a warning to the output when the queue is disabled, and not to provide comprehensive queue status, the warning is now added only when all queues are disabled.
Problem: there is no way to list available queues, nor enable/disable individual queues. Add a -q,--queue option to the enable, disable, and status subcommands. If the queue option is unspecified and multiple named queues are configured, all queues are targetted (--all confirmation required for enable, disable). This allows the commands to function similar to the way they did before when there is only the anonymous queue, which is still the common case. Since flux queue status now lists the enable/disable status of all queues when none are specified, this is now a way for a user to find out what queues are available on the system. Update rc1 to use "flux queue disable --all" when putting the instance in safe mode. Fixes flux-framework#4620
Problem: the job-manager.submit-admin RPC no longer has any users. Drop deprecated RPC.
Problem: flux-uptime(1) now reports "queue disabled" only if all queues are disabled, but this is not documented. Change the notice description to include this fact.
Problem: flux-queue(1) does not document the command's limited support for multiple named queues. Describe the possibility of multiple queues. Document the -q,--queue=NAME option for enable, disable, and status. Reorder subcommands to group the multiqueue-enabled ones together. Document -a,--all.
Problem: the addition of named queue support to flux-queue has no test coverage. Add some new tests.
OK, just pushed changes to add a |
Looks good to me (and already approved!) |
Thanks! Setting MWP. |
The plugin currently rejects jobs which submit a queue that the plugin does not know about. The changes proposed in flux-framework/flux-core#4627 enable the job-manager to be more aware of queues and instill queue validation. As a result of this, the plugin should relax some of its current enforcements on queue validation. Change the preprocessor directive INVALID_QUEUE to UNKNOWN_QUEUE, an integer to act as a default factor of 0 when a queue is specified that flux-accounting does not know about. Change the preprocessor directive NO_DEFAULT_QUEUE to UNSPECIFIED_QUEUE, an integer to act as a default factor of 0 when no queue is specified when an association submits a job.
The plugin currently rejects jobs which submit a queue that the plugin does not know about. The changes proposed in flux-framework/flux-core#4627 enable the job-manager to be more aware of queues and instill queue validation. As a result of this, the plugin should relax some of its current enforcements on queue validation. Change the preprocessor directive NO_SUCH_QUEUE to UNKNOWN_QUEUE, an integer to act as a default factor of 0 when a queue is specified that flux-accounting does not know about. Change the preprocessor directive NO_DEFAULT_QUEUE to NO_QUEUE_SPECIFIED, an integer to act as a default factor of 0 when no queue is specified when an association submits a job.
The plugin currently rejects jobs which submit a queue that the plugin does not know about. The changes proposed in flux-framework/flux-core#4627 enable the job-manager to be more aware of queues and instill queue validation. As a result of this, the plugin should relax some of its current enforcements on queue validation. Change the preprocessor directive NO_SUCH_QUEUE to UNKNOWN_QUEUE, an integer to act as a default factor of 0 when a queue is specified that flux-accounting does not know about. Change the preprocessor directive NO_DEFAULT_QUEUE to NO_QUEUE_SPECIFIED, an integer to act as a default factor of 0 when no queue is specified when an association submits a job. Add comments to each preprocessor directive to explain more clearly what each directive represents and is used for.
Problem: a job that is submitted to a named queue is accepted by a flux instance that does not configure queues (such as a batch job).
This PR makes the job manager aware of the queue configuration, and adds a check at job submit time to ensure any queue specified in the jobspec matches a configured queue, e.g.
It also adds support to
flux queue
for enabling and disabling queues individually, and for listing the status of all queues:Posting this as a WIP pending some test development