Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reject jobs submitted to a named queue when none are configured #4627

Merged
merged 13 commits into from
Oct 3, 2022

Conversation

garlick
Copy link
Member

@garlick garlick commented Sep 29, 2022

Problem: a job that is submitted to a named queue is accepted by a flux instance that does not configure queues (such as a batch job).

This PR makes the job manager aware of the queue configuration, and adds a check at job submit time to ensure any queue specified in the jobspec matches a configured queue, e.g.

$ flux start
ƒ(s=1,d=0) $ flux mini run -q batch hostname
flux-mini: ERROR: [Errno 22] 'batch' is not a valid queue

It also adds support to flux queue for enabling and disabling queues individually, and for listing the status of all queues:

$ flux queue status
flux-queue: batch: Job submission is enabled
flux-queue: debug: Job submission is enabled
flux-queue: Scheduling is enabled
$ sudo flux queue disable -q batch no batch for you
$ flux queue status
flux-queue: batch: Job submission is disabled: no batch for you
flux-queue: debug: Job submission is enabled
flux-queue: Scheduling is enabled
$ flux mini run -q debug hostname
picl6
$ flux mini run -q batch hostname
flux-mini: ERROR: [Errno 22] job submission to batch is disabled: no batch for you

Posting this as a WIP pending some test development

@codecov
Copy link

codecov bot commented Sep 30, 2022

Codecov Report

Merging #4627 (cfe5f8f) into master (99ecf1d) will decrease coverage by 0.02%.
The diff coverage is 73.61%.

❗ Current head cfe5f8f differs from pull request most recent head 5079805. Consider uploading reports for the commit 5079805 to get more accurate results

@@            Coverage Diff             @@
##           master    #4627      +/-   ##
==========================================
- Coverage   83.37%   83.34%   -0.03%     
==========================================
  Files         411      412       +1     
  Lines       68776    69014     +238     
==========================================
+ Hits        57342    57523     +181     
- Misses      11434    11491      +57     
Impacted Files Coverage Δ
src/modules/job-manager/job-manager.c 61.81% <50.00%> (-0.45%) ⬇️
src/modules/job-manager/queue.c 69.43% <69.43%> (ø)
src/cmd/builtin/uptime.c 88.43% <78.57%> (-1.57%) ⬇️
src/cmd/flux-queue.c 91.18% <84.74%> (-1.71%) ⬇️
src/modules/job-manager/submit.c 81.37% <100.00%> (-1.44%) ⬇️
src/common/libsubprocess/server.c 60.54% <0.00%> (-0.55%) ⬇️
src/shell/output.c 76.54% <0.00%> (-0.16%) ⬇️
src/broker/overlay.c 86.24% <0.00%> (+0.21%) ⬆️
src/common/libterminus/terminus.c 86.06% <0.00%> (+0.24%) ⬆️
src/modules/cron/cron.c 82.92% <0.00%> (+0.44%) ⬆️
... and 4 more

@garlick garlick changed the title WIP: reject jobs submitted to a named queue when none are configured reject jobs submitted to a named queue when none are configured Oct 1, 2022
@garlick
Copy link
Member Author

garlick commented Oct 1, 2022

Once flux-framework/flux-accounting#278 is merged, I think CI will be happy, so dropping the WIP and this is ready for a review.

Edit: I was sort of thinking the new flux queue status output could satisfy #4620 (no way to list queues) although I did not list that in any of the commits and wasn't sure if this was going to be quite enough. A separate subcommand could be added if needed. Thoughts?

$ flux queue status
flux-queue: batch: Job submission is enabled
flux-queue: debug: Job submission is enabled
flux-queue: Scheduling is enabled

Also: these command emit non error output to stderr. I could address that here if people feel that would be appropriate.

@garlick
Copy link
Member Author

garlick commented Oct 2, 2022

I added in the flux-queue fix for non-error output on stderr, refactored the commits somewhat, and added issue references.

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. The only thing I struggled with was the choice to print that queues are disabled in flux-uptime only when all queues are disabled. However, the rationale is clear in the commit messages, and it makes sense to me.

@garlick
Copy link
Member Author

garlick commented Oct 3, 2022

Since this is still waiting on the flux accounting PR getting merged, one last ditch change I think might be a good idea is to require a --force option on flux queue enable|disable when multiple queues are defined but the --queue option is not specified. In my ad hoc testing I've noticed it's somewhat easy to do something like

$ sudo flux queue disable debug This is the reason

instead of

$ sudo flux queue disable --queue debug This is the reason

and inadvertently disable all queues with the reason "debug This is the reason".

@grondo
Copy link
Contributor

grondo commented Oct 3, 2022

Another idea would be instead of --force to have an --all option that is required to select all queues as if they were given on the command line (flux queue disable -q debug,batch this is the reason), then similarly flux queue disable without specifying any queues only works when multiple queues are not defined. (--all would just be a little more intuitive IMO, but either way works for me)

cmoussa1 added a commit to cmoussa1/flux-accounting that referenced this pull request Oct 3, 2022
The plugin currently rejects jobs which submit a queue that the plugin does
not know about. The changes proposed in flux-framework/flux-core#4627
enable the job-manager to be more aware of queues and enstill queue validation.
As a result of this, the plugin should relax some of its current enforcements
on queue validation.

Remove the job rejection on a queue that flux-accounting does not know about.
Instead, add a default factor of 0 to the job through a preprocessor directive
called UNKNOWN_QUEUE (changed from NO_SUCH_QUEUE). If an association submits a
job to a queue that 1) flux-accounting knows about and 2) the user is not
allowed to submit jobs to, the job can still be rejected.

Remove the requirement for a default queue for jobs to be submitted to; instead,
if no queue is specified, add a default factor of 0 to the job through a
preprocessor directive called NO_QUEUE_SPECIFIED (changed from
NO_DEFAULT_QUEUE).

Add a comment to the queue_info struct to explain that the min_nodes_per_job,
max_nodes_per_job, and max_time_per_job fields are not currently enforced in
the plugin.
cmoussa1 added a commit to cmoussa1/flux-accounting that referenced this pull request Oct 3, 2022
The plugin currently rejects jobs which submit a queue that the plugin does
not know about. The changes proposed in flux-framework/flux-core#4627
enable the job-manager to be more aware of queues and enstill queue validation.
As a result of this, the plugin should relax some of its current enforcements
on queue validation.

Change the preprocessor directive INVALID_QUEUE to UNKNOWN_QUEUE, an integer
to act as a default factor of 0 when a queue is specified that flux-accounting
does not know about.

Change the preprocessor directive NO_DEFAULT_QUEUE to UNSPECIFIED_QUEUE, an
integer to act as a default factor of 0 when no queue is specified when an
association submits a job.
garlick added 11 commits October 3, 2022 10:51
Problem: t2240-queue-cmd.t checks for very specific error messages
in some tests, which makes the test brittle when messages change.

Relax the tests so that verbatim output is not required, just the
substantive portion of the message.
Problem: some tests submit jobs with queues that are not present
in the TOML configuration, but when the job manager becomes queue-aware,
this will no longer be possible.

Add [queues] configuration to tests where appropriate.
Problem: several futures are created but not destroyed.

Destroy futures.
Problem: several flux-queue subcommands accept -q,--queue=NAME,
but others accept -q,--quiet, which could be come confusing.

Drop -q as a short option for quiet.  The short option is not
used anywhere.
Problem: flux-queue sends non-error output to stderr.

Stop using log_msg(), which uses stderr, to print non-error output.
This means "flux-queue: " is dropped from those lines of output.

Update tests.
Problem: there is no way to list queues or enable/disable job
submission on a queue basis.

Add a new class to the job manager which provides new interfaces
for enabling, disabling, listing, and querying status of queues.

Note that these queues are not containers for jobs.  Jobs are still
enqueued in one "alloc queue" even when there are multiple named
queues configured.  The purpose of this class (at this time) is to
capture the queue configuration and the administrative status to be
be utilized by the job submit logic.

A future commit will wire this into the job submit logic.
Problem: jobs are accepted when submitted with a named queue when
queues are not configured.

Call queue_submit_check() on each submitted job.  The job is rejected
if a requested queue is not configured, or if the queue is disabled.

As part of the integration, the job-manager.submit-admin RPC is moved
temporarily to queue.c.  It will be removed once tools have been updated
to use the new RPCs.

Fixes flux-framework#4440
Problem: flux-uptime uses the deprecated job-manager.submit-admin RPC.

Use the job-manager.queue-status RPC instead.

Since the purpose of this section of code is to add a warning to the
output when the queue is disabled, and not to provide comprehensive queue
status, the warning is now added only when all queues are disabled.
Problem: there is no way to list available queues, nor enable/disable
individual queues.

Add a -q,--queue option to the enable, disable, and status subcommands.
If the queue option is unspecified and multiple named queues are configured,
all queues are targetted (--all confirmation required for enable, disable).
This allows the commands to function similar to the way they did before when
there is only the anonymous queue, which is still the common case.

Since flux queue status now lists the enable/disable status of all queues
when none are specified, this is now a way for a user to find out what
queues are available on the system.

Update rc1 to use "flux queue disable --all" when putting the instance
in safe mode.

Fixes flux-framework#4620
Problem: the job-manager.submit-admin RPC no longer has any users.

Drop deprecated RPC.
Problem: flux-uptime(1) now reports "queue disabled" only
if all queues are disabled, but this is not documented.

Change the notice description to include this fact.
Problem: flux-queue(1) does not document the command's limited
support for multiple named queues.

Describe the possibility of multiple queues.
Document the -q,--queue=NAME option for enable, disable, and status.
Reorder subcommands to group the multiqueue-enabled ones together.
Document -a,--all.
Problem: the addition of named queue support to flux-queue
has no test coverage.

Add some new tests.
@garlick
Copy link
Member Author

garlick commented Oct 3, 2022

OK, just pushed changes to add a --all option (good idea @grondo!)

@grondo
Copy link
Contributor

grondo commented Oct 3, 2022

Looks good to me (and already approved!)

@garlick
Copy link
Member Author

garlick commented Oct 3, 2022

Thanks! Setting MWP.

@mergify mergify bot merged commit 50b3ba4 into flux-framework:master Oct 3, 2022
cmoussa1 added a commit to cmoussa1/flux-accounting that referenced this pull request Oct 3, 2022
The plugin currently rejects jobs which submit a queue that the plugin does
not know about. The changes proposed in flux-framework/flux-core#4627
enable the job-manager to be more aware of queues and instill queue validation.
As a result of this, the plugin should relax some of its current enforcements
on queue validation.

Change the preprocessor directive INVALID_QUEUE to UNKNOWN_QUEUE, an integer
to act as a default factor of 0 when a queue is specified that flux-accounting
does not know about.

Change the preprocessor directive NO_DEFAULT_QUEUE to UNSPECIFIED_QUEUE, an
integer to act as a default factor of 0 when no queue is specified when an
association submits a job.
@garlick garlick deleted the issue#4440 branch October 3, 2022 20:23
cmoussa1 added a commit to cmoussa1/flux-accounting that referenced this pull request Oct 3, 2022
The plugin currently rejects jobs which submit a queue that the plugin does
not know about. The changes proposed in flux-framework/flux-core#4627
enable the job-manager to be more aware of queues and instill queue validation.
As a result of this, the plugin should relax some of its current enforcements
on queue validation.

Change the preprocessor directive NO_SUCH_QUEUE to UNKNOWN_QUEUE, an integer
to act as a default factor of 0 when a queue is specified that flux-accounting
does not know about.

Change the preprocessor directive NO_DEFAULT_QUEUE to NO_QUEUE_SPECIFIED, an
integer to act as a default factor of 0 when no queue is specified when an
association submits a job.
cmoussa1 added a commit to cmoussa1/flux-accounting that referenced this pull request Oct 3, 2022
The plugin currently rejects jobs which submit a queue that the plugin does
not know about. The changes proposed in flux-framework/flux-core#4627
enable the job-manager to be more aware of queues and instill queue validation.
As a result of this, the plugin should relax some of its current enforcements
on queue validation.

Change the preprocessor directive NO_SUCH_QUEUE to UNKNOWN_QUEUE, an integer
to act as a default factor of 0 when a queue is specified that flux-accounting
does not know about.

Change the preprocessor directive NO_DEFAULT_QUEUE to NO_QUEUE_SPECIFIED, an
integer to act as a default factor of 0 when no queue is specified when an
association submits a job.

Add comments to each preprocessor directive to explain more clearly what each
directive represents and is used for.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants