Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ingest: set configured queue constraints #4587

Merged
merged 6 commits into from
Sep 20, 2022

Conversation

garlick
Copy link
Member

@garlick garlick commented Sep 19, 2022

This adds an ingest "frobnicator" plugin (thanks @grondo) that works with the RFC 33 queues.NAME.requires spec proposed in flux-framework/rfc#342. The broker policy/queue config validator is also modified to allow queues.NAME.requires.

This was sufficient to get my test system queues working again with this config

[ingest.validator]
plugins = ["feasibility", "jobspec" ]

[ingest.frobnicator]
plugins = [ "defaults", "constraints" ]

[[resource.config]]
hosts = "picl[0-7]"
cores = "0-3"

[[resource.config]]
hosts = "picl[1-5]"
properties = ["batch"]

[[resource.config]]
hosts = "picl[6-7]"
properties = ["debug"]

[queues.debug]
requires = [ "debug" ]

[queues.batch]
requires = [ "batch" ]

[policy.jobspec.defaults.system]
queue = "debug"

Marked as WIP pending

  • more testing
  • add flux-config-queues(5)

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far. I had just one thought while doing a first pass (sorry the comment was meant to be an individual comment, not a review, but I had already clicked "Start a Review")

}
if (policy) {
char key[1024];
snprintf (key, sizeof (key), "queues.%s.policy", name);
if (validate_policy_json (policy, key, error) < 0)
return -1;
}
if (requires) {
Copy link
Contributor

@grondo grondo Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing we might want to do here is validate that each configured property is valid according to RFC 20, i.e. does not contain whitespace or any characters from the set ! & ' " ^ `` | ( ).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was wrong! Actually ^ should be allowed since that is currently our shorthand for not, e.g. a sysadmin should be able to configure a queue.NAME.requires of [ "^batch" ] which means that queue NAME excludes all resources with the batch property.

I'm sorry for messing that one up.

@garlick
Copy link
Member Author

garlick commented Sep 19, 2022

Force pushed with a new flux-config-queues(5) man page and changes to validate property names, per @grondo's comments.

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I unfortunately gave bad advice in my previous comment (the ^ character at least should be allowed in requires (sorry about that), and had one trivial suggestion for the flux-config-queues(5) man page.

Comment on lines 9 to 12
The ``queues`` table configures job queues, as described in RFC 33.
Normally, Flux has a single anonymous queue, but when queues are configured,
all queues are named, and a job submitted without a queue name is rejected,
unless a default queue is configured.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very small suggestion, this might read better if the last comma is removed:

Normally, Flux has a single anonymous queue, but when queues are configured,
all queues are named, and a job submitted without a queue name is rejected
unless a default queue is configured.

}
if (policy) {
char key[1024];
snprintf (key, sizeof (key), "queues.%s.policy", name);
if (validate_policy_json (policy, key, error) < 0)
return -1;
}
if (requires) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was wrong! Actually ^ should be allowed since that is currently our shorthand for not, e.g. a sysadmin should be able to configure a queue.NAME.requires of [ "^batch" ] which means that queue NAME excludes all resources with the batch property.

I'm sorry for messing that one up.

@garlick
Copy link
Member Author

garlick commented Sep 20, 2022

NP! I should have caught it - I did skim RFC 20 after your comment but missed it too.

Any thoughts for further testing of the constraints plugin before we merge?

@grondo
Copy link
Contributor

grondo commented Sep 20, 2022

Any thoughts for further testing of the constraints plugin before we merge?

Yeah, here's some ideas:

  • ensure the constraints plugin can be specified even if no queues are configured
  • ensure the constraints plugin can be specified when the queue does not have a requires key

Also, I noticed that the constraints plugin I gave you allows for a non-list comma separated string to be specified for requires I think. That code could be removed.

Edit: Also if you drive the constraints plugin from flux job-frobnicator directly you could test that some of the invalid configuration is handled correctly there as well (e.g. requires not a list, invalid queue specified...

@garlick
Copy link
Member Author

garlick commented Sep 20, 2022

Also if you drive the constraints plugin from flux job-frobnicator directly you could test that some of the invalid configuration is handled correctly there as well (e.g. requires not a list, invalid queue specified...

Since the config comes from the broker, and the broker now checks the config, this is not really possible.

Just squashed the following changes

  • fix up a missed check in the broker queues validation
  • add some more tests based on your suggestions
  • tweak man page opening paragraph per suggestion
  • allow ^ in property names
  • eliminate support for spitting comma-separated properties in constraints plugin
  • rebase on current master

@garlick
Copy link
Member Author

garlick commented Sep 20, 2022

oops I need to try that again - I accidentally squashed the wrong pair of commits

@garlick garlick changed the title WIP: ingest: set configured queue constraints ingest: set configured queue constraints Sep 20, 2022
@grondo
Copy link
Contributor

grondo commented Sep 20, 2022

Since the config comes from the broker, and the broker now checks the config, this is not really possible.

Ah, yes, then I suppose the code to check validity of the config in the plugin is actually dead code and can be removed?

@codecov
Copy link

codecov bot commented Sep 20, 2022

Codecov Report

Merging #4587 (ad7525b) into master (5714d9d) will increase coverage by 0.02%.
The diff coverage is 94.64%.

❗ Current head ad7525b differs from pull request most recent head 02d6ab8. Consider uploading reports for the commit 02d6ab8 to get more accurate results

@@            Coverage Diff             @@
##           master    #4587      +/-   ##
==========================================
+ Coverage   83.35%   83.37%   +0.02%     
==========================================
  Files         409      410       +1     
  Lines       68548    68599      +51     
==========================================
+ Hits        57135    57196      +61     
+ Misses      11413    11403      -10     
Impacted Files Coverage Δ
...python/flux/job/frobnicator/plugins/constraints.py 91.89% <91.89%> (ø)
src/broker/brokercfg.c 87.81% <100.00%> (+1.02%) ⬆️
...gs/python/flux/job/frobnicator/plugins/defaults.py 88.88% <0.00%> (-1.86%) ⬇️
src/modules/cron/cron.c 82.47% <0.00%> (-0.45%) ⬇️
src/common/libterminus/terminus.c 85.82% <0.00%> (-0.25%) ⬇️
src/cmd/flux-job.c 87.53% <0.00%> (+0.12%) ⬆️
src/broker/overlay.c 86.46% <0.00%> (+0.21%) ⬆️
src/common/libflux/handle.c 83.87% <0.00%> (+0.38%) ⬆️
src/common/libsubprocess/local.c 84.39% <0.00%> (+0.48%) ⬆️
src/common/libsdprocess/sdprocess.c 69.76% <0.00%> (+0.62%) ⬆️
... and 2 more

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

garlick and others added 6 commits September 20, 2022 12:57
Problem: broker policy/queues config validator does not allow
'requires' queues subtable, but this is defined in RFC 33.

Allow the following, per RFC 33:
  queues.NAME.requires = [ string array ].

Add coverage to t2241-policy-config.t.
Problem: a malformed queues config such as queues=42 is
allowed by the broker policy/queues config validator.

Check that queues is a table.
Add test to t2241-policy-config.t.
Problem: configuration can specify a queue-specific default
queue, or an unknown default queue.

Validate the [policy] default queue.
Forbid [queues] from specifying a queue-specific default queue.
Update t2241-policy-config.t.
Drop config tests from t2112-job-ingest-frobnicator.t.
Update the now invalid example in flux-config-policy(5).

drop frobnicator test on missing queue config
Problem: RFC 33 specifies that a queue may define resource
constraints to identify partitioned resources, but that is not
currently supported by the frobnicator.

Add `constraints` plugin.
Problem: frobnicator constraints plugin has no test coverage.

Add tests to t2112-job-ingest-frobnicator.t to cover the plugin.
Problem: there is no manual page for the [queues] config.

Add man page.
@garlick
Copy link
Member Author

garlick commented Sep 20, 2022

Ah, yes, then I suppose the code to check validity of the config in the plugin is actually dead code and can be removed?

Well, that prompted me to look through what was being checked and discover a couple of cases that were missed by the broker check (default queue not in [queues], queue policy contains default queue) and cover those. A few tests in t2112-job-ingest-frobnicator.t were replaced with tests in t2241-policy-config.t (which checks the broker config check).

I also discovered that the man page example didn't have the default queue in [queues] and fixed that (doh!)

However, the checks in the frobnicator plugins are pretty minimal and do serve as documentation of expectations, if nothing else (like an assertion). Maybe OK to leave them in?

CAVEATS
=======

Queue resources should not overlap.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was going to ask about this caveat. Why can't queue resources overlap (is this a Fluxion constraint?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fluxion allows it, but some issues were identified in flux-framework/flux-sched#939.

Some use cases might work with Fluxion as is, like opening up a stopped queue to select users before a DAT, then when the DAT comes around, stop/drain the normal queues and start the DAT queue. That particular one would require the job manager to support multiple queues though.

Maybe there are other cases, but I thought it best to not advertise it at this point since the near term goal is partitioned resources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks, I had forgotten about that. I do know we were talking about allowing a queue with access to all resources (and we do have a pall partition on many clusters), and this caveat prevents that option. I agree we can ignore it for now as long as the concept of non-overlapping resource sets isn't baked into the design.

@garlick
Copy link
Member Author

garlick commented Sep 20, 2022

Setting MWP, thanks!

@mergify mergify bot merged commit dd58143 into flux-framework:master Sep 20, 2022
@garlick garlick deleted the queue_constraint branch September 20, 2022 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants