Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce size constraints #202

Merged
merged 7 commits into from
Aug 27, 2024

Conversation

jameshcorbett
Copy link
Member

Problem: as rabbits enter more general use, there needs to be a way
to enforce crude limits on the size of file systems that users can
request.

Add command-line arguments to coral2_dws for accepting size limits
and logic to the directivebreakdown module for enforcing the limits.

The limits are enforced by adding an exception with a message while the job is held in DEPEND state.

Problem: as rabbits enter more general use, there needs to be a way
to enforce crude limits on the size of file systems that users can
request.

Add command-line arguments to coral2_dws for accepting size limits
and logic to the directivebreakdown module for enforcing the limits.
Problem: the test suite creates a nnfstorageprofile and then deletes
it, but if the test fails the profile is not deleted and subsequent
runs will fail because the profile is not supposed to exist when
the tests start.

Make sure to delete the profile before starting the tests.
Problem: the coral2_dws script sends a resource-update RPC with a
non-null 'errmsg' field if a job requests storage that is beyond the
size constraints for file systems specified on the coral2_dws
command line. However, the dws-jobtap plugin does not act on those
error messages.

Add logic to the jobtap plugin to raise an exception if the errmsg
field is not null.
Problem: the sham coral2_dws script used by the testsuite does not
send errmsg fields with its resource-update RPCs.

Add the errmsg field.
Problem: the sham coral2_dws script in the test directory
is poorly formatted.

Format it with black.
Copy link
Member

@cmoussa1 cmoussa1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One tiny nit about a couple of the sharness tests.

-N1 -n1 hostname) &&
flux job wait-event -vt 10 -m description=${CREATE_DEP_NAME} \
${jobid} dependency-add &&
flux job wait-event -vt 20 ${jobid} exception &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super minor and I'll leave this up to you but it might be nice to check to see that the exception raised on these failed job submissions matches what you expect? i.e grep for the exception message raised

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done and force-pushed. Will set MWP.

Problem: there are no tests to ensure the enforcement of storage
constraints works.

Add tests.
Problem: attempting to remove a dependency after raising an
exception on the job fails, and is unnecessary.

Return after raising the exception.
@jameshcorbett jameshcorbett force-pushed the enforce-size-constraints branch from a70fd3b to d11e871 Compare August 27, 2024 16:34
@mergify mergify bot merged commit 6e101f4 into flux-framework:master Aug 27, 2024
8 checks passed
@jameshcorbett jameshcorbett deleted the enforce-size-constraints branch August 27, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants