-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
t: add storage-constrained tests for tiered storage systems #673
t: add storage-constrained tests for tiered storage systems #673
Conversation
Codecov Report
@@ Coverage Diff @@
## master #673 +/- ##
==========================================
- Coverage 75.22% 75.21% -0.02%
==========================================
Files 78 78
Lines 8300 8300
==========================================
- Hits 6244 6243 -1
- Misses 2056 2057 +1
Continue to review full report at Codecov.
|
I'm not sure why codecov is failing since this PR only adds tests without adding code, but besides that, I think this PR is ready for review. |
PS - this is a relatively low priority PR, and I know we are up against a deadline. I'm more than happy to have this PR delayed until after the 0.9.0 tag if others agree that other outstanding PRs are more important. |
@SteVwonder: I will review this right after lunch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great coverage increase! Here is the first batch (only on the node-local coverage). I didn't want to possibly lose this during my lunch. I had lots of IT issues -- I wonder if this is a good time to replace my laptop... Will circle back to the rest of the multi-tiered storage levels.
t/data/resource/jobspecs/mt-storage/L0/storage-constrained.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SteVwonder: This PR looks good generally. I just spotted a few minor issues but I will approve the PR. Once you address those minor issues, feel free to add the MWP label. Thanks!
t/data/resource/commands/mt-storage/L1/cmds-ssd-constrained-alloc-orelse.in
Show resolved
Hide resolved
t/data/resource/jobspecs/mt-storage/L1/storage-constrained.yaml
Outdated
Show resolved
Hide resolved
t/data/resource/commands/mt-storage/L1/cmds-ssd-constrained-same-rack-alloc-orelse.in
Show resolved
Hide resolved
t/data/resource/jobspecs/mt-storage/L1/storage-constrained-same-rack.yaml
Outdated
Show resolved
Hide resolved
t/data/resource/commands/mt-storage/L3/storage-constrained-global.in
Outdated
Show resolved
Hide resolved
t/data/resource/commands/mt-storage/L3/storage-constrained-same-rack.in
Outdated
Show resolved
Hide resolved
I can review this PR in the next three hours. |
It seems this estimate was premature. I needed to address changes with my PR and haven't gotten around to review this. I'll get to it ASAP! |
31be293
to
9c64eaa
Compare
@dongahn: Thanks for the feedback! I just rebased and force-pushed with the changes. I forgot and autosquashed before pushing so that the fixup commits were auto-rolled in. Sorry about that. I can probably undo the autosquash with reflog if you want. |
Not necessary! I will take a brief look at it once again and put the MWP label if every looks okay. |
BTW, how do you do this? :-) This LGTM. MWP. |
@milroy: Ah... I forgot you were also reviewing this PR. Let me know if you have any comments that you want to see resolved before the MWP. |
When you commit a fixup or squash commit, use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I only found a couple of typos in a commit message.
t/data/resource/commands/mt-storage/L1/cmds-ssd-constrained-same-rack-alloc-orelse.in
Show resolved
Hide resolved
9c64eaa
to
8114dca
Compare
@SteVwonder: sounds great to me. Great addition to our multi-tiered storage support! |
.out is the suffix of the files contained in this data directory
Problem: previous multi-tier storage tests allocated compute and storage in the exact same ratio as found in the cluster. It is conceivable that these tests would pass under a scheduler that only respected compute constraints while ignoring IO constraints. Solution: add tests with jobspecs that request 2x storage of the existing tests to exhaust the storage resources before the compute ones Note: the node-local storage case is known in the tests as L0
Add storage-constrained tests -- tests with jobspecs that request 2x storage of the existing tests to exhaust the storage resources before the compute ones -- for the rack-local storage case, known in the tests as L1.
These tests explicity request that the rack-local SSDs and the compute nodes allocated to the job share the same rack. Previous tests allowed the SSDs and nodes to be on different racks.
Add storage-constrained tests -- tests with jobspecs that request 2x storage of the existing tests to exhaust the storage resources before the compute ones -- for the global storage case, known in the tests as L2.
Add storage-constrained tests -- tests with jobspecs that request 2x storage of the existing tests to exhaust the storage resources before the compute ones -- for the hybrid storage case, known in the tests as L3.
Problem: when running `make distcheck`, the creation of a tarball fails due to filenames that are too long. This is because autotools defaults to the oldest, most compatible tar format (v7) which limits filename length to 99 characters. Solution: switch to a newer tar format (which is honestly still quite old - defined in the 1988 POSIX Standard) called `ustar` that supports filenames up to 256 characters in length. If this becomes a problem in the future, we can switch to `pax` which has no limits. Additionally, set the filename-length-max option so that the entire build fails rather than the tar silently failing to be created. Further Reading: - https://www.gnu.org/software/tar/manual/html_section/tar_67.html - https://www.gnu.org/software/automake/manual/html_node/List-of-Automake-options.html - https://noiselabs.io/tar-file-name-is-too-long-max-99/
8114dca
to
fd805d0
Compare
Problem: previous multi-tier storage tests allocated compute and storage
in the exact same ratio as found in the cluster. It is conceivable that
these tests would pass under a scheduler that only respected compute
constraints while ignoring IO constraints.
Solution: add tests with jobspecs that request 2x storage of the
existing tests to exhaust the storage resources before the compute ones.
Closes #623