-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle ENOSPC more cleanly #3513
Handle ENOSPC more cleanly #3513
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good. However, there are some small items which you might want to address.
controller = cm.archive_root / "ABC" | ||
controller.mkdir() | ||
(controller / source_tarball.name).write_text("Send in the clones") | ||
|
||
# Attempting to create a dataset from the md5 file should result in | ||
# a duplicate dataset error | ||
with pytest.raises(DuplicateTarball) as exc: | ||
cm.create(source_tarball) | ||
assert exc.value.tarball == Dataset.stem(source_tarball) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The coverage report indicates that we do not test the case of a pre-existing MD5 file (without a pre-existing tarball).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a new case, nor is the condition very realistic. I had kinda hoped to get this done today and I'm trying to finish something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All true, but, realistic or not, we have code that checks for it, and, therefore, it arguably should be exercised by the tests. Whether it should be addressed in this PR or not is a totally separate question. (I'm just reporting what I found...and I did approve the PR....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
I just have the one question lingering from our previous exchange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I approve...unless there's another change coming. 😉
PBENCH-1239 In Pbench ops review, after staging the latest `main` and with the intent of testing the new `Tarball.extract` on a large dataset, we pushed a >7Gb `uperf` tarball. This failed with an internal error, leaving a partial tarball copy in the `ARCHIVE` controller directory, revealing several related problems: 1. The cache manager was using `shlib.copy2`, which copies the tarball from the staging area into the archive tree. Because `nginx` also caches the entire data stream, this *triples* the storage requirements to upload a tarball. 2. On copy failure, the cache manager did not delete the partial file. 3. While the initial data stream save code handled an `ENOSPC` specially, after mapping trouble in Werkzeug it was reported as a "server internal error", which is not ideal. 4. The MD5 file write was not similarly protected: and while this is a small file and `ENOSPC` is unlikely, we should be prepared to handle it gracefully. This PR changes the cache manager to use `shlib.move` (which was the original intent) to avoid a third copy of the tarball. On failure, we unlink the file. Both the initial tarball and MD5 write handle `ENOSPC` and return HTTP status 413 (request entity too large), which is not a perfect mapping but a standard error code that Werkzeug can handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing worth blocking the merge over, but I found some new items for you to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am so sorry, Dave, but I think I found a bug in one of the test assertions. 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvements!
PBENCH-1239
In Pbench ops review, after staging the latest
main
and with the intent of testing the newTarball.extract
on a large dataset, we pushed a >7Gbuperf
tarball. This failed with an internal error, leaving a partial tarball copy in theARCHIVE
controller directory, revealing several related problems:shlib.copy2
, which copies the tarball from the staging area into the archive tree. Becausenginx
also caches the entire data stream, this triples the storage requirements to upload a tarball.ENOSPC
specially, after mapping trouble in Werkzeug it was reported as a "server internal error", which is not ideal.ENOSPC
is unlikely, we should be prepared to handle it gracefully.This PR changes the cache manager to use
shlib.move
(which was the original intent) to avoid a third copy of the tarball. On failure, we unlink the file. Both the initial tarball and MD5 write handleENOSPC
and return HTTP status 413 (request entity too large), which is not a perfect mapping but a standard error code that Werkzeug can handle.