-
-
Notifications
You must be signed in to change notification settings - Fork 377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store MD5 in a separate metadata field and use it when ETag is not an MD5 hash #924
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lbolla
reviewed
Mar 2, 2021
andrewshadura
force-pushed
the
s3-md5-etag
branch
from
March 2, 2021 11:58
6d003a3
to
42f1a78
Compare
Can you also rebase against the latest master please, so tests should now pass. |
Codecov Report
@@ Coverage Diff @@
## master #924 +/- ##
==========================================
- Coverage 61.41% 61.28% -0.13%
==========================================
Files 54 54
Lines 5911 5928 +17
==========================================
+ Hits 3630 3633 +3
- Misses 1757 1770 +13
- Partials 524 525 +1
Continue to review full report at Codecov.
|
The S3 backend relies on ETag S3 returns being equal to the MD5 of the object, but it’s not necessarily true. For that purpose we store the MD5 object in a separate metadata field as well to make sure it isn’t lost. From https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html: > The entity tag is a hash of the object. The ETag reflects changes only > to the contents of an object, not its metadata. The ETag may or may not > be an MD5 digest of the object data. Whether or not it depends on how > the object was created and how it is encrypted as described below: > > Objects created by the PUT Object, POST Object, or Copy operation, > or through the AWS Management Console, and are encrypted by SSE-S3 or > plaintext, have ETags that are an MD5 digest of their object data. > > Objects created by the PUT Object, POST Object, or Copy operation, > or through the AWS Management Console, and are encrypted by SSE-C or > SSE-KMS, have ETags that are not an MD5 digest of their object data. > > If an object is created by either the Multipart Upload or Part Copy > operation, the ETag is not an MD5 digest, regardless of the method > of encryption. Signed-off-by: Andrej Shadura <[email protected]>
The S3 backend relies on ETag S3 returns being equal to the MD5 of the object, but it’s not necessarily true. When the value returned clearly doesn’t look like a valid MD5 hash (length isn’t exactly 32 characters), attempt to retrieve the MD5 hash possibly stored in the metadata. We cannot always do this since user-defined metadata isn’t returned by the ListObjects call, so verifying it for each object is expensive as it requires one HEAD request per each object. This commit fixes aptly-dev#923. Signed-off-by: Andrej Shadura <[email protected]>
andrewshadura
force-pushed
the
s3-md5-etag
branch
from
March 2, 2021 12:13
42f1a78
to
599e67a
Compare
lbolla
approved these changes
Mar 2, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the Change
The S3 backend relies on ETag S3 returns being equal to the MD5 of the object, but it’s not necessarily true. For that purpose we store the MD5 object in a separate metadata field as well to make sure it isn’t lost. When the value returned clearly doesn’t look like a valid MD5 hash (length isn’t exactly 32 characters), attempt to retrieve the MD5 hash possibly stored in the metadata.
We cannot always do this since user-defined metadata isn’t returned by the ListObjects call, so verifying it for each object is expensive as it requires one HEAD request per each object.
This pull request fixes #923
Requirements
All new code should be covered with tests, documentation should be updated. CI should pass.
Checklist
AUTHORS