Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding MD5 from UploadFileOptions #20356

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sdk/storage/azblob/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
### Breaking Changes

### Bugs Fixed
* Added TransactionalContentMD5 to UploadOptions and CommitBlockListOptions. Fixes [20092](https://github.com/Azure/azure-sdk-for-go/issues/20092).

### Other Changes

Expand Down
28 changes: 15 additions & 13 deletions sdk/storage/azblob/blockblob/models.go
Original file line number Diff line number Diff line change
Expand Up @@ -228,24 +228,26 @@ func (o *uploadFromReaderOptions) getStageBlockOptions() *StageBlockOptions {

func (o *uploadFromReaderOptions) getUploadBlockBlobOptions() *UploadOptions {
return &UploadOptions{
Tags: o.Tags,
Metadata: o.Metadata,
Tier: o.AccessTier,
HTTPHeaders: o.HTTPHeaders,
AccessConditions: o.AccessConditions,
CPKInfo: o.CPKInfo,
CPKScopeInfo: o.CPKScopeInfo,
Tags: o.Tags,
Metadata: o.Metadata,
Tier: o.AccessTier,
TransactionalContentMD5: o.TransactionalContentMD5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two features, one is to just set the MD5 sum in the blob and other one is transactional MD5. For transactional options crc64 is also supported. Transactional fields are only for validation and will not set any properties in the blob. Just double check that we are good with all these combinations as I do not see transactional CRC here, just had doubts on we are still missing some things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be careful about the way we chose to expose transactional MD-5 and CRC-64 to the customer. We will be implement end-to-end content validation in the Go SDK this fall, after the FE has implement trailing CRC-64 header.

This work will look something like this - Azure/azure-sdk-for-net#29778

We do not need to calculate CRC-64 or MD5 within the Go SDK at this time, but we need to design our public interfaces in a way that will support this option in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how UploadOptions looks in .NET - https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/storage/Azure.Storage.Blobs/src/Models/BlobUploadOptions.cs

It has UploadTransferValidationOptions as a property - https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/storage/Azure.Storage.Common/src/UploadTransferValidationOptions.cs

I recommend we do something similar in Go to future proof out public interfaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Sean. In .NET we had to retire our byte[] contentHash arguments and replace them with a whole configuration setup. It would be best to avoid writing "instant legacy" APIs and this probably merits further offline discussion of how to balance immediate needs with medium-term plans for transactional content integrity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In .NET, if the customer populates TransferValidationOptions.PrecalculatedChecksum and TransferValidationOptions.ChecksumAlgorithm, the SDK uses their pre-calculated checksum. If the customer only populates TransferValidationOptions.ChecksumAlgorithm, the SDK calculates the checksum for the user.

We don't want to implement the SDK calculating the checksum for the user until ~6 months from now when the service supports trailing checksum footers, but we need to design our public interface to be compatible with this functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaschrep-msft, what should we do in in the case where the customer provides a precalculated checksum for a multi-part upload, before we implement calculating checksums in the SDK?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fail fast. If a customer provides a transactional checksum that the SDK cannot make use of, we shouldn't pretend an integrity check has occurred when it actually hasn't.

In .NET, before we had SDK calculation on caller behalf, we didn't accept the parameter on any API that had the possibility of splitting an upload, as we didn't want customers to write code that suddenly broke once they passed a file size threshold they were not aware of. Customers had to use direct to REST apis if they wanted this feature.

HTTPHeaders: o.HTTPHeaders,
AccessConditions: o.AccessConditions,
CPKInfo: o.CPKInfo,
CPKScopeInfo: o.CPKScopeInfo,
}
}

func (o *uploadFromReaderOptions) getCommitBlockListOptions() *CommitBlockListOptions {
return &CommitBlockListOptions{
Tags: o.Tags,
Metadata: o.Metadata,
Tier: o.AccessTier,
HTTPHeaders: o.HTTPHeaders,
CPKInfo: o.CPKInfo,
CPKScopeInfo: o.CPKScopeInfo,
Tags: o.Tags,
Metadata: o.Metadata,
Tier: o.AccessTier,
TransactionalContentMD5: o.TransactionalContentMD5,
HTTPHeaders: o.HTTPHeaders,
CPKInfo: o.CPKInfo,
CPKScopeInfo: o.CPKScopeInfo,
}
}

Expand Down
74 changes: 74 additions & 0 deletions sdk/storage/azblob/client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ package azblob_test

import (
"context"
"crypto/md5"
"encoding/hex"
"errors"
"fmt"
"io"
Expand Down Expand Up @@ -335,6 +337,78 @@ func (s *AZBlobUnrecordedTestsSuite) TestUploadAndDownloadFileNonZeroOffsetAndCo
performUploadAndDownloadFileTest(s.T(), _require, testName, fileSize, blockSize, concurrency, downloadOffset, downloadCount)
}

func (s *AZBlobUnrecordedTestsSuite) TestMD5SmallFileUpload() {
_require := require.New(s.T())
fileSize := 1 * 1024 * 1024 // 1 MB file

md5sum, resp, err := performUploadWithMD5Check(s.T(), _require, s.T().Name(), fileSize)

_require.NoError(err)

md5sumStr := hex.EncodeToString(md5sum)
respMd5 := hex.EncodeToString(resp.ContentMD5)
_require.Equal(md5sumStr, respMd5)

}

func (s *AZBlobUnrecordedTestsSuite) TestMD5LargeFileUpload() {
_require := require.New(s.T())
fileSize := 257 * 1024 * 1024 // 257 MB file, 256 MB is the max for calculate md5 and service md5 won't match

md5sum, resp, err := performUploadWithMD5Check(s.T(), _require, s.T().Name(), fileSize)

_require.ErrorContains(err, "400 The MD5 value specified in the request did not match with the MD5 value calculated by the server.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we getting this error for large file uploads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For larger files, the service won't be able to calculate the MD5 and this error is expected as the user provided MD5 and the response MD5 won't match. We don't get this error as of now because the user provided MD5 doesn't get passed at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a transactional md5 will always be calculated by the service. If the service is failing the transactional md5 check it's because the check is actually failing. I imagine this is because you've configured the transactional md5 on put blocklist. A transactional md5 for that API is literally checking the content integrity of the blocklist itself (it's one of the only places .NET SDK currently doesn't calculate one).

Transactional md5 validates individual requests and nothing more, meaning it would need to validate individual blocks. If one large md5 is being passed in for a partitioned upload, the client should fail fast since the checksum cannot be split.


md5sumStr := hex.EncodeToString(md5sum)
respMd5 := hex.EncodeToString(resp.ContentMD5)
_require.NotEqual(md5sumStr, respMd5)

}

func performUploadWithMD5Check(t *testing.T, _require *require.Assertions, testName string, fileSize int) ([]byte, blockblob.UploadFileResponse, error) {
// Set up file to upload
fileName := "BigFile.bin"
_ = generateFile(fileName, fileSize)

// Open the file to upload
file, err := os.Open(fileName)
_require.NoError(err)
defer func(file *os.File) {
_ = file.Close()
}(file)
defer func(name string) {
_ = os.Remove(name)
}(fileName)

client, err := testcommon.GetClient(t, testcommon.TestAccountDefault, nil)
_require.NoError(err)

containerName := testcommon.GenerateContainerName(testName)
_, err = client.CreateContainer(context.Background(), containerName, nil)
_require.NoError(err)
defer func() {
_, err := client.DeleteContainer(context.Background(), containerName, nil)
_require.NoError(err)
}()

// calculate md5 sum
h := md5.New()
if _, err := io.Copy(h, file); err != nil {
_require.NoError(err)
}
md5sum := h.Sum(nil)
// Set up test blob
blobName := testcommon.GenerateBlobName(testName)

// Upload the file to a block blob
resp, err := client.UploadFile(context.Background(), containerName, blobName, file,
&blockblob.UploadFileOptions{
TransactionalContentMD5: md5sum,
})

return md5sum, resp, err
}

func performUploadAndDownloadBufferTest(t *testing.T, _require *require.Assertions, testName string, blobSize, blockSize, concurrency, downloadOffset, downloadCount int) {
// Set up buffer to upload
_, bytesToUpload := testcommon.GenerateData(blobSize)
Expand Down