Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destination data has significantly larger billable size than the source data [for sparse page blobs] #194

Open
hpaul-osi opened this issue Nov 11, 2019 · 3 comments
Assignees

Comments

@hpaul-osi
Copy link

Which service(blob, file) does this issue concern?

Blob (page blobs, specifically)

Which version of the SDK was used?

1.1.0

On which platform were you using? (.Net Framework version or .Net Core version, and OS version)

.NET Core

How can the problem be reproduced? It'd be better if the code caused the problem can be shared.

Choose a source container with page blobs of a certain size, for example 4 MB, that are sparsely populated. Perform a service side copy with CopyAsyc

await TransferManager.CopyAsync(blob, destBlob, CopyMethod.ServiceSideSyncCopy, copyOptions, copyContext, cancellationTokenSource.Token)));

What problem was encountered?

Similar to Az Copy issue 391, after performing blob copies with page blobs, the destination can be orders of magnitude larger than the source according to the billable size.

Have you found a mitigation/solution?

Not yet. The AzCopy team was able to resolve the the issue in 10.3.0, and it could be worked around by specifying a blocksize for the transfers to use, but I have not found a way to set the block size explicitly with TransferManager.

@EmmaZhu
Copy link
Member

EmmaZhu commented Nov 13, 2019

@hpaul-osi

Thanks for bringing this to us.

I've put this in our backlog.

You can see that in DMLib there are three types of copying method:
service side sync copying: to leverage the REST API PutRangeFromURL
service side async copying: to send a request to start copying in Azure Storage server side.
sync copying: to download blob content to memory and then upload to destination blob.

Does it work for your scenario if only supporting small block size in service side sync copying?

Thanks
Emma

@hpaul-osi
Copy link
Author

@EmmaZhu, I see what you described in the CopyMethod Enum. In our scenario, we require the copying to be service side, but we can use either ServiceSideAsyncCopy or ServiceSideSyncCopy. We'll try out ServiceSideAsyncCopy to see if this problem is still present.

@hpaul-osi
Copy link
Author

hpaul-osi commented Nov 15, 2019

To follow up, the ServerSideAsyncCopy has almost identical source and destination billable sizes for the blobs, so that may be viable for our use case. The ServerSideSyncCopy is recommended in the readme as the option with the best performance so we are going to investigate the performance impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants