Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Storage] Progress callback called incorrectly from BlockBlobClient.uploadFile #4719

Closed
2 tasks done
mjrousos opened this issue Aug 8, 2019 · 16 comments
Closed
2 tasks done
Assignees
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library.

Comments

@mjrousos
Copy link
Member

mjrousos commented Aug 8, 2019

  • Package Name: @azure/storage-blob
  • Package Version: 12.0.0-preview.2
  • Operating system: Windows
  • nodejs
    • version: 12.8.0
  • typescript
    • version: 3.5.3

Describe the bug
If a progress callback is provided in UploadStreamToBlockBlobOptions when calling BlockBlobClient.uploadFile, the progress callback is quickly called for all the chunks of the file (before they have actually uploaded). The upload then continues (without progress) until it's done.

This is different from the .NET SDK which calls the progress callback only when bytes are actually uploaded, as expected.

This gif shows progress callbacks called while uploading a ~10MB file using the .NET SDK. Notice that the callbacks happen over the course of the upload, as expected:
.NET Core blob upload

Compare that to this gif showing the same file being uploaded using the JS SDK. Notice that the progress callback is called a lot all at the beginning and then the file uploads (without any progress notifications):
JS blob upload

To Reproduce

Upload a large file using this pattern:

let blobClient = containerClient.getBlobClient("testfile.txt");
let blockBlobClient = blobClient.getBlockBlobClient();

// Upload a 10MB file
await blockBlobClient.uploadFile("testfile.txt", {
    // This should be called over the course of the upload but, instead,
    // it is called lots of times up-front and not as the file uploads.
    progress: (p) => console.log(`Uploaded ${p.loadedBytes} bytes`)
});

For comparison, here is a .NET Core C# repro that works as expected:

var blobClient = containerClient.GetBlockBlobClient("testfile.txt");
using (var fileStream = new FileStream(@"testfile.txt", FileMode.Open))
{
    await blobClient.UploadAsync(fileStream, progressHandler: new MyProgressHandler());
}

MyProgressHandler is defined as:

internal class MyProgressHandler : IProgress<StorageProgress>
{
    public void Report(StorageProgress value)
    {
        Console.WriteLine($"Uploaded {value.BytesTransferred} bytes");
    }
}

Expected behavior
I would expect the progress callback to only be called when bytes are actually uploaded to Azure over the course of the upload operation.

@mjrousos
Copy link
Member Author

mjrousos commented Aug 8, 2019

Digging into this more, it seems to be specific to BlockBlobClient.uploadFile because the following repro using BlockBlockClient.uploadStream works as expected:

let fileStream = fs.createReadStream("testfile.txt");
await blockBlobClient.uploadStream(fileStream, 64000, 2, {
//await blockBlobClient.uploadFile("testfile.txt", {
    progress: (p) => console.log(`Uploaded ${p.loadedBytes} bytes`)
});

This sort of makes sense since uploadFile only uploads a single 'chunk' for files <256MB. What's strange, though, is that the progress callback is called many times all at once. If only a single stageBlock call is made, I would expect the progress callback to only be called once and not until after stageBlock has completed.

@mjrousos mjrousos changed the title [Storage] Progress callback called incorrectly on blob upload [Storage] Progress callback called incorrectly from BlockBlobClient.uploadFile Aug 8, 2019
@loarabia loarabia added Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files) labels Aug 8, 2019
@triage-new-issues triage-new-issues bot removed the triage label Aug 8, 2019
@XiaoningLiu
Copy link
Member

@mjrousos You can customize options.maxSingleShotSize of uploadFile . Be default it's 256MB. You can give it a smaller value like 4MB, then it will upload by 4MB block per request, and update progress only when each 4MB finished uploading.

@XiaoningLiu XiaoningLiu self-assigned this Sep 2, 2019
@mjrousos
Copy link
Member Author

mjrousos commented Sep 3, 2019

Even with the default settings, though, the callback is called many times before any data has actually uploaded, which is misleading. With a block size of 256MB I would expect that the progress callback shouldn't be called until 256MB (or the entire file if it's smaller than that) has uploaded.

@XiaoningLiu
Copy link
Member

When uploading happens block by block, then progress only updates per block uploads succesfully.

When uploading happens in a single PUT Blob request (for size less than maxSingleShotSize), progress will update when reading the stream.

@mjrousos
Copy link
Member Author

mjrousos commented Sep 4, 2019

I see what you mean about this being a single-block PUT, but it still seems like reporting progress while reading the stream (instead of after the block has been uploaded) is confusing. It seems like the progress callback is used differently in this case than in other cases (like uploadStream) which isn't a good experience.

@XiaoningLiu
Copy link
Member

The progress update callback is scheduled in underline @azure/core-http implemention. @HarshaNalluru do you know who can help improve this?

@HarshaNalluru
Copy link
Member

TransferProgressEvent is from the @azure/core-http which is fired in response to upload or download progress.
@jeremymeng @daviwil, Any thoughts?

@jeremymeng
Copy link
Member

We just pass the parameter of type TransferProgressEvent which defines loadedBytes. I don't see we are using any upload/download progress events from WebResource. The difference is likely between due to the implementation of BufferScheduler and Batch?

@XiaoningLiu
Copy link
Member

@jeremymeng Progress update for single request is scheduled in core-http. Batch & BufferScheduler updates progress request by request.

@jeremymeng
Copy link
Member

@XiaoningLiu Got it. So the issue only happens when uploading files with size < 256 MB. I see the option being passed in upload() now.

@jeremymeng
Copy link
Member

It's likely that we reported befoer the data is actually pushed. https://github.com/Azure/azure-sdk-for-js/blob/feature/storage/sdk/core/core-http/lib/fetchHttpClient.ts#L91

@XiaoningLiu
Copy link
Member

It's likely that we reported befoer the data is actually pushed. https://github.com/Azure/azure-sdk-for-js/blob/feature/storage/sdk/core/core-http/lib/fetchHttpClient.ts#L91

Yes.

@jeremymeng jeremymeng added Azure.Core and removed Storage Storage Service (Queues, Blobs, Files) labels Sep 20, 2019
jeremymeng added a commit to jeremymeng/azure-sdk-for-js that referenced this issue Sep 23, 2019
We were reporting the progress before data is pushed. This leads to customer
reported issue like

Azure#4719

This change introduce a `ReportStream` that provides the correct `_transform`
behavior.
jeremymeng added a commit to jeremymeng/azure-sdk-for-js that referenced this issue Sep 23, 2019
We were reporting the progress before data is pushed. This leads to customer
reported issue like

Azure#4719

This change introduce a `ReportStream` that provides the correct `_transform`
behavior.
jeremymeng added a commit to jeremymeng/azure-sdk-for-js that referenced this issue Sep 30, 2019
We were reporting the progress before data is pushed. This leads to customer
reported issue like

Azure#4719

where the uploaded progress events are triggered before data chunks are
completely uploaded.

This change introduces a `ReportStream` that provides the correct `_transform`
behavior.
jeremymeng added a commit to jeremymeng/azure-sdk-for-js that referenced this issue Sep 30, 2019
We were reporting the progress before data is pushed. This leads to customer
reported issue like

Azure#4719

where the uploaded progress events are triggered before data chunks are
completely uploaded.

This change introduces a `ReportStream` that provides the correct `_transform`
behavior.
jeremymeng added a commit that referenced this issue Sep 30, 2019
We were reporting the progress before data is pushed. This leads to customer
reported issue like

#4719

where the uploaded progress events are triggered before data chunks are
completely uploaded.

This change introduces a `ReportStream` that provides the correct `_transform`
behavior.
@jeremymeng
Copy link
Member

A fix is made in core-http. We need to upgrade to depend on the next core-http preview version.

@jeremymeng
Copy link
Member

core-http preview.4 merged into feature/storage branch we updated to the new version

@faraazhabeeb123
Copy link

Is this also an issue for uploadData method? I have been noticing similar problem. Progress returns a 100% whereas the file is not uploaded to blob yet.

@jeremymeng
Copy link
Member

@faraazhabeeb123 do you have a simple project that demonstrate the problem? That would greatly help us to determine whether it's a new issue or not.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library.
Projects
None yet
Development

No branches or pull requests

6 participants