Buffered upload no longer requires length in sync client #22218

jaschrep-msft · 2021-06-10T21:52:32Z

Previously, Storage's InputStream -> Flux conversion method required a length for pregenerating a flux of set size to chunk the stream into. Since then, azure.core has introduced a converter that uses Flux.generate() to recreate the flux as needed. They serve slightly different cases:

Storage converter can handle stream marking for when we don't want to buffer a payload, but requires length upfront.
Core converted cannot handle stream marking (and so buffering is necessary) but doesn't require a length.

Since any storage transfer method that avoids buffering in favor of seeking requires a supplied length for the REST call anyway, this PR shifts to using the azure.core method when desired, rather than modifying one of those converters to serve both purposes.

gapra-msft · 2021-06-11T22:15:57Z

...age-file-datalake/src/main/java/com/azure/storage/file/datalake/DataLakeFileAsyncClient.java

-                (int) Math.min(Integer.MAX_VALUE, validatedParallelTransferOptions.getBlockSizeLong()), false)
-                : options.getDataFlux();
+                int chunkSize = (int) Math.min(Integer.MAX_VALUE, validatedParallelTransferOptions.getBlockSizeLong());
+                data = FluxUtil.toFluxByteBuffer(options.getDataStream(), chunkSize);


How much data does this buffer at a time? Just chunkSize?

each bytebuffer will be chunkSize. It already was, I just pulled it into a variable.

jaschrep-msft · 2021-06-14T23:51:23Z

/azp run java - storage - ci

azure-pipelines · 2021-06-14T23:51:38Z

Azure Pipelines successfully started running 1 pipeline(s).

jaschrep-msft · 2021-06-15T00:34:08Z

/azp run java - storage - ci

azure-pipelines · 2021-06-15T00:34:30Z

Azure Pipelines successfully started running 1 pipeline(s).

jaschrep-msft · 2021-06-17T17:47:10Z

/check-enforcer override

jaschrep-msft · 2021-06-17T17:48:55Z

overrode check enforcer to deal with pipeline bugs. was green before they were introduced. discussed offline with team.

gapra-msft · 2021-06-17T17:54:04Z

sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobAsyncClient.java

                // We can only buffer up to max int due to restrictions in ByteBuffer.
-                (int) Math.min(Integer.MAX_VALUE, parallelTransferOptions.getBlockSizeLong()), false)
-                : options.getDataFlux();
+                int chunkSize = (int) Math.min(Integer.MAX_VALUE, parallelTransferOptions.getBlockSizeLong());


can you add a note here that this is fine since buffered upload does not require a replayable flux?

It's not on this exact line but it's in the method on decision-making that this is buffered

kasobol-msft · 2021-06-17T18:03:32Z

sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobAsyncClient.java

-            Flux<ByteBuffer> data = options.getDataFlux() == null ? Utility.convertStreamToByteBuffer(
-                options.getDataStream(), options.getLength(),
+            Flux<ByteBuffer> data = options.getDataFlux();
+            // no specified length: use azure.core's converter


if we have markable stream - is SDK smart enough to detect it and don't buffer at all?

Unfortunately not. Also, though, that is only applicable when maxConcurrency = 1, so we aren't missing out on the biggest optimization.

Does this PR somewhat impairs ability to introduce such optimization?
We have customers asking about memory overhead. We should offer a way to do unbuffered upload of markable stream.

I don't think this PR introduces any new hurdles for such a feature that didn't already exist. I did the work for that optimization in .NET and the integration aspect was fairly simple.

jaschrep-msft · 2021-06-21T21:10:28Z

closing and opening in a new PR to get a new pipeline. this one seems permanently ruined by the bug the other day.

jaschrep-msft · 2021-06-21T21:12:22Z

Apparently we're stuck with this pipeline

check-enforcer · 2021-06-21T21:26:04Z

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

What if I am onboarding a new service?

Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment:
/azp run prepare-pipelines
This will run a pipeline that analyzes the source tree and creates the pipelines necessary to build and validate your pull request. Once the pipeline has been created you can trigger the pipeline using the following comment:
/azp run java - [service] - ci

kasobol-msft · 2021-06-21T21:26:38Z

/azp run java - storage - ci

azure-pipelines · 2021-06-21T21:26:55Z

Azure Pipelines successfully started running 1 pipeline(s).

kasobol-msft · 2021-06-22T15:27:01Z

sdk/storage/azure-storage-blob/src/main/java/com/azure/storage/blob/BlobAsyncClient.java

                // We can only buffer up to max int due to restrictions in ByteBuffer.
-                (int) Math.min(Integer.MAX_VALUE, parallelTransferOptions.getBlockSizeLong()), false)
-                : options.getDataFlux();
+                int chunkSize = (int) Math.min(Integer.MAX_VALUE, parallelTransferOptions.getBlockSizeLong());


This chunkSize isn't related to block size right? It's just for the purpose of stream->flux conversion.
I wonder if we should be using block size here.
few things to consider (and please correct me if this is stupid but I have a feeling that his might be reason we've seen higher than expected demand for memory in perf tests):

if this results in double buffering, i.e. stream->flux needs memory and then buffered upload needs it as well then in worst case "uploader" might be busy working on uploading last chunk (list of bytebuffers emitted from converter) and converter trying to supply next block - both will hold memory. I believe reactor will make sure at least one next element is getting prepared while recent block is being uploaded. @gapra-msft do you think this makes sense?

allocating smaller chunks to feed Flux might be a bit friendlier for allocator and GC. It will have some overhead but on the other hand it's easier to fit multiple smaller arrays into heap rather than finding place for Integer.Max_Value array. The later might trigger GC/heap defragmentation efforts. We shouldn't go crazy small though here but I think int.max is too big. A lot of built in components (like bufferedstreams, http stacks) default to 8KB buffer, so that's the lower boundary, not sure how high we should allow it to be - maybe 64kb? or maybe we need another knob.

Can you file an issue for this and assign it to me, or does it really need to be addressed now?

Yes, this can be converted to an issue and addressed outside of this PR. Would you mind doing this?

kasobol-msft · 2021-06-22T15:29:09Z

...ure-storage-blob/src/main/java/com/azure/storage/blob/options/BlobParallelUploadOptions.java

    public BlobParallelUploadOptions(InputStream dataStream, long length) {
        StorageImplUtils.assertNotNull("dataStream", dataStream);
-        StorageImplUtils.assertInBounds("length", length, 0, Long.MAX_VALUE);
+        StorageImplUtils.assertInBounds("length", length, -1, Long.MAX_VALUE);


should we add overload instead of using magic number and use nullable Long to represent this?

well, I just noticed overload below. In that case I'd keep original validation condition here and point users towards other ctor. or maybe create (InputStream, Long) ctor and move this logic there.

Is the deprecation not enough to point users to the other ctor? I don't like the idea of introducing another ctor with a Long in it because the whole point of this change it's meaningless to provide one. There's nothing extra bought from providing a length, we're just preserving old behavior in case someone relied on us catching a mismatched length or something like that. A Long implies it may still be of use or provide some optimization or something when it really doesn't.

then we should either have private ctor or don't call one from the other one.
We should also deprecate getLength and replace it with something like Long getSize() ?

I liked size in the context of expanding the ParallelTransferOptions back then because we could also rename the setter. But here we're not adding a new setter to complete the connection. Perhaps Long getOptionalLength()?

either way is fine, we'll know at apireview.

gapra-msft · 2021-06-25T18:55:57Z

...ure-storage-blob/src/main/java/com/azure/storage/blob/options/BlobParallelUploadOptions.java

     */
+    @Deprecated
    public long getLength() {
        return length;


is it worth saving the customer of a NPE here?

Or if length is null should we throw a meaningful error?

I think this is fine, it won't throw if they haven't switched to new code path.

gapra-msft

lgtm. Left one new comment about getLength()

BlobParallelUploadOptions takes -1 length

b864c9b

ghost added the Storage Storage Service (Queues, Blobs, Files) label Jun 10, 2021

jaschrep-msft added 4 commits June 10, 2021 16:47

javadoc syntax

a89c0b8

ported converter changes to datalake

8cce9a3

ported converter changes to file share

9a54687

deprecated old constructors

7ee1c3e

gapra-msft reviewed Jun 11, 2021

View reviewed changes

test retry on new inputstream converter

83442bf

gapra-msft marked this pull request as ready for review June 17, 2021 17:48

gapra-msft requested review from alzimmermsft, amishra-dev, kasobol-msft and rickle-msft as code owners June 17, 2021 17:48

gapra-msft reviewed Jun 17, 2021

View reviewed changes

kasobol-msft reviewed Jun 17, 2021

View reviewed changes

jaschrep-msft closed this Jun 21, 2021

jaschrep-msft mentioned this pull request Jun 21, 2021

Storage/stream2flux nolength #22428

Closed

jaschrep-msft reopened this Jun 21, 2021

poke ci

89de479

jaschrep-msft requested review from kasobol-msft and gapra-msft June 21, 2021 23:49

kasobol-msft reviewed Jun 22, 2021

View reviewed changes

jaschrep-msft added 2 commits June 23, 2021 15:09

redid lengthless buffered upload API

47b02c2

fixed bad arg check

7be605e

jaschrep-msft mentioned this pull request Jun 25, 2021

Investigate ByteBuffer size when converting InputStream to Flux #22579

Closed

gapra-msft reviewed Jun 25, 2021

View reviewed changes

gapra-msft approved these changes Jun 25, 2021

View reviewed changes

kasobol-msft approved these changes Jun 25, 2021

View reviewed changes

jaschrep-msft merged commit 1daae94 into Azure:main Jun 28, 2021

jaschrep-msft mentioned this pull request Jun 28, 2021

[FEATURE REQ] BlobClient.upload(InputStream, ...) shouldn't require a known length #20567

Closed

jaschrep-msft deleted the storage/stream2flux-nolength branch August 3, 2022 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffered upload no longer requires length in sync client #22218

Buffered upload no longer requires length in sync client #22218

jaschrep-msft commented Jun 10, 2021

gapra-msft Jun 11, 2021

jaschrep-msft Jun 14, 2021

jaschrep-msft commented Jun 14, 2021

azure-pipelines bot commented Jun 14, 2021

jaschrep-msft commented Jun 15, 2021

azure-pipelines bot commented Jun 15, 2021

jaschrep-msft commented Jun 17, 2021

jaschrep-msft commented Jun 17, 2021

gapra-msft Jun 17, 2021

jaschrep-msft Jun 21, 2021

kasobol-msft Jun 17, 2021

jaschrep-msft Jun 17, 2021

kasobol-msft Jun 17, 2021

jaschrep-msft Jun 21, 2021

jaschrep-msft commented Jun 21, 2021

jaschrep-msft commented Jun 21, 2021

check-enforcer bot commented Jun 21, 2021

kasobol-msft commented Jun 21, 2021

azure-pipelines bot commented Jun 21, 2021

kasobol-msft Jun 22, 2021

jaschrep-msft Jun 22, 2021 •

edited

Loading

kasobol-msft Jun 22, 2021

kasobol-msft Jun 22, 2021

jaschrep-msft Jun 22, 2021

kasobol-msft Jun 22, 2021

jaschrep-msft Jun 23, 2021

kasobol-msft Jun 23, 2021

gapra-msft Jun 25, 2021

gapra-msft Jun 25, 2021

kasobol-msft Jun 25, 2021

gapra-msft left a comment

Buffered upload no longer requires length in sync client #22218

Buffered upload no longer requires length in sync client #22218

Conversation

jaschrep-msft commented Jun 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaschrep-msft commented Jun 14, 2021

azure-pipelines bot commented Jun 14, 2021

jaschrep-msft commented Jun 15, 2021

azure-pipelines bot commented Jun 15, 2021

jaschrep-msft commented Jun 17, 2021

jaschrep-msft commented Jun 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaschrep-msft commented Jun 21, 2021

jaschrep-msft commented Jun 21, 2021

check-enforcer bot commented Jun 21, 2021

What is Check Enforcer?

Why am I getting this message?

What should I do now?

What if I am onboarding a new service?

kasobol-msft commented Jun 21, 2021

azure-pipelines bot commented Jun 21, 2021

Choose a reason for hiding this comment

jaschrep-msft Jun 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gapra-msft left a comment

Choose a reason for hiding this comment

jaschrep-msft Jun 22, 2021 •

edited

Loading