Single request uploads of large "readable" data stream are slow (capped at ~8Mbps) #11044

kasobol-msft · 2020-04-24T17:21:53Z

Package Name:
azure-core (pipeline)
Package Version:
latest
Operating System:
Windows 10 Enterprise (1909) . But this seems to be platform independent issue.
Python Version:
platform win32 -- Python 3.7.7, pytest-5.4.1, py-1.8.1, pluggy-0.13.1 -- C:\git\azure-sdk-for-python\venv\Scripts\python.exe
References
Root cause analysis brought me to these issues filled against Python httplib. Confirmed experimentally (see below).
https://bugs.python.org/issue21790
https://bugs.python.org/issue31945
https://stackoverflow.com/questions/48719893/why-is-the-block-size-for-python-httplibs-reads-hard-coded-as-8192-bytes

Describe the bug
When trying to push large amount of data (4000MB in my case) that's "readable stream" (e.g. BytesIO or file reader - anything implementing "read") then the upload speed caps at around 8Mbps.
(For the context I'm working on 4000MB block upload support for Azure Storage SDK).

To Reproduce
Execute test_put_block_stream_large with LARGE_BLOCK_SIZE bumped to some large value (i.e. 4000MB upcoming , or 100MB currently supported threshold).

OR

Use scenario from my fork as reference.

Expected behavior
Upload speed of "readable" data is not capped by httpclient and can leverage full network bandwith available.

Possible solution
The https://bugs.python.org/msg305571 suggest quite handy workaround that could be part of pipeline I guess. So far I didn't see any way to inject different blocksize to httpclient.

Screenshots

Original test:

I was uploading 4000MB of data in single request without any modifications using "readable" stream. That took over 1 hour!!

Turns out httpclient is using 8192 byte buffer when readable stream is passed:

Then I started to play with blocksize. I was editing http client's source and bumping the blocksize.

After bumpting to 8192*1024 upload speed was more than 2X faster

And after bumping it to 1081921024 I managed to upload that payload in ~4 and half minute.

Additional context
This is going to impact future users of "large block"/"large blob" (4000MB new limit for single block / 200TB limit for single blob). Users of that feature are most likely work with streams - either uploading data from network or data produced on the fly by computations. Therefore it's important to address this deficiency.

kaerm · 2020-04-28T18:36:06Z

Thanks @kasobol-msft for the detailed description, @chlowell can you take a look at this

chlowell · 2020-04-28T18:46:02Z

Sure. Looks like it's an issue users will encounter through the storage libraries, that may require a change in azure-core. @lmazuel, @rakshith91, @xiangyan99, your thoughts?

mikeharder · 2020-04-30T00:05:02Z

My first reaction is this is something we should fix ASAP. I will gather more data and investigate possible fixes.

mikeharder · 2020-09-10T22:31:55Z

I spent some time investigating this issue, and while I haven't fully isolated the root cause(s), I have several conclusions:

It seems to only repro on Windows client machines, not Linux. I suspect this is due to differences in the OS networking stack. But any changes should be perf tested on both Windows and Linux.
The regression is much larger when network throughput is lower and/or latency is higher. Specifically, it may not repro when using a storage account in the same region, but it should repro when using a storage account far away (e.g. client in West US, storage in Australia).
The biggest regression appears to be in the azure-core Pipeline layer. In my tests, the regression doesn't repro when using raw http.client or the requests package, but it does repro when using the azure-core Pipeline or the Storage SDK. But @kpajdzik has seen some regressions at the http.client layer from his personal machine.
"Pipeline,array" is much faster than "Pipeline,stream". However, if "Pipeline,array" is executed once, then "Pipeline,stream" gets much faster for all future requests. This was very unexpected and may provide clues to the root cause.

Repro Steps

Create a storage account in a region far away from your client.
Create a blob container in the storage account, and generate a SAS token.
Construct the URL for testing upload perf. It should look like https://<account>.blob.core.windows.net/<container>/<blob>?<sas-token>.
Clone the repro app at https://github.com/mikeharder/python-storage-perf
pip install -r requirements.txt
Select the scenarios you want to run by commenting/uncommenting in the app source
python app.py <url> <size> (10MB is a good starting size)

Results

Client: Azure VM, DS3_v2, West US 2, Windows Server 2019
Storage Account: Australia East, Premium BlockBlobStorage

[http.client, stream] Put 10,000,000 bytes in 0.39 seconds (195.21 Mbps), Response=201
[http.client, array] Put 10,000,000 bytes in 0.37 seconds (207.69 Mbps), Response=201

[requests, stream] Put 10,000,000 bytes in 0.39 seconds (195.05 Mbps), Response=201
[requests, array] Put 10,000,000 bytes in 0.37 seconds (208.01 Mbps), Response=201

[Pipeline, stream] Put 10,000,000 bytes in 12.97 seconds (5.88 Mbps), Response=201

[stage_block, stream] Put 10,000,000 bytes in 13.68 seconds (5.58 Mbps)

This shows that http.client and requests have no perf issues when uploading a stream, but the Azure SDK does have a perf issue at the Pipeline layer.

[Pipeline, stream] Put 5,000,000 bytes in 7.17 seconds (5.32 Mbps), Response=201
[Pipeline, stream] Put 5,000,000 bytes in 6.20 seconds (6.15 Mbps), Response=201
[Pipeline, stream] Put 5,000,000 bytes in 6.20 seconds (6.15 Mbps), Response=201

[Pipeline, array] Put 5,000,000 bytes in 0.65 seconds (58.37 Mbps), Response=201
[Pipeline, array] Put 5,000,000 bytes in 0.24 seconds (160.23 Mbps), Response=201
[Pipeline, array] Put 5,000,000 bytes in 0.23 seconds (163.72 Mbps), Response=201

[Pipeline, stream] Put 5,000,000 bytes in 0.23 seconds (166.16 Mbps), Response=201
[Pipeline, stream] Put 5,000,000 bytes in 0.22 seconds (173.42 Mbps), Response=201
[Pipeline, stream] Put 5,000,000 bytes in 0.22 seconds (173.34 Mbps), Response=201

[Pipeline, array] Put 5,000,000 bytes in 0.24 seconds (158.44 Mbps), Response=201
[Pipeline, array] Put 5,000,000 bytes in 0.23 seconds (165.01 Mbps), Response=201
[Pipeline, array] Put 5,000,000 bytes in 0.24 seconds (162.09 Mbps), Response=201

This shows that "Pipeline,stream" is much slower than "Pipeline,array". However, once "Pipeline,array" has been executed once, then both have the same perf.

mikeharder · 2020-10-02T19:03:01Z

A similar issue was reported against curl on Windows that might be related:

https://curl.haxx.se/mail/lib-2018-07/0080.html

mikeharder · 2020-11-05T02:46:43Z

This should be fixed by #14442:

	Stream (Mbps)		Array (Mbps)
azure-core	Pipeline	stage_block	Pipeline	stage_block
1.8.2	5.93	5.88	287	302
#14442	271	302	270	284

@kasobol-msft: Would you like to verify as well?

kasobol-msft · 2020-11-05T17:52:25Z

@mikeharder works like a charm.

ghost added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Apr 24, 2020

kaerm added Azure.Core bug This issue requires a change to an existing behavior in the product in order to be resolved. labels Apr 28, 2020

ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Apr 28, 2020

kaerm added the Client This issue points to a problem in the data-plane of the library. label Apr 28, 2020

kaerm added this to the Backlog milestone Apr 28, 2020

mikeharder self-assigned this Apr 30, 2020

mikeharder added the EngSys This issue is impacting the engineering system. label Apr 30, 2020

kasobol-msft mentioned this issue May 1, 2020

Jumbo blob support #11176

Merged

mikeharder removed this from the Backlog milestone Jul 21, 2020

mikeharder assigned xiangyan99 and unassigned mikeharder Sep 14, 2020

mikeharder removed the EngSys This issue is impacting the engineering system. label Sep 14, 2020

mikeharder mentioned this issue Nov 4, 2020

Config the default blocksize to 32k for Python 3.7+ #14442

Merged

xiangyan99 closed this as completed in #14442 Nov 7, 2020

github-actions bot locked and limited conversation to collaborators Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single request uploads of large "readable" data stream are slow (capped at ~8Mbps) #11044

Single request uploads of large "readable" data stream are slow (capped at ~8Mbps) #11044

kasobol-msft commented Apr 24, 2020

kaerm commented Apr 28, 2020

chlowell commented Apr 28, 2020

mikeharder commented Apr 30, 2020

mikeharder commented Sep 10, 2020 •

edited

Loading

mikeharder commented Oct 2, 2020

mikeharder commented Nov 5, 2020

kasobol-msft commented Nov 5, 2020

Single request uploads of large "readable" data stream are slow (capped at ~8Mbps) #11044

Single request uploads of large "readable" data stream are slow (capped at ~8Mbps) #11044

Comments

kasobol-msft commented Apr 24, 2020

kaerm commented Apr 28, 2020

chlowell commented Apr 28, 2020

mikeharder commented Apr 30, 2020

mikeharder commented Sep 10, 2020 • edited Loading

Repro Steps

Results

mikeharder commented Oct 2, 2020

mikeharder commented Nov 5, 2020

kasobol-msft commented Nov 5, 2020

mikeharder commented Sep 10, 2020 •

edited

Loading