Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

does az storage blob download take advantage of sparse files? It should.. #5872

Closed
marvinthepa opened this issue Mar 21, 2018 · 10 comments
Closed
Assignees

Comments

@marvinthepa
Copy link

Either I cannot find out how to do it, or az storage blob download does not seem to take advantage of sparse files.
Downloading a vm os disk snapshot of 30GB, containing only about 1.7GB of data takes more than 60 minutes while azcopy downloads the same file in 7 minutes.

The only reference to something like this in the documentation is --max-connections:

    --max-connections             :  ...
                                    ... 
                                    This may also be useful if many blobs are expected to be empty
                                    as an extra request is required for empty blobs if
                                    max_connections is greater than 1.  Default: 2.

However, setting max-connections to 1 does not seem to make a difference to using the default of 2.

Am I doing something wrong?

Environment summary

Tried in two different environments, mac os:

$ echo $SHELL
zsh
$ brew install azure-cli
...
$ az --version
azure-cli (2.0.28)

...

Python location '/usr/local/opt/python/bin/python3.6'
Extensions directory '/Users/d064064/.azure/cliextensions'

Python (Darwin) 3.6.4 (default, Mar  1 2018, 18:36:50)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)]

Also using docker image (on linux)

docker run --rm -it microsoft/azure-cli:2.0.28 bash
@tjprescott tjprescott added Storage az storage question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Mar 21, 2018
@williexu
Copy link
Contributor

@marvinthepa thanks for pointing this out, we are aware of the current limitation.

The cli leverages the storage sdk: https://github.com/Azure/azure-storage-python
This feature should be implemented on the sdk level so python devs, as well as the CLI, can make use of it.
@seguler

@seguler
Copy link

seguler commented Mar 21, 2018

I believe it does. @zezha-msft to confirm

@zezha-msft
Copy link

@seguler We have the sparse file optimization for upload but not for download. For download, we are currently treating all blob types as equal and simply downloading everything. Perhaps we can add this item to our backlog.

@williexu williexu added the Service Attention This issue is responsible by Azure service team. label Apr 11, 2018
@zezha-msft
Copy link

@williexu I have added this item in our backlog.

@tjprescott tjprescott removed the question The issue doesn't require a change to the product in order to be resolved. Most issues start as that label Apr 11, 2018
@mayurid mayurid added Storage and removed Storage az storage labels Jun 15, 2018
@iyerusad
Copy link

iyerusad commented Jul 27, 2018

+1 on sparse enabled downloads using az cli - the only packaged alternative appears to be to use AzCopy, which is not portable to Linux/Mac agents. Want to use this in VSTS.

Edit: AzCopy is available on Linux - it was in Azure Automation that it wasn't readily suitable to consume.

@zezha-msft
Copy link

This issue will be solved in the new AzCopy V10. The related issue is here.

@iyerusad
Copy link

iyerusad commented Jun 5, 2019

@zezha-msft - Please note AzCopy is NOT available under Azure Automation: Even if I manually downloaded AzCopy within the Azure Automation, Azure Automation didn't/doesn't allow running of arbitrary binaries.

It would be preferable that az storage blob download handles sparse file downloading efficiently (similarly to AZcopy).

@zezha-msft
Copy link

Hi @iyerusad, thanks for the clarification! I see that it's still necessary to provide this functionality in the Python SDK.

@mozehgir mozehgir added Storage az storage and removed Storage labels Jul 26, 2019
@zezha-msft
Copy link

I've logged this item to be included in the next generation of the Storage SDK: https://github.com/Azure/azure-sdk-for-python/tree/master/sdk/storage/azure-storage-blob.

It will be part of the GA criteria.

@jeremybusk
Copy link

jeremybusk commented Dec 2, 2022

Piping to gzip example. Normal piping works now. Remove "-f myfile.vhd" if there so you are doing something like below after grant disk access.

az storage blob download --blob-url "mysasvhdfile-URI" | gzip > myhd.vhd.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests