Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange results for download_blob with offset and length #7061

Closed
bryevdv opened this issue Sep 4, 2019 · 4 comments
Closed

Strange results for download_blob with offset and length #7061

bryevdv opened this issue Sep 4, 2019 · 4 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files)

Comments

@bryevdv
Copy link
Contributor

bryevdv commented Sep 4, 2019

Seeing the following results with azure.storage.blob version 12.0.0b2

In [4]: url = "https://bryan.blob.core.windows.net/julia/foo.py" + AZ_SAS_TOKEN

In [5]: bc = BlobClient(url)

In [6]: bc.download_blob().content_as_bytes()
Out[6]: b'class Foo:\n    pass\n\n\nclass Bar:\n    pass\n\n\nclass Baz(object):\n    pass\n'

In [7]: bc.download_blob(offset=0, length=5).content_as_bytes()
Out[7]: b'class '

In [8]: bc.download_blob(offset=5, length=5).content_as_bytes()
Out[8]: b' '

Noting the following unexpected behavior:

  • The first case with offset=0, length=5 returns b'class ' which has length 6
  • The second case with offset=5, length=5 only returns one byte even though there are more than 5 bytes left in the file at that offset

Also there is a typo in the ValueError raised when a length is given without an offset:

raise ValueError("Offset value must not be None is length is set.")

I think that is meant to be "IF length is set" (or perhaps "when length is set")

cc @annatisch @rakshith91 @xiafu-msft @zezha-msft

@bryevdv bryevdv added the Storage Storage Service (Queues, Blobs, Files) label Sep 4, 2019
@bryevdv
Copy link
Contributor Author

bryevdv commented Sep 4, 2019

One more data point. I decided to try reading past the single byte that was returned in the second case, and in that case get an exception:

In [10]: bc.download_blob(offset=15, length=5).content_as_bytes()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-67c0e96369ee> in <module>
----> 1 bc.download_blob(offset=15, length=5).content_as_bytes()

~/anaconda3/envs/preview/lib/python3.7/site-packages/azure/core/tracing/decorator.py in wrapper_use_tracer(*args, **kwargs)
     70             common.set_span_contexts(orig_wrapped_span, span_instance=original_span_instance)
     71         else:
---> 72             ans = func(*args, **kwargs) # type: ignore
     73         return ans
     74

~/anaconda3/envs/preview/lib/python3.7/site-packages/azure/storage/blob/blob_client.py in download_blob(self, offset, length, validate_content, **kwargs)
    620             'container': self.container_name
    621         }
--> 622         return StorageStreamDownloader(extra_properties=extra_properties, **options)
    623
    624     def _delete_blob_options(self, delete_snapshots=False, **kwargs):

~/anaconda3/envs/preview/lib/python3.7/site-packages/azure/storage/blob/_shared/downloads.py in __init__(self, service, config, offset, length, validate_content, encryption_options, extra_properties, **kwargs)
    259         self.download_size = None
    260         self.file_size = None
--> 261         self.response = self._initial_request()
    262         self.properties = self.response.properties
    263

~/anaconda3/envs/preview/lib/python3.7/site-packages/azure/storage/blob/_shared/downloads.py in _initial_request(self)
    345             if self.length is not None:
    346                 # Use the length unless it is over the end of the file
--> 347                 self.download_size = min(self.file_size, self.length - self.offset + 1)
    348             elif self.offset is not None:
    349                 self.download_size = self.file_size - self.offset

TypeError: '<' not supported between instances of 'int' and 'NoneType'

For reference the file is 72 bytes:

In [11]: len(bc.download_blob().content_as_bytes())
Out[11]: 72

@zezha-msft zezha-msft added the customer-reported Issues that are reported by GitHub users external to the Azure organization. label Sep 5, 2019
@xiafu-msft
Copy link
Contributor

xiafu-msft commented Sep 10, 2019

Hi @bryevdv
Thanks for reaching out! We have fixed the typo in the ValueError message, you will see the correct error message in the next release(should be today)

For the weird behavior of download_blob, that is because the name of parameter length is not accurate.. the parameter name should be something like range_end instead of length.
Let's say if you want to download 5 bytes from a blob, you should put download_blob(offset=6, length=6+5-1).... I know it is super confusing, so sorry about the inconvenience, we will rename the parameter or make it function per its name in the next preview.

Also for stage_block_from_url API, the length also means range_end (in case you also use that)

Sorry about the inconvenience again.

@kurtzeborn kurtzeborn added Client This issue points to a problem in the data-plane of the library. and removed customer-reported Issues that are reported by GitHub users external to the Azure organization. labels Sep 13, 2019
@bryevdv
Copy link
Contributor Author

bryevdv commented Sep 26, 2019

@xiafu-msft I might suggest keeping the public parameter as length, and instead internally computing the range end from it. From a user perspective I expect it is more common and convenient to want to say "download 500 bytes starting at byte 500" than it is to say "download bytes between 500 and 999"

FYI in python it is consistenty the case that ranges are half open, inclusive of the start, but exclusive of the end. If you do keep an API that specifies the "end" instead of a length, it would definitely be advised to make the end behave the same way as ranges everywhere else in Python.

@rakshith91
Copy link
Contributor

fixed

@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

5 participants