-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MD5 validation broken? #34
Comments
Thanks for filing @danqing. I'll look into it. It might break in a way that we didn't expect (i.e. we made a programming error). /cc @mfschwartz |
I'll be using the |
OK I just tried to reproduce and could not. @danqing could you provide a gzip-ed file? Is this happening in a reproducible fashion for you? Do you have a stacktrace you could share? Here is what I used to try to reproduce:
|
Danny - I suspect this problem may happen when the content-encoding is set to gzip. You can see the content-encoding using: Then try to download it using the updated library. |
I confirmed that having content-encoding:gzip causes this failure. Is google-cloud-python uncompressing the object on the fly, or is it not setting accept-encoding': 'gzip, deflate' ? |
Confirmed here. When content_encoding is gzip on download |
@mfschwartz I have also reproduced (see below). I'd love to add a system test to this library ( UPDATE: I "figured" it out: >>> import gzip
>>> from google.resumable_media.requests import MultipartUpload
>>>
>>> url_template = (
... u'https://www.googleapis.com/upload/storage/v1/b/{bucket}/o?'
... u'uploadType=multipart')
>>> upload_url = url_template.format(bucket=bucket)
>>>
>>> upload = MultipartUpload(upload_url)
>>> metadata = {
... u'name': blob_name,
... u'contentEncoding': u'gzip',
... }
>>> data = gzip.compress(b'Stuff\n')
>>> content_type = u'text/plain'
>>> response = upload.transmit(transport, data, metadata, content_type) Here is what I did:
|
Hi Danny - I'm working on a fix now, as well as a system test. |
@mfschwartz Thanks. Relevant / related: https://github.com/GoogleCloudPlatform/google-cloud-python/pull/3380/files was just merged to Questions for you:
Somewhat of an answer to my question: In [14]: response = download.consume(transport)
---------------------------------------------------------------------------
DataCorruption Traceback (most recent call last)
<ipython-input-14-74c38720271c> in <module>()
----> 1 response = download.consume(transport)
.../venv/lib/python3.6/site-packages/google/resumable_media/requests/download.py in consume(self, transport)
167
168 if self._stream is not None:
--> 169 self._write_to_stream(result)
170
171 return result
.../venv/lib/python3.6/site-packages/google/resumable_media/requests/download.py in _write_to_stream(self, response)
130 msg = _CHECKSUM_MISMATCH.format(
131 self.media_url, expected_md5_hash, actual_md5_hash)
--> 132 raise common.DataCorruption(response, msg)
133
134 def consume(self, transport):
DataCorruption: Checksum mismatch while downloading:
https://www.googleapis.com/download/storage/v1/b/${BUCKET}/o/stuff.txt?alt=media
The X-Goog-Hash header indicated an MD5 checksum of:
matKHCUjwpcS/fYLXwgV3Q==
but the actual MD5 checksum of the downloaded contents was:
Eypyl7eiV79Z1QkQqoXh4Q==
In [15]: stream.getvalue()
Out[15]: b'Stuff\n' |
|
Also relevant: https://stackoverflow.com/a/25811745/1068170. |
This isn't fixed for me with google-resumanle-media 0.3.2 :( |
@zmunro Can you provide more information? E.g., is it broken for you just for gzipped content? |
I'm now getting this error today after months of no problems, on some gzipped files from GCS. Error seems to happen at random, i.e. can do the same thing and maybe 50% of the time it works, 50% of the time it crashes. |
I have a file that I uploaded and then downloaded with
google-resumable-media
under the hood. It seems that 0.3 reports that theX-Goog-Hash
header reports a different MD5 than the computed one. The downloaded file however is intact.The only thing is my file is gzipped. Does gzip break the MD5 calculation?
Thanks!
Also to be closed: googleapis/google-cloud-python#4227
The text was updated successfully, but these errors were encountered: