-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoding Exhausted Stream #535
Decoding Exhausted Stream #535
Conversation
Very interesting that pypy succeeds but cpython fails (as expected!). |
Neat. @shazow ready for merge if you deem it so. |
Side note: I am guessing pypy's implementation of |
👍 |
1 similar comment
👍 |
data = self._decoder.decompress(data) | ||
except (IOError, zlib.error) as e: | ||
raise DecodeError( | ||
"Received response with content-encoding: %s, but " | ||
"failed to decode it." % content_encoding, e) | ||
|
||
if flush_decoder and decode_content and self._decoder: | ||
if data and flush_decoder and decode_content and self._decoder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before anyone merges, is this right? Maybe it needs to flush when data = b''
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm honestly not sure what the answer is. @Lukasa ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When another data = b''
the following self._decoder.decompress(binary_type())
would fail - if there was a bool(self._decoder.closed)
or equivalent - I imagine that would be cleaner than bool(data)
. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand your concern here better now. Basically, "What if we have data that's been decoded that hasn't been flushed yet?"
I think that's a valid usecase and that your problems were mostly arising from the block above, right? If you remove this check, do your tests fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed - if we remove either of those checks some test cases fail. I can throw in some tracebacks of each if that would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only care about removing this one though. Can you include the tracebacks for the tests that fail when you remove only this one (for the flush)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three tests fail when removing the second data check:
======================================================================
ERROR: test_chunked_decoding_deflate (test.test_response.TestResponse)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/bryanhelmig/Code/urllib3/test/test_response.py", line 115, in test_chunked_decoding_deflate
self.assertEqual(r.read(), b'')
File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 227, in read
buf = self._decoder.decompress(binary_type())
File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 34, in decompress
return self.decompress(self._data)
File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 25, in decompress
return self._obj.decompress(data)
error: Error -3 while decompressing: invalid stored block lengths
======================================================================
ERROR: test_chunked_decoding_deflate2 (test.test_response.TestResponse)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/bryanhelmig/Code/urllib3/test/test_response.py", line 132, in test_chunked_decoding_deflate2
self.assertEqual(r.read(), b'')
File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 227, in read
buf = self._decoder.decompress(binary_type())
File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 25, in decompress
return self._obj.decompress(data)
error: Error -2 while decompressing: inconsistent stream state
======================================================================
ERROR: test_chunked_decoding_gzip (test.test_response.TestResponse)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/bryanhelmig/Code/urllib3/test/test_response.py", line 149, in test_chunked_decoding_gzip
self.assertEqual(r.read(), b'')
File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 227, in read
buf = self._decoder.decompress(binary_type())
error: Error -2 while decompressing: inconsistent stream state
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense, binary_type()
is an empty string by definition - something _decoder.decompress
can't handle (after exhaustion). I wonder if we should just move the check into self._obj.decompress(data)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the check into self._obj.decompress
raises a whole slew of other errors. Need to really step through the code to resolve it I think - something screwy is afoot.
I hope no-one minds a friendly "bump" here. Happy to chase down any concerns. |
I don't mind @bryanhelmig =). I responded to your inline comment with a question of my own. I hope it helps. |
Looks like Python 3 failed some tests - I'll have to chase those down later after I get my Python 3 env in shape again. |
Big thanks to everyone for keeping this moving in my semi-absence. :) Will be back with viable internet in a few days. |
That's an interesting way of handling the problem. I wonder if those test failures are intermittent failures. |
@sigmavirus24 yeah, I'm thinking the errors are intermittent. Can we retest? WRT the PR - it's a bit different. It's easy to see why the error is raised at the lowest level: >>> import zlib
>>> compress = zlib.compressobj(6, zlib.DEFLATED, 16 + zlib.MAX_WBITS)
>>> data = compress.compress(b'foo')
>>> data += compress.flush()
>>> d = zlib.decompressobj(16 + zlib.MAX_WBITS)
>>> d.decompress(data)
'foo'
>>> d.decompress(b'') # run all you like...
''
>>> d.flush()
''
>>> d.decompress(b'') # raises after flush in python 2.7/3.2 - but succeeds in pypy (??)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
zlib.error: Error -2 while decompressing: inconsistent stream state It might be "more proper" to track the state of the decompressobj vs. ignoring empty data. |
Hmm yes, those intermittent failures are annoying. Looks good now. @bryanhelmig I like your solution, MERGING! Thank you. 🍒 |
❤️ @shazow :-) |
Changes 2.5.3 (2015-02-24) Bugfixes Revert changes to our vendored certificate bundle. For more context see (#2455, #2456, and http://bugs.python.org/issue23476) 2.5.2 (2015-02-23) Features and Improvements Add sha256 fingerprint support. (urllib3/urllib3#540) Improve the performance of headers. (urllib3/urllib3#544) Bugfixes Copy pip’s import machinery. When downstream redistributors remove requests.packages.urllib3 the import machinery will continue to let those same symbols work. Example usage in requests’ documentation and 3rd-party libraries relying on the vendored copies of urllib3 will work without having to fallback to the system urllib3. Attempt to quote parts of the URL on redirect if unquoting and then quoting fails. (#2356) Fix filename type check for multipart form-data uploads. (#2411) Properly handle the case where a server issuing digest authentication challenges provides both auth and auth-int qop-values. (#2408) Fix a socket leak. (urllib3/urllib3#549) Fix multiple Set-Cookie headers properly. (urllib3/urllib3#534) Disable the built-in hostname verification. (urllib3/urllib3#526) Fix the behaviour of decoding an exhausted stream. (urllib3/urllib3#535) Security Pulled in an updated cacert.pem. Drop RC4 from the default cipher list. (urllib3/urllib3#551)
test_api.py fails with connections reset in pipelined connections. URL: http://ia601201.us.archive.org/19/items/iacli-test-item/mfest.txt Sample TCP stream (continued with Connection: keep-alive) 02:47:25.759680 IP minst.43032 > ia601201.us.archive.org.http: Flags [P.], seq 2051:2272, ack 5802508, win 2576, options [nop,nop,TS val 1344857953 ecr 780847939], length 221 0x0000: 4500 0111 fab2 4000 4006 dc65 d5ef d969 E.....@[email protected] 0x0010: cff1 e383 a818 0050 2482 037a ebcb 5bbc .......P$..z..[. 0x0020: 8018 0a10 e309 0000 0101 080a 5028 e761 ............P(.a 0x0030: 2e8a cb43 4745 5420 2f31 392f 6974 656d ...CGET./19/item 0x0040: 732f 6961 636c 692d 7465 7374 2d69 7465 s/iacli-test-ite 0x0050: 6d2f 6d66 6573 742e 7478 7420 4854 5450 m/mfest.txt.HTTP 0x0060: 2f31 2e31 0d0a 486f 7374 3a20 6961 3630 /1.1..Host:.ia60 0x0070: 3132 3031 2e75 732e 6172 6368 6976 652e 1201.us.archive. 0x0080: 6f72 670d 0a43 6f6e 6e65 6374 696f 6e3a org..Connection: 0x0090: 206b 6565 702d 616c 6976 650d 0a41 6363 .keep-alive..Acc 0x00a0: 6570 742d 456e 636f 6469 6e67 3a20 677a ept-Encoding:.gz 0x00b0: 6970 2c20 6465 666c 6174 650d 0a41 6363 ip,.deflate..Acc 0x00c0: 6570 743a 202a 2f2a 0d0a 5573 6572 2d41 ept:.*/*..User-A 0x00d0: 6765 6e74 3a20 7079 7468 6f6e 2d72 6571 gent:.python-req 0x00e0: 7565 7374 732f 322e 342e 3320 4350 7974 uests/2.4.3.CPyt 0x00f0: 686f 6e2f 322e 372e 3820 4672 6565 4253 hon/2.7.8.FreeBS 0x0100: 442f 3130 2e31 2d53 5441 424c 450d 0a0d D/10.1-STABLE... 0x0110: 0a . 02:47:25.871239 IP ia601201.us.archive.org.http > minst.43032: Flags [F.], seq 5802508, ack 2051, win 302, options [nop,nop,TS val 780848061 ecr 1344857578], length 0 0x0000: 4500 0034 7f6d 4000 3206 6688 cff1 e383 [email protected]..... 0x0010: d5ef d969 0050 a818 ebcb 5bbc 2482 037a ...i.P....[.$..z 0x0020: 8011 012e ca77 0000 0101 080a 2e8a cbbd .....w.......... 0x0030: 5028 e5ea P(.. 02:47:25.871300 IP minst.43032 > ia601201.us.archive.org.http: Flags [.], ack 5802509, win 2576, options [nop,nop,TS val 1344858066 ecr 780848061], length 0 0x0000: 4500 0034 fab8 4000 4006 dd3c d5ef d969 E..4..@.@..<...i 0x0010: cff1 e383 a818 0050 2482 0457 ebcb 5bbd .......P$..W..[. 0x0020: 8010 0a10 bed0 0000 0101 080a 5028 e7d2 ............P(.. 0x0030: 2e8a cbbd .... 02:47:25.880633 IP minst.43032 > ia601201.us.archive.org.http: Flags [F.], seq 2272, ack 5802509, win 2576, options [nop,nop,TS val 1344858074 ecr 780848061], length 0 0x0000: 4500 0034 fab9 4000 4006 dd3b d5ef d969 E..4..@.@..;...i 0x0010: cff1 e383 a818 0050 2482 0457 ebcb 5bbd .......P$..W..[. 0x0020: 8011 0a10 bec7 0000 0101 080a 5028 e7da ............P(.. 0x0030: 2e8a cbbd .... 02:47:25.916096 IP ia601201.us.archive.org.http > minst.43032: Flags [R], seq 3955973052, win 0, length 0 0x0000: 4500 0028 28dd 4000 3206 bd24 cff1 e383 E..(([email protected]..$.... 0x0010: d5ef d969 0050 a818 ebcb 5bbc 0000 0000 ...i.P....[..... 0x0020: 5004 0000 5d21 0000 0000 0000 0000 P...]!........ 02:47:26.027561 IP ia601201.us.archive.org.http > minst.43032: Flags [R], seq 3955973053, win 0, length 0 0x0000: 4500 0028 28e4 4000 3206 bd1d cff1 e383 E..(([email protected]....... 0x0010: d5ef d969 0050 a818 ebcb 5bbd 0000 0000 ...i.P....[..... 0x0020: 5004 0000 5d20 0000 0000 0000 0000 P...]......... 02:47:26.036999 IP ia601201.us.archive.org.http > minst.43032: Flags [R], seq 3955973053, win 0, length 0 0x0000: 4500 0028 28e5 4000 3206 bd1c cff1 e383 E..(([email protected]....... 0x0010: d5ef d969 0050 a818 ebcb 5bbd 0000 0000 ...i.P....[..... 0x0020: 5004 0000 5d20 0000 0000 0000 0000 P...]......... This might be related to this urllib3 bug: urllib3/urllib3#535
Added a "reference" read() test, as well as a demonstration of how it differs from the streaming behavior for deflate or gzip. "reference" may be too strong of word in the context of urllib3, but seems to hold true for the standard library:
As of this PR opening - I expect 3 existing tests to fail and 1 new test to succeed. No solution yet!