Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decoding Exhausted Stream #535

Merged
merged 3 commits into from
Jan 30, 2015

Conversation

bryanhelmig
Copy link
Contributor

Added a "reference" read() test, as well as a demonstration of how it differs from the streaming behavior for deflate or gzip. "reference" may be too strong of word in the context of urllib3, but seems to hold true for the standard library:

>>> from io import BytesIO
>>> f = BytesIO('foo')
>>> f.read(3)
'foo'
>>> f.read()
''
>>> f.read()
''

As of this PR opening - I expect 3 existing tests to fail and 1 new test to succeed. No solution yet!

@bryanhelmig
Copy link
Contributor Author

Very interesting that pypy succeeds but cpython fails (as expected!).

@bryanhelmig
Copy link
Contributor Author

Neat. @shazow ready for merge if you deem it so.

@bryanhelmig
Copy link
Contributor Author

Side note: I am guessing pypy's implementation of zlib.decompressobj().decompress() handles empty b'' without exception vs cpython's native. Interesting.

@sigmavirus24
Copy link
Contributor

👍

1 similar comment
@Lukasa
Copy link
Contributor

Lukasa commented Jan 17, 2015

👍

data = self._decoder.decompress(data)
except (IOError, zlib.error) as e:
raise DecodeError(
"Received response with content-encoding: %s, but "
"failed to decode it." % content_encoding, e)

if flush_decoder and decode_content and self._decoder:
if data and flush_decoder and decode_content and self._decoder:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before anyone merges, is this right? Maybe it needs to flush when data = b'' too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm honestly not sure what the answer is. @Lukasa ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When another data = b'' the following self._decoder.decompress(binary_type()) would fail - if there was a bool(self._decoder.closed) or equivalent - I imagine that would be cleaner than bool(data). 😕

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I understand your concern here better now. Basically, "What if we have data that's been decoded that hasn't been flushed yet?"

I think that's a valid usecase and that your problems were mostly arising from the block above, right? If you remove this check, do your tests fail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed - if we remove either of those checks some test cases fail. I can throw in some tracebacks of each if that would help.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only care about removing this one though. Can you include the tracebacks for the tests that fail when you remove only this one (for the flush)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The three tests fail when removing the second data check:

======================================================================
ERROR: test_chunked_decoding_deflate (test.test_response.TestResponse)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bryanhelmig/Code/urllib3/test/test_response.py", line 115, in test_chunked_decoding_deflate
    self.assertEqual(r.read(), b'')
  File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 227, in read
    buf = self._decoder.decompress(binary_type())
  File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 34, in decompress
    return self.decompress(self._data)
  File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 25, in decompress
    return self._obj.decompress(data)
error: Error -3 while decompressing: invalid stored block lengths

======================================================================
ERROR: test_chunked_decoding_deflate2 (test.test_response.TestResponse)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bryanhelmig/Code/urllib3/test/test_response.py", line 132, in test_chunked_decoding_deflate2
    self.assertEqual(r.read(), b'')
  File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 227, in read
    buf = self._decoder.decompress(binary_type())
  File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 25, in decompress
    return self._obj.decompress(data)
error: Error -2 while decompressing: inconsistent stream state

======================================================================
ERROR: test_chunked_decoding_gzip (test.test_response.TestResponse)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bryanhelmig/Code/urllib3/test/test_response.py", line 149, in test_chunked_decoding_gzip
    self.assertEqual(r.read(), b'')
  File "/Users/bryanhelmig/Code/urllib3/urllib3/response.py", line 227, in read
    buf = self._decoder.decompress(binary_type())
error: Error -2 while decompressing: inconsistent stream state

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense, binary_type() is an empty string by definition - something _decoder.decompress can't handle (after exhaustion). I wonder if we should just move the check into self._obj.decompress(data)...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the check into self._obj.decompress raises a whole slew of other errors. Need to really step through the code to resolve it I think - something screwy is afoot.

@bryanhelmig
Copy link
Contributor Author

I hope no-one minds a friendly "bump" here. Happy to chase down any concerns.

@sigmavirus24
Copy link
Contributor

I don't mind @bryanhelmig =). I responded to your inline comment with a question of my own. I hope it helps.

@bryanhelmig
Copy link
Contributor Author

Looks like Python 3 failed some tests - I'll have to chase those down later after I get my Python 3 env in shape again.

@shazow
Copy link
Member

shazow commented Jan 25, 2015

Big thanks to everyone for keeping this moving in my semi-absence. :) Will be back with viable internet in a few days.

@sigmavirus24
Copy link
Contributor

That's an interesting way of handling the problem. I wonder if those test failures are intermittent failures.

@bryanhelmig
Copy link
Contributor Author

@sigmavirus24 yeah, I'm thinking the errors are intermittent. Can we retest?

WRT the PR - it's a bit different. It's easy to see why the error is raised at the lowest level:

>>> import zlib
>>> compress = zlib.compressobj(6, zlib.DEFLATED, 16 + zlib.MAX_WBITS)
>>> data = compress.compress(b'foo')
>>> data += compress.flush()
>>> d = zlib.decompressobj(16 + zlib.MAX_WBITS)
>>> d.decompress(data)
'foo'
>>> d.decompress(b'') # run all you like...
''
>>> d.flush()
''
>>> d.decompress(b'') # raises after flush in python 2.7/3.2 - but succeeds in pypy (??)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -2 while decompressing: inconsistent stream state

It might be "more proper" to track the state of the decompressobj vs. ignoring empty data.

@shazow
Copy link
Member

shazow commented Jan 30, 2015

Hmm yes, those intermittent failures are annoying. Looks good now.

@bryanhelmig I like your solution, MERGING! Thank you. 🍒

shazow added a commit that referenced this pull request Jan 30, 2015
@shazow shazow merged commit 0b6a8e4 into urllib3:master Jan 30, 2015
shazow added a commit that referenced this pull request Jan 30, 2015
@bryanhelmig
Copy link
Contributor Author

❤️ @shazow :-)

seocam pushed a commit to TracyWebTech/urllib3 that referenced this pull request Feb 5, 2015
jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Mar 10, 2015
Changes


2.5.3 (2015-02-24)

Bugfixes

    Revert changes to our vendored certificate bundle. For more context see (#2455, #2456, and http://bugs.python.org/issue23476)

2.5.2 (2015-02-23)

Features and Improvements

    Add sha256 fingerprint support. (urllib3/urllib3#540)
    Improve the performance of headers. (urllib3/urllib3#544)

Bugfixes

    Copy pip’s import machinery. When downstream redistributors remove requests.packages.urllib3 the import machinery will continue to let those same symbols work. Example usage in requests’ documentation and 3rd-party libraries relying on the vendored copies of urllib3 will work without having to fallback to the system urllib3.
    Attempt to quote parts of the URL on redirect if unquoting and then quoting fails. (#2356)
    Fix filename type check for multipart form-data uploads. (#2411)
    Properly handle the case where a server issuing digest authentication challenges provides both auth and auth-int qop-values. (#2408)
    Fix a socket leak. (urllib3/urllib3#549)
    Fix multiple Set-Cookie headers properly. (urllib3/urllib3#534)
    Disable the built-in hostname verification. (urllib3/urllib3#526)
    Fix the behaviour of decoding an exhausted stream. (urllib3/urllib3#535)

Security

    Pulled in an updated cacert.pem.
    Drop RC4 from the default cipher list. (urllib3/urllib3#551)
saper added a commit to saper/internetarchive that referenced this pull request Apr 10, 2015
test_api.py fails with connections reset
in pipelined connections.

URL: http://ia601201.us.archive.org/19/items/iacli-test-item/mfest.txt

Sample TCP stream (continued with Connection: keep-alive)

02:47:25.759680 IP minst.43032 > ia601201.us.archive.org.http: Flags [P.], seq 2051:2272, ack 5802508, win 2576, options [nop,nop,TS val 1344857953 ecr 780847939], length 221
	0x0000:  4500 0111 fab2 4000 4006 dc65 d5ef d969  E.....@[email protected]
	0x0010:  cff1 e383 a818 0050 2482 037a ebcb 5bbc  .......P$..z..[.
	0x0020:  8018 0a10 e309 0000 0101 080a 5028 e761  ............P(.a
	0x0030:  2e8a cb43 4745 5420 2f31 392f 6974 656d  ...CGET./19/item
	0x0040:  732f 6961 636c 692d 7465 7374 2d69 7465  s/iacli-test-ite
	0x0050:  6d2f 6d66 6573 742e 7478 7420 4854 5450  m/mfest.txt.HTTP
	0x0060:  2f31 2e31 0d0a 486f 7374 3a20 6961 3630  /1.1..Host:.ia60
	0x0070:  3132 3031 2e75 732e 6172 6368 6976 652e  1201.us.archive.
	0x0080:  6f72 670d 0a43 6f6e 6e65 6374 696f 6e3a  org..Connection:
	0x0090:  206b 6565 702d 616c 6976 650d 0a41 6363  .keep-alive..Acc
	0x00a0:  6570 742d 456e 636f 6469 6e67 3a20 677a  ept-Encoding:.gz
	0x00b0:  6970 2c20 6465 666c 6174 650d 0a41 6363  ip,.deflate..Acc
	0x00c0:  6570 743a 202a 2f2a 0d0a 5573 6572 2d41  ept:.*/*..User-A
	0x00d0:  6765 6e74 3a20 7079 7468 6f6e 2d72 6571  gent:.python-req
	0x00e0:  7565 7374 732f 322e 342e 3320 4350 7974  uests/2.4.3.CPyt
	0x00f0:  686f 6e2f 322e 372e 3820 4672 6565 4253  hon/2.7.8.FreeBS
	0x0100:  442f 3130 2e31 2d53 5441 424c 450d 0a0d  D/10.1-STABLE...
	0x0110:  0a                                       .
02:47:25.871239 IP ia601201.us.archive.org.http > minst.43032: Flags [F.], seq 5802508, ack 2051, win 302, options [nop,nop,TS val 780848061 ecr 1344857578], length 0
	0x0000:  4500 0034 7f6d 4000 3206 6688 cff1 e383  [email protected].....
	0x0010:  d5ef d969 0050 a818 ebcb 5bbc 2482 037a  ...i.P....[.$..z
	0x0020:  8011 012e ca77 0000 0101 080a 2e8a cbbd  .....w..........
	0x0030:  5028 e5ea                                P(..
02:47:25.871300 IP minst.43032 > ia601201.us.archive.org.http: Flags [.], ack 5802509, win 2576, options [nop,nop,TS val 1344858066 ecr 780848061], length 0
	0x0000:  4500 0034 fab8 4000 4006 dd3c d5ef d969  E..4..@.@..<...i
	0x0010:  cff1 e383 a818 0050 2482 0457 ebcb 5bbd  .......P$..W..[.
	0x0020:  8010 0a10 bed0 0000 0101 080a 5028 e7d2  ............P(..
	0x0030:  2e8a cbbd                                ....
02:47:25.880633 IP minst.43032 > ia601201.us.archive.org.http: Flags [F.], seq 2272, ack 5802509, win 2576, options [nop,nop,TS val 1344858074 ecr 780848061], length 0
	0x0000:  4500 0034 fab9 4000 4006 dd3b d5ef d969  E..4..@.@..;...i
	0x0010:  cff1 e383 a818 0050 2482 0457 ebcb 5bbd  .......P$..W..[.
	0x0020:  8011 0a10 bec7 0000 0101 080a 5028 e7da  ............P(..
	0x0030:  2e8a cbbd                                ....
02:47:25.916096 IP ia601201.us.archive.org.http > minst.43032: Flags [R], seq 3955973052, win 0, length 0
	0x0000:  4500 0028 28dd 4000 3206 bd24 cff1 e383  E..(([email protected]..$....
	0x0010:  d5ef d969 0050 a818 ebcb 5bbc 0000 0000  ...i.P....[.....
	0x0020:  5004 0000 5d21 0000 0000 0000 0000       P...]!........
02:47:26.027561 IP ia601201.us.archive.org.http > minst.43032: Flags [R], seq 3955973053, win 0, length 0
	0x0000:  4500 0028 28e4 4000 3206 bd1d cff1 e383  E..(([email protected].......
	0x0010:  d5ef d969 0050 a818 ebcb 5bbd 0000 0000  ...i.P....[.....
	0x0020:  5004 0000 5d20 0000 0000 0000 0000       P...].........
02:47:26.036999 IP ia601201.us.archive.org.http > minst.43032: Flags [R], seq 3955973053, win 0, length 0
	0x0000:  4500 0028 28e5 4000 3206 bd1c cff1 e383  E..(([email protected].......
	0x0010:  d5ef d969 0050 a818 ebcb 5bbd 0000 0000  ...i.P....[.....
	0x0020:  5004 0000 5d20 0000 0000 0000 0000       P...].........

This might be related to this urllib3 bug:

urllib3/urllib3#535
@bryanhelmig bryanhelmig deleted the reference-read-stream-deflate branch June 3, 2016 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants