Rebase #3745 and add some tests #3923

vbarbaresi · 2017-03-14T18:41:10Z

2 years after the original #2431, and after #3745
I encountered the strange behavior of r.iter_lines(), found this and tried to complete it.
Rebased once more and added more tests using the breakdown of @ianepperson https://github.com/kennethreitz/requests/pull/2431#issuecomment-72333964

Lukasa

Cool, so this generally looks really good! I have one small note.

Lukasa · 2017-03-14T19:07:50Z

requests/models.py

+                                       decode_unicode=decode_unicode):
+            # Skip any null responses
+            if not chunk:
+                continue


Should this check move after the pending data logic?

You're right
Fixed and added tests for this behavior 👌

I don't know if it's licit to receive empty chunks in the middle of a stream, but I guess it's possible in theory. (and it would work)

Actually I replied too fast: if we want to return the same result on
['line\r\n'] and ['line\r\n', ''] (which seems right),
we need to keep this check on top

Otherwise we consume pending and discard it

I think it's safe to discard empty chunks early, pending will be carried to the next non-empty chunk

I don't know if it's licit to receive empty chunks in the middle of a stream, but I guess it's possible in theory. (and it would work)

urllib3 won't give them to us, but people like putting things that aren't urllib3 underneath us, and they might. So we should tolerate it.

As to your note about wanting to return the same result, I think that makes sense. But we should add a comment to say that "pending is necessarily an incomplete chunk, and so if we don't have more data we don't want to bother trying to get it".

codecov-io · 2017-03-15T00:29:42Z

Codecov Report

Merging #3923 into proposed/3.0.0 will increase coverage by 0.12%.
The diff coverage is 100%.

@@                Coverage Diff                 @@
##           proposed/3.0.0    #3923      +/-   ##
==================================================
+ Coverage           89.32%   89.44%   +0.12%     
==================================================
  Files                  15       15              
  Lines                1929     1932       +3     
==================================================
+ Hits                 1723     1728       +5     
+ Misses                206      204       -2

Impacted Files	Coverage Δ
requests/models.py	`94.22% <100%> (+0.48%)`	✅

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84dc6b6...d491e9f. Read the comment docs.

Write a list of different chunk splits and their expected results to test against, using ianepperson's breakdown as specification: https://github.com/kennethreitz/requests/pull/2431#issuecomment-72333964

check the case of an empty chunk somewhere in the stream

Lukasa

Ok, I think I'm happy with this. Thanks for pushing this over the line! I'll try to dive into what's going on with the tests.

Lukasa

Ok, so the tests are failing because of a bytes/string incompatibility. Want to take a look?

vbarbaresi · 2017-03-15T12:54:17Z

I'll take a look

vbarbaresi · 2017-03-15T16:26:46Z

I attempted a fix; though I'm not sure it's correct
Maybe use self.encoding or self.apparent_encoding?

When decode_unicode is False we don't deal with the encoding, so I'm not sure it makes sense to assume utf-8 and to encode only the last byte

Lukasa · 2017-03-15T17:46:44Z

requests/models.py

+            if isinstance(chunk[-1], basestring):  # Decoded string (decode_unicode=True or Py2)
+                incomplete_line = lines[-1] and lines[-1].endswith(chunk[-1])
+            else:  # Bytestring (decode_unicode=False and Py3)
+                incomplete_line = lines[-1] and lines[-1].endswith(chr(chunk[-1]).encode('utf-8'))


So, to be clear, the error in this fix was assuming that we need to tolerate str here. The tests were in error, not the code: iter_content is required to always yield bytes. You'll find if you go through the tests and use bytes everywhere the problem goes away. 😁 So we don't need this.

I'm confused: I checked in a normal usage (outside of the tests) and could reproduce the issue:
with Python2 I get str, and with Python3 I get bytes (and if I use iter_lines(decode_unicode=True)
I get str (Py3) or unicode (Py2)

The previous code wasn't accepting bytes (and was working only with Python 2)

This failed on Python3 before my last fix:

r = requests.get("https://github.com", stream=True) list(r.iter_lines())

TypeError: endswith first arg must be bytes or a tuple of bytes, not int

That's why I don't think the tests were in error (a test that I didn't write was failing, test_response_iter_lines)

@vbarbaresi The issue here seems to be stemming from 46e3fdd in #3745. The conditional on master, and prior to that commit, is done by comparing the last index of each string/bytestring. When retrieving a single index from a bytestring in Python 3 it returns an integer representation of the byte. This works in the case of lines[-1] and lines[-1][-1] == chunk[-1] but not lines[-1] and lines[-1].endswith(chunk[-1]).

I'd suggest we simply move back to using the original double indexing, rather than introducing more logic to handle the seemingly unneeded endswith.

Thanks, I missed that... much neater. It works fine with a simple indexing in all cases.

nateprewitt · 2017-03-15T20:37:54Z

tests/test_requests.py

@@ -1741,10 +1816,12 @@ def test_json_param_post_should_not_override_data_param(self, httpbin):
        prep = r.prepare()
        assert 'stuff=elixr' == prep.body

-    def test_response_iter_lines(self, httpbin):
+    @pytest.mark.parametrize('decode_unicode', (True, False))
+    def test_response_iter_lines(self, httpbin, decode_unicode):


Also, I'm not sure if you're going to back this commit out, but if not, decode_unicode doesn't seem to be used for anything here in this tests current configuration.

Nice catch
I overwrote the commit, and fixed the unused decode_unicode parameter 👌

Also add a parametrize on decode_unicode for iter_lines() test to check with bytestrings and str content

Lukasa

Cool, we seem to have resolved everything outstanding. Nicely done!

ianepperson and others added 3 commits March 14, 2017 16:57

Test to show bug when delimiter is split between reads

02031e3

Fix bug when delimiter is split between responses

9174925

Review markups for @Lukasa

9881be2

Lukasa requested changes Mar 14, 2017

View reviewed changes

vbarbaresi force-pushed the 3.0.0-iter_lines branch 2 times, most recently from 399b9f1 to a7e7181 Compare March 15, 2017 00:29

vbarbaresi added 3 commits March 15, 2017 01:37

add some parametrized tests for iter_lines()

0380ac5

Write a list of different chunk splits and their expected results to test against, using ianepperson's breakdown as specification: https://github.com/kennethreitz/requests/pull/2431#issuecomment-72333964

add more tests for iter_lines()

5a8bc19

check the case of an empty chunk somewhere in the stream

remove useless brackets in iter_lines boolean condition

cc2ac23

vbarbaresi force-pushed the 3.0.0-iter_lines branch from a7e7181 to cc2ac23 Compare March 15, 2017 00:38

add explanatory comment about skipping null chunks in iter_lines

052595f

Lukasa approved these changes Mar 15, 2017

View reviewed changes

Lukasa requested changes Mar 15, 2017

View reviewed changes

Lukasa reviewed Mar 15, 2017

View reviewed changes

nateprewitt reviewed Mar 15, 2017

View reviewed changes

use [-1] instead of endswith() to work with bytes or string

d491e9f

Also add a parametrize on decode_unicode for iter_lines() test to check with bytestrings and str content

vbarbaresi force-pushed the 3.0.0-iter_lines branch from a2adf26 to d491e9f Compare March 15, 2017 21:30

Lukasa approved these changes Mar 16, 2017

View reviewed changes

Lukasa merged commit 73456b0 into psf:proposed/3.0.0 Mar 16, 2017

Lukasa mentioned this pull request Apr 21, 2017

iter_lines() sometimes yields broken results when chunk_size is small #3980

Closed

vbarbaresi deleted the 3.0.0-iter_lines branch May 31, 2017 14:52

github-actions bot locked as resolved and limited conversation to collaborators Sep 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebase #3745 and add some tests #3923

Rebase #3745 and add some tests #3923

vbarbaresi commented Mar 14, 2017

Lukasa left a comment

Lukasa Mar 14, 2017

vbarbaresi Mar 15, 2017

vbarbaresi Mar 15, 2017

Lukasa Mar 15, 2017

vbarbaresi Mar 15, 2017

codecov-io commented Mar 15, 2017 •

edited

Loading

Lukasa left a comment

Lukasa left a comment

vbarbaresi commented Mar 15, 2017

vbarbaresi commented Mar 15, 2017

Lukasa Mar 15, 2017

vbarbaresi Mar 15, 2017

nateprewitt Mar 15, 2017

vbarbaresi Mar 15, 2017

nateprewitt Mar 15, 2017

vbarbaresi Mar 15, 2017

Lukasa left a comment

Rebase #3745 and add some tests #3923

Rebase #3745 and add some tests #3923

Conversation

vbarbaresi commented Mar 14, 2017

Lukasa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Mar 15, 2017 • edited Loading

Codecov Report

Lukasa left a comment

Choose a reason for hiding this comment

Lukasa left a comment

Choose a reason for hiding this comment

vbarbaresi commented Mar 15, 2017

vbarbaresi commented Mar 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lukasa left a comment

Choose a reason for hiding this comment

codecov-io commented Mar 15, 2017 •

edited

Loading