Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase #3745 and add some tests #3923

Merged
merged 8 commits into from
Mar 16, 2017
Merged

Conversation

vbarbaresi
Copy link

2 years after the original #2431, and after #3745
I encountered the strange behavior of r.iter_lines(), found this and tried to complete it.
Rebased once more and added more tests using the breakdown of @ianepperson https://github.com/kennethreitz/requests/pull/2431#issuecomment-72333964

Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, so this generally looks really good! I have one small note.

decode_unicode=decode_unicode):
# Skip any null responses
if not chunk:
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check move after the pending data logic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right
Fixed and added tests for this behavior 👌

I don't know if it's licit to receive empty chunks in the middle of a stream, but I guess it's possible in theory. (and it would work)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I replied too fast: if we want to return the same result on
['line\r\n'] and ['line\r\n', ''] (which seems right),
we need to keep this check on top

Otherwise we consume pending and discard it

I think it's safe to discard empty chunks early, pending will be carried to the next non-empty chunk

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's licit to receive empty chunks in the middle of a stream, but I guess it's possible in theory. (and it would work)

urllib3 won't give them to us, but people like putting things that aren't urllib3 underneath us, and they might. So we should tolerate it.

As to your note about wanting to return the same result, I think that makes sense. But we should add a comment to say that "pending is necessarily an incomplete chunk, and so if we don't have more data we don't want to bother trying to get it".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 👌

@vbarbaresi vbarbaresi force-pushed the 3.0.0-iter_lines branch 2 times, most recently from 399b9f1 to a7e7181 Compare March 15, 2017 00:29
@codecov-io
Copy link

codecov-io commented Mar 15, 2017

Codecov Report

Merging #3923 into proposed/3.0.0 will increase coverage by 0.12%.
The diff coverage is 100%.

@@                Coverage Diff                 @@
##           proposed/3.0.0    #3923      +/-   ##
==================================================
+ Coverage           89.32%   89.44%   +0.12%     
==================================================
  Files                  15       15              
  Lines                1929     1932       +3     
==================================================
+ Hits                 1723     1728       +5     
+ Misses                206      204       -2
Impacted Files Coverage Δ
requests/models.py 94.22% <100%> (+0.48%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 84dc6b6...d491e9f. Read the comment docs.

Write a list of different chunk splits and their expected results
to test against, using ianepperson's breakdown as specification:
https://github.com/kennethreitz/requests/pull/2431#issuecomment-72333964
check the case of an empty chunk somewhere in the stream
Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I'm happy with this. Thanks for pushing this over the line! I'll try to dive into what's going on with the tests.

Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the tests are failing because of a bytes/string incompatibility. Want to take a look?

@vbarbaresi
Copy link
Author

I'll take a look

@vbarbaresi
Copy link
Author

I attempted a fix; though I'm not sure it's correct
Maybe use self.encoding or self.apparent_encoding?

When decode_unicode is False we don't deal with the encoding, so I'm not sure it makes sense to assume utf-8 and to encode only the last byte

if isinstance(chunk[-1], basestring): # Decoded string (decode_unicode=True or Py2)
incomplete_line = lines[-1] and lines[-1].endswith(chunk[-1])
else: # Bytestring (decode_unicode=False and Py3)
incomplete_line = lines[-1] and lines[-1].endswith(chr(chunk[-1]).encode('utf-8'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to be clear, the error in this fix was assuming that we need to tolerate str here. The tests were in error, not the code: iter_content is required to always yield bytes. You'll find if you go through the tests and use bytes everywhere the problem goes away. 😁 So we don't need this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused: I checked in a normal usage (outside of the tests) and could reproduce the issue:
with Python2 I get str, and with Python3 I get bytes (and if I use iter_lines(decode_unicode=True)
I get str (Py3) or unicode (Py2)

The previous code wasn't accepting bytes (and was working only with Python 2)

This failed on Python3 before my last fix:

r = requests.get("https://github.com", stream=True)
list(r.iter_lines())

TypeError: endswith first arg must be bytes or a tuple of bytes, not int

That's why I don't think the tests were in error (a test that I didn't write was failing, test_response_iter_lines)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbarbaresi The issue here seems to be stemming from 46e3fdd in #3745. The conditional on master, and prior to that commit, is done by comparing the last index of each string/bytestring. When retrieving a single index from a bytestring in Python 3 it returns an integer representation of the byte. This works in the case of lines[-1] and lines[-1][-1] == chunk[-1] but not lines[-1] and lines[-1].endswith(chunk[-1]).

I'd suggest we simply move back to using the original double indexing, rather than introducing more logic to handle the seemingly unneeded endswith.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I missed that... much neater. It works fine with a simple indexing in all cases.

@@ -1741,10 +1816,12 @@ def test_json_param_post_should_not_override_data_param(self, httpbin):
prep = r.prepare()
assert 'stuff=elixr' == prep.body

def test_response_iter_lines(self, httpbin):
@pytest.mark.parametrize('decode_unicode', (True, False))
def test_response_iter_lines(self, httpbin, decode_unicode):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm not sure if you're going to back this commit out, but if not, decode_unicode doesn't seem to be used for anything here in this tests current configuration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch
I overwrote the commit, and fixed the unused decode_unicode parameter 👌

Also add a parametrize on decode_unicode for iter_lines() test
to check with bytestrings and str content
Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, we seem to have resolved everything outstanding. Nicely done!

@Lukasa Lukasa merged commit 73456b0 into psf:proposed/3.0.0 Mar 16, 2017
@vbarbaresi vbarbaresi deleted the 3.0.0-iter_lines branch May 31, 2017 14:52
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants