-
-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase #3745 and add some tests #3923
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, so this generally looks really good! I have one small note.
requests/models.py
Outdated
decode_unicode=decode_unicode): | ||
# Skip any null responses | ||
if not chunk: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check move after the pending data logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right
Fixed and added tests for this behavior 👌
I don't know if it's licit to receive empty chunks in the middle of a stream, but I guess it's possible in theory. (and it would work)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I replied too fast: if we want to return the same result on
['line\r\n']
and ['line\r\n', '']
(which seems right),
we need to keep this check on top
Otherwise we consume pending
and discard it
I think it's safe to discard empty chunks early, pending
will be carried to the next non-empty chunk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's licit to receive empty chunks in the middle of a stream, but I guess it's possible in theory. (and it would work)
urllib3 won't give them to us, but people like putting things that aren't urllib3 underneath us, and they might. So we should tolerate it.
As to your note about wanting to return the same result, I think that makes sense. But we should add a comment to say that "pending is necessarily an incomplete chunk, and so if we don't have more data we don't want to bother trying to get it".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 👌
399b9f1
to
a7e7181
Compare
Codecov Report
@@ Coverage Diff @@
## proposed/3.0.0 #3923 +/- ##
==================================================
+ Coverage 89.32% 89.44% +0.12%
==================================================
Files 15 15
Lines 1929 1932 +3
==================================================
+ Hits 1723 1728 +5
+ Misses 206 204 -2
Continue to review full report at Codecov.
|
Write a list of different chunk splits and their expected results to test against, using ianepperson's breakdown as specification: https://github.com/kennethreitz/requests/pull/2431#issuecomment-72333964
check the case of an empty chunk somewhere in the stream
a7e7181
to
cc2ac23
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I'm happy with this. Thanks for pushing this over the line! I'll try to dive into what's going on with the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so the tests are failing because of a bytes/string incompatibility. Want to take a look?
I'll take a look |
I attempted a fix; though I'm not sure it's correct When |
requests/models.py
Outdated
if isinstance(chunk[-1], basestring): # Decoded string (decode_unicode=True or Py2) | ||
incomplete_line = lines[-1] and lines[-1].endswith(chunk[-1]) | ||
else: # Bytestring (decode_unicode=False and Py3) | ||
incomplete_line = lines[-1] and lines[-1].endswith(chr(chunk[-1]).encode('utf-8')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, to be clear, the error in this fix was assuming that we need to tolerate str
here. The tests were in error, not the code: iter_content
is required to always yield bytes
. You'll find if you go through the tests and use bytes
everywhere the problem goes away. 😁 So we don't need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused: I checked in a normal usage (outside of the tests) and could reproduce the issue:
with Python2 I get str
, and with Python3 I get bytes
(and if I use iter_lines(decode_unicode=True)
I get str
(Py3) or unicode
(Py2)
The previous code wasn't accepting bytes (and was working only with Python 2)
This failed on Python3 before my last fix:
r = requests.get("https://github.com", stream=True)
list(r.iter_lines())
TypeError: endswith first arg must be bytes or a tuple of bytes, not int
That's why I don't think the tests were in error (a test that I didn't write was failing, test_response_iter_lines
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vbarbaresi The issue here seems to be stemming from 46e3fdd in #3745. The conditional on master, and prior to that commit, is done by comparing the last index of each string/bytestring. When retrieving a single index from a bytestring in Python 3 it returns an integer representation of the byte. This works in the case of lines[-1] and lines[-1][-1] == chunk[-1]
but not lines[-1] and lines[-1].endswith(chunk[-1])
.
I'd suggest we simply move back to using the original double indexing, rather than introducing more logic to handle the seemingly unneeded endswith
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I missed that... much neater. It works fine with a simple indexing in all cases.
@@ -1741,10 +1816,12 @@ def test_json_param_post_should_not_override_data_param(self, httpbin): | |||
prep = r.prepare() | |||
assert 'stuff=elixr' == prep.body | |||
|
|||
def test_response_iter_lines(self, httpbin): | |||
@pytest.mark.parametrize('decode_unicode', (True, False)) | |||
def test_response_iter_lines(self, httpbin, decode_unicode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm not sure if you're going to back this commit out, but if not, decode_unicode
doesn't seem to be used for anything here in this tests current configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
I overwrote the commit, and fixed the unused decode_unicode
parameter 👌
Also add a parametrize on decode_unicode for iter_lines() test to check with bytestrings and str content
a2adf26
to
d491e9f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, we seem to have resolved everything outstanding. Nicely done!
2 years after the original #2431, and after #3745
I encountered the strange behavior of
r.iter_lines()
, found this and tried to complete it.Rebased once more and added more tests using the breakdown of @ianepperson https://github.com/kennethreitz/requests/pull/2431#issuecomment-72333964