Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes problems with iter_lines when the delimiter is split #2431

Closed
wants to merge 2 commits into from

Conversation

ianepperson
Copy link

I ran into this problem today where my data would randomly have additional null strings from iter_lines. I added a test to show the issue, then fixed it.

The only weirdness might be when no data comes back, how should iter_lines behave? Should there be a single yield of a null string, or should it just return without yielding anything? This edit performs the latter.

@ianepperson
Copy link
Author

The specific issue happens when attempting to retrieve lines with a multi-byte delimiter. Very occasionally, the delimiter ends up split between data chunks. This results in the yielding of an additional null string. For reading an MJPEG data stream with iter_lines('\r\n') it was causing a garbled frame every 10 seconds or so.

@ianepperson
Copy link
Author

I see you've marked it with the Breaking API Change. I suppose that's correct, as it was previously inconsistent. If a null-string chunk were received then the port closed, iter_lines() would not yield anything and exit, but iter_lines(any_delimiter) would yield a null-string then exit. With this change they both consistently yield nothing and exit. (Which I think is the more historically expected behavior.)

@sigmavirus24 sigmavirus24 added this to the 3.0.0 milestone Jan 31, 2015
@sigmavirus24
Copy link
Contributor

Which I think is the more historically expected behavior.

Please define historically. In the context of the project, empty strings are historically the normal, intentionally or not.

For what it's worth, this can still be merged before 3.0 but I wanted to ensure no one merged this accidentally before we had a chance to discuss it.

@ianepperson
Copy link
Author

Please define historically.

Sure. Prior to the change that added the ability to define the delimiter, empty strings would not have been emitted as that only came about as a result of using split(delimiter=x) in place of splitlines() - they have different behavior. split will always provide at list of at least length 1, whereas splitlines will occasionally provide a zero length list. Because of this, when the delimiter is split across chunks, nulls are occasionally emitted. Also, when the socket (or iter_content) emits a zero length string and nothing more, the behavior is as follows:

Behavior prior to October 2014:

  • iter_lines() - returns without yielding anything.

After October 2014:

  • iter_lines() - returns without yielding anything.
  • iter_lines(delimiter) - yields a zero-length string, then returns.

Side-effect of proposed fix:

  • iter_lines() - returns without yielding anything.
  • iter_lines(delimiter) - returns without yielding anything.

All this is an aside from my proposed fix, which essentially yielding if and only if there's a full line of content. If I'm expecting a string that looks like this: "--boundary\r\nContent-Type: image/jpg\r\nContent-Length: 65535\r\n\r\n...\r\n" I'd expect iter_lines(delimiter='\r\n') to yield once per \r\n. I expect it to behave exactly as if I iterated over:

[
    '--boundary',
    'Content-Type: image/jpg',
    'Content-Length: 65535',
    '',
    '...',
]

But once in a while, the delimiter is split across chunks internally, and without any change to the source content, I occasionally get an extra '' at seemingly random intervals:

[
    '--boundary',
    'Content-Type: image/jpg',
    '',
    'Content-Length: 65535',
    '',
    '...',
]

The proposed change fixes the routine so that it behaves more consistently with null-string input and does not yield when there's not a line of data** in the input stream.

**Okay, there's one additional issue that's been in there since at least October 2014 that I didn't address: if using iter_lines() (without the delimiter option) AND the delimiters are \r\n AND the delimiter gets split across chunks, a spurious '' will be emitted. That is, usually the string 'a line\r\n' will end up as ['a line'], but once in a while it'll be ['a line', '']. This could be rectified by providing the actual expected delimiter - using iter_lines(delimiter=\r\n') instead.

@ianepperson
Copy link
Author

Here's a breakdown as I understand it.

Prior to October 2014:

Input Process Result
'' iter_lines() []
'line' iter_lines() ['line']
'line\n' iter_lines() ['line'] but occasionally ['line', '']
'line\r\n' iter_lines() ['line'] but occasionally ['line', ''] and very rarely ['line', '', '']

After October 2014:

Input Process Result
'' iter_lines() []
iter_lines(delimiter='\r\n') ['']
'line\n' iter_lines() ['line'] but occasionally ['line', '']
'iter_lines(delimiter='\r\n') ['line\n']
'line\r\n' iter_lines() ['line'] but occasionally ['line', ''] and very rarely ['line', '', '']
'iter_lines(delimiter='\r\n') ['line', ''] but occasionally ['line', '', '']

Proposed Change:

Input Process Result
'' iter_lines() []
iter_lines(delimiter='\r\n') []
'line\n' iter_lines() ['line'] but occasionally ['line', '']
'iter_lines(delimiter='\r\n') ['line\n']
'line\r\n' iter_lines() ['line'] but occasionally ['line', ''] and very rarely ['line', '', '']
'iter_lines(delimiter='\r\n') ['line', ''] always

Since 'iter_lines() is and always has been inconsistent, it's unreliable and introduces subtle bugs. iter_lines(delimiter=x), after October 2014 has the same problem. My proposed change makes iter_lines(delimiter=x) consistent, but leaves the inconsistency of the original version.

@kennethreitz
Copy link
Contributor

This PR hasn't seen any love in a long time. @ianepperson, would mind performing a rebase to make this once-again mergable?

@sigmavirus24 @Lukasa +1? -1? I want to get this merged or closed out.

@Lukasa
Copy link
Member

Lukasa commented Apr 6, 2016

I'm generally +1 on this, though it needs to be merged into 3.0.0 rather than the master branch.

@Lukasa
Copy link
Member

Lukasa commented Dec 2, 2016

Closing in favour of #3745.

@Lukasa Lukasa closed this Dec 2, 2016
vbarbaresi referenced this pull request in vbarbaresi/requests Mar 14, 2017
Write a list of different chunk splits and their expected results
to test against, using ianepperson's breakdown as specification:
https://github.com/kennethreitz/requests/pull/2431#issuecomment-72333964
vbarbaresi referenced this pull request in vbarbaresi/requests Mar 15, 2017
Write a list of different chunk splits and their expected results
to test against, using ianepperson's breakdown as specification:
https://github.com/kennethreitz/requests/pull/2431#issuecomment-72333964
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants