-
Notifications
You must be signed in to change notification settings - Fork 498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes bug in rfc2616 #3.6.1 implementation. #236
Fixes bug in rfc2616 #3.6.1 implementation. #236
Conversation
A chunk size (hexidecimal) must be followed by either an optional chunk extension (which begins with a semicolon) or \r\n. @link https://tools.ietf.org/html/rfc2616#section-3.6.1
I've updated the test to provide an example of a false-positive that were were previously missing. I've also added further test (data) for correct decoding of chunks with chunk extensions. |
Current coverage is 92.22% (diff: 100%)@@ master #236 diff @@
==========================================
Files 21 21
Lines 1762 1762
Methods 156 156
Messages 0 0
Branches 0 0
==========================================
Hits 1625 1625
Misses 137 137
Partials 0 0
|
@@ -39,7 +43,7 @@ public function testChunked($body, $expected){ | |||
*/ | |||
public function testNotActuallyChunked() { | |||
$transport = new MockTransport(); | |||
$transport->body = 'Hello! This is a non-chunked response!'; | |||
$transport->body = "Believe me\r\nthis looks chunked, but it isn't."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be split into a separate test please? Ideally, it should look more like a chunked response (i.e. 2Anotchunked\r\n...
) so it's obvious what it's testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a data provider be suitable? The test testNotActuallyChunked()
hasn't really changed, just the example's we're feeding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, works for me :)
Great catch :) One small thing to fix in the tests, but otherwise looks good. |
@@ -15,6 +15,10 @@ public static function chunkedProvider() { | |||
"02\r\nab\r\n04\r\nra\nc\r\n06\r\nadabra\r\n0c\r\n\nall we got\n", | |||
"abra\ncadabra\nall we got\n" | |||
), | |||
array( | |||
"02;foo=bar;hello=world\r\nab\r\n04;foo=baz\r\nra\nc\r\n06;justfoo\r\nadabra\r\n0c\r\n\nall we got\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is OWS (optional whitespace) allowed here too? Should add a test for that if so. Ditto if params can take quoted values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC2616 states:
chunk-ext-name = token
chunk-ext-val = token | quoted-string
Token cannot contain any white space, or any of the following: (}|<>@,;:\"/[]?={}
. Quoted string can contain white space.
What we have at the moment is still a relatively low-bar for what is interpreted of correctly encoded. I'm wary about making it too strict as a misunderstanding here, when it's too strict, is more likely to lead to issues in production.
Happy to add the above restrictions with some additional tests
…?=. Chunk extension values can, provided they are quoted. Ref #236
@@ -749,15 +749,17 @@ public static function parse_multiple(&$response, $request) { | |||
* @return string Decoded body | |||
*/ | |||
protected static function decode_chunked($data) { | |||
if (!preg_match('/^([0-9a-f]+)(?:;[^\r\n]*)*\r\n/i', trim($data))) { | |||
if (!preg_match('/^([0-9a-f]+)(?:;(?:[\w-]*)(?:=(?:(?:[\w-]*)*|"(?:[^\r\n])*"))?)*\r\n/i', trim($data))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that [\w-]*
isn't too strict for a token. I would be tempted to err on the side of caution and stick with the original patch and have a slightly lower bar for what constitutes a valid encoding.
Turns out the tests failing here were just a transient issue on a HTTPS connection; restarted the failing build and everything passed nicely. Thanks! |
Google isn't lying and this change doesn't really fix the issue it just fixes a symptom. Depending on the underlying transport you may get the chunks already decoded which means chunked decoding can't be keyed directly off the header. By default You can see this with the |
@westi Yup, I figured this was the case, but the PR fixes it regardless. |
Except... if the data being transferred "looks" like chunked encoded data it will still mangle it... and we know we don't need to decode so running a bunch of Regex is wasteful |
I don't disagree, so happy to accept a PR :) |
A chunk size must be followed by either optional chunk extension(s) (which begin with a semicolon) or \r\n. The current implementation just allows anything from the chunk size to the \r\n.
This was causing a bug in my Events plugin: a Google iCal feed was sometimes mistaken for being chunked (Google claims it is in the header: it isn't).
The starting line is
which is mistakenly interpreted as a chunk header with size
BE
. If, it just so happens the rest of the feed is interpreted as chunked you end up with a mutilated iCal feed.Of course, Google shouldn't claim it is chunked, but the current implementation fails to detect that it is not a valid encoding.