Fixes bug in rfc2616 #3.6.1 implementation. #236

stephenharris · 2016-08-26T12:13:43Z

A chunk size must be followed by either optional chunk extension(s) (which begin with a semicolon) or \r\n. The current implementation just allows anything from the chunk size to the \r\n.

This was causing a bug in my Events plugin: a Google iCal feed was sometimes mistaken for being chunked (Google claims it is in the header: it isn't).

The starting line is

BEGIN:VCALENDAR

which is mistakenly interpreted as a chunk header with size BE. If, it just so happens the rest of the feed is interpreted as chunked you end up with a mutilated iCal feed.

Of course, Google shouldn't claim it is chunked, but the current implementation fails to detect that it is not a valid encoding.

A chunk size (hexidecimal) must be followed by either an optional chunk extension (which begins with a semicolon) or \r\n. @link https://tools.ietf.org/html/rfc2616#section-3.6.1

stephenharris · 2016-08-26T12:15:50Z

I've updated the test to provide an example of a false-positive that were were previously missing.

I've also added further test (data) for correct decoding of chunks with chunk extensions.

codecov-io · 2016-08-26T12:24:51Z

Current coverage is 92.22% (diff: 100%)

Merging #236 into master will not change coverage

@@             master       #236   diff @@
==========================================
  Files            21         21          
  Lines          1762       1762          
  Methods         156        156          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits           1625       1625          
  Misses          137        137          
  Partials          0          0

Powered by Codecov. Last update fb5b517...4041e0a

rmccue · 2016-08-30T03:24:55Z

tests/ChunkedEncoding.php

@@ -39,7 +43,7 @@ public function testChunked($body, $expected){
 	 */
 	public function testNotActuallyChunked() {
 		$transport = new MockTransport();
-		$transport->body = 'Hello! This is a non-chunked response!';
+		$transport->body = "Believe me\r\nthis looks chunked, but it isn't.";


Can this be split into a separate test please? Ideally, it should look more like a chunked response (i.e. 2Anotchunked\r\n...) so it's obvious what it's testing.

Would a data provider be suitable? The test testNotActuallyChunked() hasn't really changed, just the example's we're feeding it.

Good point, works for me :)

rmccue · 2016-08-30T03:26:06Z

Great catch :) One small thing to fix in the tests, but otherwise looks good.

rmccue · 2016-08-30T10:41:40Z

tests/ChunkedEncoding.php

@@ -15,6 +15,10 @@ public static function chunkedProvider() {
 				"02\r\nab\r\n04\r\nra\nc\r\n06\r\nadabra\r\n0c\r\n\nall we got\n",
 				"abra\ncadabra\nall we got\n"
 			),
+			array(
+				"02;foo=bar;hello=world\r\nab\r\n04;foo=baz\r\nra\nc\r\n06;justfoo\r\nadabra\r\n0c\r\n\nall we got\n",


Is OWS (optional whitespace) allowed here too? Should add a test for that if so. Ditto if params can take quoted values.

RFC2616 states:

chunk-ext-name = token
chunk-ext-val = token | quoted-string

Token cannot contain any white space, or any of the following: (}|<>@,;:\"/[]?={}. Quoted string can contain white space.

What we have at the moment is still a relatively low-bar for what is interpreted of correctly encoded. I'm wary about making it too strict as a misunderstanding here, when it's too strict, is more likely to lead to issues in production.

Happy to add the above restrictions with some additional tests

…?=. Chunk extension values can, provided they are quoted. Ref #236

stephenharris · 2016-08-30T15:54:43Z

library/Requests.php

@@ -749,15 +749,17 @@ public static function parse_multiple(&$response, $request) {
 	 * @return string Decoded body
 	 */
 	protected static function decode_chunked($data) {
-		if (!preg_match('/^([0-9a-f]+)(?:;[^\r\n]*)*\r\n/i', trim($data))) {
+		if (!preg_match('/^([0-9a-f]+)(?:;(?:[\w-]*)(?:=(?:(?:[\w-]*)*|"(?:[^\r\n])*"))?)*\r\n/i', trim($data))) {


I'm not convinced that [\w-]* isn't too strict for a token. I would be tempted to err on the side of caution and stick with the original patch and have a slightly lower bar for what constitutes a valid encoding.

rmccue · 2016-09-18T23:11:16Z

Turns out the tests failing here were just a transient issue on a HTTPS connection; restarted the failing build and everything passed nicely. Thanks!

westi · 2016-09-23T15:38:45Z

a Google iCal feed was sometimes mistaken for being chunked (Google claims it is in the header: it isn't).

Google isn't lying and this change doesn't really fix the issue it just fixes a symptom.

Depending on the underlying transport you may get the chunks already decoded which means chunked decoding can't be keyed directly off the header.

By default curl decodes chunks internally and so Requests shouldn't be trying to decode chunks again because it will potentially damage the transferred content.

You can see this with the curl cli by using the --raw option to disable the decoding when looking at the Google iCal feeds.

rmccue · 2016-09-23T15:40:33Z

@westi Yup, I figured this was the case, but the PR fixes it regardless.

westi · 2016-09-23T15:41:57Z

Except... if the data being transferred "looks" like chunked encoded data it will still mangle it... and we know we don't need to decode so running a bunch of Regex is wasteful

rmccue · 2016-09-23T15:51:53Z

I don't disagree, so happy to accept a PR :)

Fixes bug in rfc2616 #3.6.1 implementation.

4041e0a

A chunk size (hexidecimal) must be followed by either an optional chunk extension (which begins with a semicolon) or \r\n. @link https://tools.ietf.org/html/rfc2616#section-3.6.1

rmccue reviewed Aug 30, 2016
View reviewed changes

rmccue added this to the 1.8 milestone Aug 30, 2016

rmccue added Component: Transports Component: Core and removed Component: Transports labels Aug 30, 2016

rmccue reviewed Aug 30, 2016
View reviewed changes

Chunk extension names cannot contain CTLs, spaces or (){}|<>@,;:\"/[]…

2f1b5dc

…?=. Chunk extension values can, provided they are quoted. Ref #236

stephenharris reviewed Aug 30, 2016
View reviewed changes

rmccue merged commit ea359d3 into WordPress:master Sep 18, 2016

rmccue modified the milestones: 1.7, 1.8 Sep 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes bug in rfc2616 #3.6.1 implementation. #236

Fixes bug in rfc2616 #3.6.1 implementation. #236

stephenharris commented Aug 26, 2016

stephenharris commented Aug 26, 2016

codecov-io commented Aug 26, 2016 •

edited

Loading

rmccue Aug 30, 2016

stephenharris Aug 30, 2016

rmccue Aug 30, 2016

rmccue commented Aug 30, 2016

rmccue Aug 30, 2016

stephenharris Aug 30, 2016

stephenharris Aug 30, 2016

rmccue commented Sep 18, 2016

westi commented Sep 23, 2016

rmccue commented Sep 23, 2016

westi commented Sep 23, 2016

rmccue commented Sep 23, 2016

Fixes bug in rfc2616 #3.6.1 implementation. #236

Fixes bug in rfc2616 #3.6.1 implementation. #236

Conversation

stephenharris commented Aug 26, 2016

stephenharris commented Aug 26, 2016

codecov-io commented Aug 26, 2016 • edited Loading

Current coverage is 92.22% (diff: 100%)

rmccue Aug 30, 2016

Choose a reason for hiding this comment

stephenharris Aug 30, 2016

Choose a reason for hiding this comment

rmccue Aug 30, 2016

Choose a reason for hiding this comment

rmccue commented Aug 30, 2016

rmccue Aug 30, 2016

Choose a reason for hiding this comment

stephenharris Aug 30, 2016

Choose a reason for hiding this comment

stephenharris Aug 30, 2016

Choose a reason for hiding this comment

rmccue commented Sep 18, 2016

westi commented Sep 23, 2016

rmccue commented Sep 23, 2016

westi commented Sep 23, 2016

rmccue commented Sep 23, 2016

codecov-io commented Aug 26, 2016 •

edited

Loading