Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent validation errors for URLs with leading or trailing spaces #4166

Merged
merged 9 commits into from
Jan 24, 2020
6 changes: 3 additions & 3 deletions includes/sanitizers/class-amp-tag-and-attribute-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -1626,7 +1626,7 @@ private function check_attr_spec_rule_value_regex_casei( DOMElement $node, $attr
private function check_attr_spec_rule_valid_url( DOMElement $node, $attr_name, $attr_spec_rule ) {
if ( isset( $attr_spec_rule[ AMP_Rule_Spec::VALUE_URL ] ) && $node->hasAttribute( $attr_name ) ) {
foreach ( $this->extract_attribute_urls( $node->getAttributeNode( $attr_name ) ) as $url ) {
$url = urldecode( $url );
westonruter marked this conversation as resolved.
Show resolved Hide resolved
$url = urldecode( trim( $url ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the AMP validator does this same trimming up front: https://github.com/ampproject/amphtml/blob/af1e3a550feeafd732226202b8d1f26dcefefa18/validator/engine/parse-url.js#L214-L215

However, it is also doing one more thing:

    // Browsers remove Tab/CR/LF from the entire URL, so we do too.
    unparsed = unparsed.replace(/[\t\r\n]/g, '');

I think that should be incorporated here as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This being the case, instead of calling trim() directly, let's create a new private method like normalize_url_from_attribute_value() which does both the trimming and the tab/newline removal and then use that method everywhere that you are currently using trim() instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, it'll probably be several hours before I could do that, if that's alright.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

911602d adds a method to normalize the URLs, sorry for the delay.


// Check whether the URL is parsable.
$parts = wp_parse_url( $url );
Expand Down Expand Up @@ -1666,8 +1666,8 @@ private function check_attr_spec_rule_valid_url( DOMElement $node, $attr_name, $
* @return string|null Protocol without colon if matched. Otherwise null.
*/
private function parse_protocol( $url ) {
if ( preg_match( '#^[^/]+(?=:)#', $url, $matches ) ) {
return $matches[0];
if ( preg_match( '#^\s*([^/]+):#', $url, $matches ) ) {
return $matches[1];
}
return null;
}
Expand Down
149 changes: 142 additions & 7 deletions tests/php/test-amp-tag-and-attribute-sanitizer-private-methods.php
Original file line number Diff line number Diff line change
Expand Up @@ -1413,6 +1413,66 @@ public function get_check_attr_spec_rule_blacklisted_value_regex() {
];
}

/**
* Gets the test data for test_check_attr_spec_rule_valid_url().
*
* @return array The test data.
*/
public function get_check_attr_spec_rule_valid_url() {
return [
'no_attribute' => [
'<a></a>',
AMP_Rule_Spec::NOT_APPLICABLE,
],
'correct_url' => [
'<a baz="https://wp.org"></a>',
AMP_Rule_Spec::PASS,
],
'correct_url_leading_space' => [
'<a baz=" https://wp.org"></a>',
AMP_Rule_Spec::PASS,
],
'non_parseable_url' => [
'<a baz="//"></a>',
AMP_Rule_Spec::FAIL,
],
'wrong_protocol' => [
'<a baz="@:wp.org"></a>',
AMP_Rule_Spec::FAIL,
],
'wrong_host' => [
'<a baz="https://wp$camp.org"></a>',
AMP_Rule_Spec::FAIL,
],
];
}

/**
* Tests check_attr_spec_rule_valid_url.
*
* @dataProvider get_check_attr_spec_rule_valid_url
* @group allowed-tags-private-methods
* @covers AMP_Tag_And_Attribute_Sanitizer::check_attr_spec_rule_valid_url()
*
* @param array $source The HTML source to test.
* @param string $expected The expected return value.
* @throws ReflectionException If it's not possible to create a reflection to call the private method.
*/
public function test_check_attr_spec_rule_valid_url( $source, $expected ) {
$node_tag_name = 'a';
$dom = AMP_DOM_Utils::get_dom_from_content( $source );
$sanitizer = new AMP_Tag_And_Attribute_Sanitizer( $dom );
$node = $dom->getElementsByTagName( $node_tag_name )->item( 0 );
$attr_name = 'baz';
$attr_spec_rule = [ 'value_url' => [] ];

$this->assertEquals(
$expected,
$this->call_private_method( $sanitizer, 'check_attr_spec_rule_valid_url', [ $node, $attr_name, $attr_spec_rule ] ),
sprintf( 'using source: %s', $source )
);
}

/**
* @dataProvider get_check_attr_spec_rule_blacklisted_value_regex
* @group allowed-tags-private-methods
Expand All @@ -1429,7 +1489,7 @@ public function test_check_attr_spec_rule_blacklisted_value_regex( $data, $expec

public function get_check_attr_spec_rule_allowed_protocol() {
return [
'no_attributes' => [
'no_attributes' => [
[
'source' => '<div></div>',
'node_tag_name' => 'div',
Expand All @@ -1438,7 +1498,7 @@ public function get_check_attr_spec_rule_allowed_protocol() {
],
AMP_Rule_Spec::NOT_APPLICABLE,
],
'protocol_pass' => [
'protocol_pass' => [
[
'source' => '<div attribute1="http://example.com"></div>',
'node_tag_name' => 'div',
Expand All @@ -1454,7 +1514,23 @@ public function get_check_attr_spec_rule_allowed_protocol() {
],
AMP_Rule_Spec::PASS,
],
'protocol_multiple_pass' => [
'protocol_pass_leading_space' => [
[
'source' => '<div attribute1=" http://example.com"></div>',
'node_tag_name' => 'div',
'attr_name' => 'attribute1',
'attr_spec_rule' => [
'value_url' => [
'protocol' => [
'http',
'https',
],
],
],
],
AMP_Rule_Spec::PASS,
],
'protocol_multiple_pass' => [
[
'source' => '<div attribute1="http://example.com, https://domain.com"></div>',
'node_tag_name' => 'div',
Expand All @@ -1470,7 +1546,7 @@ public function get_check_attr_spec_rule_allowed_protocol() {
],
AMP_Rule_Spec::PASS,
],
'protocol_fail' => [
'protocol_fail' => [
[
'source' => '<div attribute1="data://example.com"></div>',
'node_tag_name' => 'div',
Expand All @@ -1486,7 +1562,7 @@ public function get_check_attr_spec_rule_allowed_protocol() {
],
AMP_Rule_Spec::FAIL,
],
'protocol_multiple_fail' => [
'protocol_multiple_fail' => [
[
'source' => '<img srcset="http://example.com, data://domain.com">',
'node_tag_name' => 'img',
Expand All @@ -1502,7 +1578,7 @@ public function get_check_attr_spec_rule_allowed_protocol() {
],
AMP_Rule_Spec::FAIL,
],
'protocol_alternative_pass' => [
'protocol_alternative_pass' => [
[
'source' => '<div attribute1_alternative1="http://example.com"></div>',
'node_tag_name' => 'div',
Expand All @@ -1521,7 +1597,7 @@ public function get_check_attr_spec_rule_allowed_protocol() {
],
AMP_Rule_Spec::PASS,
],
'protocol_alternative_fail' => [
'protocol_alternative_fail' => [
[
'source' => '<div attribute1_alternative1="data://example.com"></div>',
'node_tag_name' => 'div',
Expand All @@ -1543,6 +1619,65 @@ public function get_check_attr_spec_rule_allowed_protocol() {
];
}

/**
* Gets the test data for test_parse_protocol().
*
* @return array The test data.
*/
public function get_parse_protocol_data() {
return [
'empty_string' => [
'',
false,
],
'only_space' => [
' ',
false,
],
'traditional_https' => [
'https://example.com',
'https',
],
'leading_space' => [
' https://foo',
'https',
],
'trailing_space' => [
'https://foo.com ',
'https',
],
'no_colon' => [
'//image.png ',
false,
],
'two_colons' => [
'foo:baz://image.png ',
'foo:baz',
],
];
}

/**
* Tests parse_protocol.
*
* @dataProvider get_parse_protocol_data
* @group allowed-tags-private-methods
* @covers AMP_Tag_And_Attribute_Sanitizer::parse_protocol()
*
* @param array $url The URL to parse.
* @param string $expected The expected return value.
* @throws ReflectionException If it's not possible to create a reflection to call the private method.
*/
public function test_parse_protocol( $url, $expected ) {
$dom = AMP_DOM_Utils::get_dom_from_content( '' );
$sanitizer = new AMP_Tag_And_Attribute_Sanitizer( $dom );

$this->assertEquals(
$expected,
$this->call_private_method( $sanitizer, 'parse_protocol', [ $url ] )
);
}

/**
* @dataProvider get_check_attr_spec_rule_allowed_protocol
* @group allowed-tags-private-methods
Expand Down