Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KSES: Allow Custom Data Attributes having dashes in their dataset name. #6429

Open
wants to merge 15 commits into
base: trunk
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 63 additions & 7 deletions src/wp-includes/kses.php
Original file line number Diff line number Diff line change
Expand Up @@ -1256,24 +1256,32 @@ function wp_kses_attr_check( &$name, &$value, &$whole, $vless, $element, $allowe
$allowed_attr = $allowed_html[ $element_low ];

if ( ! isset( $allowed_attr[ $name_low ] ) || '' === $allowed_attr[ $name_low ] ) {
$dataset_name = wp_kses_transform_custom_data_attribute_name( $name );

// Reject custom data attributes that don't fit a basic form.
$is_allowable_custom_attribute = (
isset( $dataset_name ) &&
peterwilsoncc marked this conversation as resolved.
Show resolved Hide resolved
1 === preg_match( '/^[a-z0-9_-]+$/i', $dataset_name )
);

/*
* Allow `data-*` attributes.
* Allow Custom Data Attributes (`data-*`).
*
* When specifying `$allowed_html`, the attribute name should be set as
* `data-*` (not to be mixed with the HTML 4.0 `data` attribute, see
* https://www.w3.org/TR/html40/struct/objects.html#adef-data).
*
* Note: the attribute name should only contain `A-Za-z0-9_-` chars,
* double hyphens `--` are not accepted by WordPress.
* Custom data attributes appear on an HTML element in the `dataset`
* property and are available from JavaScript with a transformed name.
*
* @see https://html.spec.whatwg.org/#custom-data-attribute
*/
if ( str_starts_with( $name_low, 'data-' ) && ! empty( $allowed_attr['data-*'] )
&& preg_match( '/^data(?:-[a-z0-9_]+)+$/', $name_low, $match )
) {
if ( $is_allowable_custom_attribute && ! empty( $allowed_attr['data-*'] ) ) {
/*
* Add the whole attribute name to the allowed attributes and set any restrictions
* for the `data-*` attribute values for the current element.
*/
$allowed_attr[ $match[0] ] = $allowed_attr['data-*'];
$allowed_attr[ $name_low ] = $allowed_attr['data-*'];
} else {
$name = '';
$value = '';
Expand Down Expand Up @@ -1311,6 +1319,54 @@ function wp_kses_attr_check( &$name, &$value, &$whole, $vless, $element, $allowe
return true;
}

/**
* If an attribute name represents a custom data attribute, return the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds a summary so the details are all included in the description section of the developer docs (see wp_kses() as an example).

Suggested change
* If an attribute name represents a custom data attribute, return the
* Convert attribute to JavaScript `dataset` form if allowed.
*
* If an attribute name represents a custom data attribute, return the

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peterwilsoncc are you recommending a different wording of the summary, or worried that the existing summary won't appear in the docs? The existing summary is up to two lines at the start of the docblock, so this will appear as-written in the patch.

For example, class_name_updates_to_attributes_updates() also has a two-line summary.

Screenshot 2024-05-22 at 4 40 01 PM Screenshot 2024-05-22 at 4 39 33 PM Screenshot 2024-05-22 at 4 39 38 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm mainly suggesting different, shorter, phrasing.

I thought the summaries were intended to be a one-liner but just re-read the doc standards and it turns out I was mistaken.

I'll leave this up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded in dc859f1. It may take some more rewording 🤷‍♂️

* transformed name as it would appear in JavaScript, else return null.
*
* This function can be used to determine if an attribute name represents
* a custom data attribute, and it can be used as well to return what the
* name of the attribute would be in an element's `dataset` property when
* accessed from JavaScript.
*
* Example:
*
* 'postId' === wp_kses_transform_custom_data_attribute_name( 'data-post-id' );
* null === wp_kses_transform_custom_data_attribute_name( 'post-id' );
*
* @since 6.6.0
*
* @see https://html.spec.whatwg.org/#concept-domstringmap-pairs
*
* @param string $raw_attribute_name Raw attribute name as found in the source HTML.
* @return string|null Transformed `dataset` name, if valid, else `null`.
*/
function wp_kses_transform_custom_data_attribute_name( $raw_attribute_name ) {
if ( 1 !== preg_match( '~^data-(?P<custom_name>[^=/> \t\f\r\n]+)$~', $raw_attribute_name, $matches ) ) {
return null;
}

$custom_name = $matches['custom_name'];

/*
* > For each name in list, for each U+002D HYPHEN-MINUS character (-)
* > in the name that is followed by an ASCII lower alpha, remove the
* > U+002D HYPHEN-MINUS character (-) and replace the character that
* > followed it by the same character converted to ASCII uppercase.
*
* @link https://html.spec.whatwg.org/#concept-domstringmap-pairs
*/
$custom_name = preg_replace_callback(
'/-[a-z]/',
static function ( $dash_matches ) {
// Transforms "-a" -> "A".
return strtoupper( $dash_matches[0][1] );
},
$custom_name
);

return $custom_name;
}

/**
* Builds an attribute list from string containing attributes.
*
Expand Down
65 changes: 63 additions & 2 deletions tests/phpunit/tests/kses.php
Original file line number Diff line number Diff line change
Expand Up @@ -1362,12 +1362,73 @@ public function data_safecss_filter_attr() {
* @ticket 33121
*/
public function test_wp_kses_attr_data_attribute_is_allowed() {
$test = '<div data-foo="foo" data-bar="bar" datainvalid="gone" data--invalid="gone" data-also-invalid-="gone" data-two-hyphens="remains">Pens and pencils</div>';
$expected = '<div data-foo="foo" data-bar="bar" data-two-hyphens="remains">Pens and pencils</div>';
$test = '<div data-foo="foo" data-bar="bar" datainvalid="gone" data--double-dash="retained" data-trailing-dash-="allowable" data-two-hyphens="remains">Pens and pencils</div>';
$expected = '<div data-foo="foo" data-bar="bar" data--double-dash="retained" data-trailing-dash-="allowable" data-two-hyphens="remains">Pens and pencils</div>';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see cases covering data--leading data-trailing- and data-middle--double cases. The -- together in the middle is missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change it would be good to convert this to use a data provider and add a bunch more valid and invalid use cases.

It probably should have been a data provider all along but please do not look in to the history of this feature to figure out who didn't do that in the first place. ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted in 5294a86


$this->assertSame( $expected, wp_kses_post( $test ) );
}

/**
* Ensures proper recognition of a data attribute and how to transform its
* name into what JavaScript code would read from an element's `dataset`.
*
* @ticket 61052
*
* @dataProvider data_possible_custom_data_attributes_and_transformed_names
*
* @param string $attribute_name Raw attribute name.
* @param string|null $dataset_name_if_any Transformed attribute name, or `null`
* if not a custom data attribute.
*/
public function test_wp_kses_transform_custom_data_attribute_name_recognizes_data_attributes( $attribute_name, $dataset_name_if_any ) {
$transformed_name = wp_kses_transform_custom_data_attribute_name( $attribute_name );

if ( isset( $dataset_name_if_any ) ) {
$this->assertNotNull(
$transformed_name,
"Failed to recognize '{$attribute_name}' as a custom data attribute."
);

$this->assertSame(
$dataset_name_if_any,
$transformed_name,
'Improperly transformed custom data attribute name.'
);
} else {
$this->assertNull(
$transformed_name,
"Should not have identified '{$attribute_name}' as a custom data attribute."
);
}
}

/**
* Data provider.
*
* @return array[].
*/
public static function data_possible_custom_data_attributes_and_transformed_names() {
return array(
// Non-custom-data attributes.
'Normal attribute' => array( 'post-id', null ),
'Single word' => array( 'id', null ),

// Normative custom data attributes.
'Normal custom data attribute' => array( 'data-post-id', 'postId' ),
'Leading dash' => array( 'data--before', 'Before' ),
'Trailing dash' => array( 'data-after-', 'after-' ),
'Double-dashes' => array( 'data-wp-bind--enabled', 'wpBind-Enabled' ),
'Triple-dashes' => array( 'data---one---two---', '-One--Two---' ),

// Unexpected but recognized custom data attributes.
'Only comprising a prefix' => array( 'data-', '' ),
'With upper case ASCII' => array( 'data-Post-ID', 'postId' ),
'With Unicode whitespace' => array( "data-\u{2003}", "\u{2003}" ),
'With Emoji' => array( 'data-🐄-pasture', '🐄Pasture' ),
'Brackets and colon' => array( 'data-[wish:granted]', '[wish:granted]' ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WordPress can be stricter than the specs and browser implementations (especially that latter if they differ from the former), to an extent that's the entire purpose of KSES.

I don't think accounting for special characters is overly wise as it's too easy to miss situations in which a contributor or author role can break out of the intended behaviour.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @peterwilsoncc! in the code for this PR there are two stages: the first is determining if there's a data attribute, and then the second is locking down a smaller allowable subset. these tests, and this function, is to ensure that we can all agree on what is a data attribute before we then apply further constraints.

it doesn't have to be this way, of course, but I find that it's really easy to let things slide through when we conflate the two ideas. in this particular case it looks like everything else should be eliminated in wp_kses_attr_check() other than the specific data attributes, but I also had to double and triple check the logic to make sure we weren't letting through some data attributes only because they didn't fit the pattern we expected.

so this was done for the reason of making it less likely to let something slip through. it doesn't have to be done this way, but it was the purpose for the structuring I made.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this was done for the reason of making it less likely to let something slip through. it doesn't have to be done this way, but it was the purpose for the structuring I made.

Thanks @dmsnell that makes a lot of sense. 👯

);
}

/**
* Ensure wildcard attributes block unprefixed wildcard uses.
*
Expand Down
Loading