Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom non-AMP scripts #6528

Merged
merged 29 commits into from
Aug 19, 2021
Merged

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Aug 11, 2021

See #6443.

This pull request adds an opt-in to the AMP_Script_Sanitizer to do the sanitization of scripts itself, rather than list limiting itself to unwrapping noscript elements. The new behavior is controlled by the sanitize_scripts sanitizer arg, which is false by default but which can be enabled as follows:

add_filter(
	'amp_content_sanitizers',
	function ( $sanitizers ) {
		$sanitizers[ AMP_Script_Sanitizer::class ]['sanitize_scripts'] = true;
		return $sanitizers;
	}
);

When enabled, three new validation errors can be emitted by the script sanitizer:

  • CUSTOM_INLINE_SCRIPT: A custom inline script is encountered.
  • CUSTOM_EXTERNAL_SCRIPT: A custom external script is encountered (not loaded from the AMP CDN).
  • CUSTOM_EVENT_HANDLER_ATTR: An event handler attribute is encountered (e.g. onclick).

When activating this Test Custom Scripts mini plugin and adding a [custom_scripts] shortcode to a post, the validation errors appear as follows:

image

With this test plugin, when the scripts are removed the resulting output is as follows:

image

Notice how “Loading...” never changes since the script was removed and that the noscript was unwrapped. If if the invalid markup for the validation errors is marked as kept however:

image

Then the result on the frontend is as follows:

image

The effect of keeping a custom script is as follows:

  1. The script is not removed (obviously), and the data-ampdevmode attribute is added to prevent removal by the AMP_Tag_And_Attribute_Sanitizer.
  2. The noscript unwrapping behavior is disabled since this could likely result in duplication of JS and non-JS functionality.
  3. CSS tree shaking is disabled, allowing the green background color to be applied with style rule having a selector mentioning a JS-added class name. (The CSS max byte count enforcement is likewise lifted, via Allow CSS max byte count enforcement in TransformedIdentifier transformer to be configured  amp-toolbox-php#319.)
  4. Native img elements are used instead of amp-img, since AMP validation is not going to happen anyway. See Experiment: Add native img opt-in to prevent conversion to amp-img/amp-anim #6518.
  5. Likewise, native POST forms are also allowed without being converted to use action-xhr, as there may very well be scripts that are adding custom form validation logic on the page. See Add opt-in to prevent POST forms from being converted to amp-form (with action-xhr) #6527.

Note: The AMP_Script_Sanitizer is moved to the beginning of the list of sanitizers because it's likely that keeping custom scripts will result in other behavior changes to other sanitizers beyond disabling tree-shaking in the style sanitizer and using native images. The ability of a sanitizer to change the behavior of subsequently-run sanitizers is enabled by the new AMP_Base_Sanitizer::update_args() method that allows the arguments to be updated.

Unwrapping noscript elements

As noted above, when custom scripts are kept the noscript elements do not get unwrapped to avoid duplicated functionality (JS and non-JS fallback) from appearing on the page.

There is also now a unwrap_noscripts sanitizer argument that allows unwrapping to be turned off entirely, although it is enabled by default.

Lastly, when an individual noscript element has a data-amp-no-unwrap attribute it will be selectively skipped from being unwrapped. Fixes #6030.

Checklist

  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

@westonruter westonruter added this to the v2.2 milestone Aug 11, 2021
@westonruter westonruter self-assigned this Aug 11, 2021
@westonruter westonruter force-pushed the add/custom-script-allowance branch from 021b654 to c5ec8b8 Compare August 17, 2021 00:17
@westonruter westonruter force-pushed the add/custom-script-allowance branch from 1210af1 to d8b4227 Compare August 17, 2021 05:54
@westonruter westonruter force-pushed the add/custom-script-allowance branch from d8b4227 to 1578aba Compare August 17, 2021 06:01
@westonruter westonruter requested a review from pierlon August 17, 2021 23:45
@westonruter westonruter marked this pull request as ready for review August 17, 2021 23:45
@github-actions
Copy link
Contributor

github-actions bot commented Aug 17, 2021

Plugin builds for f35c4e9 are ready 🛎️!

includes/amp-helper-functions.php Outdated Show resolved Hide resolved
@@ -1531,7 +1533,11 @@ public static function ensure_required_markup( Document $dom, $script_handles =
}

// When opting-in to POST forms, omit the amp-form component entirely since it blocks submission.
if ( amp_is_native_post_form_allowed() && $dom->xpath->query( '//form[ @action and @method and translate( @method, "POST", "post" ) = "post" ]' )->length > 0 ) {
if (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No more checking if amp_is_native_post_form_allowed()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, because now that the AMP_Script_Sanitizer can modify the args of AMP_Form_Sanitizer to enable the native_post_forms_allowed arg, the return value of amp_is_native_post_form_allowed() may not be accurate at this point. It's return value is used as the initial value for native_post_forms_allowed, but since it can be changed by a sanitizer then it's only relevant for setting the initial value in amp_get_content_sanitizers().

In fact, there's not really a need for a global amp_is_native_post_form_allowed() function anymore.

The only reason why the function was here was to offer a slight performance improvement to skip doing the XPath query. But now we are checking to see if the amp-form extension was identified instead.

* @since 2.2
*/
protected function sanitize_script_elements() {
$scripts = $this->dom->xpath->query( '//script[ not( @type ) or not( contains( @type, "json" ) ) ]' );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for whitelisting type=json scripts? What about scripts containing HTML templates, for example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for skipping JSON scripts is that they have no logic that can mutate the page, which is the main concern for script sanitization. If there is JSON on the page, it's not going to cause issues with tree shaking, for example. Nor will it negatively impact PX. So this is specifically sanitizing JS scripts (which the method used to be called, and it could get renamed back).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed method and arg in dd49403 and 3070b83.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

So this is specifically sanitizing JS scripts (which the method used to be called, and it could get renamed back).

According to MDN, if the type attribute has any other value than module or a permitted JavaScript MIME type, the script will be treated as a data block, so I'm wondering if skipping only JSON scripts would be a shortsighted approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right. I forgot about examples like <script type="text/plain">

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 03fb8b7

Comment on lines +288 to +296
Extension::POSITION_OBSERVER === $element->tagName
&&
Attribute::ONCE === $event_handler_attribute->nodeName
)
||
(
Extension::FONT === $element->tagName
&&
substr( $event_handler_attribute->nodeName, 0, 3 ) === 'on-'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

includes/sanitizers/class-amp-script-sanitizer.php Outdated Show resolved Hide resolved
includes/sanitizers/class-amp-script-sanitizer.php Outdated Show resolved Hide resolved

// When there are kept custom scripts, skip tree shaking since it's likely JS will toggle classes that have
// associated style rules.
// @todo There should be an attribute on script tags that opt-in to keeping tree shaking and/or to indicate what class names need to be included.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to keep tree shaking enabled by default, and then provide an opt-in attribute to disable tree shaking?

Copy link
Member Author

@westonruter westonruter Aug 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here is this, consider this common script in themes:

<script>document.documentElement.className = document.documentElement.className.replace( 'no-js', 'js' );</script>

If there are CSS rules such as:

.js .mobile-nav > ul {
    display: none;
}
.no-js .mobile-nav > ul {
    display: block;
}

The result is that with tree-shaking, the second rule will get stripped out because it is not aware of the className change by the script. So this is why tree shaking needs to be disabled by default. Nevertheless, there could be a way to opt-in to keeping tree shaking by adding an attribute to the script to indicate which class names it would be mutating:

<script data-mutation-classlist="no-js js">
document.documentElement.className = document.documentElement.className.replace( 'no-js', 'js' );
</script>

If such a hypothetical attribute were present, then those class names could be added to the AMP_Style_Sanitizer::$used_class_names and the style rule would not get stripped out by the tree shaker.

So that's the thinking behind this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK that makes sense 👍.

Comment on lines +870 to +871
// Capture the selector conversion mappings from the other sanitizers.
foreach ( $this->sanitizers as $sanitizer ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for moving this foreach loop from init() to sanitize()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because it allows us to gather up the selector mappings after all of the previous sanitizers have run, as opposed to gathering them before the other sanitizers have run. This is important for example if the script sanitizer enables native_img_used for the AMP_Img_Sanitizer, because when enabled no longer does the style sanitizer need to remap img to amp-img/amp-anim.

Copy link
Contributor

@pierlon pierlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bartoszgadomski
Copy link
Contributor

QA passed

I tested this with Test Custom Scripts plugin that Weston created.

Screenshot 2021-12-13 at 09 12 29

Sandboxing level Markup status Screenshot
Disabled Removed Screenshot 2021-12-13 at 09 13 54
Disabled Kept Screenshot 2021-12-13 at 09 13 00
Enabled (3: strict) Removed Screenshot 2021-12-13 at 09 18 08
Enabled (3: strict, automatically changed to 1: loose) Kept Screenshot 2021-12-13 at 09 21 09
Enabled (2: moderate) Removed Screenshot 2021-12-13 at 10 09 04
Enabled (2: moderate, automatically changed to 1: loose) Kept Screenshot 2021-12-13 at 10 09 44
Enabled (1: loose, automatically changed to 2: moderate) Removed Screenshot 2021-12-13 at 10 12 31
Enabled (1: loose) Kept Screenshot 2021-12-13 at 10 12 45

Things to note:

  • image element in 2: moderate sandboxing level with invalid markup removed is displayed improperly:

Screenshot 2021-12-13 at 10 16 05

@westonruter westonruter added the Changelogged Whether the issue/PR has been added to release notes. label Dec 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bento Changelogged Whether the issue/PR has been added to release notes. Sandboxing Experiment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide mechanism to prevent noscript unwrapping
3 participants