Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental: Add sandboxing levels #6546

Merged
merged 90 commits into from
Oct 13, 2021
Merged

Conversation

westonruter
Copy link
Member

@westonruter westonruter commented Aug 18, 2021

Summary

Fixes #6443.

This PR introduces a new Sandboxing Level section which is available in the Standard template mode drawer:

image

Given the example of Ninja Forms, which is a forms plugin that requires scripts to render a form, compare the following two sandboxing levels:

Strict Sandboxing (Valid AMP) and Moderate Sandboxing 👎 Non-AMP and AMP w/ Loose Sandboxing 👍
image image

Notice how where previously the form would have failed to render entirely on an validAMP page, whereas now with the Loose Sandboxing level the custom scripts required by Ninja Forms are kept by default, allowing the form to work normally:

ninja-forms-functional.mov

Loose Sandboxing

When there are custom scripts on the page in the loose sandboxing level:

  1. All CSS processing is disabled, leaving external stylesheet link elements, style elements, and style attributes as-is (including !important qualifiers). Since tree-shaking cannot be safely performed, it is likely preferable to not have a massive inline style of all concatenated CSS. CSS tree shaking is disabled and inline style attributes and !important qualifiers are not transformed.
  2. Native video is used (with dimensions added).
  3. Native img is used (with dimensions added).
  4. Native audio is used.
  5. Native iframe is used.
  6. Native POST forms are used rather than converting to use amp-form (including comments form).
  7. WordPress's built-in comment-reply script is used for comment threading instead of the amp-bind implementation.
  8. Bento versions are preferred for any any other AMP components which may have been added to the page.
  9. Custom scripts are retained and noscript elements are not unwrapped.
  10. Any otherwise invalid AMP markup is kept by default (and will be reported as validation errors in AMP Developer Tools).

As seen in the Ninja Forms example above, this allows non-AMP scripting to work on a page that still uses the AMP plugin's sanitizers and also goes through the AMP Optimizer.

👉 The fundamental goals of the loose sandboxing level is that it will allow us to:

  1. Default the template mode to Standard as opposed to Reader.
  2. Not break pages by default when they have invalid AMP markup on them.

Moderate Sandboxing

When a page doesn't have custom scripts on it (or rather the custom scripts have been marked as being PX-verified, see below), then a more restrictive set of sanitization constraints are applied on the page:

  1. CSS tree shaking remains enabled, but inline style attributes and !important qualifiers are not transformed.
  2. Element conversions are performed for video, audio, and iframe.
  3. Bento versions of components are preferred.
  4. Native POST forms are allowed.
  5. WordPress's built-in comment-reply script is used for comment threading instead of the amp-bind implementation (which is not feature-complete and the comment-reply script does not negatively impact PX).
  6. Custom scripts are removed and noscript elements are unwrapped.
  7. Any otherwise invalid AMP markup is removed by default (and will be reported as validation errors in AMP Developer Tools).

The intention here is that the PX characteristics for moderate sandboxing should be more or less the same as for strict sandboxing. The main difference in terms of PX is that the moderate sandboxing level is not eligible for AMP Cache. This level allows for external POST forms to not break on AMP pages, as is a big pain point currently on valid AMP pages (ampproject/amphtml#27638), so much so that we have a documentation page all about it. For example, the Revue Newsletter Signup block in Jetpack shows an error message when on a valid AMP page (strict sandboxing), whereas in moderate sandboxing the form submission opens a new window as expected:

Strict Sandboxing (Valid AMP) 👎 Moderate Sandboxing 👍
image image

Strict Sandboxing

The strict sandboxing level is the same as the current behavior today, where the AMP plugin removes any AMP-invalid markup by default to generate a valid AMP page.

Dynamic Sandboxing Levels

Even when a user selects the loose sandboxing level, if it turns out that on a given page there are no custom scripts and no POST forms except for the comments form, the page will automatically be upgraded to be valid AMP since there is nothing blocking it from doing so. In the same way, if on another page there are no custom scripts and yet there is an external POST form for newsletter signups, the page will be served at the moderate sandboxing level. In this way, even when the user has selected the loose sandboxing level, the majority of pages on a site could still be valid AMP. For example, all single blog posts could be valid AMP, while the shopping cart page may be at the loose level and the newsletter sign up page may be at the moderate level.

In this way, the chosen sandboxing level can be thought of as the default minimum sandboxing level of PX conformance. The actual sandboxing level applied is the effective level.

The dynamic levels work both ways as well. As noted above, If you are in the moderate sandboxing level, any invalid AMP markup is removed by default and validation errors will be raised. If you mark any of those validation errors as kept, then the page's level will go down from moderate to loose.

The effective level is currently displayed with the AMP admin bar menu item with the 1️⃣, 2️⃣, or 3️⃣ emoji.

New attributes to demarcate non-AMP markup

One thing that has become clear is that the data-ampdevmode attribute is highly overloaded. It can mean three things:

  1. The document is in dev mode (when on the html element).
  2. An element should be exempted from validation (when the attribute is present on an element).
  3. All attributes of an element should be exempted from validation (when one of them is the attribute).

This overloading causes some ambiguity for what is intended. For example, currently the form sanitizer skips processing a form that has the dev mode attribute. When the new script sanitizer runs first and an onsubmit attribute is kept, this currently results in data-ampdevmode attribute added. When the form sanitizer then runs later, it skips processing of the form altogether even though the only intention was to exempt the onsubmit attribute from validation. The result is some validation errors not being raised when they should be, and thus some unexpected behaviors occur.

The ambiguity of dev mode also extends to what the intention is for why a given bit of markup is being kept on the page. Is it being kept just for "dev mode" in that the user is a developer or admin user who is administering the page? Or is dev mode being used as a hack to prevent the AMP plugin from stripping out the markup from the page? On another level, is the invalid AMP markup being exempted with dev mode because AMP forbids it due to it negatively impacting the page, or is it being exempted because it actually doesn't hurt PX but it just isn't allowed in the AMP format (yet)?

For these reasons, this PR introduces 2 sets of attributes:

  1. Flagging elements to skip AMP validation (which may also negatively impact PX):
  • data-amp-unvalidated-tag: Prevents a tag as a whole from being removed.
  • data-amp-unvalidated-attrs: Prevents specific attributes from being removed. The value is a token list of attribute names.
  1. Flagging elements to skip AMP validation but which have been verified to be good for PX:
  • data-px-verified-tag: Prevents a tag as a whole from being removed.
  • data-px-verified-attrs: Prevents specific attributes from being removed. The value is a token list of attribute names.

The latter set of attributes are particularly useful on custom scripts which are interacting with the APIs for Bento components.

These attributes will suppress validation errors from being reported in the validation screens. And when invalid markup causing a validation error is marked as “kept”, either the data-amp-unvalidated-tag or data-amp-unvalidated-attrs attribute will be added during post-processing. For example, if there was an onload attribute on the body and you indicate you want the markup kept, the resulting page would contain:

<body data-amp-unvalidated-attrs="onload" onload="alert('Hi!!!')">

Sanitizer ordering

Prior to this PR, the only requirement was for the following sanitizers to run at the end:

  1. AMP_Layout_Sanitizer
  2. AMP_Style_Sanitizer
  3. AMP_Meta_Sanitizer
  4. AMP_Tag_And_Attribute_Sanitizer

When a user filtered amp_content_sanitizers to add additional sanitizers, they could appear in any order among the previous sanitizers. This could have resulted in problems, specifically for the AMP_Script_Sanitizer. In particular, sanitizers like AMP_Embed_Sanitizer and AMP_O2_Player_Sanitizer are removing script elements from the page (e.g. Twitter's widgets.js). If AMP_Script_Sanitizer were to run first, then it would find these custom scripts too early and could result in a page being downgraded to L1, even though the AMP_Embed_Sanitizer would be converting the tweets to amp-twitter already. Therefore, this PR ensures that all of the plugin's sanitizers run in the order they are defined prior to filtering with the exception of the the embed sanitizers (AMP_Embed_Sanitizer, AMP_O2_Player_Sanitizer, AMP_Playbuzz_Sanitizer) and any other user-supplied sanitizers which will get run before all the others. This will ensure consistency in the sanitized output of the page as well when plugins are filtering amp_content_sanitizers (either by appending or prepending sanitizers to the array).

The sanitizer ordering is particularly important now because when a sanitizer encounters markup that impacts the sandboxing level, it may need to change the arguments of sanitizers which haven't run yet.

The final sanitizers now are forced to run in the following order:

  1. AMP_Core_Theme_Sanitizer
  2. AMP_Script_Sanitizer
  3. AMP_Form_Sanitizer
  4. AMP_Comments_Sanitizer
  5. AMP_Srcset_Sanitizer
  6. AMP_Img_Sanitizer
  7. AMP_Video_Sanitizer
  8. AMP_Audio_Sanitizer
  9. AMP_Object_Sanitizer
  10. AMP_Iframe_Sanitizer
  11. AMP_Gallery_Block_Sanitizer
  12. AMP_Block_Sanitizer
  13. AMP_Accessibility_Sanitizer
  14. AMP_Layout_Sanitizer
  15. AMP_Style_Sanitizer
  16. AMP_Meta_Sanitizer
  17. AMP_Tag_And_Attribute_Sanitizer

Dynamic Bento Experiment Opt-in

Previously with #6353, when amp_bento_enabled was filtered on, the JS code to enable the bento AMP experiment was always added. This is no longer the case. The experiment script will only be added to the page if there's actually a Bento component on it. When a Bento version is available for a used component on the page, the Bento version will be used when either amp_bento_enabled is filtered to be true or the page is being served in the loose or moderate sandboxing levels.

Revamped Comments Handling

The comments sanitizer has been completely rewritten in how it implements amp-bind comment threading. The code was quite old and it turned out there was a lot that wasn't needed anymore. In particular, a lot of code seemed to be a workaround for the clear action not working as expected on forms, however now it is working as expect. More to the point of sandboxing levels, this PR discontinues splitting up the responsibilities for making comments AMP-compatible between the AMP_Theme_Support class (during template rendering) and the AMP_Comments_Sanitizer (during post-processing). Now everything is done during post-process in in the AMP_Comments_Sanitizer and this allows it to dynamically switch between WP-native comment-reply and the amp-bind implementation based on what sandboxing level the page is at (which can only be determined at post-processing time).

Moving the logi to the post-processing phase also allowed for a full fix of #2489, going beyond the short-term fix introduced in #4388.

Upon clicking a link to reply to a comment by “admin”:

Non-AMP w/ JS Non-AMP w/o JS AMP Before 👎 AMP After 👍
image image image image

Commits first added in sub-PR #6546.
Also fixes #6231.
Fixes #4624.

Testing

Since it is experimental, you have to opt-in via the following plugin code (or use example plugin):

add_filter( 'amp_experimental_sandboxing_enabled', '__return_true' );

Choose the loose sandboxing level. The chosen sandboxing level is indicated in the meta generator tag, followed by the effective sandboxing level for the given page (here chosen is 1 and effective is 3):

<meta name="generator" content="AMP Plugin v2.2.0-alpha; mode=standard; sandboxing-level=1:3">

I've put together three example posts that demonstrate the sandboxing levels. For each post, obtain the post content from the Gist linked to in the heading, and add a comment to each of the posts. When in the loose sandboxing level, each page should result in the aforementioned dynamic sandboxing levels where a page that has nothing blocking it from being valid AMP will automatically upgrade to strict even though the loose sandboxing level is selected.

Loose Moderate Strict
loose moderate strict

This PR adds a UI which exposes the functionality that has been added in:

Key Todos

  • Add PX-verified to amp custom style and style attributes that have importants.
    • If none were on the page, then still we can serve in Strict.
  • Mark native images, audio, forms, video as Px-verified.
  • Add allow_excessive_css arg the Style sanitizer which is automatically set when in lower sandboxing modes? We should not rely on standard mode as being the suppressor of excessive css errors.
  • Data-bento attribute to indicate scripts are specifically bento-friendly? (Yes, this is data-px-verified.)
  • Per-script flag to say tree shaking ok? (This is currently data-px-verified.)
  • Show effective sandbox level in admin bar and meta generator. We need a clear way of knowing which level we're in, in a way separate from the attributes? Actually, we can simply indicate Level 1 if there are any amp-unvalidated attributes. If not, then Level 2 if there are any px-verified attributes. Otherwise, it's Level 3 (even when dev mode)
    • Add a method to ValidationExemption which returns the sandbox level: 1, 2, 3.
    • Maybe these methods should rather be put in the Sandboxing class?
  • When in level 1, instead of inlining all styles without tree-shaking, opt to instead just leave the external stylesheets as-is.
  • Custom theme that isn't AMP-compatible should be checked.
  • Compare loose level and with the AMP plugin turned off. Check improvement.

Future Todos

  • Address feedback from @pierlon in Experimental: Add sandboxing levels #6546 (review).
  • Address findings in Experimental: Add sandboxing levels #6546 (comment).
  • Add validation error when a external post form present?
  • Raise validation error when native post form on page, keep by default in non-strict? Allow remove to upgrade to native? The key thing is that it would have to be done for ALL forms on the page, not just the one.
  • Should the sandboxing level be overridden when the author supplies the attributes up front?
  • If wondering why the level is chosen, you can add outlines around elements that have the attributes.
  • Should Developer Tools show the elements that have validation exemptions and who caused them?
  • If in Strict level, should the amp-unvalidated attributes be respected? Yes, because the user can supply them already.
  • Should validation exceptions only be honored if sandboxing experiment is enabled?
  • Instead of data-amp-unvalidated-* consider data-px-unverified-* or data-px-exempted-*.
  • Or… instead of adding attributes to elements to exempt from sanitization, what if we instead stored the spl_hash of the node and then used that in the lookup?
  • Attribute on script tags to indicate what selectors are tree shaking.
  • If custom scripts are allowed, should MediaElement.js be allowed as well? Or should we go ahead with our Bento versions? (Core blocks make use of native audio/video instead of ME.js)

Checklist

  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

@westonruter westonruter added this to the v2.2 milestone Aug 18, 2021
@westonruter westonruter marked this pull request as ready for review August 18, 2021 06:17
@westonruter
Copy link
Member Author

(Marking as ready for review to trigger builds.)

@westonruter westonruter changed the title Experimental: Add sandboxing levels [WIP] Experimental: Add sandboxing levels Aug 18, 2021
@codecov
Copy link

codecov bot commented Aug 18, 2021

Codecov Report

Merging #6546 (faaf295) into develop (68bccbf) will increase coverage by 0.50%.
The diff coverage is 96.80%.

Impacted file tree graph

@@              Coverage Diff              @@
##             develop    #6546      +/-   ##
=============================================
+ Coverage      76.63%   77.13%   +0.50%     
- Complexity      6318     6420     +102     
=============================================
  Files            248      250       +2     
  Lines          19812    20121     +309     
=============================================
+ Hits           15182    15521     +339     
+ Misses          4630     4600      -30     
Flag Coverage Δ
javascript 79.06% <ø> (ø)
php 77.05% <96.80%> (+0.53%) ⬆️
unit 77.05% <96.80%> (+0.53%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/AmpWpPlugin.php 100.00% <ø> (ø)
src/Sandboxing.php 78.66% <78.66%> (ø)
includes/class-amp-theme-support.php 86.65% <97.72%> (+0.31%) ⬆️
src/ValidationExemption.php 98.33% <98.33%> (ø)
includes/amp-helper-functions.php 84.53% <100.00%> (-0.11%) ⬇️
includes/sanitizers/class-amp-audio-sanitizer.php 94.38% <100.00%> (+0.33%) ⬆️
includes/sanitizers/class-amp-base-sanitizer.php 73.22% <100.00%> (+0.98%) ⬆️
...cludes/sanitizers/class-amp-comments-sanitizer.php 95.72% <100.00%> (+15.72%) ⬆️
includes/sanitizers/class-amp-form-sanitizer.php 99.27% <100.00%> (+0.08%) ⬆️
includes/sanitizers/class-amp-iframe-sanitizer.php 96.07% <100.00%> (+0.13%) ⬆️
... and 16 more

@github-actions
Copy link
Contributor

github-actions bot commented Aug 18, 2021

Plugin builds for faaf295 are ready 🛎️!

Base automatically changed from add/custom-script-allowance to develop August 19, 2021 01:10
@westonruter
Copy link
Member Author

Rebasing onto develop now that #6528 has been merged.

@westonruter westonruter changed the title [WIP] Experimental: Add sandboxing levels Experimental: Add sandboxing levels Sep 3, 2021
@westonruter westonruter requested a review from pierlon September 3, 2021 21:33
@westonruter westonruter marked this pull request as draft September 3, 2021 22:32
@westonruter
Copy link
Member Author

Finally the coverage is ✅

@westonruter
Copy link
Member Author

westonruter commented Oct 5, 2021

  • Custom theme that isn't AMP-compatible should be checked.

I checked Hestia which is not AMP-compatible, and as expected it resulted in sandboxing level 1 due to the custom scripts:

Frontend Validated URL
image image

Because custom scripts are included and CSS processing is disabled in Level 1, the mobile menu works as expected:

mobile-menu.mov

Observation: The comment-reply.js script should not actually be reported as an error because it is PX-verified.
2.

  • Compare loose level and with the AMP plugin turned off. Check improvement.
AMP Disabled AMP Enabled
21 requests 20 requests
image image

Observations:

  1. HTML minification is being performed, thanks to AMP Optimizer.
  2. The AMP runtime v0.js is still being added to the page even when there are zero AMP components present. This should be getting omitted.
  3. The amp-default.css stylesheet should be omitted on level 1.
  4. The AMP runtime CSS (ampshared.css) is being inlined when it should likely be omitted (along with the AMP boilerplate).
  5. Nevertheless, there are fewer requests in non-AMP because the twemoji.js, wp-emoji.js, and wp-embed.js scripts are being omitted.
  • The first two are OK to be omitted since we're doing wp_staticize_emoji() instead. This is a performance win.
  • The last is not correct to be excluded unconditionally. When in Level 1, we should remove this script automatically from the page when there are no WordPress embed iframes (or else, use amp-wordpress-embed).
  1. The comment-reply.js script is lacking the expected defer attribute.
  2. The preconnect and dns-prefetch links are being added for the Google Fonts stylesheet (for fonts.gstatic.com).
  3. We should omit the preconnect for s.w.org if there are no emoji on the page.

With WP Super Cache enabled, I ran Lighthouse 5 times with AMP disabled and AMP enabled, with the last row being the average:

AMP Disabled AMP Enabled
82 81
80 75
84 76
85 82
84 82
83 79

So at the moment having AMP active is a net negative for an AMP-incompatible theme. I believe it will become a positive when the above observations are addressed. But they should probably be tackled in a new PR since this one is getting too large to review.

@westonruter westonruter requested a review from pierlon October 5, 2021 19:09
Copy link
Contributor

@pierlon pierlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments to address, but functionality wise this looks ready and can be merged ✅.

includes/sanitizers/class-amp-style-sanitizer.php Outdated Show resolved Hide resolved
includes/sanitizers/class-amp-style-sanitizer.php Outdated Show resolved Hide resolved
* @param int|null $effective_sandboxing_level Effective sandboxing level.
*/
public function finalize_document( Document $dom, $effective_sandboxing_level ) {
$actual_sandboxing_level = AMP_Options_Manager::get_option( self::OPTION_LEVEL );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe bail if not in Standard mode? Or rather amp_is_canonical() should be added as a condition to Sandboxing::is_needed().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good call. I'll add that to the future todos.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #6715.

*
* @var int[]
*/
const LEVELS = [ 1, 2, 3 ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on each sandbox level being it's own constant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem: that would eliminate the the relevance of the comment on this line:

if ( $sandboxing_level < 3 ) { // <3 === ❤️

Joking aside, that makes sense. I'll add it to the future todos.

westonruter and others added 2 commits October 13, 2021 12:03
Co-authored-by: Pierre Gordon <[email protected]>
…oxing-levels

* 'develop' of github.com:ampproject/amp-wp: (41 commits)
  Bump eslint-plugin-jsdoc from 36.1.0 to 36.1.1
  Bump lint-staged from 11.1.2 to 11.2.3
  Use assertContains for array
  Use DependencyInjectedTestCase
  Add test assertions
  Update comments
  Update unit test cases
  Remove unwanted style, Fix notice style in mobile
  Revert some changes and update unit test case
  Bump @babel/core from 7.15.5 to 7.15.8
  Fix grammar in comment
  Bump css-minimizer-webpack-plugin from 3.0.2 to 3.1.1
  Apply suggestions from code review
  Remove changes from .github
  Bump postcss from 8.3.8 to 8.3.9
  Remove template change notice from other pages
  Remove template change notice from other pages
  Add unit test case
  Add data removal notice to plugin description
  Update unit test case
  ...
@westonruter westonruter enabled auto-merge October 13, 2021 19:19
@westonruter
Copy link
Member Author

JS unit tests are failing with:

[2021-10-13T19:27:04.159Z] ['error'] Error POSTing to https://codecov.io: 404 {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}
[2021-10-13T19:27:04.160Z] ['error'] There was an error running the uploader: Error uploading to https://codecov.io: Error: Not Found
Error: Codecov: Failed to properly upload: The process '/home/runner/work/_actions/codecov/codecov-action/v2.1.0/dist/codecov' failed with exit code 255

Is it down? They say all systems operational.

@pierlon
Copy link
Contributor

pierlon commented Oct 13, 2021

JS unit tests are failing with...

Seems to be an issue with the uploader Codecov uses, but they now pass after restarting the jobs for some reason.

@pierlon
Copy link
Contributor

pierlon commented Oct 13, 2021

Aaand I don't know why it's not auto-merging, maybe GitHub is busy...

@westonruter westonruter disabled auto-merge October 13, 2021 20:53
@westonruter
Copy link
Member Author

I think it's because one of the reviewers hasn't approved, but I'm merging 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bento Changelogged Whether the issue/PR has been added to release notes. Sandboxing Experiment
Projects
None yet
3 participants