Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to a whitelist sanitizer instead of blacklist #271

Closed
bcampeau opened this issue Feb 26, 2016 · 4 comments
Closed

Switch to a whitelist sanitizer instead of blacklist #271

bcampeau opened this issue Feb 26, 2016 · 4 comments
Labels
Enhancement New feature or improvement of an existing one

Comments

@bcampeau
Copy link
Contributor

I've found a number of AMP validation errors that are occurring after reviewing Google Webmaster Tools. Many are because AMP incompatible attributes like target="new" on hrefs and size to style elements are not being stripped out. There are other examples of user-generated markup in the post body that are failing validation.

I'm wondering if creating an allowed_html array from the AMP validator spec and pushing the body through wp_kses might address it? Open to ideas on a cleaner way to proceed if this is being worked on.

I'll probably just address a few by hand that are creating the bulk of the issues for us and definitely #130.

Related to #130, #253, #237.

@mjangda
Copy link
Contributor

mjangda commented Feb 26, 2016

Yeah, it's a bit of a pain.

A whitelist would definitely be better than the hunt-and-peck approach we have now. We had a bunch of problems with kses the first time around, which is why we ditched it (although, there are a bunch of problems with DOMDocument too; just can't win 💩).

Here's the full AMP spec.

I think there's a few challenges though:

  • Likely a fair bit of work involved to build out the whiltelist approach with the full spec.
  • I worry about how much slower it will be.
  • Keeping up with spec changes could get annoying (presumably, it won't change that often).

But like I said, definitely would be better than what we have in terms of completeness and validity.

@mjangda mjangda added the Enhancement New feature or improvement of an existing one label Mar 1, 2016
@bcampeau
Copy link
Contributor Author

bcampeau commented Mar 3, 2016

I've gone down the rabbit hole of reviewing a large sample of about 15,000 indexing errors on a very large site. I may put in a few basic fixes in the theme to clean out some common issues. However, more than ever, I definitely think the full whitelist based on the spec is the only way we'll ever get close to zero errors.

@mjangda
Copy link
Contributor

mjangda commented Mar 3, 2016

Lullabot's amp-library has a PHP version of the spec which we could potentially work with: https://github.com/Lullabot/amp-library/blob/master/src/Spec/validator-generated.php

@mjangda mjangda changed the title AMP incompatible HTML Switch to a whitelist instead of blacklist Aug 18, 2016
@mjangda mjangda changed the title Switch to a whitelist instead of blacklist Switch to a whitelist sanitizer instead of blacklist Aug 18, 2016
@westonruter
Copy link
Member

Whitelist sanitizer was implemented as of #600, with additional remaining work in #689.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or improvement of an existing one
Projects
None yet
Development

No branches or pull requests

3 participants