Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MinifyHtml transformer #588

Merged
merged 6 commits into from
Jan 28, 2020
Merged

Add MinifyHtml transformer #588

merged 6 commits into from
Jan 28, 2020

Conversation

sebastianbenz
Copy link
Collaborator

  • collapses whitespace (except in
    , <textarea>, <script>
  • trims whitespace (except in
    , <textarea>)
  • removes comments except if they match a given pattern
  • minifies JSON
  • minifies inline amp-script using terser

* collapses whitespace (except in <pre>, <textarea>, <script>
* trims whitespace (except in <pre>, <textarea>)
* removes comments except if they match a given pattern
* minifies JSON
* minifies inline amp-script using terser
Copy link
Member

@andreban andreban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - This is great - I always thought we should try comparing DOM Trees in the tests, but I think I was just proven wrong by Minify :D

child.data = JSON.stringify(JSON.parse(child.data), null, '');
} catch (e) {
// invalid JSON
this.log.warn('Invalid JSON', child.data);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the JSON is invalid, does it still parse in the browser? Just wondering if this somehow fail more explicitly if that's the case (or maybe not in the scope of optimizer?)

* MinifyHtml - minifies files size by:
*
* - minifying inline JSON
* - minifying inline amp-script using https://www.npmjs.com/package/terser
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the <style amp-custom> tag already assumed to be minified?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This currently happens in the SeparateKeyframes as we're already parsing the CSS there.

* - minifying inline amp-script using https://www.npmjs.com/package/terser
* - collapsing whitespace outside of pre, script, style and area.
* - removing comments
*
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a list of additional transformations you could consider:

  • removing quotes where they are not needed (lang="en_US" => lang=en_US)
  • removing empty attribute values (data-bool-flag="" => data-bool-flag)
  • removing unneeded attribute values (disabled="disabled" => disabled)
  • removing redundant attribute values where the values is the default (<input type=number step=1> => <input type=number>)
  • removing optional tags: https://html.spec.whatwg.org/multipage/syntax.html#optional-tags
  • decoding HTML entities into Unicode characters (&copy; => ©)

Probably not needed within an Amp context:

  • removing type="text/javascript" from <script> tags
  • removing type="text/css" from <style> tags

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions! I've filed #594

Some things are already handled by the parser (e.g. removing empty attribute values)

}

canCollapseWhitespace(tagName) {
return !/^(?:script|style|pre|textarea)$/.test(tagName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how important performance is here, but as this runs over every loop, it might be worthwhile to test using direct string comparisons here instead of a regex.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just tested this, non-regex is 25% faster. Addressed in 62c343e

}

canTrimWhitespace(tagName) {
return !/^(?:pre|textarea)$/.test(tagName);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how important performance is here, but as this runs over every loop, it might be worthwhile to test using direct string comparisons here instead of a regex.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 62c343e


minifyScriptNode(node, opts) {
const isJson = this.isJson(node);
const isAmpScript = this.isInlineAmpScript(node);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't one of these exclude the other? No need to process the second if the first is already true.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 62c343e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants