-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new_audit: add charset declaration audit #10284
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nicely done. on the overall, the audit looks great. just a few nits and ideas.
const Audit = require('../audit.js'); | ||
const i18n = require('../../lib/i18n/i18n.js'); | ||
const MainResource = require('../../computed/main-resource.js'); | ||
const CONTENT_TYPE_HEADER = 'content-type'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yesterday we moved these down to above the class dfn. so i think you have a few more commits to push
@@ -0,0 +1,84 @@ | |||
/** | |||
* @license Copyright 2016 Google Inc. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- its a lighthouse reviewer's favorite thing to call out. 🤕
/** Title of a Lighthouse audit that provides detail on if the charset is set properly for a page. This title is shown when the charset is defined correctly. */ | ||
title: 'Properly defines charset', | ||
/** Title of a Lighthouse audit that provides detail on if the charset is set properly for a page. This title is shown when the charset meta tag is missing or defined too late in the page. */ | ||
failureTitle: 'Charset element is missing or occurs too late on the page', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failureTitle: 'Charset element is missing or occurs too late on the page', | |
failureTitle: 'Charset declaration is missing or occurs too late in the HTML', |
/** Title of a Lighthouse audit that provides detail on if the charset is set properly for a page. This title is shown when the charset meta tag is missing or defined too late in the page. */ | ||
failureTitle: 'Charset element is missing or occurs too late on the page', | ||
/** Description of a Lighthouse audit that tells the user why the charset needs to be defined early on. */ | ||
description: 'A character encoding declaration is required whether it is done explicitly ' + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slight rewording. let's remove mention of the BOM here, as I dont think we actually want to recommend it. the linked resource takes care of mentioning it anyway.
A character encoding declaration is required. It can be done with a
<meta>
tag in the first 1024 bytes of the HTML or in theContent-Type
HTTP response header.
*/ | ||
static audit(artifacts, context) { | ||
const devtoolsLog = artifacts.devtoolsLogs[Audit.DEFAULT_PASS]; | ||
return MainResource.request({devtoolsLog, URL: artifacts.URL}, context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these days we'd write this with async await instead. a little nicer since you can drop the indentation below. i'd recommend it
// Check the http header 'content-type' to see if charset is defined there | ||
if (mainResource.responseHeaders) { | ||
const contentTypeHeader = mainResource.responseHeaders | ||
.find(header => header.name.toLowerCase() === CONTENT_TYPE_HEADER); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing too fancy about this const so i'd inline it here.
|
||
describe('Charset defined audit', () => { | ||
it('succeeds when the page contains the charset meta tag', () => { | ||
const finalUrl = 'https://example.com/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we try to strike a weird balance between DRY and WET in our tests.
in this case i think this test would benefit from a getArtifacts()
helper method that all of these cases can use. that way it'd be a much easier to see at at glance how each of the artifacts differ.
if you search for artifacts(
in the lh-core/test/
folder you'll find a few diff tests that use this sort of pattern.
const MainResource = require('../../computed/main-resource.js'); | ||
const CONTENT_TYPE_HEADER = 'content-type'; | ||
const CHARSET_META_REGEX = /<meta.*charset="?.{1,}"?.*>/gm; | ||
const CHARSET_HTTP_REGEX = /charset=.{1,}/gm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mentioned you could add these consts to the module.exports, and that way you can write some unit tests against them.
iirc, you wrote these regexs to handle some extra fancy cases that the current unit tests dont cover. like charset=
(empty string value). so i think it'd be worth having a unit test for each regex just to give it a handful of cases it should match and not match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that way you can write some unit tests against them.
let me know if you have any questions on this. i'm thinking 1 test with like 10-ish assertions using various html variants.
regexes are hilarious so its good to test out all sorts of edge cases.
@@ -0,0 +1,185 @@ | |||
/** | |||
* @license Copyright 2016 Google Inc. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @license Copyright 2016 Google Inc. All Rights Reserved. | |
* @license Copyright 2020 Google Inc. All Rights Reserved. |
const MainResource = require('../../computed/main-resource.js'); | ||
const CONTENT_TYPE_HEADER = 'content-type'; | ||
const CHARSET_META_REGEX = /<meta.*charset="?.{1,}"?.*>/gm; | ||
const CHARSET_HTTP_REGEX = /charset=.{1,}/gm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that way you can write some unit tests against them.
let me know if you have any questions on this. i'm thinking 1 test with like 10-ish assertions using various html variants.
regexes are hilarious so its good to test out all sorts of edge cases.
|
||
describe('Charset defined audit', () => { | ||
it('succeeds when the page contains the charset meta tag', () => { | ||
const htmlContent = '<meta charset="utf-8" />'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make these htmlContent stubs a little more realistic i'd recommend a doctype and some content after the meta... like
<!doctype html><meta charset="utf-8" /><h1>hello
const htmlContent = '<meta charset="utf-8" />'; | ||
const artifacts = generateArtifacts(htmlContent); | ||
const context = {computedCache: new Map()}; | ||
return CharsetDefinedAudit.audit(artifacts, context).then(auditResult => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah we can also go async/await here in the tests, too..
async () => {
at the top and then down here something like
const auditResult = await CharsetDefinedAudit.audit(artifacts, context);
assert.equal(auditResult.score, 1);
static async audit(artifacts, context) { | ||
const devtoolsLog = artifacts.devtoolsLogs[Audit.DEFAULT_PASS]; | ||
const mainResource = await MainResource.request({devtoolsLog, URL: artifacts.URL}, context); | ||
let charsetIsSet = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. typical convention is to start boolean vars with is
/ has
/ etc. so i'd just rename to isCharsetSet
or isCharsetDefined
it('succeeds when the page contains the charset meta tag', () => { | ||
const htmlContent = '<meta charset="utf-8" />'; | ||
const artifacts = generateArtifacts(htmlContent); | ||
const context = {computedCache: new Map()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a huge deal but you could also move this computedCache thing into generateArtifacts and then you'd have something like...
const {artifacts, context} = generateArtifacts(htmlContent);
.find(header => header.name.toLowerCase() === CONTENT_TYPE_HEADER); | ||
|
||
if (contentTypeHeader) { | ||
isCharsetSet = contentTypeHeader.value.match(CHARSET_HTTP_REGEX) !== null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super js nit but i prefer regex.test(str)
vs str.match(regex)
when you just need the boolean result
assert.equal(HTTP_REGEX.test('text/html; charset= '), false); | ||
}); | ||
|
||
it('handles charset name validation correctly', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supa hot 🔥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last remaining nits. lgtm!
* @fileoverview Audits a page to ensure charset it configured properly. | ||
* It must be defined within the first 1024 bytes of the HTML document, defined in the HTTP header, or the document source starts with a BOM. | ||
* | ||
* TODO: It doesn't yet validate the encoding is a valid IANA charset name. https://www.iana.org/assignments/character-sets/character-sets.xhtml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does now. at least for the html5 meta case. :)
you can drop this line but move the link down to L35
isCharsetSet = isCharsetSet || artifacts.MainDocumentContent.charCodeAt(0) === BOM_FIRSTCHAR; | ||
|
||
// Check if charset-ish meta tag is defined within the first 1024 characters(~1024 bytes) of the HTML document | ||
if (artifacts.MainDocumentContent.slice(0, 1024).match(CHARSET_HTML_REGEX) !== null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return (meta.charset && meta.charset.match(IANA_REGEX)) || | ||
(meta.httpEquiv === 'content-type' && | ||
meta.content && | ||
meta.content.match(CHARSET_HTTP_REGEX)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Googler has manually verified that the CLAs look good. (Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.) ℹ️ Googlers: Go here for more info. |
We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google. ℹ️ Googlers: Go here for more info. |
b72030a
to
925c202
Compare
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
🎉 🎉 🎉 |
Thank you! |
While it would be overkill to implement full-blown HTML/HTTP parsers, by making the regular expressions case-insensitive can can reduce the amount of false negatives for the charset audit. This patch also applies some drive-by nits/simplifications. Ref. GoogleChrome#10023, GoogleChrome#10284.
While it would be overkill to implement full-blown HTML/HTTP parsers, by simply making the regular expressions case-insensitive we can reduce the amount of false negatives for the charset audit. This patch also applies some drive-by nits/simplifications. Ref. GoogleChrome#10023, GoogleChrome#10284.
We want to add an audit that checks whether or not the character encoding for a page is properly declared. The audit will pass if any of the following are true:
Addresses #10023