Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new_audit(hreflang): document has a valid hreflang code #3815

Merged
merged 7 commits into from
Nov 29, 2017
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions lighthouse-cli/test/fixtures/seo/seo-failure-cases.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
<meta name="viewport" content="invalid-content=should_have_looked_it_up">
<!-- no <meta name="description" content=""> -->
<meta name="robots" content="nofollow, NOINDEX, all">
<link rel="alternate" hreflang="xx" href="https://xx.example.com" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind starting to mark these with PASS(audit-name): details if possible and succinct (or FAIL) like e.g. in

<!-- FAIL(optimized): image is not JPEG optimized -->
<!-- FAIL(webp): image is not WebP optimized -->
<!-- PASS(responsive): image is used at full size -->
<!-- FAIL(offscreen): image is offscreen -->
<img style="position: absolute; top: -10000px;" src="lighthouse-unoptimized.jpg">
when possible?

helpful for going back later and knowing what touches what test/audit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Added two comments here and one in the seo-tester.html

<link rel="alternate" href="http://example.com/" hreflang=" x-default" />
</head>
<body>
<h1>SEO</h1>
Expand Down
5 changes: 5 additions & 0 deletions lighthouse-cli/test/fixtures/seo/seo-tester.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,11 @@
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0">
<meta name="Description" content="The premiere destination for testing your SEO audit gathering">
<link rel="alternate" hreflang="es" href="https://lat.example.com" />
<link rel="alternate" Hreflang="en-PH" href="https://ph.example.com" />
<LINK REL="ALTERNATE" HREFLANG="ru-RU" HREF="https://ru.example.com" />
<LINK REL="alternate" HREFLANG="zh-Hans-TW" HREF="https://zh.example.com" />
<link rel="alternate" href="http://example.com/" hreflang="x-default" />
</head>
<body>
<h1>SEO</h1>
Expand Down
2 changes: 1 addition & 1 deletion lighthouse-cli/test/fixtures/static-server.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ const path = require('path');
const fs = require('fs');
const parseQueryString = require('querystring').parse;
const parseURL = require('url').parse;
const HEADER_SAFELIST = new Set(['x-robots-tag']);
const HEADER_SAFELIST = new Set(['x-robots-tag', 'link']);

const lhRootDirPath = path.join(__dirname, '../../../');

Expand Down
15 changes: 13 additions & 2 deletions lighthouse-cli/test/smokehouse/seo/expectations.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,14 @@ module.exports = [
'is-crawlable': {
score: true,
},
'hreflang': {
score: true,
},
},
},
{
initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none',
url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none',
initialUrl: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'),
url: 'http://localhost:10200/seo/seo-failure-cases.html?status_code=403&extra_header=x-robots-tag:none&extra_header=link:' + encodeURI('<http://example.com>;rel="alternate";hreflang="xx"'),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is getting quite nasty. Maybe we should move headers to an external file (e.g. seo-failure-cases.html?headers=seo-failure-headers.json) or keep them here, but in a separate field (headers: ['Link: bla', 'x-robots-tag:none'])? In the second case we could pass them to the static-server e.g. in a header (which would be a very meta thing to do). @brendankenny WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

url is compared to the LHR's url property (to make sure redirects or whatever were captured correctly), so if the headers will be passed in as a query param to static server they'll have to be in the url expectations string too.

@paulirish just brought up that sending the headers as request headers won't really work since smokehouse invokes lighthouse with just a URL...there's no way to add request headers (until/if #2746 :)

This is a js file, so there's a few ways we could make this better:

  • save parts of the URL to variables at the top of the file so they are just concatted here
  • create some kind of headersParam([]) function so that this could be url: 'http://localhost:10200/seo/seo-failure-cases.html?' + headersParam(['Link: bla', 'x-robots-tag:none']) or whatever
  • use the URL constructor to create the string and rely on implicit toString or save href to a variable up top and invoke down here
  • some combination of the above

audits: {
'viewport': {
score: false,
Expand Down Expand Up @@ -72,6 +75,14 @@ module.exports = [
},
},
},
'hreflang': {
score: false,
details: {
items: {
length: 3,
},
},
},
},
},
];
97 changes: 97 additions & 0 deletions lighthouse-core/audits/seo/hreflang.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
/**
* @license Copyright 2017 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const Audit = require('../audit');
const LinkHeader = require('http-link-header');
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new dependency - https://github.com/jhermsmeier/node-http-link-header. It's a simple (~300LOC), well tested, link header value parser with no dependencies.

const axeCore = require('axe-core');
Copy link
Collaborator Author

@kdzwinel kdzwinel Nov 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out I can't import 'axe-core' as it wasn't imported anywhere else before and since it's huge, it blows our bundle budget 🤔

Since I'm using only a tiny part of 'axe-core' it seems like tree shaking should help here, but unfortunately we are not using import/export syntax ATM(which is required for tree shaking to work).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance we can just import the languages file?

const validLangs = require('axe-core/lib/commons/utils/valid-langs.js')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta observation: there's only ~17k possible 3-character codes and ~8k of them are valid

how much value is the list really providing over a /^[a-z]{2,3}$/ regex? are mistakes likely to result in a different invalid 3-character code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't realize require can import single files that are not in the root folder of the package, thanks! I had to use a slight hack (global variable) to get things working though.

45% of 3-character codes and 25% of 2-character codes are valid - I agree that it's a lot, but IMO it still leaves a huge margin for people to make "catchable" mistakes. We may also extend this audit with region and script codes validation in the future. @rviscomi WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of regex pattern matching as a substitute for whitelist checking. Too many false positives.

Copy link
Member

@brendankenny brendankenny Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option is putting it in a different data structure when browserifying. A quick test of a pretty naive trie got it down to about 1/2 the size raw, 1/5 the size after gzip (19915 -> 3751 bytes). Being more clever (char codes, exclusions vs inclusions) can get the trie down to about 1/4 the size of the current file, but the gzip size doesn't really budge after that first drop.

Probably not worth pursuing at this point, but maybe worth leaving a note for when we go back looking for the low hanging fruit of shrinking bundle size.

const LINK_HEADER = 'link';
const NO_LANGUAGE = 'x-default';

/**
* @param {String} hreflang
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lowercase string/boolean for all of these (this will soon actually be checked! :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated! Can't wait for your PR to land 👌

* @returns {Boolean}
*/
function isValidHreflang(hreflang) {
if (hreflang.toLowerCase() === NO_LANGUAGE) {
return true;
}

// hreflang can consist of language-script-region, we are validating only language
const [lang] = hreflang.split('-');
return axeCore.utils.validLangs().includes(lang.toLowerCase());
}

/**
* @param {String} headerValue
* @returns {Boolean}
*/
function headerHasValidHreflangs(headerValue) {
const linkHeader = LinkHeader.parse(headerValue);

return linkHeader.get('rel', 'alternate')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always returns an array? strange api 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single Link header can have multiple rel=alternate links (e.g. <http://es.example.com/>; rel="alternate"; hreflang="es",<http://fr.example.com/>; rel="alternate"; Hreflang="fr-be"), so IMO it makes sense to always return an array.

.every(link => link.hreflang && isValidHreflang(link.hreflang));
}

class Hreflang extends Audit {
/**
* @return {!AuditMeta}
*/
static get meta() {
return {
name: 'hreflang',
description: 'Document has a valid `hreflang`',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to be a property of an alternate link rather than the entire document, or am I misunderstanding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! @rviscomi WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I wrote the text, but I think I was trying to strike a balance between the hreflang being used as both an HTTP header value and HTML attribute value. Other audits also use "document" as the owner of the thing being audited, for example:

  • Document has a <meta name="viewport"> tag with width or initial-scale.
  • Document uses legible font sizes.
  • Document has a <title> element.
  • Document has a meta description.
  • Document has a valid rel=canonical.
  • Document avoids plugins.

So I'm ok with the text as it is currently written but open to suggestions.

failureDescription: 'Document doesn\'t have a valid `hreflang`',
helpText: 'hreflang allows crawlers to discover alternate translations of the ' +
'page content. [Learn more]' +
'(https://support.google.com/webmasters/answer/189077).',
requiredArtifacts: ['Hreflang'],
};
}

/**
* @param {!Artifacts} artifacts
* @return {!AuditResult}
*/
static audit(artifacts) {
const devtoolsLogs = artifacts.devtoolsLogs[Audit.DEFAULT_PASS];

return artifacts.requestMainResource(devtoolsLogs)
.then(mainResource => {
/** @type {Array<{source: String|Object}>} */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array<{source: {type: string, snippet: string}}>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, although it's actually Array<{source: string|{type: string, snippet: string}}> because array item is an object for failing nodes, and a string for failing headers.

const invalidHreflangs = [];

if (artifacts.Hreflang) {
artifacts.Hreflang.forEach(({href, hreflang}) => {
if (!isValidHreflang(hreflang)) {
invalidHreflangs.push({
source: {
type: 'node',
snippet: `<link name="alternate" hreflang="${hreflang}" href="${href}" />`,
},
});
}
});
}

mainResource.responseHeaders
.filter(h => h.name.toLowerCase() === LINK_HEADER && !headerHasValidHreflangs(h.value))
.forEach(h => invalidHreflangs.push({source: `${h.name}: ${h.value}`}));

const headings = [
{key: 'source', itemType: 'code', text: 'Source'},
];
const details = Audit.makeTableDetails(headings, invalidHreflangs);

return {
rawValue: invalidHreflangs.length === 0,
details,
};
});
}
}

module.exports = Hreflang;
2 changes: 2 additions & 0 deletions lighthouse-core/config/default.js
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ module.exports = {
'seo/meta-description',
'seo/crawlable-links',
'seo/meta-robots',
'seo/hreflang',
],
},
{
Expand Down Expand Up @@ -152,6 +153,7 @@ module.exports = {
'seo/http-status-code',
'seo/link-text',
'seo/is-crawlable',
'seo/hreflang',
],

groups: {
Expand Down
3 changes: 3 additions & 0 deletions lighthouse-core/config/seo.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,15 @@ module.exports = {
'seo/meta-description',
'seo/crawlable-links',
'seo/meta-robots',
'seo/hreflang',
],
}],
audits: [
'seo/meta-description',
'seo/http-status-code',
'seo/link-text',
'seo/is-crawlable',
'seo/hreflang',
],
groups: {
'seo-mobile': {
Expand Down Expand Up @@ -47,6 +49,7 @@ module.exports = {
{id: 'http-status-code', weight: 1, group: 'seo-crawl'},
{id: 'link-text', weight: 1, group: 'seo-content'},
{id: 'is-crawlable', weight: 1, group: 'seo-crawl'},
{id: 'hreflang', weight: 1, group: 'seo-content'},
],
},
},
Expand Down
33 changes: 33 additions & 0 deletions lighthouse-core/gather/gatherers/seo/hreflang.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/**
* @license Copyright 2017 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const Gatherer = require('../gatherer');

class Hreflang extends Gatherer {
/**
* @param {{driver: !Object}} options Run options
* @return {!Promise<?Array<string>>} Array with hreflang and href values of all link[rel=alternate] nodes found in HEAD, or null
*/
afterPass(options) {
const driver = options.driver;

return driver.querySelectorAll('head link[rel="alternate" i][hreflang]')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

querySelectorAll returns null instead of []!? we should fix that 😆

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, it always returns an array. I've updated the code (and jsdoc) accordingly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok, no worries!

.then(nodes => nodes &&
Promise.all(
nodes.map(node => Promise.all([node.getAttribute('href'), node.getAttribute('hreflang')]))
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

driver.querySelector solution works well for simple getters but with driver.querySelectorAll and multiple getAttribute calls it's getting a bit less readable IMO. Maybe a simple injected script (driver.evaluateAsync) would be a better fit here? Something like:

const links = document.querySelectorAll('head link[rel="alternate" i][hreflang]');
return Array.from(links).map(({href, hreflang}) => ({href, hreflang}));

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinions here, so I'll defer to others

evaluate async seems fine to me, but this also doesn't really seem that bad either

).then(attributeValues => attributeValues &&
attributeValues.map(values => {
const [href, hreflang] = values;
return {href, hreflang};
})
);
}
}

module.exports = Hreflang;

164 changes: 164 additions & 0 deletions lighthouse-core/test/audits/seo/hreflang.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
/**
* @license Copyright 2017 Google Inc. All Rights Reserved.
* Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
*/
'use strict';

const HreflangAudit = require('../../../audits/seo/hreflang.js');
const assert = require('assert');

/* eslint-env mocha */

describe('SEO: Document has valid hreflang code', () => {
it('fails when language code provided in hreflang via link element is invalid', () => {
const hreflangValues = [
'xx',
'XX-be',
'XX-be-Hans',
'',
' es',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this invalid because of the whitespace?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, AFAIK it should be invalid in this case

];

const allRuns = hreflangValues.map(hreflangValue => {
const mainResource = {
responseHeaders: [],
};
const artifacts = {
devtoolsLogs: {[HreflangAudit.DEFAULT_PASS]: []},
requestMainResource: () => Promise.resolve(mainResource),
Hreflang: [{
hreflang: hreflangValue,
href: 'https://example.com',
}],
};

return HreflangAudit.audit(artifacts).then(auditResult => {
assert.equal(auditResult.rawValue, false);
assert.equal(auditResult.details.items.length, 1);
});
});

return Promise.all(allRuns);
});

it('succeeds when language code provided via link element is valid', () => {
const mainResource = {
responseHeaders: [],
};
const artifacts = {
devtoolsLogs: {[HreflangAudit.DEFAULT_PASS]: []},
requestMainResource: () => Promise.resolve(mainResource),
Hreflang: [
{hreflang: 'pl'},
{hreflang: 'nl-be'},
{hreflang: 'zh-Hans'},
{hreflang: 'x-default'},
{hreflang: 'FR-BE'},
],
};

return HreflangAudit.audit(artifacts).then(auditResult => {
assert.equal(auditResult.rawValue, true);
});
});

it('succeeds when there are no rel=alternate link elements nor headers', () => {
const mainResource = {
responseHeaders: [],
};
const artifacts = {
devtoolsLogs: {[HreflangAudit.DEFAULT_PASS]: []},
requestMainResource: () => Promise.resolve(mainResource),
Hreflang: null,
};

return HreflangAudit.audit(artifacts).then(auditResult => {
assert.equal(auditResult.rawValue, true);
});
});

it('fails when language code provided in hreflang via header is invalid', () => {
const linkHeaders = [
[
{name: 'Link', value: '<http://es.example.com/>; rel="alternate"; hreflang="xx"'},
],
[
{name: 'link', value: '<http://es.example.com/>; rel="alternate"; hreflang=""'},
],
[
{name: 'LINK', value: '<http://es.example.com/>; rel="alternate"'},
],
[
{name: 'Link', value: '<http://es.example.com/>; rel="alternate"; hreflang="es",<http://xx.example.com/>; rel="alternate"; Hreflang="xx"'},
],
[
{name: 'link', value: '<http://es.example.com/>; rel="alternate"; hreflang="es"'},
{name: 'Link', value: '<http://xx.example.com/>; rel="alternate"; hreflang="x"'},
],
];

const allRuns = linkHeaders.map(headers => {
const mainResource = {
responseHeaders: headers,
};
const artifacts = {
devtoolsLogs: {[HreflangAudit.DEFAULT_PASS]: []},
requestMainResource: () => Promise.resolve(mainResource),
Hreflang: null,
};

return HreflangAudit.audit(artifacts).then(auditResult => {
assert.equal(auditResult.rawValue, false);
assert.equal(auditResult.details.items.length, 1);
});
});

return Promise.all(allRuns);
});

it('succeeds when language codes provided via Link header are valid', () => {
const mainResource = {
responseHeaders: [
{name: 'link', value: ''},
{name: 'link', value: 'garbage'},
{name: 'link', value: '<http://es.example.com/>; rel="example"; hreflang="xx"'},
{name: 'link', value: '<http://es.example.com/>; rel="alternate"; hreflang="es"'},
{name: 'Link', value: '<http://fr.example.com/>; rel="alternate"; hreflang="fr-be"'},
{name: 'LINK', value: '<http://es.example.com/>; rel="alternate"; hreflang="es",<http://fr.example.com/>; rel="alternate"; Hreflang="fr-be"'},
],
};
const artifacts = {
devtoolsLogs: {[HreflangAudit.DEFAULT_PASS]: []},
requestMainResource: () => Promise.resolve(mainResource),
Hreflang: null,
};

return HreflangAudit.audit(artifacts).then(auditResult => {
assert.equal(auditResult.rawValue, true);
});
});

it('returns all failing items', () => {
const mainResource = {
responseHeaders: [
{name: 'link', value: '<http://xx1.example.com/>; rel="alternate"; hreflang="xx1"'},
{name: 'Link', value: '<http://xx2.example.com/>; rel="alternate"; hreflang="xx2"'},
],
};
const artifacts = {
devtoolsLogs: {[HreflangAudit.DEFAULT_PASS]: []},
requestMainResource: () => Promise.resolve(mainResource),
Hreflang: [{
hreflang: 'xx3',
}, {
hreflang: 'xx4',
}],
};

return HreflangAudit.audit(artifacts).then(auditResult => {
assert.equal(auditResult.rawValue, false);
assert.equal(auditResult.details.items.length, 4);
});
});
});
Loading