-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new_audit(anchor-text-audit): descriptive anchor text audit #3490
Conversation
url.origin == pageHost.origin && | ||
url.pathname == pageHost.pathname && | ||
url.search == pageHost.search | ||
) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we decided to skip all links that do not link to another page (e.g. '#')
url.protocol.toLowerCase() === 'javascript:' || | ||
(url.origin == pageHost.origin && | ||
url.pathname == pageHost.pathname && | ||
url.search == pageHost.search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are skipping links that point to the same website (e.g. '#' or 'javascript:void(0)')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should have a helper method for the non-javascript case
lighthouse/lighthouse-core/lib/url-shim.js
Lines 116 to 129 in 05030ad
URL.equalWithExcludedFragments = function(url1, url2) { | |
[url1, url2] = [url1, url2].map(rewriteChromeInternalUrl); | |
try { | |
url1 = new URL(url1); | |
url1.hash = ''; | |
url2 = new URL(url2); | |
url2.hash = ''; | |
return url1.href === url2.href; | |
} catch (e) { | |
return false; | |
} | |
}; |
does that work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include this as an inline comment in the code if it's purpose might be unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patrickhulce that works perfectly!
@rviscomi with equalWithExcludedFragments
this will probably won't need more explanation. I'll skip the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! easy win here 🎉 🔍 💣
'start', | ||
'right here', | ||
'more', | ||
'learn more', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱 our report would fail like crazy!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The help text for this audit has a "learn more" link 😆
(as bad of an example as it sets, in our defense this tool will not be crawled by search bots)
return false; | ||
} | ||
|
||
return BLACKLIST.includes(anchor.text.trim()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to toLowerCase
this too? maybe add a test case to catch it too :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LEARN MORE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Good point, fixed.
|
||
return BLACKLIST.includes(anchor.text.trim()); | ||
}) | ||
.map(anchor => ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the anchor have extra data you're trying to trim out here? if not, let's just drop since the prop names are the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on the return value of the gather, looks like they only have these two props... so agree we can drop the .map().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, these are the only two props. Good catch!
value: failingAnchors, | ||
}, | ||
details, | ||
displayValue: failingAnchors.length ? `${failingAnchors.length} anchors found` : undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually add the qualifier here for display value, how about 2 non-descriptive anchors found
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, that's fine then 👍 ignore me :)
|
||
const Audit = require('../audit'); | ||
const URL = require('../../lib/url-shim'); | ||
const BLACKLIST = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it makes a big difference with such a small list but how about a Set
since we just basically want .has
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! We will be checking 100s of links and these will be mostly misses against the blocklist, so it turns out it can actually make a difference: https://github.com/GoogleChrome/lighthouse/pull/3490/files#r143292224
url.protocol.toLowerCase() === 'javascript:' || | ||
(url.origin == pageHost.origin && | ||
url.pathname == pageHost.pathname && | ||
url.search == pageHost.search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include this as an inline comment in the code if it's purpose might be unclear.
url.protocol.toLowerCase() === 'javascript:' || | ||
(url.origin == pageHost.origin && | ||
url.pathname == pageHost.pathname && | ||
url.search == pageHost.search) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to compare the search properties? #foo
and #bar
are the same web page all else being equal. Unless the page is using ajaxy #!
URLs :-\
return false; | ||
} | ||
|
||
return BLACKLIST.includes(anchor.text.trim()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance nit: maybe make BLACKLIST a Set and use Set.has()
? The blacklist may grow, and this is inside a loop with potentially many iterations, so it could be significant. Ok to leave it as is if you don't feel it'll be significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd leave it alone. we have bigger fish to fry. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it's probably nothing but then I checked… and damn, Array.includes
is slow:
This is for 100 elements in array/set and 1000 checks. Bottom two results are probably most important in our case.
https://jsperf.com/set-array-access
value: failingAnchors, | ||
}, | ||
details, | ||
displayValue: failingAnchors.length ? `${failingAnchors.length} anchors found` : undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the length is 1, we should say "1 anchor found".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
@@ -0,0 +1,82 @@ | |||
/** | |||
* @license Copyright 2016 Google Inc. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2017 ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔎 good catch!
|
||
const Audit = require('../audit'); | ||
const URL = require('../../lib/url-shim'); | ||
const BLACKLIST = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll be slowly doing a s/Whitelist/Safelist/g
and s/Blacklist/Blocklist/g
, but i'd love to get a headstart here. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CS lingo is the worst. Thanks for pointing that out.
return false; | ||
} | ||
|
||
return BLACKLIST.includes(anchor.text.trim()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LEARN MORE
return false; | ||
} | ||
|
||
return BLACKLIST.includes(anchor.text.trim()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd leave it alone. we have bigger fish to fry. :)
|
||
return BLACKLIST.includes(anchor.text.trim()); | ||
}) | ||
.map(anchor => ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on the return value of the gather, looks like they only have these two props... so agree we can drop the .map().
afterPass(options) { | ||
const expression = `(function() { | ||
${DOMHelpers.getElementsInDocumentFnString}; // define function on page | ||
const selector = 'a[href]:not([rel~="nofollow"])'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better to combine this with anchors-with-no-rel-noopener
and sort client-side for nofollow vs noopener into a single anchor gatherer? Would need to collect rel
so that audits can filter on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be all-anchors
or something (we could do it in a followup PR if you want to keep this one focused)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all-anchors
SGTM, but since this will require changing anchors-with-no-rel-noopener
I think we should do that in a followup PR as you suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg. filed as #3581
668f28c
to
b16c181
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great. A few wording changes..
}); | ||
|
||
const headings = [ | ||
{key: 'href', itemType: 'url', text: 'URL'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
URL => Link destination
|
||
const headings = [ | ||
{key: 'href', itemType: 'url', text: 'URL'}, | ||
{key: 'text', itemType: 'text', text: 'Text'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Text => Link Text
return { | ||
category: 'Content Best Practices', | ||
name: 'anchor-text', | ||
description: 'Anchors have descriptive text.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Links have descriptive text
cc @rviscomi you okay with calling these "links" instead of 'anchors' ? IMO its more natural and i dont think the technical correctness is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"links" or "<a>
elements" SGTM
@kdzwinel please also update the name
and failureDescription
fields for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing name
kinda forced me to get rid of 'anchor' completely: b710137 . @paulirish PTAL
category: 'Content Best Practices', | ||
name: 'anchor-text', | ||
description: 'Anchors have descriptive text.', | ||
failureDescription: 'Anchors do not have descriptive text', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anchors => Links
return elements | ||
.map(node => ({ | ||
href: node.href, | ||
text: node.innerText |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/innerText
/textContent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disagree. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jk I'm wrong :)
return { | ||
rawValue: failingAnchors.length === 0, | ||
extendedInfo: { | ||
value: failingAnchors, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in theory this could be a fairly large set of links. Is there a benefit to having a copy in extendedInfo
as well as in details
? If not we should probably just keep details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extendedInfo
is a bit easier to work with in unit tests:
assert.equal(auditResult.extendedInfo.value.length, 1);
assert.equal(auditResult.extendedInfo.value.includes(invalidAnchor), true);
will have to became:
assert.equal(auditResult.details.items.length, 1);
assert.equal(auditResult.details.items[0][0].text, invalidAnchor.href);
assert.equal(auditResult.details.items[0][1].text, invalidAnchor.text);
But yeah, it doesn't justify keeping it around.
@@ -49,6 +52,10 @@ module.exports = [ | |||
score: false, | |||
displayValue: '403', | |||
}, | |||
'anchor-text': { | |||
score: false, | |||
displayValue: '3 anchors found', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe details: { items: {length: 3}}
as well to make sure they're in there? (you could also assert the actual items
contents if you thought that was worthwhile...the only cost is more verbose/harder to scan test expectations)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking length SGTM, but checking all contents gets very verbose (showing just one of three items):
details: {
items: {
length: 3,
0: {
0: {
text: 'https://example.com/',
},
1: {
text: 'click this',
}
},
},
},
I'd leave that to unit tests (in fact, since we are getting rid of extendedInfo
, I'll have to use details
in unit tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave that to unit tests
SGTM
.filter(anchor => { | ||
if ( | ||
anchor.href.toLowerCase().startsWith('javascript:') || | ||
URL.equalWithExcludedFragments(anchor.href, artifacts.URL.finalUrl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are links within the page not counted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand, these links have no SEO value and often are handled by JS anyway (e.g. <a href='#'>show more</a>
to expand some text inline).
const auditResult = AnchorTextAudit.audit(artifacts); | ||
assert.equal(auditResult.rawValue, true); | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a javascript:
unit test?
…t for javascript: links.
b16c181
to
db23054
Compare
@brendankenny all done, thank you! @rviscomi I'll need you to sign off on these description changes that Paul suggested. |
@rviscomi is calling this "Link text" rather than "Anchor text" ok with you? (we've already made the change) |
👍 Anchor -> Link SGTM, reviewed earlier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ⒸⓁⒾⒸⓀ ⒽⒺⓇⒺ💥 |
Closes #3179
Failing audit:
Successful audit: