Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new_audit(anchor-text-audit): descriptive anchor text audit #3490

Merged
merged 7 commits into from
Oct 19, 2017

Conversation

kdzwinel
Copy link
Collaborator

@kdzwinel kdzwinel commented Oct 5, 2017

Closes #3179

Failing audit:
screen shot 2017-10-05 at 23 40 42

Successful audit:
screen shot 2017-10-05 at 23 50 39

@kdzwinel kdzwinel changed the title Seo anchor text new SEO audit: descriptive status text Oct 5, 2017
@kdzwinel kdzwinel changed the title new SEO audit: descriptive status text new SEO audit: descriptive anchor text Oct 5, 2017
url.origin == pageHost.origin &&
url.pathname == pageHost.pathname &&
url.search == pageHost.search
) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we decided to skip all links that do not link to another page (e.g. '#')

url.protocol.toLowerCase() === 'javascript:' ||
(url.origin == pageHost.origin &&
url.pathname == pageHost.pathname &&
url.search == pageHost.search)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are skipping links that point to the same website (e.g. '#' or 'javascript:void(0)')

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have a helper method for the non-javascript case

URL.equalWithExcludedFragments = function(url1, url2) {
[url1, url2] = [url1, url2].map(rewriteChromeInternalUrl);
try {
url1 = new URL(url1);
url1.hash = '';
url2 = new URL(url2);
url2.hash = '';
return url1.href === url2.href;
} catch (e) {
return false;
}
};

does that work for you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include this as an inline comment in the code if it's purpose might be unclear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patrickhulce that works perfectly!
@rviscomi with equalWithExcludedFragments this will probably won't need more explanation. I'll skip the comment.

Copy link
Collaborator

@patrickhulce patrickhulce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! easy win here 🎉 🔍 💣

'start',
'right here',
'more',
'learn more',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😱 our report would fail like crazy!

Copy link
Member

@rviscomi rviscomi Oct 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The help text for this audit has a "learn more" link 😆

(as bad of an example as it sets, in our defense this tool will not be crawled by search bots)

return false;
}

return BLACKLIST.includes(anchor.text.trim());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to toLowerCase this too? maybe add a test case to catch it too :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LEARN MORE

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good point, fixed.


return BLACKLIST.includes(anchor.text.trim());
})
.map(anchor => ({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the anchor have extra data you're trying to trim out here? if not, let's just drop since the prop names are the same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the return value of the gather, looks like they only have these two props... so agree we can drop the .map().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, these are the only two props. Good catch!

value: failingAnchors,
},
details,
displayValue: failingAnchors.length ? `${failingAnchors.length} anchors found` : undefined,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually add the qualifier here for display value, how about 2 non-descriptive anchors found?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the copy that I got from here: Anchors do not have descriptive text: {#} anchors found. Not sure if Anchors do not have descriptive text: {#} non-descriptive anchors found sounds equally good. @rviscomi WDYT?

Copy link
Collaborator

@patrickhulce patrickhulce Oct 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's fine then 👍 ignore me :)


const Audit = require('../audit');
const URL = require('../../lib/url-shim');
const BLACKLIST = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it makes a big difference with such a small list but how about a Set since we just basically want .has

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! We will be checking 100s of links and these will be mostly misses against the blocklist, so it turns out it can actually make a difference: https://github.com/GoogleChrome/lighthouse/pull/3490/files#r143292224

url.protocol.toLowerCase() === 'javascript:' ||
(url.origin == pageHost.origin &&
url.pathname == pageHost.pathname &&
url.search == pageHost.search)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include this as an inline comment in the code if it's purpose might be unclear.

url.protocol.toLowerCase() === 'javascript:' ||
(url.origin == pageHost.origin &&
url.pathname == pageHost.pathname &&
url.search == pageHost.search)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to compare the search properties? #foo and #bar are the same web page all else being equal. Unless the page is using ajaxy #! URLs :-\

return false;
}

return BLACKLIST.includes(anchor.text.trim());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance nit: maybe make BLACKLIST a Set and use Set.has()? The blacklist may grow, and this is inside a loop with potentially many iterations, so it could be significant. Ok to leave it as is if you don't feel it'll be significant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd leave it alone. we have bigger fish to fry. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's probably nothing but then I checked… and damn, Array.includes is slow:
screen shot 2017-10-06 at 22 52 18

This is for 100 elements in array/set and 1000 checks. Bottom two results are probably most important in our case.
https://jsperf.com/set-array-access

value: failingAnchors,
},
details,
displayValue: failingAnchors.length ? `${failingAnchors.length} anchors found` : undefined,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the length is 1, we should say "1 anchor found".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@@ -0,0 +1,82 @@
/**
* @license Copyright 2016 Google Inc. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2017 ;)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔎 good catch!


const Audit = require('../audit');
const URL = require('../../lib/url-shim');
const BLACKLIST = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll be slowly doing a s/Whitelist/Safelist/g and s/Blacklist/Blocklist/g, but i'd love to get a headstart here. :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CS lingo is the worst. Thanks for pointing that out.

return false;
}

return BLACKLIST.includes(anchor.text.trim());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LEARN MORE

return false;
}

return BLACKLIST.includes(anchor.text.trim());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd leave it alone. we have bigger fish to fry. :)


return BLACKLIST.includes(anchor.text.trim());
})
.map(anchor => ({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the return value of the gather, looks like they only have these two props... so agree we can drop the .map().

afterPass(options) {
const expression = `(function() {
${DOMHelpers.getElementsInDocumentFnString}; // define function on page
const selector = 'a[href]:not([rel~="nofollow"])';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to combine this with anchors-with-no-rel-noopener and sort client-side for nofollow vs noopener into a single anchor gatherer? Would need to collect rel so that audits can filter on it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be all-anchors or something (we could do it in a followup PR if you want to keep this one focused)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all-anchors SGTM, but since this will require changing anchors-with-no-rel-noopener I think we should do that in a followup PR as you suggested

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg. filed as #3581

@paulirish paulirish changed the title new SEO audit: descriptive anchor text new SEO audit: descriptive anchor text. Oct 11, 2017
@kdzwinel kdzwinel changed the title new SEO audit: descriptive anchor text. feature(anchor-text-audit): Adding descriptive anchor text audit Oct 11, 2017
@kdzwinel kdzwinel changed the title feature(anchor-text-audit): Adding descriptive anchor text audit Adding(anchor-text-audit): descriptive anchor text audit Oct 11, 2017
@kdzwinel kdzwinel changed the title Adding(anchor-text-audit): descriptive anchor text audit new-audit(anchor-text-audit): descriptive anchor text audit Oct 11, 2017
@paulirish paulirish changed the title new-audit(anchor-text-audit): descriptive anchor text audit new_audit(anchor-text-audit): descriptive anchor text audit Oct 11, 2017
Copy link
Member

@paulirish paulirish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great. A few wording changes..

});

const headings = [
{key: 'href', itemType: 'url', text: 'URL'},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

URL => Link destination


const headings = [
{key: 'href', itemType: 'url', text: 'URL'},
{key: 'text', itemType: 'text', text: 'Text'},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Text => Link Text

return {
category: 'Content Best Practices',
name: 'anchor-text',
description: 'Anchors have descriptive text.',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Links have descriptive text

cc @rviscomi you okay with calling these "links" instead of 'anchors' ? IMO its more natural and i dont think the technical correctness is worth it.

Copy link
Member

@rviscomi rviscomi Oct 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"links" or "<a> elements" SGTM

@kdzwinel please also update the name and failureDescription fields for consistency

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing name kinda forced me to get rid of 'anchor' completely: b710137 . @paulirish PTAL

category: 'Content Best Practices',
name: 'anchor-text',
description: 'Anchors have descriptive text.',
failureDescription: 'Anchors do not have descriptive text',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anchors => Links

return elements
.map(node => ({
href: node.href,
text: node.innerText
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/innerText/textContent

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disagree. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jk I'm wrong :)

return {
rawValue: failingAnchors.length === 0,
extendedInfo: {
value: failingAnchors,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in theory this could be a fairly large set of links. Is there a benefit to having a copy in extendedInfo as well as in details? If not we should probably just keep details

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extendedInfo is a bit easier to work with in unit tests:

    assert.equal(auditResult.extendedInfo.value.length, 1);
    assert.equal(auditResult.extendedInfo.value.includes(invalidAnchor), true);

will have to became:

    assert.equal(auditResult.details.items.length, 1);
    assert.equal(auditResult.details.items[0][0].text, invalidAnchor.href);
    assert.equal(auditResult.details.items[0][1].text, invalidAnchor.text);

But yeah, it doesn't justify keeping it around.

@@ -49,6 +52,10 @@ module.exports = [
score: false,
displayValue: '403',
},
'anchor-text': {
score: false,
displayValue: '3 anchors found',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe details: { items: {length: 3}} as well to make sure they're in there? (you could also assert the actual items contents if you thought that was worthwhile...the only cost is more verbose/harder to scan test expectations)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking length SGTM, but checking all contents gets very verbose (showing just one of three items):

        details: {
          items: {
            length: 3,
            0: {
              0: {
                text: 'https://example.com/',
              },
              1: {
                text: 'click this',
              }
            },
          },
        },

I'd leave that to unit tests (in fact, since we are getting rid of extendedInfo, I'll have to use details in unit tests).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave that to unit tests

SGTM

.filter(anchor => {
if (
anchor.href.toLowerCase().startsWith('javascript:') ||
URL.equalWithExcludedFragments(anchor.href, artifacts.URL.finalUrl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are links within the page not counted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, these links have no SEO value and often are handled by JS anyway (e.g. <a href='#'>show more</a> to expand some text inline).

const auditResult = AnchorTextAudit.audit(artifacts);
assert.equal(auditResult.rawValue, true);
});
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a javascript: unit test?

@kdzwinel
Copy link
Collaborator Author

@brendankenny all done, thank you!

@rviscomi I'll need you to sign off on these description changes that Paul suggested.

@paulirish
Copy link
Member

@rviscomi is calling this "Link text" rather than "Anchor text" ok with you? (we've already made the change)

@rviscomi
Copy link
Member

👍 Anchor -> Link SGTM, reviewed earlier

Copy link
Member

@brendankenny brendankenny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@paulirish paulirish merged commit f911e84 into GoogleChrome:master Oct 19, 2017
@paulirish
Copy link
Member

ⒸⓁⒾⒸⓀ ⒽⒺⓇⒺ

:shipit: 💥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants