Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrastructure: Fix failing link checker on GitHub README link #2931

Merged
merged 4 commits into from
Feb 26, 2024
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 45 additions & 3 deletions .link-checker.js
Original file line number Diff line number Diff line change
@@ -1,3 +1,24 @@
const HTMLParser = require('node-html-parser');

const getAttributeValue = (obj, attribute) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a brief comment here on what this does for future context? Something like:

// Checks object for attribute and returns value. If not found on first pass, recursively checks nested objects and arrays of nested object(s) until attribute is found. If not found, returns undefined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

if (typeof obj !== 'object' || obj === null) return undefined;
if (obj.hasOwnProperty(attribute)) return obj[attribute];

if (Array.isArray(obj)) {
for (const element of obj) {
const attributeValue = getAttributeValue(element, attribute);
if (attributeValue !== undefined) return attributeValue;
}
} else {
for (const key in obj) {
const attributeValue = getAttributeValue(obj[key], attribute);
if (attributeValue !== undefined) return attributeValue;
}
}

return undefined;
};

module.exports = {
filesToIgnore: [
// For example:
@@ -18,13 +39,34 @@ module.exports = {
{
name: 'github',
pattern: /^https:\/\/github\.com\/.*/,
matchHash: (ids, hash) =>
ids.includes(hash) || ids.includes(`user-content-${hash}`),
matchHash: (ids, hash, { reactPartial }) => {
if (reactPartial) {
// This is where the react-partial keeps data about READMEs and other *.md files
const richText = getAttributeValue(reactPartial, 'richText');
if (richText !== undefined) {
const html = HTMLParser.parse(richText);
const githubIds = html
.querySelectorAll('[id]')
.map((idElement) => idElement.getAttribute('id'));
return githubIds.includes(`user-content-${hash}`);
}
}
return ids.includes(hash) || ids.includes(`user-content-${hash}`);
},
getPartial: (html) => {
return html
.querySelectorAll('react-partial')
.filter(
(partialElement) =>
partialElement.getAttribute('partial-name') === 'repos-overview' // This is the partial that handles the READMEs
)
.flatMap((element) => element.getElementsByTagName('script'))
.map((element) => JSON.parse(element.innerHTML))[0];
},
},
],
ignoreHashesOnExternalPagesMatchingRegex: [
// Some hash links are resolved with JS and are therefore difficult to check algorithmically
/^https:\/\/html\.spec\.whatwg\.org\/multipage\//,
'https://github.com/w3c/aria-practices#code-conformance', // TODO: Remove when #2907 is resolved
],
};
39 changes: 32 additions & 7 deletions scripts/link-checker.js
Original file line number Diff line number Diff line change
@@ -33,7 +33,24 @@ async function checkLinks() {
return getLineNumber;
};

const checkPathForHash = (hrefOrSrc, ids = [], hash) => {
const getHashCheckHandler = (hrefOrSrc) => {
return options.hashCheckHandlers.find(({ pattern }) =>
pattern.test(hrefOrSrc)
);
};

const getReactPartial = (hrefOrSrc, html) => {
const handler = getHashCheckHandler(hrefOrSrc);
if (handler) return handler.getPartial(html);
return undefined;
};

const checkPathForHash = (
hrefOrSrc,
ids = [],
hash,
{ reactPartial } = {}
) => {
// On some websites, the ids may not exactly match the hash included
// in the link.
// For e.g. GitHub will prepend client facing ids with their own
@@ -43,10 +60,8 @@ async function checkLinks() {
// as being 'user-content-foo-bar' for its own page processing purposes.
//
// See https://github.com/w3c/aria-practices/issues/2809
const handler = options.hashCheckHandlers.find(({ pattern }) =>
pattern.test(hrefOrSrc)
);
if (handler) return handler.matchHash(ids, hash);
const handler = getHashCheckHandler(hrefOrSrc);
if (handler) return handler.matchHash(ids, hash, { reactPartial });
else return ids.includes(hash);
};

@@ -149,7 +164,15 @@ async function checkLinks() {
.querySelectorAll('[id]')
.map((idElement) => idElement.getAttribute('id'));

return { ok: response.ok, status: response.status, ids };
// Handle GitHub README links.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is solely related to GitHub-related urls with hashes, I think checking hrefOrSrc to verify if the link follows those rules before running this would be okay.

// These links are stored within a react-partial element
const reactPartial = getReactPartial(hrefOrSrc, html);
return {
ok: response.ok,
status: response.status,
ids,
reactPartial,
};
} catch (error) {
return {
errorMessage:
@@ -305,7 +328,9 @@ async function checkLinks() {
if (
!isHashCheckingDisabled &&
hash &&
!checkPathForHash(hrefOrSrc, pageData.ids, hash)
!checkPathForHash(hrefOrSrc, pageData.ids, hash, {
reactPartial: pageData.reactPartial,
})
) {
consoleError(
`Found broken external link on ${htmlPath}:${lineNumber}:${columnNumber}, ` +

Unchanged files with check annotations Beta

If the scope of the header element were body, browsers would automatically treat the header as an ARIA banner,
so the header would not need the role="banner". -->
<div class="page">
<header role="banner">

Check warning on line 75 in content/patterns/treeview/examples/treeview-navigation.html

GitHub Actions / lint-html

The “banner” role is unnecessary for element “header”.
<div class="title" id="id_website_title">Mythical University</div>
<div class="tagline">Using a Tree widget pattern for navigation links</div>
</header>
<!-- NOTE: In a real website the following SECTION element should be MAIN element -->
<section class="main" aria-labelledby="id_website_title id_page_title">
<h1 class="page_title" id="id_page_title">Mythical University</h1>

Check warning on line 296 in content/patterns/treeview/examples/treeview-navigation.html

GitHub Actions / lint-html

Consider using the “h1” element as a top-level heading only (all “h1” elements are treated as top-level headings by many screen readers and other tools).
<div class="content">
<p></p>
</div>
<!-- NOTE: In a real website the footer element would be a top level element, i.e., it would have the body as its scope.
If the scope of the footer element were body, browsers would automatically treat the header as an ARIA contentinfo,
so the footer would not need the role="contentinfo". -->
<footer role="contentinfo">Mythical University footer information</footer>

Check warning on line 305 in content/patterns/treeview/examples/treeview-navigation.html

GitHub Actions / lint-html

The “contentinfo” role is unnecessary for element “footer”.
</div>
</div>
<div role="separator" id="ex_end_sep" aria-labelledby="ex_end_sep ex_label" aria-label="End of"></div>
<!-- NOTE: In an actual website, the header element would be a top level element, i.e., it would have the body as its scope.
If the scope of the header element were body, browsers would automatically treat the header as an ARIA banner,
so the header would not need the role="banner". -->
<header role="banner">

Check warning on line 69 in content/patterns/menubar/examples/menubar-navigation.html

GitHub Actions / lint-html

The “banner” role is unnecessary for element “header”.
<div class="title">Mythical University</div>
<div class="tagline">Using a Menubar for navigation links</div>
</header>
<!-- NOTE: The following SECTION element would be a MAIN element in an actual website -->
<div class="main">
<section aria-labelledby="id-page-title">
<h1 id="id-page-title" class="page_title">Mythical University</h1>

Check warning on line 217 in content/patterns/menubar/examples/menubar-navigation.html

GitHub Actions / lint-html

Consider using the “h1” element as a top-level heading only (all “h1” elements are treated as top-level headings by many screen readers and other tools).
<div class="content">
<p></p>
</div>
NOTE: In an actual website, the footer element would be a top level element, i.e., it would have the body as its scope.
If the scope of the footer element were body, browsers would automatically treat the footer as an ARIA contentinfo landmark, so the footer would not need the role="contentinfo".
-->
<footer role="contentinfo">Mythical University footer information</footer>

Check warning on line 227 in content/patterns/menubar/examples/menubar-navigation.html

GitHub Actions / lint-html

The “contentinfo” role is unnecessary for element “footer”.
</div>
</div>
<div role="separator" id="ex_end_sep" aria-labelledby="ex_end_sep ex_label" aria-label="End of"></div>