Infrastructure: Fix failing link checker on GitHub README link #2931

evmiguel · 2024-02-08T19:41:04Z

This PR addresses #2907, where the link checker is failing due to a reference to a GitHub README link. The link checker fails because the page uses a react-partial element to wrap the README content.

Questions for reviewers:

This is a first pass solution, and I am looking for more feedback on how to make this solution less brittle, in case GitHub decides to change how they handle READMEs in the future. Thoughts?

WAI Preview Link (Last built on Thu, 22 Feb 2024 21:30:45 GMT).

howard-e

@evmiguel thanks for doing the research into figuring what the issue was! I've also confirmed this solution works as expected!

I also really appreciate the consideration for the brittleness of this approach as you've mentioned.

My initial thought for this, was if there were any alternative ways explored? So as to not risk GH pulling the rug from under link-checker.js once more 🙂. If not, that doesn't have to be something explored in this PR but was wondering if there was anything else to consider other than how the page ids are currently being extracted.

I've left some additional comments in the thread, design-related questions and possible brittleness mitigation thoughts.

howard-e · 2024-02-12T22:05:05Z

.link-checker.js

@@ -18,13 +20,27 @@ module.exports = {
    {
      name: 'github',
      pattern: /^https:\/\/github\.com\/.*/,
-      matchHash: (ids, hash) =>
-        ids.includes(hash) || ids.includes(`user-content-${hash}`),
+      matchHash: (ids, hash, ssr) => {


Suggested change

matchHash: (ids, hash, ssr) => {

matchHash: (ids, hash, { ssr }) => {

To me, passing in something like ssr here feels like an additional 'option', and beyond just a 'parameter required for matching against known hashes'.

I'm wondering what you think of representing this (and updating other references in this PR where relevant). Also, If future edge cases like this pop up that require new 'options', it may make this easier to extend if needed.

I agree with Howard's point about adding an object here.

I would say though that I don't think this is properly called ssr. After all, if GitHub was doing SSR properly then we would be dealing with correctly formatted HTML already and none of these workarounds would be necessary.

As you can see in my other comment, I think we need to add another option to the configuration file to contain the complexity that GitHub is forcing us to handle. When you do that maybe you can try out another name and/or way of passing data to this function that makes sense based on how you tackle it.

howard-e · 2024-02-12T22:05:56Z

scripts/link-checker.js

@@ -33,7 +33,7 @@ async function checkLinks() {
    return getLineNumber;
  };

-  const checkPathForHash = (hrefOrSrc, ids = [], hash) => {
+  const checkPathForHash = (hrefOrSrc, ids = [], hash, ssr) => {


Suggested change

const checkPathForHash = (hrefOrSrc, ids = [], hash, ssr) => {

const checkPathForHash = (hrefOrSrc, ids = [], hash, { ssr } = {}) => {

To align with my previous thought

howard-e · 2024-02-12T22:22:54Z

scripts/link-checker.js

@@ -142,7 +142,18 @@ async function checkLinks() {
              .querySelectorAll('[id]')
              .map((idElement) => idElement.getAttribute('id'));

-            return { ok: response.ok, status: response.status, ids };
+            // Handle GitHub README links.


Since this is solely related to GitHub-related urls with hashes, I think checking hrefOrSrc to verify if the link follows those rules before running this would be okay.

howard-e · 2024-02-12T22:56:29Z

.link-checker.js

+          const overviewFiles =
+            ssr['props']['initialPayload']['overview']['overviewFiles'];
+          for (let file of overviewFiles) {
+            if (file['richText']) {


You could recursively check the sub-objects, and arrays found in ssr until you find the richText attribute. Doing this could lessen the brittle-ness by putting the main point of failure on richText existing, which is probably more manageable, than if any of the intermediate objects disappear or the path gets shifted around.

Purely for demonstration, I was thinking something along the lines of:

const getAttributeValue = (obj, attribute) => { if (typeof obj !== 'object' || obj === null) return undefined; if (obj.hasOwnProperty(attribute)) return obj[attribute]; if (Array.isArray(obj)) { for (const element of obj) { const attributeValue = getAttributeValue(element, attribute); if (attributeValue !== undefined) return attributeValue; } } else { for (const key in obj) { const attributeValue = getAttributeValue(obj[key], attribute); if (attributeValue !== undefined) return attributeValue; } } return undefined; } // ... const richText = getAttributeValue(ssr, 'richText'); if (richText !== undefined) { // ...

[My thoughts on its maintainability vs performance is another discussion though but for now, we only have that 1 offending link which shouldn't cause much impact]

alflennik

I left a couple comments but my first bit of feedback is a biggie, which is I verified that this does indeed fix the issue with the page. Nice!

There is one link failing though. If you merge in the latest changes from the main branch there is another PR, merged yesterday, which fixes it.

alflennik · 2024-02-14T21:33:14Z

scripts/link-checker.js

+              )
+              .flatMap((element) => element.getElementsByTagName('script'))
+              .map((element) => JSON.parse(element.innerHTML))[0];
+            return { ok: response.ok, status: response.status, ids, ssr };


Would it be possible to add an option for this work in the .link-checker file? If we can avoid dumping this complexity in the main script that would be a big win.

alflennik · 2024-02-14T21:37:18Z

.link-checker.js

@@ -18,13 +20,27 @@ module.exports = {
    {
      name: 'github',
      pattern: /^https:\/\/github\.com\/.*/,
-      matchHash: (ids, hash) =>
-        ids.includes(hash) || ids.includes(`user-content-${hash}`),
+      matchHash: (ids, hash, ssr) => {


I agree with Howard's point about adding an object here.

I would say though that I don't think this is properly called ssr. After all, if GitHub was doing SSR properly then we would be dealing with correctly formatted HTML already and none of these workarounds would be necessary.

As you can see in my other comment, I think we need to add another option to the configuration file to contain the complexity that GitHub is forcing us to handle. When you do that maybe you can try out another name and/or way of passing data to this function that makes sense based on how you tackle it.

evmiguel · 2024-02-15T20:46:47Z

@howard-e @alflennik thank you for your feedback! I've made some changes based on them. Looking forward to your thoughts.

css-meeting-bot · 2024-02-20T20:06:49Z

The ARIA Authoring Practices (APG) Task Force just discussed Link checker.

The full IRC log of that discussion

<jugglinmike> Topic: Link checker
<jugglinmike> github: https://github.com//pull/2931
<jugglinmike> howard-e: The link checker fails to find a GitHub link with an ID matching the anchor
<jugglinmike> howard-e: It worked once. The reason it no longer works is that there is a server-rendered element now rendered on that page which is not easily identified using the link checker implementation
<jugglinmike> howard-e: The server-side-rendered element seems to be hiding the one we want
<jugglinmike> howard-e: I have some thoughts about the stability of this solution, but I haven't reviewed the latest changes (Erika has made more changes since I last shared feedback)
<jugglinmike> howard-e: I will be taking a look again this week
<jugglinmike> Matt_King: This is really about testing the fragment part of the link. We're not worried about the link checker failing because the URL of the page is wrong. It's just that if we specify a fragment and the fragment is invalid--that's a problem
<jugglinmike> Matt_King: We want to catch those kinds of errors when we can
<jugglinmike> Matt_King: Okay howard-e, you merge this when you think it's ready
<jugglinmike> howard-e: Will do
<jugglinmike> Zakim, end the meeting

howard-e

Looks good to me! Thanks for addressing my feedback and thanks for the research put into this work!

We'll want to pay close attention to the link-checker checks, to determine how this holds up with any future GitHub changes -- but having this a bit more structured is great and makes it easier to think about adding future support. Not only for GitHub, but also if other domains are changing unexpectedly.

howard-e · 2024-02-22T00:06:32Z

.link-checker.js

@@ -1,3 +1,24 @@
+const HTMLParser = require('node-html-parser');
+
+const getAttributeValue = (obj, attribute) => {


Could you add a brief comment here on what this does for future context? Something like:

// Checks object for attribute and returns value. If not found on first pass, recursively checks nested objects and arrays of nested object(s) until attribute is found. If not found, returns undefined.

alflennik

Thanks for addressing my feedback, I'm happy with how this turned out!

evmiguel requested review from alflennik and howard-e February 8, 2024 19:41

daniel-montalvo mentioned this pull request Feb 8, 2024

apg/fix-lint-checker generated by aria-practices w3c/wai-aria-practices#297

Closed

howard-e changed the title ~~Fix failing link checker on GitHub README link~~ Infrastructure: Fix failing link checker on GitHub README link Feb 12, 2024

howard-e added the Infrastructure Related to maintaining task force and repo operations, processes, systems, documentation label Feb 12, 2024

howard-e reviewed Feb 12, 2024

View reviewed changes

alflennik reviewed Feb 14, 2024

View reviewed changes

evmiguel added 3 commits February 15, 2024 15:23

Parse react-partial props for ids in README

2549faa

Moving partial request to handler

a6d83e6

Adding functions to clean up code

d1babdc

evmiguel force-pushed the fix-lint-checker branch from ef648d0 to d1babdc Compare February 15, 2024 20:43

evmiguel requested review from alflennik and howard-e February 15, 2024 20:46

howard-e approved these changes Feb 22, 2024

View reviewed changes

alflennik approved these changes Feb 22, 2024

View reviewed changes

Adding comment for getAttributeValue

76daf9e

howard-e merged commit 829499f into main Feb 26, 2024
7 checks passed

howard-e deleted the fix-lint-checker branch February 26, 2024 19:03

howard-e mentioned this pull request Feb 26, 2024

For publication / deploy on Tuesday February 27, 2024 w3c/wai-aria-practices#300

Merged

mcking65 added this to the 2024-02-27 Publication Pull Requests milestone Apr 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infrastructure: Fix failing link checker on GitHub README link #2931

Infrastructure: Fix failing link checker on GitHub README link #2931

evmiguel commented Feb 8, 2024 •

edited by daniel-montalvo

Loading

howard-e left a comment

howard-e Feb 12, 2024

alflennik Feb 14, 2024

howard-e Feb 12, 2024

howard-e Feb 12, 2024

howard-e Feb 12, 2024

alflennik left a comment

alflennik Feb 14, 2024

alflennik Feb 14, 2024

evmiguel commented Feb 15, 2024

css-meeting-bot commented Feb 20, 2024

howard-e left a comment

howard-e Feb 22, 2024

evmiguel Feb 22, 2024

alflennik left a comment

	matchHash: (ids, hash, ssr) => {
	matchHash: (ids, hash, { ssr }) => {

	const checkPathForHash = (hrefOrSrc, ids = [], hash, ssr) => {
	const checkPathForHash = (hrefOrSrc, ids = [], hash, { ssr } = {}) => {

		@@ -1,3 +1,24 @@
		const HTMLParser = require('node-html-parser');

		const getAttributeValue = (obj, attribute) => {

Infrastructure: Fix failing link checker on GitHub README link #2931

Infrastructure: Fix failing link checker on GitHub README link #2931

Conversation

evmiguel commented Feb 8, 2024 • edited by daniel-montalvo Loading

howard-e left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alflennik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evmiguel commented Feb 15, 2024

css-meeting-bot commented Feb 20, 2024

howard-e left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alflennik left a comment

Choose a reason for hiding this comment

evmiguel commented Feb 8, 2024 •

edited by daniel-montalvo

Loading