Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Add link checker to avoid 404s going live, merge CI docs into one to simplify #259

Closed
mrjones-plip opened this issue Jul 29, 2020 · 11 comments · Fixed by #266
Closed
Assignees
Labels
Type: Improvement Make something better

Comments

@mrjones-plip
Copy link
Contributor

mrjones-plip commented Jul 29, 2020

Add Link Checker

While working #239, some images were accidentally pushed live that were broken (404). A system that programmatically checked that all images, and possibly all links, do not 404 would have avoided this. Because we're using hugo's ref and relref, we shouldn't ever have broken links between pages. However, this checking does not account 404s for:

  • images via markdown: ![GitHub Logo](/images/logo.png)
  • hugo image processing: {{- $image := resources.Get “images/logo.jpg” -}}
  • internal links via markdown: [contribute](/contribute/docs)
  • external links via markdown [forum](https://forum.communityhealthtoolkit.org/)

We should deploy a solution that accounts for all of these. If hugo can be extended so it can check links in the dev environment, that'd be ideal. Otherwise if GH Actions can do this on commit in CI, that'd likely be the logical place. Beyond that, a GH Action that can run once a day to check the live site for 404's and alert/create an issue on failure could work as well.

Some notes about what's been discussed researched:

  • hugo's current figure shortcode ({{< figure src="image.png" >}}) doesn't break the build locally if you have a 404 image
  • putting a ref or relref as the src of an image doesn't work (only .md docs resolve)
  • we used linkchecker CLI app in the past to do this ad hoc.

Merge CI docs

Right now we have both ci.yml and build-any.branch.yml docs that both get run for every commit. We can merge these by using conditionals. We should so this so all the complexity of a link checker is just in one file, instead of duplicated in two.

@MaxDiz
Copy link
Contributor

MaxDiz commented Jul 30, 2020

adding to v3.11 to help streamline product documentation

@mrjones-plip
Copy link
Contributor Author

Doing a bit more research on this, it looks like muffet and htmltest are waaaay faster. Here's a full test of localhost (and a few other domains snuck in there like youtube) using muffet with a concurrency of 30 :

time muffet -c 30 -s -e ".*github\.com.*" -e ".*npmjs\.com.*" -e ".*google\.com.*" -e ".*docker\.com.*" http://localhost:1313
http://localhost:1313/apps/reference/forms/app/
        id #care-guides not found       http://localhost:1313/apps/reference/forms/contact/#care-guides
        id #icons not found     http://localhost:1313/apps/reference/forms/app/#icons
        ---------------[SNIP]-----------------
http://localhost:1313/core/overview/translations/
        id #/guide/05_using-translate-directive not found       http://angular-translate.github.io/docs/#/guide/05_using-translate-directive
muffet -c 30  -e ".*github\.com.*" -e ".*npmjs\.com.*" -e ".*google\.com.*"  
 4.02s user 0.66s system 39% cpu 11.866 total

Just over 10 seconds - w00t! A lot better than just under 3 minutes with linkchecker ;)

Researching some more, I found a project Hue that wrote about using Muffet to reduce 404s in their Hugo based project. They have another write up on how they use this same technique in Circle CI in GH Actions. From there we can see their .circleci/config.yml file and, in turn, the actual bash script they use to run muffet during CI.

I'm still totally open as to which we deploy, but it seems that linkchecker isn't the one we should go with either way.

@mrjones-plip
Copy link
Contributor Author

Me again! There's some really great discussion about hugo implementing link checking feature natively. It is, unfortunately, blocked by another issue which was initially slated for release in Jul of '19 but then kept on getting delayed for almost 20 milestones, and most recently got delayed two days ago to 0.75.

If any of us know Go and want to contribute, there's ample opportunity!

Also, it's noteworthy that hugo chose htmltest over muffet, but that was back in Feb of '19, so things might have changed.

@garethbowen
Copy link
Member

Great research @mrjones-plip !

It sounds like we shouldn't wait for a Hugo implementation, and that htmltest and muffet are roughly equivalent, so I suggest going with whichever is easier. If the Hue write up makes muffet easy, then go for that!

@mrjones-plip
Copy link
Contributor Author

currently blocked by #260 - we can't have a build with broken links if we're gonna push a link check live, now can we ;)

@mrjones-plip
Copy link
Contributor Author

Oh, fun times! @abbyad and @garethbowen - while doing some bulk clean up in #260, the first item to fix is on the markdown guide page it's supposedly missing the #tables named anchor as witness by this awesome curl call:

curl -v --silent https://www.markdownguide.org/extended-syntax 2>&1 | grep '"tables'
  # this space intentionally blank 

It'll come as no surprise that the site is using ye ol' JavaScript to inject this in the tables h2:

<h2 id="tables">

Aside from the fact I'm old and didn't know that both id="tables" and names="tables" work the same, this poses some problems for this ticket. If we want to validate what moffet calls ids, or what I'm calling named anchors, we need to have an easy way to exclude pages maybe? Or maybe instead we don't validate named anchors? The thought of rendering JavaScript in a headless browser just to check a link seems especilly silly, but this is 2020 and compute is cheap?!

The OCD nerd in me wants use to check everything so this docs ship sales true and pure, but maybe we need to be more lax on deck.

Thoughts? Without any feedback I'll likely implement something like a .muffet_ignore file.

@abbyad
Copy link
Contributor

abbyad commented Jul 31, 2020

we need to have an easy way to exclude pages

Is that to exclude checking links on the doc site page, or exclude specific target URLs?

@mrjones-plip
Copy link
Contributor Author

mrjones-plip commented Jul 31, 2020

I'm envisioning it'd be to exclude specific target URLs. So that might be look like this, in increasing granularity, applying site wide any where these links appear in our docs:

github.com     
subdomain.github.com/subdir 
https://github.com/medic/medic-os#user-content-service-management-scripts

this whole ID thing getting more and more complicated. While working some more on #260, I see that github has those super handy hover to show the ID on the h2-h5 links. So on this page you can click Service Management Scripts and get the id #service-management-scripts. All good, right? According to muffet, wrong! the actual id in the HTML is different:

<h2><a id="user-content-service-management-scripts" class="anchor" aria-hidden="true" href="#service-management-scripts"></a>Service Management Scripts</h2>

So GH does a JS slight of hand to get it to work with #service-management-scripts, but in my testing #user-content-service-management-scripts does indeed work and then validates in muffet. However, I think it's unreasonable to ask docs editors to use the inspector tool to derive the correct ids.

Edit - I found out you can preface any GH id with user-content- and it works. A lot better than inspecting the DOM, but still a bit kludgy.

What a rabbit hole - I thought this was gonna be a quick win ;)

@garethbowen
Copy link
Member

Feel free to solve the simplest problems first (MVP styles). We can gradually improve the link checking over time to add more value but don't let the complexity of checking anchors stop you from shipping checks on other types! ie: get the quick win now, and raise issues to get the slow wins later.

@mrjones-plip
Copy link
Contributor Author

Awesome feedback for a KISS approach @garethbowen - much thanks!

@mrjones-plip mrjones-plip changed the title Add link checker to avoid 404s going live CI: Add link checker to avoid 404s going live, merge CI docs into one to simplify Aug 4, 2020
mrjones-plip pushed a commit that referenced this issue Aug 4, 2020
@mrjones-plip mrjones-plip linked a pull request Aug 4, 2020 that will close this issue
mrjones-plip pushed a commit that referenced this issue Aug 4, 2020
…for master, add test alerts to my own private channel #259
mrjones-plip pushed a commit that referenced this issue Aug 4, 2020
mrjones-plip pushed a commit that referenced this issue Aug 6, 2020
mrjones-plip pushed a commit that referenced this issue Aug 6, 2020
mrjones-plip pushed a commit that referenced this issue Aug 6, 2020
mrjones-plip pushed a commit that referenced this issue Aug 6, 2020
mrjones-plip pushed a commit that referenced this issue Aug 7, 2020
mrjones-plip pushed a commit that referenced this issue Aug 7, 2020
mrjones-plip pushed a commit that referenced this issue Aug 7, 2020
mrjones-plip pushed a commit that referenced this issue Aug 7, 2020
@mrjones-plip
Copy link
Contributor Author

mrjones-plip commented Aug 7, 2020

w00t! code complete on this, just need to finalize the PR and consider go live, roll back and testing for QA/review. Very nicely in one yml file for CI instead of two and complete build for dev takes under 2 min, with link checker itself taking less than 45s. Further, we have a nice muffet bash script that devs/editors can run locally with out having to push live to see if the build passes.

Here's the most recent passing build from GH Actions:

Set up job                                               3s
Pull rtcamp/action-slack-notify:v2.0.                    4s
All Branches - Git checkout, includi[SNIP] Docsy theme   27s
All Branches - Install hugo extended v0.71.              1s
All Branches - Install Go v1.                            1s
All Branches - Install Muffet v1.3.3                     10s
All Branches - Serve Hugo Site                           10s
All Branches - Run Muffet link checker                   41s
Master Branch Only - Install Node 12.x                   0s
Master Branch Only - Install npm modules for hugo build  0s
Master Branch Only - Build for prod with minify          0s
Master Branch Only - Deploy to GH pages                  0s
Master Branch Only - Report errors to slack              0s
Post All Branches - Git checkout, in[SNIP] Docsy theme   1s

mrjones-plip pushed a commit that referenced this issue Aug 7, 2020
mrjones-plip added a commit that referenced this issue Aug 10, 2020
mrjones-plip added a commit that referenced this issue Aug 10, 2020
mrjones-plip added a commit that referenced this issue Aug 11, 2020
* Add muffet as link checker util for all branches  #259
* Merge two ci scripts into one  #259
* Update in readme.md with muffet info #259
* exclude  IntelliJ family of IDEs and their .ideo folder #259
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Improvement Make something better
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants