Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown link check runs on entire repo #332

Open
breedx-splk opened this issue Sep 19, 2023 · 8 comments · May be fixed by #1697
Open

Markdown link check runs on entire repo #332

breedx-splk opened this issue Sep 19, 2023 · 8 comments · May be fixed by #1697
Assignees
Labels
tooling Regarding build, workflows, build-tools, ...

Comments

@breedx-splk
Copy link
Contributor

When the required build checks for a PR run, one of these checks is a markdown link checker. This is useful to help ensure that we don't amass a bunch of broken links in the docs. However, the markdown link checker looks like it runs across the entire repo. This can result in PRs failing their checks even when the broken link exists in an unrelated pice of markdown. This can cause small PRs to appear broken even when they are not, and it can extend time-to-merge.

One example of this has been #215, where a link in the http docs caused the client session.id PR to fail verifications.

Suggestions:

  • Only run the markdown link check against markdown files that have changed in the PR
  • Instead of running on PRs, only run on a schedule across the whole repo and open an issue when the checker fails
@joaopgrassi joaopgrassi added the tooling Regarding build, workflows, build-tools, ... label Sep 20, 2023
@AlexanderWert
Copy link
Member

Only run the markdown link check against markdown files that have changed in the PR

What if a file A.md was moved to A-new.md in that PR, and another untouched markdown file B.md referenced the old file A.md but has not been updated in that PR? By running the check only on changed files would miss this broken link, right?

@trask
Copy link
Member

trask commented Sep 22, 2023

In the Java repos, we retry failures (file-by-file) a few times to help mitigate sporadic failures: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/.github/scripts/markdown-link-check-with-retry.sh

@breedx-splk
Copy link
Contributor Author

Only run the markdown link check against markdown files that have changed in the PR

What if a file A.md was moved to A-new.md in that PR, and another untouched markdown file B.md referenced the old file A.md but has not been updated in that PR? By running the check only on changed files would miss this broken link, right?

Definitely not a use case I had considered (repo-internal breakage)....I've only ever seen it break on external outbound links 🤣 .

I have no idea how to do this, but if the tools somehow support internal/external distinction this could be useful. Always check/validate internal links but only external links in changed files. I'm sure this is an unreasonable request. 😁

@breedx-splk
Copy link
Contributor Author

Oh, I see that the github action version (not npm) has this check-modified-files-only option.

How about this compromise (and I'm happy to PR this if that helps):

  • PR checks are relaxed to only check the modified files
  • Entire repo is checked nightly and issue created on failure

Thoughts?

@joaopgrassi
Copy link
Member

I have no idea how to do this, but if the tools somehow support internal/external distinction this could be useful. Always check/validate internal links but only external links in changed files. I'm sure this is an unreasonable request.

Maybe a simple check to see if the link is a "path" or a "http link" is enough to distinguish? Run checks on "path" links for the entire repo, but only run "http checks" for the modified files?

@pyohannes
Copy link
Contributor

Maybe a simple check to see if the link is a "path" or a "http link" is enough to distinguish? Run checks on "path" links for the entire repo, but only run "http checks" for the modified files?

That would be easily possible by having two separate markdown-link-check configs, adding one that ignores every link starting with http[s]:. Then duplicating the check, and running the one with the old config only on files changed in the PR.

And this would be needed in addition:

  • Instead of running on PRs, only run on a schedule across the whole repo and open an issue when the checker fails

@codefromthecrypt
Copy link

we are currently getting failures like this which are distracting.

MLC has limited control over retry as their setting only work when the server returns things like 429. This means you have to do things like retry externally. We should only start hacking around things like this which are valid to the change IMHO.

TL;DR; I totally support constraining the MLC to files changed and/or separating out config in such a way that there is less tripping.

Run make markdown-link-check
  make markdown-link-check
  shell: /usr/bin/bash -e {0}
semantic-conventions@ /home/runner/work/semantic-conventions/semantic-conventions

  ERROR: 1 dead links found in ./docs/system/system-metrics.md !
└── [email protected].[2](https://github.com/open-telemetry/semantic-conventions/actions/runs/12250055826/job/34172374920?pr=1655#step:4:2)

  [✖] https://blogs.oracle.com/linux/post/understanding-linux-kernel-memory-statistics → Status: 0
make: *** [Makefile:72: markdown-link-check] Error 1

@trask
Copy link
Member

trask commented Dec 10, 2024

As part of #1009 I'm working on converting this repo from MLC to lychee: https://github.com/lycheeverse/lychee/pulls?q=is%3Apr+author%3Atrask

will be interesting to see if it helps with this issue or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tooling Regarding build, workflows, build-tools, ...
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants