Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to Lychee to check URLs #2734

Merged

Conversation

agriyakhetarpal
Copy link
Member

@agriyakhetarpal agriyakhetarpal commented Feb 26, 2023

Description

Implements a GitHub Actions workflow offered by Lychee (Rust-based URL checker) to potentially speed up URL checking

Type of change

Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #) - note reverse order of PR #s. If necessary, also add to the list of breaking changes.

  • New feature (non-breaking change which adds functionality)
  • Optimization (back-end change that speeds up the code)
  • Bug fix (non-breaking change which fixes an issue)

Key checklist:

  • No style issues: $ pre-commit run (see CONTRIBUTING.md for how to set this up to run automatically when committing locally, in just two lines of code)
  • All tests pass: $ python run-tests.py --all
  • The documentation builds: $ python run-tests.py --doctest

You can run unit and doctests together at once, using $ python run-tests.py --quick.

Further checks:

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Feb 26, 2023

TODO (updated):

  • Configure it to ignore specific files and folders
  • Explicitly mention file types as .rst / .md / .yml because it currently reads all files as plaintext
  • Set up a cache in accordance with cron (if this may be required?)
  • Set up lychee.toml file for config
  • Pin version for enhanced security (avoid tag spoofing)
  • Perhaps add a repo badge to README.md

@codecov
Copy link

codecov bot commented Feb 26, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (9d5f5a1) 99.68% compared to head (fb82592) 99.68%.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #2734   +/-   ##
========================================
  Coverage    99.68%   99.68%           
========================================
  Files          271      271           
  Lines        18991    18992    +1     
========================================
+ Hits         18932    18933    +1     
  Misses          59       59           
Impacted Files Coverage Δ
...amm/models/submodels/particle/fickian_diffusion.py 100.00% <0.00%> (ø)
...els/submodels/particle_mechanics/base_mechanics.py 100.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@agriyakhetarpal
Copy link
Member Author

@tinosulzer @Saransh-cpp I managed to get it to work, but it is currently failing with too many requests after a point, for which an API token may be required (I have added the config in the workflow file with GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} but I am not sure and cannot verify if it is using any; I do not have permissions to add secrets to the PyBaMM repository)

P.S. Please check the todo I added above as well

Copy link
Member

@Saransh-cpp Saransh-cpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @agriyakhetarpal! Looks great (and fast 🚀) from a quick look.

  • I think the errors with the code 404 and ERR are genuine errors that can be fixed (not necessarily in this PR).
  • You might have to set up the lychee cache (mentioned by you above) to fix the errors with error code 429. (This is the only point that would have to be taken care of before merging this PR).
  • Regarding secrets.GITHUB_TOKEN, the secret is not added to the repository manually. Instead, the secret is generated by GitHub on every Action run. More on this here - https://docs.github.com/en/actions/security-guides/automatic-token-authentication
  • A repo badge for the URL checker might not be needed, but we can add that if others think that it is needed.

@agriyakhetarpal agriyakhetarpal marked this pull request as ready for review March 4, 2023 18:13
@agriyakhetarpal
Copy link
Member Author

Should be ready to merge now despite the fact that some links are failing. Some other files and folders may also be excluded as well

@valentinsulzer
Copy link
Member

Why are the links failing? They're all passing the old links checker

@agriyakhetarpal
Copy link
Member Author

Why are the links failing? They're all passing the old links checker

The links are valid but the response code is 429 so we're getting rate limited, which should hopefully be fixed by the cache and the token (the cache is not being generated for some reason). Investigating this right now

Copy link
Member

@Saransh-cpp Saransh-cpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found some resources for the failing github links with 429 error code -

This is a GitHub issue, not a lychee issue (though they are trying to fix it from their end). The PR looks perfect but would require minor hit and try modifications to see what satisfies GitHub's requirements. (As one of the comments says, GitHub could be thinking that we are trying to DDoS their servers haha)

@agriyakhetarpal
Copy link
Member Author

I went through the links, I think it is also partly because Lychee needs the links to pass the checks before it caches them, which is why the GitHub links have not been cached yet – sort of a deadlock situation.

I doubt if changing the number of max retries can fix it (I have kept it to 5, in line with that of the old links checker) or if decreasing the number of concurrent requests would help (will cause a slowdown albeit still faster than the old one – though I will try this). I can accept 429 status codes with --accept 429 for now

The args were inside the `env` block, causing all arguments to be ignored and the default arguments to be used instead since the very beginning.
@mre
Copy link
Contributor

mre commented Mar 7, 2023

Hey y'all! Thanks for using lychee.

The pipeline isn't working because args is in the wrong position.
Here's a fix: agriyakhetarpal#1

@mre
Copy link
Contributor

mre commented Mar 7, 2023

Two more hints, just because I took a look at the remaining errors:

Copy link
Member

@Saransh-cpp Saransh-cpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the help, @mre!

@agriyakhetarpal, this looks good to me now with 1 final change (suggested by @mre above).

The failing URLs (true positives) can be fixed in this PR or in a separate PR, your call. (It would be nice to fix them in this PR itself.)

.github/workflows/lychee_url_checker.yml Outdated Show resolved Hide resolved
@agriyakhetarpal
Copy link
Member Author

Happy to fix the links in this PR itself

@brosaplanella
Copy link
Member

Cool, let us know if there is any reference you can't trace back.

@agriyakhetarpal
Copy link
Member Author

agriyakhetarpal commented Mar 7, 2023

The links in the markdown files (here and in some other places) are failing since they are empty (Sphinx is still able to build the docs without them by referencing the links internally). Should I change them to include the readthedocs links for the stable release?

Edit: I fixed all the other errors, these are the only ones that remain

@valentinsulzer
Copy link
Member

It's nice to be able to reference links internally in the docs without having to give the full path. This also makes it more robust to url changes. Is there a way around this without having to give the full path?

@agriyakhetarpal
Copy link
Member Author

Though it does not sound desirable, maybe we can exclude the entirety of these folders or the files therein, in particular (docs/source/user_guide/ and in examples/notebooks/) for now?

@agriyakhetarpal
Copy link
Member Author

I think the best way is to explicitly (manually) exclude all the links to the files since there are only eight of them failing anyway. We can set up a .lycheeignore file and add them there, so that the workflow file remains relatively cleaner

@mre
Copy link
Contributor

mre commented Mar 7, 2023

Alternatively you could consider checking the links on the rendered HTML, which should contain the full link paths. I don't know how to build the HTML files for the project, so I could not test it myself.
If you have relative links (e.g. ../foo/bar.html), that you'd like to check as well, you can also set --base /path/to/root/folder so that all the links will be checked relative to this base path.

But yeah, excluding the links certainly is an option as well. 👍

@agriyakhetarpal
Copy link
Member Author

I doubt if the link checks could be performed in conjunction with building the docs using a composite action – if I am correct, readthedocs has its own workflow file that is separate from GitHub Actions. Maybe this can be integrated within tox? I'll exclude the links for the time being

@Saransh-cpp
Copy link
Member

Excluding them sounds good to me

@valentinsulzer
Copy link
Member

It's not ideal to have to exclude everything manually. Could we exclude the whole path file:///home/runner/work/PyBaMM/PyBaMM/docs/source/ or would that then miss some errors?

@agriyakhetarpal
Copy link
Member Author

Yes, the whole folder PyBaMM/docs/source/ can be excluded as well – though it would miss errors for external hyperlinks lest they are added to the API docs or the user guide in the future

@valentinsulzer
Copy link
Member

Ok, yeah we don't want to miss those, let's leave like this then. Ready to merge?

@agriyakhetarpal
Copy link
Member Author

Yes sure — that failing integration test probably was a one-off one, shouldn't come up on a re-run

@valentinsulzer valentinsulzer merged commit 3078360 into pybamm-team:develop Mar 8, 2023
@agriyakhetarpal agriyakhetarpal deleted the lychee-URL-checking branch March 9, 2023 00:00
@valentinsulzer valentinsulzer mentioned this pull request Mar 9, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants