Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many Network errors: Looking for suggestions #634

Closed
vipulgupta2048 opened this issue Jun 1, 2022 · 16 comments
Closed

Too many Network errors: Looking for suggestions #634

vipulgupta2048 opened this issue Jun 1, 2022 · 16 comments
Labels
question Further information is requested waiting-for-feedback

Comments

@vipulgupta2048
Copy link

vipulgupta2048 commented Jun 1, 2022

Hey folks, Have been tinkering with Lychee lately to run on static HTML and check about 17k+ links. From those, I am getting about 400+ links resulting in a Network error. I wanted to open an issue here to ask folks if I might be doing something wrong or if something can be improved to mitigate these.

I have set verbose: true I wish it could show what the actual HTTP error code is in the report that gets generated.

@mre
Copy link
Member

mre commented Jun 2, 2022

17k+ links are quite astonishing. 🤩

Checking the report, I can see a few things:

  • The biggest issue is that you're running into the Github rate limiter (429). A proper solution is to respect per-site rate-limiting headers and throttle requests accordingly. This is not in place yet (Look into tower as a replacement for deadpool + channels #36). Even if it was, it would significantly slow down the runtime. I'm thinking you might want to accept 429 as a valid status code for now. You can do so in the config (accept = "429, 200"). You can also limit the number of concurrent requests with --max-concurrency, but I personally would not do that in your case.
  • Another source of errors is https://tinkerboarding.co.uk/wiki/index.php/Setup, which returns a 403. I run into a timeout when I open it in my browser. Looks like the domain is for sale now, so you probably want to remove these links.
  • http://www.banana-pi.org/m1p.html is indeed a 404.

There might be more issues, but these are the ones I can see by taking a quick look.

You're right that we should add the status code to the Markdown output. This was an oversight on my end. I can add it but I'd be thankful for a PR which brushes up the Markdown output a little bit.

@mre mre added waiting-for-feedback question Further information is requested labels Jun 4, 2022
@vipulgupta2048
Copy link
Author

17k+ links are quite astonishing. star_struck

It's all the power of Rust and a great tool by you @mre, I tweaked a lot with the CPU and concurrency count to see how much GitHub would appreciate getting pushed. I can try a little more to respect their rate-limiting, I don't want to DDOS any site. I will make change to accept the 429, I think it's okay to do that.

Thanks for taking a look, appreciate it. I will see about that Markdown output. I will post a renewed report on what improved after then for others to get help.

@mre
Copy link
Member

mre commented Aug 11, 2022

@vipulgupta2048 are you still planning to post the renewed report or can we close this issue? 😅

@vipulgupta2048
Copy link
Author

Thanks for the bump @mre, here you go
Report: balena-io/docs#2364
Config: https://github.com/balena-io/docs/blob/master/lychee.toml

Following the suggestion to allow 429 has greatly decreased errors for us as you can see above. Hope this helps!

@mre
Copy link
Member

mre commented Aug 14, 2022

ℹ️ If anyone runs into issues with rate limiting in the future, there is now a troubleshooting guide over at lychee.cli.rs/#/rate-limits.

@StevenMaude
Copy link

StevenMaude commented Dec 7, 2022

information_source If anyone runs into issues with rate limiting in the future, there is now a troubleshooting guide over at lychee.cli.rs/#/rate-limits.

The rate limit troubleshooting guide moved to: https://lychee.cli.rs/#/troubleshooting/rate-limits

Edit: it moved again; see below.

@mre
Copy link
Member

mre commented Dec 7, 2022

Thanks for mentioning the updated link! ⭐

@p2635
Copy link

p2635 commented Feb 8, 2024

information_source If anyone runs into issues with rate limiting in the future, there is now a troubleshooting guide over at lychee.cli.rs/#/rate-limits.

The rate limit troubleshooting guide moved to: https://lychee.cli.rs/#/troubleshooting/rate-limits

Ironically, these links do not take you to the right page. Here is the right link for anyone reading in future: https://lychee.cli.rs/troubleshooting/rate-limits/#_top (at the time of writing my comment).

@mre
Copy link
Member

mre commented Feb 8, 2024

Haha, that is indeed ironic. We switched the docs backend lately and that changed the links. Thanks for the updated URL. 😆

@vipulgupta2048
Copy link
Author

Hey @mre wanted to probably ping before opening this issue again. I have set 429 to be an accepted code on my config: https://github.com/balena-io/docs/blob/11b7527d663e34709a2a6a9725102ef76a5fe732/lychee.toml#L50

I have also followed the troubleshooting guide mentioned above on rate limits, the only thing not applied is retries 0. Yet, I am seeing about 200 links failing due to 429 error code still: https://github.com/balena-io/docs/issues (First 4 issues are reports)

What am I missing here? Maybe I need to update my lychee action to receive a new update that uses the accepted codes array correctly?

@mre mre reopened this Apr 24, 2024
@mre
Copy link
Member

mre commented Apr 24, 2024

Reopening as there still seem to be issues. Thanks for the heads up.

@vipulgupta2048
Copy link
Author

It's nice to see I am not alone in this. Would you think having retries 0 might be helpful here? I have had one report that where everything worked, and it showed no false positives (I pinged you on that). I have been trying to replicate that ever since and have checked if my GITHUB_TOKEN is being used properly, too. To clarify, the issue is that 429 codes, when added to the approved list, are still showing up as errors.

@mre
Copy link
Member

mre commented Apr 26, 2024

Oh, I just looked at your pipeline and noticed that you're still running on 0.14.0.
There was a bug fix for the accept handling in #1344.
Here are the release notes: https://github.com/lycheeverse/lychee/releases/tag/v0.14.1

Can you update to the latest version?
(FYI, we released [email protected] yesterday, which is based on lychee 0.15.0.)

@vipulgupta2048
Copy link
Author

Awesome find. I think I was looking at the wrong place for the changelog of the feature (lychee-action). Let me update the action, and report back on Monday how the new report goes. Apologize for the noise.

@vipulgupta2048
Copy link
Author

That did it balena-io/docs#2968
Only valid errors now.

@mre
Copy link
Member

mre commented May 6, 2024

Good times. Thanks for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested waiting-for-feedback
Projects
None yet
Development

No branches or pull requests

4 participants