-
-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Response caching? #163
Comments
Oh that's a great idea! We don't cache responses yet. Didn't even think about using a file for storing these responses yet, but I'm all for it. |
You could store the expiry in seconds (probably just needs to be an arg actually) within the file store along with a unix timestamp to know if the file should be used for cache or discarded. I haven't written rust in a while but The file in the CI can be cached easily enough, eg with Github Actions, a prior step would be: - name: Cache Responses
uses: actions/cache@v2
with:
path: <path to cache file from lychee>
key: lychee.cache The EDIT:
I just realized you mentioned key expiry specifically as the concern! 😅 I had a more naive approach in mind. Just caching the status of URLs to disk to persist between CI runs and invalidating the cache when That is the An expiry of This caching behavior may cause some false positives depending on activity (for internal or external links) that might cause the links to actually be unreachable but not caught, but for a PR I think that is uncommon and a CI event can do a full scan without cache via a related event such as reviewer approving for merge.
I'd give a shot, but I'm pretty swamped as-is right now 😞 Just wanted to detail it here as |
No worries. Thanks to the detailed write-up; I'm sure it will help implementing the functionality in the future. I'm all in favor of keeping this as simple as possible.
I don't even think we'd need sled for that. Perhaps just a serialized version of the |
An impl of the things discussed plus some additional comments is here: #443. |
I'll close this issue because the other one contains my TODO list and the PR code. Let's move the discussion over there as well. Thanks for the initial idea! 😊 |
I've seen another link checking tool mention response caching:
Which is also paired with a cache age option:
I don't see it documented here if responses are cached by default implicitly. If not, is this something that could help with
I did see this discussion about an extractor pool with the benefit of de-duping URL checks. That's a similar benefit I guess, although if there were a related cache data file that could be persisted (with an expiry timestamp), one could persist this with CI cache tooling across runs.
That would be useful for PRs triggering tests each time commits are pushed to update the PR, reducing concerns about Rate Limits being hit? (since the likelihood of a URL, at least ignoring internal ones if local filesystem URI were later supported; would probably still be successful within a given duration between runs)
The text was updated successfully, but these errors were encountered: