Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache GitHub API information between Actions runs #276

Closed
holly-cummins opened this issue Aug 1, 2023 · 3 comments · Fixed by #306
Closed

Cache GitHub API information between Actions runs #276

holly-cummins opened this issue Aug 1, 2023 · 3 comments · Fixed by #306

Comments

@holly-cummins
Copy link
Collaborator

We would ideally cache information between Actions runs, using the Gatsby cache facility and the GitHub actions cache facility

We would want a short expiry on the cache, so that issue counts are current; or we would re-query issue counts, even if the cache is populated. That's the technique we now use for repositories with labels. We would also want to reflect image changes reasonably quickly.

@holly-cummins
Copy link
Collaborator Author

We are currently seeing grey icons again, because of secondary rate limits on the GitHub API, so this is clearly needed. We will want a couple of coordinated cache mechanisms, one at the actions level (see commits on #300), and another using gatsby or a node library to cache results for individual repos, with eviction.

@holly-cummins
Copy link
Collaborator Author

Gatsby offers a built-in cache, which seems like a very tidy solution, but there are a few reasons not to use it:
There’s no time-to-live feature, so any TTL would need to be home-rolled. The GitHub data definitely needs to be refreshed occasionally.
Persisting the Gatsby cache between builds would also drag in caches for a number of other things that we maybe don’t want to cache in a production build.

On the other hand, the available persistent cache libraries are rather unsatisfactory:

They write to disk on every set, which will slow builds down and isn’t needed
They have low Snyk scores (partly because they’re so simple)

I’ve ended up going for a hybrid solution, where I use the GitHub caching mechanism to carry the Gatsby cache forward between builds, and write the API output to that cache; but I also include a layer of a normal in-memory cache to handle evicting stale results.

I realised that because we do the initial read in a big batch, if we have a fixed TTL we’ll end up with a single build needing to refresh the whole cache, which we don’t want. To avoid this I’ve put a jitter on the default time to live.

@holly-cummins
Copy link
Collaborator Author

Hurray! The images are all back after the overnight build (which had a warm cache). The overall build was 6m, with only 45s being for the actual gatsby build phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant