Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework github action caching #4367

Merged
merged 3 commits into from
Jun 28, 2024

Conversation

graydon
Copy link
Contributor

@graydon graydon commented Jun 27, 2024

Quite a lot is going wrong with our current github actions artifact caching:

  • I'm not sure if it's been getting any cache hits on PRs for a long time. A PR hit would need a master-seeded cache and those are not being generated.
  • The "hour-number-qualified cache key" we were generating was/is both inaccurate enough to represent changes we want to invalidate the cache, and also redundant: github always falls back to recency-order when accessing prefix restore keys on a cache miss
  • Latobarita's auto branch was getting hits, and getting seeded, but we're not running latobarita anymore.
  • The PR branches were making their own cache entries so they'd get a hit against themselves if you did multiple pushes but only then. Otherwise they'd be filling up the cache set with cache entries that never get hits again.
  • The new merge queues are the same: never hit, fill up the cache set with stale entries.
  • We weren't caching any of the Rust stuff

So this PR overhauls this quite a bit:

  • We split cache probing, retrieving and saving
  • We key the cache on OS/toolchain/protocol as before (and use that prefix for restore-prefix fallback) but use the commit SHA as the exact-hit suffix
  • We also add a run on master on a cron schedule, every hour
  • We only save the cache at the end of a run on schedule, on master, so no pollution from other branches
  • All runs probe for a cache hit on the commit SHA, and do nothing if there is a hit: we already tested this SHA and saved its cache. So the hourly run should be very cheap/quick/quiet most of the time
  • Any cache miss falls back to restoring from the last cache written, which will have been written on master on schedule and be fairly similar to whatever the runner is doing (any PR has to be fairly close to it to work)
  • We cache the Rust stuff

Hopefully this will speed up everything. Suggestions on further improvements welcome.

@MonsieurNicolas
Copy link
Contributor

The reason for not using commit SHA and instead use some "stable for an hour" ID was that there is a global limit per repo that is fairly small (like 5GB), which when there is enough churn (if a few people push to their branches it may be enough to hit the limit) to purge keys that are actually useful (like the master cache in your case).
I think each cache is about 250MB per toolchain combination so 1 GB cache addition every time (gcc/clang*cur/next).

In PRs there should have been caching advantage but only when pushing new commits. Now that we also have a cache on master it should work better for sure.

@graydon graydon force-pushed the improve-github-action-caching branch from f4a6b9a to 406eb09 Compare June 27, 2024 23:37
@MonsieurNicolas MonsieurNicolas added this pull request to the merge queue Jun 28, 2024
Merged via the queue into stellar:master with commit 12f36e1 Jun 28, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants