Cache images slightly differently #165

MarcoPolo · 2023-04-09T04:18:39Z

closes #161

In an effort to speed things up, this changes the caching strategy from relying on Docker's somewhat opaque rules to simply caching the generated image. Here's how it works:

We define the cache key to be the hash of the files in the implementation folder.
We check s3 directly to see if we have a file named $IMAGE_NAME-$CACHE_KEY-$ARCH.tar.gz, if we do import that image.
Yes? we create the image.json, and call make -o image.json which allows the implementation to do some extra work from cache (e.g. js-libp2p needs to build both the node and browser images, so it can cache the base and do quick work to build the final node and browser images).
No? Build as normal.
If we have the PUSH_CACHE env we push the built image to s3.

Q: Why not a GH registry?
A: Then we have to deal with GC. This lets us rely on the s3 behavior of a lifetime for objs.

Q: Why not an S3 registry?
A: That requires a bit more work to set up. You need a registry container running in the background. This is simpler. It also supports public read only access to the cache.

Benefits

Faster CI runs for everyone. Less time spent building, even if only cached (I noticed one js-libp2p PR took 8 minutes before starting the test).
More predictable caching. Caching strategy is dead simple here so it's easy to see why a cache missed happened.
Anonymous S3 Reads. Everyone can use the cache
Faster downloads of images. If the final image of the container-mage is small (it should be), then this is significantly less data than getting every single layer and rebuilding.

Why didn't we do this from the start?

One step at a time. The build layer caching strategy was convenient and it was correct. I was hoping it would be better, but here we are.

Note that this solution is different (and imo better) than each implementation managing its own images and pushing to a registry. I say better since only this repo needs to worry about pushing images and GC'ing them.

thomaseizinger · 2023-04-10T12:26:39Z

Let me know when this is ready for review, it looks exciting!

thomaseizinger · 2023-04-11T07:01:54Z

multidim-interop/helpers/cache.ts

+    for (const implFamily of fs.readdirSync(path.join(multidimInteropDir, 'impl'))) {
+        const ig = ignore()
+
+        addGitignoreIfPresent(ig, path.join(multidimInteropDir, ".gitignore"))


Would it make sense to extract path.join(multidimInteropDir, ".gitignore") and others constants as well?

I think it's the same, but unless you feel strongly I'd rather just leave it.

multidim-interop/helpers/cache.ts

thomaseizinger · 2023-04-11T15:32:04Z

Nice work @MarcoPolo!

MarcoPolo added 6 commits April 9, 2023 04:16

Move files to impl

ff3ebe8

Add gitignore to impl/js

dfe74be

Add VSCode formatter settings

61361c7

Add cache helper

2798235

Have CI use and push the image cache

0c9106b

Cleanup

e6c6ad8

MarcoPolo marked this pull request as draft April 9, 2023 04:18

MarcoPolo added 6 commits April 9, 2023 05:44

Fix imports

75679b5

Don't push if already cached

b50e472

Base browser images from same base as node

0324877

Try skipping some make work

21ca43f

Always tag the node image so you can use it in a build arg

ba2980b

Trigger CI

cfb42f3

MarcoPolo added 5 commits April 10, 2023 09:58

Cleanup cache.ts

8770b7d

Remove vscode settings for now

3863584

Add to gitignore

edf1938

Fix path import

91fc5fd

Add section about caching to README

5bcf7e4

MarcoPolo marked this pull request as ready for review April 10, 2023 17:34

MarcoPolo requested a review from thomaseizinger April 10, 2023 17:34

thomaseizinger approved these changes Apr 11, 2023

View reviewed changes

Use createFilter

cfcc8e9

MarcoPolo merged commit 23fdcef into master Apr 11, 2023

MarcoPolo mentioned this pull request Apr 11, 2023

Support read-only cache #153

Closed

MarcoPolo added a commit to mxinden/test-plans that referenced this pull request Apr 13, 2023

Cache images slightly differently (libp2p#165)

e7728b6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache images slightly differently #165

Cache images slightly differently #165

MarcoPolo commented Apr 9, 2023 •

edited

Loading

thomaseizinger commented Apr 10, 2023

thomaseizinger Apr 11, 2023

MarcoPolo Apr 11, 2023

thomaseizinger commented Apr 11, 2023

Cache images slightly differently #165

Cache images slightly differently #165

Conversation

MarcoPolo commented Apr 9, 2023 • edited Loading

Benefits

Why didn't we do this from the start?

thomaseizinger commented Apr 10, 2023

thomaseizinger Apr 11, 2023

Choose a reason for hiding this comment

MarcoPolo Apr 11, 2023

Choose a reason for hiding this comment

thomaseizinger commented Apr 11, 2023

MarcoPolo commented Apr 9, 2023 •

edited

Loading