Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a layer of caching to omicron-package #2773

Closed
wants to merge 2 commits into from
Closed

Conversation

steveklabnik
Copy link
Contributor

This commit adds a very simple caching system to omicron-package. Upon running for the first time, omicron-package will write out a JSON file containing file paths and hashes of content. Upon subsequent builds, if the file exists with the same contents, it will skip the step of building the package.

On atrium, repeated builds take 3 seconds for me, compared to roughly ten seconds currently.

I am happy with any and all feedback here; I am trying to make basic progress rather than build a build system from scratch, ha! Also, I decided to go with a global here, because then I wouldn't need to pass configuration through a bunch of layers of functions. If you'd prefer a style where this is stored inside the configuration, or sitting on the stack of main, or something else, happy to make changes like that as well.

@steveklabnik steveklabnik requested a review from smklein April 5, 2023 21:40
This commit adds a very simple caching system to omicron-package. Upon
running for the first time, omicron-package will write out a JSON file
containing file paths and hashes of content. Upon subsequent builds, if
the file exists with the same contents, it will skip the step of
building the package.

On atrium, repeated builds take 3 seconds for me, compared to roughly
ten seconds currently.
package/src/bin/omicron-package.rs Outdated Show resolved Hide resolved
mem::drop(cache);

if hash.is_none()
|| !does_hash_match(&output_file, &hash.unwrap()).await
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm theoretically on-board with an approach that uses hashes this way to avoid rebuilding - but don't we need to hash the inputs instead of the outputs?

For example:

  • Suppose we build a Nexus package from some input files: (1) db.rs, (2) http.rs, and (3) sagas.rs
  • Building this package produces nexus.tar.gz, which is then hashed and stored in build-cache.json
  • If we rebuild without changing anything, the hash of nexus.tar.gz matches the hash stored in build-cache.json, which is great.
  • However, if we update http.rs to, say, add a new endpoint...
    • build-cache.json will still store the old hash
    • nexus.tar.gz still exists, built using the old files
    • The hash will still appear to match, right?

Doesn't this basically mean that all re-builds where inputs change would be "skipped" incorrectly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, hashing inputs is 10000% the Right Way to do this; but it requires a lot more work for various reasons, and not that I am opposed to doing said work, I was just tired of not shipping any wins and wanted to get something going. Incremental progress is better than no progress.

My hope was to get something that would re-build spuriously more often than it should in a world where the inputs are hashed, but still usefully cache some of the time. That said, I think you're 100% right that this doesn't actually do this. Sigh. Time to go back to the drawing board.

@smklein
Copy link
Collaborator

smklein commented Jan 31, 2024

I've gone ahead and implemented a still-hopefully-relatively-simple cache in oxidecomputer/omicron-package#60

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants