Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Package Caching #60

Merged
merged 12 commits into from
Feb 1, 2024
Merged

Implement Package Caching #60

merged 12 commits into from
Feb 1, 2024

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Jan 27, 2024

This PR is a bit of an overhaul of omicron-package, doing the following:

Overview

  • It splits out "collection of inputs" from "actually shoving inputs into the package". This separation is necessary to allow us to inspect inputs before deciding whether or not to rebuild packages.
  • It refactors different subcomponents of package building into new modules: archive.rs for building archives, digest.rs for collecting file digests, input.rs for collecting inputs.
  • It adds a "build timer" in timer.rs for ease-of-use, measuring build phases.
  • It plumbs a logger through the Progress trait so we can get better debug information without fighting the TUI-based interface. In Omicron, this will default to out/LOG.
  • It switches from using std::path::Path to camino.

So How Does Caching Work

  • We currently emit artifacts to an output directory -- by default, from Omicron, this is named out/.
  • In this PR, we add a subdirectory named out/manifest-cache, which stores a JSON file for each output we expect to build. This JSON file contains the set of all inputs used to construct the output, with their digests. Although this PR adds support for both SHA-2 and Blake-3, we use Blake-3 by default.
  • BEFORE we build a package, we look up the expected output in this cache. If all inputs are the same and the output exists, we use the output.
  • AFTER we build a package, we update the JSON manifest file.

TO-DO list before merging

  • Add configuration option to "skip caching altogether", to triple-check we aren't introducing any performance regressions
  • We could make the "cache miss" case faster by comparing file lengths before digests
  • Add more tests

Copy link

@steveklabnik steveklabnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! Love the big picture here.

I have left a number of small comments but am marking this "approve" because they are all nits or possible style things to consider. The only one that I would maybe feel a smidge stronger about is using the standard library Once, if there isn't something I'm missing about that, but even that is not the end of the world, and it's possible I am missing something, as I know that the standard library version isn't exactly equivalent.

Cargo.toml Outdated Show resolved Hide resolved
src/archive.rs Show resolved Hide resolved
src/archive.rs Show resolved Hide resolved
src/archive.rs Outdated Show resolved Hide resolved
src/archive.rs Outdated Show resolved Hide resolved
// TODO: Consider using async compression, async tar.
// It's not the *worst* thing in the world for a packaging tool to block
// here, but it would help the other async threads remain responsive if
// we avoided blocking.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 full agreement that this isn't needed but could be cool (I know this comment was moved from old code, but imho it's still true and so worth remaking on)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally! With the block_on calls this is slightly mitigated, but it still has a small cost.


// Writes a manifest file to a particular location.
async fn write_to(&self, path: &Utf8PathBuf) -> anyhow::Result<()> {
let Some(extension) = path.extension() else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is my first time running into let else in the wild! Neat!

.map_err(|e| anyhow!(e))?;

// In the case that we cannot read the manifest, treat it as "missing".
// This will force a rebuild anyway.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this strategy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I had a preference for: "In case of error, treat it like a cache miss, and rebuild anyway", because I figured most users don't care why their cache is invalid, they just want a package that works.

src/package.rs Outdated Show resolved Hide resolved
@smklein
Copy link
Collaborator Author

smklein commented Feb 1, 2024

This is fantastic! Love the big picture here.

Thanks for the quick feedback, very appreciated! I think I've addressed everything

@steveklabnik steveklabnik merged commit b5a2cda into main Feb 1, 2024
3 checks passed
@steveklabnik steveklabnik deleted the cache2 branch February 1, 2024 17:32
smklein added a commit to oxidecomputer/omicron that referenced this pull request Feb 2, 2024
This PR pulls in the changes introduced in
oxidecomputer/omicron-package#60, which has been
published as a breaking version of omicron-package.

These changes include caching, a non-TUI logfile, and camino paths.

Caching can optionally be disabled with:

```bash
omicron-package package --disable-cache
```

Though the cache is now enabled by default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants