Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel: Better remote caching #29367

Merged
merged 2 commits into from
Sep 5, 2024

Conversation

ParkMyCar
Copy link
Member

@ParkMyCar ParkMyCar commented Sep 4, 2024

This PR improves our remote caching by changing the way we use --workspace_status_command which enables us to fetch 100% of actions from the remote cache when applicable.

Previously I noticed CI jobs would always appear to rebuild the leafs of our repo (e.g. environmentd and clusterd), which unfortunately take the longest because of linking. This was because our workspace status command required generating some Rust code which was breaking our remote caching. Now with this change I've observed the remote caching working exactly as a local cache, e.g. a change in the sql crate only requires re-building environmentd whereas it used to also cause a recompile of clusterd and some other crates.

Edit: Unfortunately there is no free lunch and my testing was flawed. The original approach from this PR of tweaking the way we use workspace_status_command did not work, and we were actually just providing a constant value for the git hash instead of the updated value. After investigating further it turns out that Bazel treats local cache artifacts differently than remote cache artifacts when it comes to the workspace status and specificallly the volatile-status.txt file we rely on here, see bazelbuild/bazel#10075.

The only way to prevent build_info!() from causing unnecessary rebuilds and update the git hash for anything that does get rebuilt, is to side channel the necessary information like we're doing here. The approach is we write the current git hash to a known location, in this case /tmp/mz_git_hash.txt and then read it via Rust. bin/bazel has been updated to write this file, and I'll make the same change to mzbuild. When running with --config=release-stamp we fallback to the original behavior with workspace_status_command and formally "stamp" the builds which causes everything to get rebuilt.

This is definitely a hack but from what I can tell there is no other way to get the same behavior. An alternative is we always just return a constant git hash when not running with --config=release-stamped, not sure how important stamping development builds is.

Motivation

Progress towards https://github.com/MaterializeInc/database-issues/issues/7923

  • Improve remote caching speed

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ParkMyCar ParkMyCar requested a review from def- September 4, 2024 14:05
@ParkMyCar ParkMyCar requested review from a team and benesch as code owners September 4, 2024 14:05
@ParkMyCar ParkMyCar requested a review from jkosh44 September 4, 2024 14:05
@benesch
Copy link
Member

benesch commented Sep 4, 2024

Last commit LGTM. Did you mean to stack this on top of the mzbuild integration?

@ParkMyCar ParkMyCar force-pushed the bazel/better-caching branch from 37ba114 to c9561c6 Compare September 4, 2024 15:01
@ParkMyCar
Copy link
Member Author

Last commit LGTM. Did you mean to stack this on top of the mzbuild integration?

Ahh shoot sorry, I was doing that for testing and forgot to rebase on main before pushing up, fixed!

@ParkMyCar ParkMyCar force-pushed the bazel/better-caching branch from c9561c6 to 0346a63 Compare September 5, 2024 16:16
* this PR side channels git info into the build to improve our remote caching
@ParkMyCar ParkMyCar force-pushed the bazel/better-caching branch from 0346a63 to 0cdade2 Compare September 5, 2024 16:27
@ParkMyCar ParkMyCar requested review from benesch and def- September 5, 2024 16:27
@@ -16,6 +16,10 @@

from materialize import MZ_ROOT, ui

# Path where we put the current revision of the repo that we can side channel
# into Bazel.
MZ_GIT_HASH_FILE = "/tmp/mz_git_hash.txt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes it safe to always write to this file in /tmp? Does Bazel make /tmp hermetic? Not that this is a huge risk, but if you have two worktrees of Materialize and you're building simultaneously, it seems like they'll compete to write to this file?

Is there a way to write to a file in the current repo instead? That seems a bit less worrying to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use an env var instead? If the build is allowed to read random files maybe we can just read a special env var?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately it's not safe, two different Materialize worktrees building Bazel at the same time will totally compete with each other. The issue is Bazel doesn't expose the path to the current repo in a way that wouldn't invalidate the cache. Same with environment variables, anything we try passing into for the build process of a rust_binary or rust_library is tracked as well.

I'll ask around and see if I can find a better solution, but going to merge as-is for now!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm building in multiple repos at once all the time. I guess we also can't use something like the PID in the path?

@ParkMyCar ParkMyCar merged commit d5f7831 into MaterializeInc:main Sep 5, 2024
81 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Sep 5, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants