bazel: Better remote caching #29367

ParkMyCar · 2024-09-04T14:05:29Z

~~This PR improves our remote caching by changing the way we use --workspace_status_command which enables us to fetch 100% of actions from the remote cache when applicable.~~

Previously I noticed CI jobs would always appear to rebuild the leafs of our repo (e.g. environmentd and clusterd), which unfortunately take the longest because of linking. This was because our workspace status command required generating some Rust code which was breaking our remote caching. Now with this change I've observed the remote caching working exactly as a local cache, e.g. a change in the sql crate only requires re-building environmentd whereas it used to also cause a recompile of clusterd and some other crates.

Edit: Unfortunately there is no free lunch and my testing was flawed. The original approach from this PR of tweaking the way we use workspace_status_command did not work, and we were actually just providing a constant value for the git hash instead of the updated value. After investigating further it turns out that Bazel treats local cache artifacts differently than remote cache artifacts when it comes to the workspace status and specificallly the volatile-status.txt file we rely on here, see bazelbuild/bazel#10075.

The only way to prevent build_info!() from causing unnecessary rebuilds and update the git hash for anything that does get rebuilt, is to side channel the necessary information like we're doing here. The approach is we write the current git hash to a known location, in this case /tmp/mz_git_hash.txt and then read it via Rust. bin/bazel has been updated to write this file, and I'll make the same change to mzbuild. When running with --config=release-stamp we fallback to the original behavior with workspace_status_command and formally "stamp" the builds which causes everything to get rebuilt.

This is definitely a hack but from what I can tell there is no other way to get the same behavior. An alternative is we always just return a constant git hash when not running with --config=release-stamped, not sure how important stamping development builds is.

Motivation

Progress towards https://github.com/MaterializeInc/database-issues/issues/7923

Improve remote caching speed

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

benesch · 2024-09-04T14:28:08Z

Last commit LGTM. Did you mean to stack this on top of the mzbuild integration?

ParkMyCar · 2024-09-04T15:01:30Z

Last commit LGTM. Did you mean to stack this on top of the mzbuild integration?

Ahh shoot sorry, I was doing that for testing and forgot to rebase on main before pushing up, fixed!

* this PR side channels git info into the build to improve our remote caching

benesch · 2024-09-05T17:28:07Z

misc/python/materialize/cli/bazel.py

@@ -16,6 +16,10 @@

 from materialize import MZ_ROOT, ui

+# Path where we put the current revision of the repo that we can side channel
+# into Bazel.
+MZ_GIT_HASH_FILE = "/tmp/mz_git_hash.txt"


What makes it safe to always write to this file in /tmp? Does Bazel make /tmp hermetic? Not that this is a huge risk, but if you have two worktrees of Materialize and you're building simultaneously, it seems like they'll compete to write to this file?

Is there a way to write to a file in the current repo instead? That seems a bit less worrying to me.

Could we use an env var instead? If the build is allowed to read random files maybe we can just read a special env var?

Unfortunately it's not safe, two different Materialize worktrees building Bazel at the same time will totally compete with each other. The issue is Bazel doesn't expose the path to the current repo in a way that wouldn't invalidate the cache. Same with environment variables, anything we try passing into for the build process of a rust_binary or rust_library is tracked as well.

I'll ask around and see if I can find a better solution, but going to merge as-is for now!

I'm building in multiple repos at once all the time. I guess we also can't use something like the PID in the path?

ParkMyCar requested a review from def- September 4, 2024 14:05

ParkMyCar requested review from a team and benesch as code owners September 4, 2024 14:05

ParkMyCar requested a review from jkosh44 September 4, 2024 14:05

ParkMyCar force-pushed the bazel/better-caching branch from 37ba114 to c9561c6 Compare September 4, 2024 15:01

antiguru approved these changes Sep 4, 2024

View reviewed changes

def- approved these changes Sep 4, 2024

View reviewed changes

benesch approved these changes Sep 4, 2024

View reviewed changes

ParkMyCar force-pushed the bazel/better-caching branch from c9561c6 to 0346a63 Compare September 5, 2024 16:16

start, improve remote caching

0cdade2

* this PR side channels git info into the build to improve our remote caching

ParkMyCar force-pushed the bazel/better-caching branch from 0346a63 to 0cdade2 Compare September 5, 2024 16:27

ParkMyCar requested review from benesch and def- September 5, 2024 16:27

fix clippy

965184f

def- approved these changes Sep 5, 2024

View reviewed changes

benesch reviewed Sep 5, 2024

View reviewed changes

ParkMyCar merged commit d5f7831 into MaterializeInc:main Sep 5, 2024
81 checks passed

github-actions bot locked and limited conversation to collaborators Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bazel: Better remote caching #29367

bazel: Better remote caching #29367

ParkMyCar commented Sep 4, 2024 •

edited

Loading

benesch commented Sep 4, 2024

ParkMyCar commented Sep 4, 2024

benesch Sep 5, 2024

petrosagg Sep 5, 2024

ParkMyCar Sep 5, 2024

def- Sep 6, 2024

bazel: Better remote caching #29367

bazel: Better remote caching #29367

Conversation

ParkMyCar commented Sep 4, 2024 • edited Loading

Motivation

Checklist

benesch commented Sep 4, 2024

ParkMyCar commented Sep 4, 2024

benesch Sep 5, 2024

Choose a reason for hiding this comment

petrosagg Sep 5, 2024

Choose a reason for hiding this comment

ParkMyCar Sep 5, 2024

Choose a reason for hiding this comment

def- Sep 6, 2024

Choose a reason for hiding this comment

ParkMyCar commented Sep 4, 2024 •

edited

Loading