Insufficient remote Merkle tree size causes slow builds #18686
Labels
P3
We're not considering working on this, but happy to review a PR. (No assignee)
team-Documentation
Documentation improvements that cannot be directly linked to other team labels
team-Remote-Exec
Issues and PRs for the Execution (Remote) team
type: documentation (cleanup)
Description of the bug:
After updating our version of Bazel we saw a significant regression in some of our builds. We saw that a couple of actions towards the end of the build that have a lot of inputs took significantly longer. When looking at trace report we saw that the CPU usage is low when executing those actions while the memory usage of the main Bazel process is constantly going up and down:
Using Git bisect, we found #18015 to be the change that lead to the biggest regression.
We figured out that the builds which have regressed are using
--experimental_remote_merkle_tree_cache
and we could fix it by increasing--experimental_remote_merkle_tree_cache_size
. With an insufficient size, Bazel will keep allocating and deallocating the Merklee trees. Presumably we saw a regression after that change because it keeps Merkle trees around for longer.While we can work around the issue by increasing the size, a warning or error when this starts happening would be appreciated.
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Haven't tried this myself, but I'm assuming this can be replicated on a decently size Java project with enabling
--experimental_remote_merkle_tree_cache
and setting--experimental_remote_merkle_tree_cache_size
to a low value.Which operating system are you running Bazel on?
MacOS 13.4 & Ubuntu Focal Fossa
What is the output of
bazel info release
?/
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.We built on top of commit 286306e from the 6.x branch with some additional patches.
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
1641fa8
Have you found anything relevant by searching the web?
No response
Any other information, logs, or outputs that you want to share?
No response
The text was updated successfully, but these errors were encountered: