-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force synchronous upload and reuse of possibly modified spawn outputs #23382
Conversation
826f1c9
to
5bf3c1a
Compare
FYI @aranguyen: We don't need to modify anything about how paths are rewritten in Java actions to benefit from execution deduplication with this change. We should still look into getting rid of the in-place modification for direct classpath header compilation to avoid the synchronous fallback introduced by this change. |
5bf3c1a
to
f051c2f
Compare
@tjgq I have no idea how to test this short of running a large build in CI. I tested it manually on Bazel itself, with promising results: I no longer get any failures and build times improve noticeably. Representative build results for
I expect the sandboxing overhead to be lower on Linux, but even with the overhead, path mapping is substantially faster. |
src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnCache.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I think we should be in good shape to flip --experimental_remote_cache_async
once it's in.
src/main/java/com/google/devtools/build/lib/actions/ActionExecutionMetadata.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnCache.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnCache.java
Outdated
Show resolved
Hide resolved
src/main/java/com/google/devtools/build/lib/remote/RemoteSpawnCache.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to insist too much because I know this might be a lot of effort, but: is there some way we can write a test to verify that a fast, async-unsafe spawn waits for a slow, concurrent one to reuse its outputs before proceeding?
f051c2f
to
2c82260
Compare
994ba7f
to
967b286
Compare
I added a Java test that relies on mocking to make The problem is that we can't just make the other spawn slow (it doesn't run anyway), we need to make its output reuse slow. A more realistic test would be possible if we could somehow delay file system operations. Is that something that the test filesystem supports (say, stall updates to |
It might be possible with a custom FileSystem implementation; see https://cs.opensource.google/bazel/bazel/+/master:src/test/java/com/google/devtools/build/lib/remote/ActionInputPrefetcherTestBase.java;l=134;drc=3fe227ff418ef81eb5f221625bfd90f52ac70150 for inspiration. |
Thanks for the pointer, that looks pretty interesting. Since the main code path here goes through I added an assert that should make it less likely that the test fails open. |
// Make it more likely to detect races by waiting for this thread to make as much progress as | ||
// possible before letting the second spawn continue. | ||
while (completeFirstSpawn.getState().compareTo(Thread.State.WAITING) < 0) { | ||
Thread.sleep(10); | ||
} | ||
assertThat(completeFirstSpawn.getState()).isEqualTo(Thread.State.WAITING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead use a latch to explicitly synchronize with the call to uploadOutputs
(and avoid unnecessarily slowing down the test)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Note that with the latch we can no longer assert that the thread would naturally wait (in the call to awaitAllOutputReuse
), so I removed that assert.
src/test/java/com/google/devtools/build/lib/remote/RemoteSpawnCacheTest.java
Show resolved
Hide resolved
@bazel-io fork 7.4.0 |
When an action may modify a spawn's outputs after execution, the upload of outputs to the cache and reuse for deduplicated actions need to happen synchronously directly after spawn execution to avoid a race. This commit implements this for cache uploads by marking all actions with this property and simply disabling async upload for all spawns executed by such actions. For output reuse, all executions deduplicated against the first one register atomically upon deduplication and cause the cache upload to wait for all of them to complete reuse. Fixes bazelbuild#22501 Fixes bazelbuild#23288 Work towards bazelbuild#21578 Closes bazelbuild#23307 (no longer needed) Closes bazelbuild#23382. PiperOrigin-RevId: 668101364 Change-Id: Ice75dbe14a7dd46e02ecb096d2b2a30940216356
Cherry-picks the following changes to implement output reuse: * Deduplicate locally executed path mapped spawns (#22556) * Fix local execution deduplication to work with optional outputs (#23296) * Force synchronous upload and reuse of possibly modified spawn outputs (#23382) * Add support for in-memory outputs to output reuse (#23422) Fixes #23377 Fixes #23444 Fixes #23457
When an action may modify a spawn's outputs after execution, the upload of outputs to the cache and reuse for deduplicated actions need to happen synchronously directly after spawn execution to avoid a race.
This commit implements this for cache uploads by marking all actions with this property and simply disabling async upload for all spawns executed by such actions.
For output reuse, all executions deduplicated against the first one register atomically upon deduplication and cause the cache upload to wait for all of them to complete reuse.
Fixes #22501
Fixes #23288
Work towards #21578
Closes #23307 (no longer needed)