Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip path metadata lookup in bytestream uploader when artefact won't be uploaded #23374

Conversation

Silic0nS0ldier
Copy link
Contributor

@Silic0nS0ldier Silic0nS0ldier commented Aug 20, 2024

Fresh take on #23252 without the NPE.

This PR is an alternate implementation of #20575 that defers to LocalFileType where possible, adds a test case and fixes a latent NPE in src/main/java/com/google/devtools/build/lib/runtime/NamedArtifactGroup.java (path converter returns null for files lacking a digest, such as directories, symlinks and anything skipped under minimal mode).

This should improve performance. Benchmarking (in Bazel v6) has shown 54% of a Bazel profile for a fully cached build is spent in readPathMetadata. Gains likely won't be as significant as digest lookups are still required (required to produce bytestream URI).

Fixes #20576.

@Silic0nS0ldier Silic0nS0ldier marked this pull request as ready for review August 21, 2024 12:00
@Silic0nS0ldier Silic0nS0ldier requested a review from a team as a code owner August 21, 2024 12:00
@github-actions github-actions bot added team-Remote-Exec Issues and PRs for the Execution (Remote) team awaiting-review PR is awaiting review from an assigned reviewer labels Aug 21, 2024
Comment on lines +202 to +223
if (remoteBuildEventUploadMode == RemoteBuildEventUploadMode.MINIMAL) {
// Omit all unimportant paths
if (!(
// Logs (regex needed for test logs as they lack a dedicated `LocalFileType`)
TEST_LOG_PATTERN.matcher(path.getPathString()).matches()
|| file.type == LocalFileType.LOG
|| file.type == LocalFileType.PERFORMANCE_LOG
// std(out|err)
|| file.type == LocalFileType.STDOUT
|| file.type == LocalFileType.STDERR
)) {
return new PathMetadata(
path,
digestUtil.compute(path),
/* directory= */ false,
/* symlink= */ false,
/* remote= */ false,
/* omitted= */ true,
/* isBuildToolLog= */ false,
digestUtil.getDigestFunction());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time trying to understand what functionally changes in this PR. What do we save by exiting early here, considering that we're still calling digestUtil.compute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find myself asking the same question. This went through a number of iterations, as I discovered more and more requirements that incrementally eliminated any kind of potential optimisation.

I'm going to close this. There is room for improvement in the bytestream uploader (I had a stab at some in #23252 but hit a NPE I could not track down, at face value the area seems like it would benefit from a refactor and perhaps alignment across full and minimal upload modes).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I don't think the uploader should be digesting files at all; all output files have already been digested (and the digests cached in memory as a FileArtifactValue) during action execution, so we should figure out a way to plumb that data into the uploader.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some spawn intermediate outputs might not be stored, though.

@github-actions github-actions bot removed the awaiting-review PR is awaiting review from an assigned reviewer label Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Remote-Exec Issues and PRs for the Execution (Remote) team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

perf: bazel computes digests for too many files when remote_build_event_upload is set to minimal
3 participants