Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLAKE3 digest function #18658

Closed

Conversation

tylerwilliams
Copy link
Contributor

@tylerwilliams tylerwilliams commented Jun 13, 2023

This PR adds the BLAKE3 digest function to Bazel. This new digest type is already supported in the remote API.

Rather than re-implementing the BLAKE3 algorithm in Java, JNI is used to call the C language implementation. The advantage of using the C implementation is that native instructions like AVX2/NEON can be used on x86_64/arm, which greatly speeds up hashing, especially for large files.

Performance depends on the build, but is generally similar to or faster than SHA256 (the default). Faster hashing is especially useful in situations where Bazel's digest is unavailable but results already exist on disk: this is common during CI runs, like on Github Actions.

Here's an example of slow iOS build that's improved with this change:

tylerw@blokus integration % hyperfine -w 1 --prepare 'blake3bazel --digest_function=SHA256 shutdown' 'blake3bazel --digest_function=SHA256 build //iOSApp' --prepare 'blake3bazel --digest_function=BLAKE3 shutdown' 'blake3bazel --digest_function=BLAKE3 build //iOSApp'
Benchmark 1: blake3bazel --digest_function=SHA256 build //iOSApp
  Time (mean ± σ):     31.935 s ±  0.339 s    [User: 0.018 s, System: 0.023 s]
  Range (min … max):   31.333 s … 32.480 s    10 runs

Benchmark 2: blake3bazel --digest_function=BLAKE3 build //iOSApp
  Time (mean ± σ):      9.284 s ±  0.434 s    [User: 0.019 s, System: 0.023 s]
  Range (min … max):    9.025 s … 10.484 s    10 runs

Summary
  'blake3bazel --digest_function=BLAKE3 build //iOSApp' ran
    3.44 ± 0.17 times faster than 'blake3bazel --digest_function=SHA256 build //iOSApp'
tylerw@blokus integration %

Before:
image

After:
image

Here's an example of a linux build on a modern machine that's similar / slightly faster:

tylerw@lunchbox:~/buildbuddy$ cd ../bazel
tylerw@lunchbox:~/bazel$ hyperfine -w 1 --prepare 'blake3bazel --digest_function=SHA256 shutdown' 'blake3bazel --digest_function=SHA256 build //src:bazel-bin --java_runtime_version=remotejdk_11' --prepare 'blake3bazel --digest_function=BLAKE3 shutdown' 'blake3bazel --digest_function=BLAKE3 build //src:bazel-bin --java_runtime_version=remotejdk_11'
Benchmark 1: blake3bazel --digest_function=SHA256 build //src:bazel-bin --java_runtime_version=remotejdk_11
  Time (mean ± σ):      3.015 s ±  0.044 s    [User: 0.004 s, System: 0.003 s]
  Range (min … max):    2.939 s …  3.078 s    10 runs

Benchmark 2: blake3bazel --digest_function=BLAKE3 build //src:bazel-bin --java_runtime_version=remotejdk_11
  Time (mean ± σ):      2.984 s ±  0.043 s    [User: 0.006 s, System: 0.001 s]
  Range (min … max):    2.935 s …  3.066 s    10 runs

Summary
  'blake3bazel --digest_function=BLAKE3 build //src:bazel-bin --java_runtime_version=remotejdk_11' ran
    1.01 ± 0.02 times faster than 'blake3bazel --digest_function=SHA256 build //src:bazel-bin --java_runtime_version=remotejdk_11'
tylerw@lunchbox:~/bazel$

The change is already somewhat large, so I wanted to start with the minimum useful bits here, but the performance can likely be improved further by parallelizing and or optimizing how data transits the JNI interface.

N.B. If you are applying this change as a patch to do your own benchmarking, make sure you build bazel with -c opt :)

@chiragramani
Copy link
Contributor

Thank you @tylerwilliams for putting this up! this will be a huge performance win in all our iOS builds as well!

@chiragramani
Copy link
Contributor

chiragramani commented Jun 13, 2023

The addresses #18614 really well. performance win is huge!

Repro in the above issue
On the latest run, the contribution of actuallyCompleteAction
Before: 14.8 seconds
Screenshot 2023-06-13 at 1 46 03 PM

After: 3s 177ms

Screenshot 2023-06-13 at 1 46 15 PM

On Uber's Rider app,
Linking stage actuallyCompleteAction contribution: 6s 676ms -> 1s 296ms
Bundling stage actuallyCompleteAction contribution: 7s 297ms -> 1s 866m
Screenshot 2023-06-13 at 2 23 47 PM

Screenshot 2023-06-13 at 2 24 19 PM

really happy to see the biggest bottlenecks being addressed by this PR! 🙌🏼

@coeuvre coeuvre self-requested a review June 14, 2023 08:33
@coeuvre
Copy link
Member

coeuvre commented Jun 14, 2023

Nice improvement! I am happy to review this.

To make it easier to review, test and import, I suggest we split this large PR into multiple small ones (and we have to if one PR contains both non-thirdparty and thirdparty changes):

  1. PR that includes update to REAPI.
  2. PR that import sBLAZE3 c implementation.
  3. PR that introduces BLAZE3 hash function to Bazel's vfs.
  4. and PR that integrate BLAZE3 with remote execution.

What do you think?

@tylerwilliams
Copy link
Contributor Author

Nice improvement! I am happy to review this.

To make it easier to review, test and import, I suggest we split this large PR into multiple small ones (and we have to if one PR contains both non-thirdparty and thirdparty changes):

  1. PR that includes update to REAPI.
  2. PR that import sBLAZE3 c implementation.
  3. PR that introduces BLAZE3 hash function to Bazel's vfs.
  4. and PR that integrate BLAZE3 with remote execution.

What do you think?

Thank you for offering to review this, that sounds great, and your proposed splits seem good too.

I sent the first two smaller CLs here:

  1. Add BLAKE3 digest function to remote_execution.proto #18681
  2. Add BLAKE3 source code to third_party #18682

copybara-service bot pushed a commit that referenced this pull request Jun 19, 2023
Add BLAKE3 source code to third_party

This PR adds the BLAKE3 C and asm sources to third_party, and includes a BUILD
file to build them.

This is a partial commit for #18658.

Closes #18682.

PiperOrigin-RevId: 541539341
Change-Id: I49b1edce20a7d0f986e29712e6050e4e0b9c1d44
copybara-service bot pushed a commit that referenced this pull request Jun 27, 2023
Support new-style digest functions.

This PR adds support for new-style digest functions to the remote execution
library code. The remote-apis spec says:

```
// * `digest_function` is a lowercase string form of a `DigestFunction.Value`
//   enum, indicating which digest function was used to compute `hash`. If the
//   digest function used is one of MD5, MURMUR3, SHA1, SHA256, SHA384, SHA512,
//   or VSO, this component MUST be omitted. In that case the server SHOULD
//   infer the digest function using the length of the `hash` and the digest
//   functions announced in the server's capabilities.
```

This is a partial commit for #18658.

Closes #18731.

PiperOrigin-RevId: 543691155
Change-Id: If8c386d923db1b24dff6054c8ab3f783409b7f13
copybara-service bot pushed a commit that referenced this pull request Jul 24, 2023
This PR adds the Blake3Hasher and Blake3HashFunction classes to vfs and
makes them available under the flag --digest_function=BLAKE3.

This is a partial commit for #18658.

Closes #18784.

PiperOrigin-RevId: 550525978
Change-Id: Iedc0886c51755585d56b4d8f47676d3be5bbedba
brentleyjones pushed a commit to brentleyjones/bazel that referenced this pull request Jul 25, 2023
Add BLAKE3 source code to third_party

This PR adds the BLAKE3 C and asm sources to third_party, and includes a BUILD
file to build them.

This is a partial commit for bazelbuild#18658.

Closes bazelbuild#18682.

PiperOrigin-RevId: 541539341
Change-Id: I49b1edce20a7d0f986e29712e6050e4e0b9c1d44
(cherry picked from commit a3a569e)
brentleyjones pushed a commit to brentleyjones/bazel that referenced this pull request Jul 25, 2023
This PR adds the Blake3Hasher and Blake3HashFunction classes to vfs and
makes them available under the flag --digest_function=BLAKE3.

This is a partial commit for bazelbuild#18658.

PiperOrigin-RevId: 550525978
Change-Id: Iedc0886c51755585d56b4d8f47676d3be5bbedba

(cherry picked from commit cc49d68)
@tylerwilliams
Copy link
Contributor Author

The last two CLs related to this work are:

I will close this out after they are in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants