-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BLAKE3 digest function #18658
BLAKE3 digest function #18658
Conversation
…ke3_digestfunction
…blake3_digestfunction
…blake3_digestfunction
Thank you @tylerwilliams for putting this up! this will be a huge performance win in all our iOS builds as well! |
The addresses #18614 really well. performance win is huge! Repro in the above issue After: 3s 177ms On Uber's Rider app, really happy to see the biggest bottlenecks being addressed by this PR! 🙌🏼 |
Nice improvement! I am happy to review this. To make it easier to review, test and import, I suggest we split this large PR into multiple small ones (and we have to if one PR contains both non-thirdparty and thirdparty changes):
What do you think? |
Thank you for offering to review this, that sounds great, and your proposed splits seem good too. I sent the first two smaller CLs here: |
Support new-style digest functions. This PR adds support for new-style digest functions to the remote execution library code. The remote-apis spec says: ``` // * `digest_function` is a lowercase string form of a `DigestFunction.Value` // enum, indicating which digest function was used to compute `hash`. If the // digest function used is one of MD5, MURMUR3, SHA1, SHA256, SHA384, SHA512, // or VSO, this component MUST be omitted. In that case the server SHOULD // infer the digest function using the length of the `hash` and the digest // functions announced in the server's capabilities. ``` This is a partial commit for #18658. Closes #18731. PiperOrigin-RevId: 543691155 Change-Id: If8c386d923db1b24dff6054c8ab3f783409b7f13
Add BLAKE3 source code to third_party This PR adds the BLAKE3 C and asm sources to third_party, and includes a BUILD file to build them. This is a partial commit for bazelbuild#18658. Closes bazelbuild#18682. PiperOrigin-RevId: 541539341 Change-Id: I49b1edce20a7d0f986e29712e6050e4e0b9c1d44 (cherry picked from commit a3a569e)
This PR adds the Blake3Hasher and Blake3HashFunction classes to vfs and makes them available under the flag --digest_function=BLAKE3. This is a partial commit for bazelbuild#18658. PiperOrigin-RevId: 550525978 Change-Id: Iedc0886c51755585d56b4d8f47676d3be5bbedba (cherry picked from commit cc49d68)
The last two CLs related to this work are:
I will close this out after they are in. |
This PR adds the BLAKE3 digest function to Bazel. This new digest type is already supported in the remote API.
Rather than re-implementing the BLAKE3 algorithm in Java, JNI is used to call the C language implementation. The advantage of using the C implementation is that native instructions like AVX2/NEON can be used on x86_64/arm, which greatly speeds up hashing, especially for large files.
Performance depends on the build, but is generally similar to or faster than SHA256 (the default). Faster hashing is especially useful in situations where Bazel's digest is unavailable but results already exist on disk: this is common during CI runs, like on Github Actions.
Here's an example of slow iOS build that's improved with this change:
Before:
After:
Here's an example of a linux build on a modern machine that's similar / slightly faster:
The change is already somewhat large, so I wanted to start with the minimum useful bits here, but the performance can likely be improved further by parallelizing and or optimizing how data transits the JNI interface.
N.B. If you are applying this change as a patch to do your own benchmarking, make sure you build bazel with
-c opt
:)