-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace MD5 checksum on blob with a safe hasher #9
Comments
Add a simple benchmark for secure hash functions.
I am not a security pro; so I will just assume these hash functions provide similar level of cryptographic strength. If this assumption is correct, then I would choose the fastest hash function. I wrote a simple benchmark program in Rust and ran it on hibari-brick-rs's main target platform, a modern x86_64 processor running 64-bit Linux. I found Blake2s is the fastest.
For SHA-256 implementation, I also tried its
Blake2 algorithm is optimized for software implementation on modern processors. For example, the above Rust implementation utilizes SIMD instructions. I think that is why it was the fastest. In contrast, it is said that SHA-3 (Keccak) algorithm is good for hardware implementation, such as ASIC and FPGA. It would be interesting to make the hash function pluggable in this project (hibari-brich-rs). But for now, I will just pick Blake2s for the hash function for storage checksum. Crates Used They are written by the same author. Environment
|
Add MD5 to the simple secure hash benchmark.
I was just curious how fast Blake2s is compared to MD5. I added MD5 to the benchmark and got a competing result. Note that the hand-written assembly implementation of MD5 runs ~5% faster than Rust implementation of MD5. MD5 Rust Implementation (md-5 crate)
MD5 Hand-written Assembly Implementation (md-5 crate with md-5-asm crate)
|
Add Blake2b and SHA-512 to the simple secure hash benchmark.
One more thing. In addition to Blake2s, I tried Blake2b algorithm. Blake2b is optimized for 64-bit processors while Blake2s is optimized for 32-bit processors. I tried two different Rust implementations, one from blake2 crate, and another from blake2-rfc crate. Blake2b specification supports output digest length from 1 byte to 64 bytes (512 bits). The one in blake2 crate supports only 64-byte digest. The one in blake2-rfc crate supports other digest lengths. Here is the result.
Blake2b from both crates outperformed MD5. The digest length seems to have no impact to performance; they both 3.3 times faster than the assembly implementation of SHA-256. Conclusion We will use Blake2b algorithm for the best performance on 64-bit processors, with 32 bytes (256 bits) digest for space efficiency. |
- Replace MD5 with Blake2b with 32 bytes (256 bits) digest output. - To be more scalable on multi-core processor, move checksum calculation from the WAL thread to client threads. - Update examples/simple to store larger values (8KB).
In order to verify data integrity in blobs (large binary values), we currently use MD5 digest. MD5 algorithm was chosen because:
However, MD5 is no longer recommended as secure hash because hash collisions can be easily created today (example).
Replace MD5 with a safe hasher, such as SHA-2, SHA-3 or Blake2. Not decided yet but maybe we will prefer 256 bit digest over 512 bit digest to save storage spaces (so SHA-256, SHA-3-256, or Blake2s).
References:
The text was updated successfully, but these errors were encountered: