-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aes-gcm: performance is worse than OpenSSL #243
Comments
Presently you need to enable RUSTFLAGS as described here for optimum performance: https://docs.rs/aes-gcm/0.8.0/aes_gcm/#performance-notes We are working on and have partially implemented autodetection support for these CPU features which will eliminate the need to manually configure RUSTFLAGS and will be available in the next release. |
Well, it was built with RUSTFLAGS. Surprisingly the performance is approximately 50% in encryption and 30% in decryption compared to OpenSSL. |
I'm not sure that much of a difference deserves the qualifier "much". We've presently been working on features like CPU feature autodetection (which are important) and haven't heavily invested in micro-optimization. OpenSSL uses heavily optimized hand-written assembly implementations (in the case of AES-GCM, written by cryptography engineers at Intel), so reaching performance parity with those (especially in pure Rust) will be difficult. |
If anyone would like to work on improving AES-GCM performance, #74 might be a good start |
Also note: for optimum performance, pass This will significantly improve performance on Skylake, where LLVM will use the VPCLMULQDQ instruction for GHASH. |
In my experience |
I didn't see any statistically significant difference on iMac 2019, thanks anyway :)
|
The maybe you can share your bench code. My bench code: LuoZijun/crypto-bench Bench Result: X86-64:
AArch64:
|
In #243 (comment), 16B data is used for AES-GCM tests. I bumped the data size to 8 KiB, updated all crates to the latest version, and reran some of the tests. On i5-7400 (avx2):
On Intel(R) Xeon(R) Platinum 8272CL (avx512 w/o vaes, vpclmulqdq):
|
@Schmid7k we already have https://github.com/RustCrypto/AEADs/tree/master/benches |
@Schmid7k Also, curiously enough, AES-GCM should improve significantly when RustCrypto/traits#965 will land. |
After RustCrypto/traits#965 lands I can try implementing #74 again. If the code optimizes correctly it should double the performance. Also now that inline ASM is stable, we can add an |
@Schmid7k |
IIUC It also possible that for some reason This is why I generally prefer to not rely on |
hey Rustycrypto, I think OpenSSL Performance is an unfair comparison; as @tarcieri noted earlier in this thread OpenSSL has a dedicated person writing hand crafted assembly for different instruction sets. With Perl scripts to take away the pain of updating to CPU specific feature novelties, variations and new models. OpenSSL is now a fairly well funded project for FOSS standards. That person actually fixes more bugs in OpenSSL than he ever introduced as well. So is it a good idea to do the same with an I had more to say but GitHub swallowed my original comment draft so that's it for now. PS: I don't see OCB anywhere :P Happy hacking, |
Are there any plans on improving performance? It's not only slow when compared to |
I think we're bottlenecked on the trait design of Without that we can't take advantage of pipelining between AES-NI and (P)CLMUL(QDQ), which would give us an expected 2X speedup, as it were. I had an issue for that here, which we should probably reopen: See also: #74 As I mentioned before in this issue, we could also include inline ASM implementations for certain platforms, gated under an |
Another option would be to add architecture-specific low-level APIs to crates like If we can get things performing well that way, I think it could help inform the overall trait design for RustCrypto/traits#444. |
2024 here - I'm seeing 5x slower CTR performance than Golang's standard library CTR implementation. In both of my test codebases, I'm using AES-256-CTR with 128-bit big endian encoding splitting the inputs into 4KB chunks and encrypting each chunk. On a 100 MB file filled with random data, my Go program encrypts it all in 146.5205ms, while Rust takes 630.505875ms. This comparison was done dozens of times on an M1 Pro MacBook and Go consistently outperforms Rust. |
@httpjamesm can you please provide more information including code examples as well as the target architecture? The It would also be helpful if you could reduce your test case to the AES block function in the case of CTR and see if you still experience the problem, as CTR itself is unlikely to add much overhead. |
Note that the Linux kernel just gained a hand-crafted assembly implementation of x86-64 AES-GCM that's far smaller than OpenSSL, just 8 kB of machine code, and performance on par with OpenSSL implementation. The assembly code is heavily commented as well. Mayhaps there is something to be learned from there: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b06affb1cb58 |
Note there was a PR to add VAES support to the |
@intgr |
My thinking was not to use the source as is, but it could inform some ideas how do design a high-performance implementation in a reasonable amount of code. But FWIW it's also cross-licensed, from the commit message:
|
As my test via
cargo bench
, theaes-gcm-256
's performance is much worse:It was built with
export RUSTFLAGS="-Ctarget-cpu=sandybridge -Ctarget-feature=+aes,+sse2,+sse4.1,+ssse3"
as documented.For OpenSSL:
Environment:
iMac (Retina 5K, 27-inch, 2019), 3.7 GHz 6-Core Intel Core i5
The text was updated successfully, but these errors were encountered: