Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
poly1305: modify s390x assembly to implement MAC interface
The vector (vx) implementation has been updated to read in the state and update it - as opposed to being a single shot function. This has allowed the new MAC interface can be implemented. For performance reasons s390x uses a larger buffer than the generic implementation. There is a relatively high fixed cost to read the state, calculate the key coefficients and serialize the state, so it makes sense to buffer more blocks before calling it. For now I've had to remove the faster VMSL implementation. It is too complex for me to update in time for Go 1.15. At some point I'd like to revisit it but for now it looks like using the MAC interface is more of a win than using VMSL. The benchmarks show considerable improvements when using the MAC interface. The Sum benchmarks show slowdown due to a combination of the removal of the VMSL implementation and also the added overhead from splitting the summation function into multiple parts. poly1305: name old speed new speed delta 64 1.33GB/s ± 0% 0.80GB/s ± 1% -39.51% (p=0.000 n=16+20) 1K 4.04GB/s ± 0% 2.97GB/s ± 0% -26.46% (p=0.000 n=19+19) 2M 5.32GB/s ± 1% 3.63GB/s ± 0% -31.76% (p=0.000 n=20+19) 64Unaligned 1.33GB/s ± 0% 0.80GB/s ± 0% -39.80% (p=0.000 n=19+18) 1KUnaligned 4.09GB/s ± 1% 2.94GB/s ± 0% -28.23% (p=0.000 n=19+18) 2MUnaligned 5.33GB/s ± 1% 3.52GB/s ± 0% -34.04% (p=0.000 n=20+19) Write64 1.03GB/s ± 1% 1.49GB/s ± 1% +44.34% (p=0.000 n=20+20) Write1K 1.21GB/s ± 0% 3.24GB/s ± 0% +169.02% (p=0.000 n=20+17) Write2M 1.24GB/s ± 1% 3.63GB/s ± 0% +192.36% (p=0.000 n=20+19) Write64Unaligned 1.04GB/s ± 1% 1.50GB/s ± 0% +44.16% (p=0.000 n=19+14) Write1KUnaligned 1.21GB/s ± 0% 3.20GB/s ± 0% +164.55% (p=0.000 n=20+16) Write2MUnaligned 1.24GB/s ± 1% 3.51GB/s ± 0% +183.96% (p=0.000 n=20+19) chacha20poly1305 (this vs. using generic MAC interface - post CL 206977): name old speed new speed delta Open-64 147MB/s ± 2% 156MB/s ± 1% +6.15% (p=0.000 n=20+19) Seal-64 151MB/s ± 0% 164MB/s ± 1% +8.86% (p=0.000 n=19+16) Open-64-X 104MB/s ± 2% 111MB/s ± 1% +6.24% (p=0.000 n=20+20) Seal-64-X 109MB/s ± 2% 111MB/s ± 1% +2.11% (p=0.000 n=20+19) Open-1350 555MB/s ± 0% 751MB/s ± 1% +35.19% (p=0.000 n=20+20) Seal-1350 557MB/s ± 0% 759MB/s ± 0% +36.23% (p=0.000 n=20+20) Open-1350-X 517MB/s ± 1% 683MB/s ± 1% +31.97% (p=0.000 n=20+20) Seal-1350-X 511MB/s ± 0% 683MB/s ± 0% +33.77% (p=0.000 n=18+19) Open-8192 672MB/s ± 0% 1013MB/s ± 0% +50.65% (p=0.000 n=19+19) Seal-8192 674MB/s ± 0% 1018MB/s ± 0% +50.98% (p=0.000 n=18+20) Open-8192-X 663MB/s ± 0% 979MB/s ± 0% +47.57% (p=0.000 n=20+20) Seal-8192-X 658MB/s ± 0% 985MB/s ± 0% +49.62% (p=0.000 n=18+20) name old allocs/op new allocs/op delta Open-64 0.00 0.00 ~ (all equal) Seal-64 0.00 0.00 ~ (all equal) Open-64-X 0.00 0.00 ~ (all equal) Seal-64-X 0.00 0.00 ~ (all equal) Open-1350 0.00 0.00 ~ (all equal) Seal-1350 0.00 0.00 ~ (all equal) Open-1350-X 0.00 0.00 ~ (all equal) Seal-1350-X 0.00 0.00 ~ (all equal) Open-8192 0.00 0.00 ~ (all equal) Seal-8192 0.00 0.00 ~ (all equal) Open-8192-X 0.00 0.00 ~ (all equal) Seal-8192-X 0.00 0.00 ~ (all equal) chacha20poly1305 (this vs. using asm Sum interface - pre CL 206977): name old speed new speed delta Open-64 144MB/s ± 0% 156MB/s ± 1% +8.16% (p=0.000 n=20+19) Seal-64 150MB/s ± 0% 164MB/s ± 1% +9.35% (p=0.000 n=20+16) Open-64-X 104MB/s ± 1% 111MB/s ± 1% +6.15% (p=0.000 n=19+20) Seal-64-X 109MB/s ± 1% 111MB/s ± 1% +1.43% (p=0.000 n=19+19) Open-1350 702MB/s ± 1% 751MB/s ± 1% +6.98% (p=0.000 n=20+20) Seal-1350 715MB/s ± 0% 759MB/s ± 0% +6.09% (p=0.000 n=19+20) Open-1350-X 642MB/s ± 0% 683MB/s ± 1% +6.37% (p=0.000 n=19+20) Seal-1350-X 639MB/s ± 0% 683MB/s ± 0% +6.98% (p=0.000 n=20+19) Open-8192 994MB/s ± 0% 1013MB/s ± 0% +1.85% (p=0.000 n=20+19) Seal-8192 1.00GB/s ± 0% 1.02GB/s ± 0% +1.90% (p=0.000 n=20+20) Open-8192-X 965MB/s ± 0% 979MB/s ± 0% +1.43% (p=0.000 n=19+20) Seal-8192-X 962MB/s ± 0% 985MB/s ± 0% +2.39% (p=0.000 n=20+20) name old allocs/op new allocs/op delta Open-64 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-64 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Open-64-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-64-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Open-1350 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-1350 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Open-1350-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-1350-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Open-8192 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-8192 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Open-8192-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-8192-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Updates golang/go#25219. Change-Id: Ib491e3a47b6b3ec8bbbe1f41f7bf42ad82f5c249 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/219057 Run-TryBot: Michael Munday <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Filippo Valsorda <[email protected]>
- Loading branch information