-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/crypto/poly1305: implement a subset of the hash.Hash interface #25219
Comments
Poly1305 is a MAC and not Hash, so the naming would be misleading (although there is precedent in HMAC), it can't be reused with the same key, so Reset would be pointless, and can't be reused after calling Sum as it would be a key reuse. I don't think |
I completely agree with Filippo here. That's exactly the reason why I did not implement |
SGTM. I think I'd use the function name
Yeah, I'm hoping this and the |
Sure. But panic on anything after it is called once.
You have all my support to make that happen BTW. Reducing the need for and amount of assembly is an explicit goal/wishlist item of mine. Let me know if you need anything from my side in that sense. |
Change https://golang.org/cl/111335 mentions this issue: |
This CL adds the poly1305.MAC type which implements a subset of the hash.Hash interface. With MAC it is possible to compute an authentication tag of data without copying it into a single byte slice. This commit modifies the reference/generic and the AMD64 assembler but not the ARM/s390x implementation to support an io.Writer interface. Updates golang/go#25219 Change-Id: I7ee5a9eadd43387cf3cd887d734c625575eee47d Reviewed-on: https://go-review.googlesource.com/c/crypto/+/111335 Run-TryBot: Filippo Valsorda <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Filippo Valsorda <[email protected]>
This is still missing the arm and s390x implementations. |
Change https://golang.org/cl/219057 mentions this issue: |
The vector (vx) implementation has been updated to read in the state and update it - as opposed to being a single shot function. This has allowed the new MAC interface can be implemented. For performance reasons s390x uses a larger buffer than the generic implementation. There is a relatively high fixed cost to read the state, calculate the key coefficients and serialize the state, so it makes sense to buffer more blocks before calling it. For now I've had to remove the faster VMSL implementation. It is too complex for me to update in time for Go 1.15. At some point I'd like to revisit it but for now it looks like using the MAC interface is more of a win than using VMSL. The benchmarks show considerable improvements when using the MAC interface. The Sum benchmarks show slowdown due to a combination of the removal of the VMSL implementation and also the added overhead from splitting the summation function into multiple parts. poly1305: name old speed new speed delta 64 1.33GB/s ± 0% 0.80GB/s ± 1% -39.51% (p=0.000 n=16+20) 1K 4.04GB/s ± 0% 2.97GB/s ± 0% -26.46% (p=0.000 n=19+19) 2M 5.32GB/s ± 1% 3.63GB/s ± 0% -31.76% (p=0.000 n=20+19) 64Unaligned 1.33GB/s ± 0% 0.80GB/s ± 0% -39.80% (p=0.000 n=19+18) 1KUnaligned 4.09GB/s ± 1% 2.94GB/s ± 0% -28.23% (p=0.000 n=19+18) 2MUnaligned 5.33GB/s ± 1% 3.52GB/s ± 0% -34.04% (p=0.000 n=20+19) Write64 1.03GB/s ± 1% 1.49GB/s ± 1% +44.34% (p=0.000 n=20+20) Write1K 1.21GB/s ± 0% 3.24GB/s ± 0% +169.02% (p=0.000 n=20+17) Write2M 1.24GB/s ± 1% 3.63GB/s ± 0% +192.36% (p=0.000 n=20+19) Write64Unaligned 1.04GB/s ± 1% 1.50GB/s ± 0% +44.16% (p=0.000 n=19+14) Write1KUnaligned 1.21GB/s ± 0% 3.20GB/s ± 0% +164.55% (p=0.000 n=20+16) Write2MUnaligned 1.24GB/s ± 1% 3.51GB/s ± 0% +183.96% (p=0.000 n=20+19) chacha20poly1305 (this vs. using generic MAC interface - post CL 206977): name old speed new speed delta Open-64 147MB/s ± 2% 156MB/s ± 1% +6.15% (p=0.000 n=20+19) Seal-64 151MB/s ± 0% 164MB/s ± 1% +8.86% (p=0.000 n=19+16) Open-64-X 104MB/s ± 2% 111MB/s ± 1% +6.24% (p=0.000 n=20+20) Seal-64-X 109MB/s ± 2% 111MB/s ± 1% +2.11% (p=0.000 n=20+19) Open-1350 555MB/s ± 0% 751MB/s ± 1% +35.19% (p=0.000 n=20+20) Seal-1350 557MB/s ± 0% 759MB/s ± 0% +36.23% (p=0.000 n=20+20) Open-1350-X 517MB/s ± 1% 683MB/s ± 1% +31.97% (p=0.000 n=20+20) Seal-1350-X 511MB/s ± 0% 683MB/s ± 0% +33.77% (p=0.000 n=18+19) Open-8192 672MB/s ± 0% 1013MB/s ± 0% +50.65% (p=0.000 n=19+19) Seal-8192 674MB/s ± 0% 1018MB/s ± 0% +50.98% (p=0.000 n=18+20) Open-8192-X 663MB/s ± 0% 979MB/s ± 0% +47.57% (p=0.000 n=20+20) Seal-8192-X 658MB/s ± 0% 985MB/s ± 0% +49.62% (p=0.000 n=18+20) name old allocs/op new allocs/op delta Open-64 0.00 0.00 ~ (all equal) Seal-64 0.00 0.00 ~ (all equal) Open-64-X 0.00 0.00 ~ (all equal) Seal-64-X 0.00 0.00 ~ (all equal) Open-1350 0.00 0.00 ~ (all equal) Seal-1350 0.00 0.00 ~ (all equal) Open-1350-X 0.00 0.00 ~ (all equal) Seal-1350-X 0.00 0.00 ~ (all equal) Open-8192 0.00 0.00 ~ (all equal) Seal-8192 0.00 0.00 ~ (all equal) Open-8192-X 0.00 0.00 ~ (all equal) Seal-8192-X 0.00 0.00 ~ (all equal) chacha20poly1305 (this vs. using asm Sum interface - pre CL 206977): name old speed new speed delta Open-64 144MB/s ± 0% 156MB/s ± 1% +8.16% (p=0.000 n=20+19) Seal-64 150MB/s ± 0% 164MB/s ± 1% +9.35% (p=0.000 n=20+16) Open-64-X 104MB/s ± 1% 111MB/s ± 1% +6.15% (p=0.000 n=19+20) Seal-64-X 109MB/s ± 1% 111MB/s ± 1% +1.43% (p=0.000 n=19+19) Open-1350 702MB/s ± 1% 751MB/s ± 1% +6.98% (p=0.000 n=20+20) Seal-1350 715MB/s ± 0% 759MB/s ± 0% +6.09% (p=0.000 n=19+20) Open-1350-X 642MB/s ± 0% 683MB/s ± 1% +6.37% (p=0.000 n=19+20) Seal-1350-X 639MB/s ± 0% 683MB/s ± 0% +6.98% (p=0.000 n=20+19) Open-8192 994MB/s ± 0% 1013MB/s ± 0% +1.85% (p=0.000 n=20+19) Seal-8192 1.00GB/s ± 0% 1.02GB/s ± 0% +1.90% (p=0.000 n=20+20) Open-8192-X 965MB/s ± 0% 979MB/s ± 0% +1.43% (p=0.000 n=19+20) Seal-8192-X 962MB/s ± 0% 985MB/s ± 0% +2.39% (p=0.000 n=20+20) name old allocs/op new allocs/op delta Open-64 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-64 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Open-64-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-64-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Open-1350 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-1350 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Open-1350-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-1350-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Open-8192 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-8192 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Open-8192-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Seal-8192-X 1.00 ± 0% 0.00 -100.00% (p=0.000 n=20+20) Updates golang/go#25219. Change-Id: Ib491e3a47b6b3ec8bbbe1f41f7bf42ad82f5c249 Reviewed-on: https://go-review.googlesource.com/c/crypto/+/219057 Run-TryBot: Michael Munday <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Filippo Valsorda <[email protected]>
I think CL 219057 was the final part of this so closing the issue. |
Currently the interface to
poly1305
,func Sum(out *[TagSize]byte, msg []byte, key *[32]byte)
, forces the user to copy everything they want to sum into a single slice (msg
). It would be nice to have a way to append multiple slices and then generate the sum.The main motivation would be to avoid an allocation in the
x/crypto/chacha20poly1305
AEAD. We could avoid adding new API surface by implementing this in something likeinternal/poly1305
but I thought I'd create this proposal to see if we could just add it to thepoly1305
package. I'm imagining the API would be used something like (not tested):For reference this is the current code:
https://github.com/golang/crypto/blob/613d6eafa307c6881a737a3c35c0e312e8d3a8c5/chacha20poly1305/chacha20poly1305_generic.go#L31-L39
Also, @aead's
poly1305
library already implements [EDIT: a subset of]hash.Hash
if you want to take a look: https://github.com/aead/poly1305.The text was updated successfully, but these errors were encountered: