-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/md5: optimize amd64 assembly #43690
base: master
Are you sure you want to change the base?
Conversation
* Use two ADDL instead of LEAL * Keep ones in R11 * Use XORL with lower latency instead of NOTL * Remove loads and load the correct value in the previous round ``` name old time/op new time/op delta Hash8Bytes-32 106ns ± 0% 103ns ± 0% -2.37% (p=0.000 n=10+10) Hash1K-32 1.33µs ± 0% 1.26µs ± 0% -4.78% (p=0.000 n=8+10) Hash8K-32 9.95µs ± 0% 9.46µs ± 0% -4.90% (p=0.000 n=10+10) Hash8BytesUnaligned-32 106ns ± 0% 103ns ± 0% -2.37% (p=0.000 n=10+10) Hash1KUnaligned-32 1.33µs ± 0% 1.26µs ± 0% -4.76% (p=0.000 n=10+9) Hash8KUnaligned-32 10.0µs ± 0% 9.5µs ± 0% -4.88% (p=0.000 n=10+10) name old speed new speed delta Hash8Bytes-32 75.8MB/s ± 0% 77.8MB/s ± 0% +2.70% (p=0.000 n=10+10) Hash1K-32 772MB/s ± 0% 810MB/s ± 0% +4.99% (p=0.000 n=9+10) Hash8K-32 823MB/s ± 0% 866MB/s ± 0% +5.15% (p=0.000 n=10+10) Hash8BytesUnaligned-32 75.8MB/s ± 0% 77.8MB/s ± 0% +2.64% (p=0.000 n=10+10) Hash1KUnaligned-32 771MB/s ± 0% 810MB/s ± 0% +4.96% (p=0.000 n=10+10) Hash8KUnaligned-32 823MB/s ± 0% 866MB/s ± 0% +5.13% (p=0.000 n=10+10) ``` Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993
This PR (HEAD: d8ec5eb) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/283538 to see it. Tip: You can toggle comments from me using the |
Change-Id: I1a29b5bb761ca9351de482f5167c9ee1eb9a6be6
This PR (HEAD: 4ed3895) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/283538 to see it. Tip: You can toggle comments from me using the |
Change-Id: I63749bfd31917c1924ee559caff987f48dd44374
4ed3895
to
ab492a9
Compare
This PR (HEAD: ab492a9) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/283538 to see it. Tip: You can toggle comments from me using the |
Change-Id: Iad303a2123a16636297ae7f701d03916a92357a7
This PR (HEAD: ec8b15d) has been imported to Gerrit for code review. Please visit https://go-review.googlesource.com/c/go/+/283538 to see it. Tip: You can toggle comments from me using the |
* Use two ADDL instead of LEAL * Keep ones in R11 * Use XORL with lower latency instead of NOTL * Remove loads and load the correct value in the previous round * Reduce dependency chain in round 2. * Remove MOVL in round 3. name old time/op new time/op delta Hash8Bytes-32 104ns ± 0% 96ns ± 1% -7.83% (p=0.000 n=9+10) Hash64-32 169ns ± 0% 155ns ± 0% -7.97% (p=0.000 n=10+10) Hash128-32 244ns ± 0% 224ns ± 0% -8.16% (p=0.000 n=9+10) Hash256-32 396ns ± 0% 360ns ± 1% -9.01% (p=0.000 n=10+10) Hash512-32 700ns ± 1% 634ns ± 1% -9.43% (p=0.000 n=10+10) Hash1K-32 1.30µs ± 0% 1.18µs ± 1% -9.32% (p=0.000 n=9+10) Hash8K-32 9.77µs ± 0% 8.81µs ± 0% -9.78% (p=0.000 n=9+10) Hash1M-32 1.24ms ± 1% 1.12ms ± 1% -9.54% (p=0.000 n=10+10) Hash8M-32 10.0ms ± 1% 9.0ms ± 1% -10.04% (p=0.000 n=10+10) Hash8BytesUnaligned-32 104ns ± 0% 96ns ± 0% -7.50% (p=0.000 n=10+10) Hash1KUnaligned-32 1.32µs ± 1% 1.18µs ± 1% -10.42% (p=0.000 n=10+10) Hash8KUnaligned-32 9.80µs ± 0% 8.79µs ± 1% -10.29% (p=0.000 n=10+10) name old speed new speed delta Hash8Bytes-32 77.1MB/s ± 0% 83.6MB/s ± 1% +8.49% (p=0.000 n=9+10) Hash64-32 379MB/s ± 0% 412MB/s ± 0% +8.66% (p=0.000 n=10+10) Hash128-32 525MB/s ± 0% 572MB/s ± 0% +8.89% (p=0.000 n=9+10) Hash256-32 646MB/s ± 0% 710MB/s ± 1% +9.90% (p=0.000 n=10+10) Hash512-32 732MB/s ± 1% 808MB/s ± 1% +10.41% (p=0.000 n=10+10) Hash1K-32 786MB/s ± 0% 866MB/s ± 1% +10.30% (p=0.000 n=9+10) Hash8K-32 839MB/s ± 0% 930MB/s ± 0% +10.79% (p=0.000 n=10+10) Hash1M-32 849MB/s ± 1% 938MB/s ± 1% +10.54% (p=0.000 n=10+10) Hash8M-32 841MB/s ± 1% 935MB/s ± 1% +11.16% (p=0.000 n=10+10) Hash8BytesUnaligned-32 77.1MB/s ± 0% 83.4MB/s ± 0% +8.12% (p=0.000 n=10+10) Hash1KUnaligned-32 778MB/s ± 1% 869MB/s ± 1% +11.64% (p=0.000 n=10+10) Hash8KUnaligned-32 836MB/s ± 0% 932MB/s ± 1% +11.47% (p=0.000 n=10+10) Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993 This PR will be imported into Gerrit with the title and first comment (this text) used to generate the subject and body of the Gerrit change. Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993 GitHub-Last-Rev: ec8b15d GitHub-Pull-Request: #43690 Reviewed-on: https://go-review.googlesource.com/c/go/+/283538 Run-TryBot: Joel Sing <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> Reviewed-by: David Chase <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Joel Sing <[email protected]>
* Use two ADDL instead of LEAL * Keep ones in R11 * Use XORL with lower latency instead of NOTL * Remove loads and load the correct value in the previous round * Reduce dependency chain in round 2. * Remove MOVL in round 3. name old time/op new time/op delta Hash8Bytes-32 104ns ± 0% 96ns ± 1% -7.83% (p=0.000 n=9+10) Hash64-32 169ns ± 0% 155ns ± 0% -7.97% (p=0.000 n=10+10) Hash128-32 244ns ± 0% 224ns ± 0% -8.16% (p=0.000 n=9+10) Hash256-32 396ns ± 0% 360ns ± 1% -9.01% (p=0.000 n=10+10) Hash512-32 700ns ± 1% 634ns ± 1% -9.43% (p=0.000 n=10+10) Hash1K-32 1.30µs ± 0% 1.18µs ± 1% -9.32% (p=0.000 n=9+10) Hash8K-32 9.77µs ± 0% 8.81µs ± 0% -9.78% (p=0.000 n=9+10) Hash1M-32 1.24ms ± 1% 1.12ms ± 1% -9.54% (p=0.000 n=10+10) Hash8M-32 10.0ms ± 1% 9.0ms ± 1% -10.04% (p=0.000 n=10+10) Hash8BytesUnaligned-32 104ns ± 0% 96ns ± 0% -7.50% (p=0.000 n=10+10) Hash1KUnaligned-32 1.32µs ± 1% 1.18µs ± 1% -10.42% (p=0.000 n=10+10) Hash8KUnaligned-32 9.80µs ± 0% 8.79µs ± 1% -10.29% (p=0.000 n=10+10) name old speed new speed delta Hash8Bytes-32 77.1MB/s ± 0% 83.6MB/s ± 1% +8.49% (p=0.000 n=9+10) Hash64-32 379MB/s ± 0% 412MB/s ± 0% +8.66% (p=0.000 n=10+10) Hash128-32 525MB/s ± 0% 572MB/s ± 0% +8.89% (p=0.000 n=9+10) Hash256-32 646MB/s ± 0% 710MB/s ± 1% +9.90% (p=0.000 n=10+10) Hash512-32 732MB/s ± 1% 808MB/s ± 1% +10.41% (p=0.000 n=10+10) Hash1K-32 786MB/s ± 0% 866MB/s ± 1% +10.30% (p=0.000 n=9+10) Hash8K-32 839MB/s ± 0% 930MB/s ± 0% +10.79% (p=0.000 n=10+10) Hash1M-32 849MB/s ± 1% 938MB/s ± 1% +10.54% (p=0.000 n=10+10) Hash8M-32 841MB/s ± 1% 935MB/s ± 1% +11.16% (p=0.000 n=10+10) Hash8BytesUnaligned-32 77.1MB/s ± 0% 83.4MB/s ± 0% +8.12% (p=0.000 n=10+10) Hash1KUnaligned-32 778MB/s ± 1% 869MB/s ± 1% +11.64% (p=0.000 n=10+10) Hash8KUnaligned-32 836MB/s ± 0% 932MB/s ± 1% +11.47% (p=0.000 n=10+10) Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993 This PR will be imported into Gerrit with the title and first comment (this text) used to generate the subject and body of the Gerrit change. Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993 GitHub-Last-Rev: ec8b15d789181d0dac57bf0ba5041ee7aeb305c9 GitHub-Pull-Request: golang/go#43690 Reviewed-on: https://go-review.googlesource.com/c/go/+/283538 Run-TryBot: Joel Sing <[email protected]> Reviewed-by: Matthew Dempsky <[email protected]> Reviewed-by: David Chase <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Joel Sing <[email protected]>
name old time/op new time/op delta
Hash8Bytes-32 104ns ± 0% 96ns ± 1% -7.83% (p=0.000 n=9+10)
Hash64-32 169ns ± 0% 155ns ± 0% -7.97% (p=0.000 n=10+10)
Hash128-32 244ns ± 0% 224ns ± 0% -8.16% (p=0.000 n=9+10)
Hash256-32 396ns ± 0% 360ns ± 1% -9.01% (p=0.000 n=10+10)
Hash512-32 700ns ± 1% 634ns ± 1% -9.43% (p=0.000 n=10+10)
Hash1K-32 1.30µs ± 0% 1.18µs ± 1% -9.32% (p=0.000 n=9+10)
Hash8K-32 9.77µs ± 0% 8.81µs ± 0% -9.78% (p=0.000 n=9+10)
Hash1M-32 1.24ms ± 1% 1.12ms ± 1% -9.54% (p=0.000 n=10+10)
Hash8M-32 10.0ms ± 1% 9.0ms ± 1% -10.04% (p=0.000 n=10+10)
Hash8BytesUnaligned-32 104ns ± 0% 96ns ± 0% -7.50% (p=0.000 n=10+10)
Hash1KUnaligned-32 1.32µs ± 1% 1.18µs ± 1% -10.42% (p=0.000 n=10+10)
Hash8KUnaligned-32 9.80µs ± 0% 8.79µs ± 1% -10.29% (p=0.000 n=10+10)
name old speed new speed delta
Hash8Bytes-32 77.1MB/s ± 0% 83.6MB/s ± 1% +8.49% (p=0.000 n=9+10)
Hash64-32 379MB/s ± 0% 412MB/s ± 0% +8.66% (p=0.000 n=10+10)
Hash128-32 525MB/s ± 0% 572MB/s ± 0% +8.89% (p=0.000 n=9+10)
Hash256-32 646MB/s ± 0% 710MB/s ± 1% +9.90% (p=0.000 n=10+10)
Hash512-32 732MB/s ± 1% 808MB/s ± 1% +10.41% (p=0.000 n=10+10)
Hash1K-32 786MB/s ± 0% 866MB/s ± 1% +10.30% (p=0.000 n=9+10)
Hash8K-32 839MB/s ± 0% 930MB/s ± 0% +10.79% (p=0.000 n=10+10)
Hash1M-32 849MB/s ± 1% 938MB/s ± 1% +10.54% (p=0.000 n=10+10)
Hash8M-32 841MB/s ± 1% 935MB/s ± 1% +11.16% (p=0.000 n=10+10)
Hash8BytesUnaligned-32 77.1MB/s ± 0% 83.4MB/s ± 0% +8.12% (p=0.000 n=10+10)
Hash1KUnaligned-32 778MB/s ± 1% 869MB/s ± 1% +11.64% (p=0.000 n=10+10)
Hash8KUnaligned-32 836MB/s ± 0% 932MB/s ± 1% +11.47% (p=0.000 n=10+10)
Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993
This PR will be imported into Gerrit with the title and first
comment (this text) used to generate the subject and body of
the Gerrit change.