Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/md5: optimize amd64 assembly #43690

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

klauspost
Copy link
Contributor

@klauspost klauspost commented Jan 14, 2021

  • Use two ADDL instead of LEAL
  • Keep ones in R11
  • Use XORL with lower latency instead of NOTL
  • Remove loads and load the correct value in the previous round
  • Reduce dependency chain in round 2.
  • Remove MOVL in round 3.

name old time/op new time/op delta
Hash8Bytes-32 104ns ± 0% 96ns ± 1% -7.83% (p=0.000 n=9+10)
Hash64-32 169ns ± 0% 155ns ± 0% -7.97% (p=0.000 n=10+10)
Hash128-32 244ns ± 0% 224ns ± 0% -8.16% (p=0.000 n=9+10)
Hash256-32 396ns ± 0% 360ns ± 1% -9.01% (p=0.000 n=10+10)
Hash512-32 700ns ± 1% 634ns ± 1% -9.43% (p=0.000 n=10+10)
Hash1K-32 1.30µs ± 0% 1.18µs ± 1% -9.32% (p=0.000 n=9+10)
Hash8K-32 9.77µs ± 0% 8.81µs ± 0% -9.78% (p=0.000 n=9+10)
Hash1M-32 1.24ms ± 1% 1.12ms ± 1% -9.54% (p=0.000 n=10+10)
Hash8M-32 10.0ms ± 1% 9.0ms ± 1% -10.04% (p=0.000 n=10+10)
Hash8BytesUnaligned-32 104ns ± 0% 96ns ± 0% -7.50% (p=0.000 n=10+10)
Hash1KUnaligned-32 1.32µs ± 1% 1.18µs ± 1% -10.42% (p=0.000 n=10+10)
Hash8KUnaligned-32 9.80µs ± 0% 8.79µs ± 1% -10.29% (p=0.000 n=10+10)

name old speed new speed delta
Hash8Bytes-32 77.1MB/s ± 0% 83.6MB/s ± 1% +8.49% (p=0.000 n=9+10)
Hash64-32 379MB/s ± 0% 412MB/s ± 0% +8.66% (p=0.000 n=10+10)
Hash128-32 525MB/s ± 0% 572MB/s ± 0% +8.89% (p=0.000 n=9+10)
Hash256-32 646MB/s ± 0% 710MB/s ± 1% +9.90% (p=0.000 n=10+10)
Hash512-32 732MB/s ± 1% 808MB/s ± 1% +10.41% (p=0.000 n=10+10)
Hash1K-32 786MB/s ± 0% 866MB/s ± 1% +10.30% (p=0.000 n=9+10)
Hash8K-32 839MB/s ± 0% 930MB/s ± 0% +10.79% (p=0.000 n=10+10)
Hash1M-32 849MB/s ± 1% 938MB/s ± 1% +10.54% (p=0.000 n=10+10)
Hash8M-32 841MB/s ± 1% 935MB/s ± 1% +11.16% (p=0.000 n=10+10)
Hash8BytesUnaligned-32 77.1MB/s ± 0% 83.4MB/s ± 0% +8.12% (p=0.000 n=10+10)
Hash1KUnaligned-32 778MB/s ± 1% 869MB/s ± 1% +11.64% (p=0.000 n=10+10)
Hash8KUnaligned-32 836MB/s ± 0% 932MB/s ± 1% +11.47% (p=0.000 n=10+10)

Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993

This PR will be imported into Gerrit with the title and first
comment (this text) used to generate the subject and body of
the Gerrit change.

* Use two ADDL instead of LEAL
* Keep ones in R11
* Use XORL with lower latency instead of NOTL
* Remove loads and load the correct value in the previous round

```
name                    old time/op    new time/op    delta
Hash8Bytes-32              106ns ± 0%     103ns ± 0%  -2.37%  (p=0.000 n=10+10)
Hash1K-32                 1.33µs ± 0%    1.26µs ± 0%  -4.78%   (p=0.000 n=8+10)
Hash8K-32                 9.95µs ± 0%    9.46µs ± 0%  -4.90%  (p=0.000 n=10+10)
Hash8BytesUnaligned-32     106ns ± 0%     103ns ± 0%  -2.37%  (p=0.000 n=10+10)
Hash1KUnaligned-32        1.33µs ± 0%    1.26µs ± 0%  -4.76%   (p=0.000 n=10+9)
Hash8KUnaligned-32        10.0µs ± 0%     9.5µs ± 0%  -4.88%  (p=0.000 n=10+10)

name                    old speed      new speed      delta
Hash8Bytes-32           75.8MB/s ± 0%  77.8MB/s ± 0%  +2.70%  (p=0.000 n=10+10)
Hash1K-32                772MB/s ± 0%   810MB/s ± 0%  +4.99%   (p=0.000 n=9+10)
Hash8K-32                823MB/s ± 0%   866MB/s ± 0%  +5.15%  (p=0.000 n=10+10)
Hash8BytesUnaligned-32  75.8MB/s ± 0%  77.8MB/s ± 0%  +2.64%  (p=0.000 n=10+10)
Hash1KUnaligned-32       771MB/s ± 0%   810MB/s ± 0%  +4.96%  (p=0.000 n=10+10)
Hash8KUnaligned-32       823MB/s ± 0%   866MB/s ± 0%  +5.13%  (p=0.000 n=10+10)
```

Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993
@google-cla google-cla bot added the cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change. label Jan 14, 2021
@gopherbot
Copy link
Contributor

This PR (HEAD: d8ec5eb) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/283538 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

Change-Id: I1a29b5bb761ca9351de482f5167c9ee1eb9a6be6
@gopherbot
Copy link
Contributor

This PR (HEAD: 4ed3895) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/283538 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

Change-Id: I63749bfd31917c1924ee559caff987f48dd44374
@gopherbot
Copy link
Contributor

This PR (HEAD: ab492a9) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/283538 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

Change-Id: Iad303a2123a16636297ae7f701d03916a92357a7
@gopherbot
Copy link
Contributor

This PR (HEAD: ec8b15d) has been imported to Gerrit for code review.

Please visit https://go-review.googlesource.com/c/go/+/283538 to see it.

Tip: You can toggle comments from me using the comments slash command (e.g. /comments off)
See the Wiki page for more info

gopherbot pushed a commit that referenced this pull request Aug 4, 2023
* Use two ADDL instead of LEAL
* Keep ones in R11
* Use XORL with lower latency instead of NOTL
* Remove loads and load the correct value in the previous round
* Reduce dependency chain in round 2.
* Remove MOVL in round 3.

name                    old time/op    new time/op    delta
Hash8Bytes-32              104ns ± 0%      96ns ± 1%   -7.83%   (p=0.000 n=9+10)
Hash64-32                  169ns ± 0%     155ns ± 0%   -7.97%  (p=0.000 n=10+10)
Hash128-32                 244ns ± 0%     224ns ± 0%   -8.16%   (p=0.000 n=9+10)
Hash256-32                 396ns ± 0%     360ns ± 1%   -9.01%  (p=0.000 n=10+10)
Hash512-32                 700ns ± 1%     634ns ± 1%   -9.43%  (p=0.000 n=10+10)
Hash1K-32                 1.30µs ± 0%    1.18µs ± 1%   -9.32%   (p=0.000 n=9+10)
Hash8K-32                 9.77µs ± 0%    8.81µs ± 0%   -9.78%   (p=0.000 n=9+10)
Hash1M-32                 1.24ms ± 1%    1.12ms ± 1%   -9.54%  (p=0.000 n=10+10)
Hash8M-32                 10.0ms ± 1%     9.0ms ± 1%  -10.04%  (p=0.000 n=10+10)
Hash8BytesUnaligned-32     104ns ± 0%      96ns ± 0%   -7.50%  (p=0.000 n=10+10)
Hash1KUnaligned-32        1.32µs ± 1%    1.18µs ± 1%  -10.42%  (p=0.000 n=10+10)
Hash8KUnaligned-32        9.80µs ± 0%    8.79µs ± 1%  -10.29%  (p=0.000 n=10+10)

name                    old speed      new speed      delta
Hash8Bytes-32           77.1MB/s ± 0%  83.6MB/s ± 1%   +8.49%   (p=0.000 n=9+10)
Hash64-32                379MB/s ± 0%   412MB/s ± 0%   +8.66%  (p=0.000 n=10+10)
Hash128-32               525MB/s ± 0%   572MB/s ± 0%   +8.89%   (p=0.000 n=9+10)
Hash256-32               646MB/s ± 0%   710MB/s ± 1%   +9.90%  (p=0.000 n=10+10)
Hash512-32               732MB/s ± 1%   808MB/s ± 1%  +10.41%  (p=0.000 n=10+10)
Hash1K-32                786MB/s ± 0%   866MB/s ± 1%  +10.30%   (p=0.000 n=9+10)
Hash8K-32                839MB/s ± 0%   930MB/s ± 0%  +10.79%  (p=0.000 n=10+10)
Hash1M-32                849MB/s ± 1%   938MB/s ± 1%  +10.54%  (p=0.000 n=10+10)
Hash8M-32                841MB/s ± 1%   935MB/s ± 1%  +11.16%  (p=0.000 n=10+10)
Hash8BytesUnaligned-32  77.1MB/s ± 0%  83.4MB/s ± 0%   +8.12%  (p=0.000 n=10+10)
Hash1KUnaligned-32       778MB/s ± 1%   869MB/s ± 1%  +11.64%  (p=0.000 n=10+10)
Hash8KUnaligned-32       836MB/s ± 0%   932MB/s ± 1%  +11.47%  (p=0.000 n=10+10)

Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993

This PR will be imported into Gerrit with the title and first
comment (this text) used to generate the subject and body of
the Gerrit change.

Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993
GitHub-Last-Rev: ec8b15d
GitHub-Pull-Request: #43690
Reviewed-on: https://go-review.googlesource.com/c/go/+/283538
Run-TryBot: Joel Sing <[email protected]>
Reviewed-by: Matthew Dempsky <[email protected]>
Reviewed-by: David Chase <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Joel Sing <[email protected]>
DecFox pushed a commit to ooni/oocrypto that referenced this pull request Nov 24, 2024
* Use two ADDL instead of LEAL
* Keep ones in R11
* Use XORL with lower latency instead of NOTL
* Remove loads and load the correct value in the previous round
* Reduce dependency chain in round 2.
* Remove MOVL in round 3.

name                    old time/op    new time/op    delta
Hash8Bytes-32              104ns ± 0%      96ns ± 1%   -7.83%   (p=0.000 n=9+10)
Hash64-32                  169ns ± 0%     155ns ± 0%   -7.97%  (p=0.000 n=10+10)
Hash128-32                 244ns ± 0%     224ns ± 0%   -8.16%   (p=0.000 n=9+10)
Hash256-32                 396ns ± 0%     360ns ± 1%   -9.01%  (p=0.000 n=10+10)
Hash512-32                 700ns ± 1%     634ns ± 1%   -9.43%  (p=0.000 n=10+10)
Hash1K-32                 1.30µs ± 0%    1.18µs ± 1%   -9.32%   (p=0.000 n=9+10)
Hash8K-32                 9.77µs ± 0%    8.81µs ± 0%   -9.78%   (p=0.000 n=9+10)
Hash1M-32                 1.24ms ± 1%    1.12ms ± 1%   -9.54%  (p=0.000 n=10+10)
Hash8M-32                 10.0ms ± 1%     9.0ms ± 1%  -10.04%  (p=0.000 n=10+10)
Hash8BytesUnaligned-32     104ns ± 0%      96ns ± 0%   -7.50%  (p=0.000 n=10+10)
Hash1KUnaligned-32        1.32µs ± 1%    1.18µs ± 1%  -10.42%  (p=0.000 n=10+10)
Hash8KUnaligned-32        9.80µs ± 0%    8.79µs ± 1%  -10.29%  (p=0.000 n=10+10)

name                    old speed      new speed      delta
Hash8Bytes-32           77.1MB/s ± 0%  83.6MB/s ± 1%   +8.49%   (p=0.000 n=9+10)
Hash64-32                379MB/s ± 0%   412MB/s ± 0%   +8.66%  (p=0.000 n=10+10)
Hash128-32               525MB/s ± 0%   572MB/s ± 0%   +8.89%   (p=0.000 n=9+10)
Hash256-32               646MB/s ± 0%   710MB/s ± 1%   +9.90%  (p=0.000 n=10+10)
Hash512-32               732MB/s ± 1%   808MB/s ± 1%  +10.41%  (p=0.000 n=10+10)
Hash1K-32                786MB/s ± 0%   866MB/s ± 1%  +10.30%   (p=0.000 n=9+10)
Hash8K-32                839MB/s ± 0%   930MB/s ± 0%  +10.79%  (p=0.000 n=10+10)
Hash1M-32                849MB/s ± 1%   938MB/s ± 1%  +10.54%  (p=0.000 n=10+10)
Hash8M-32                841MB/s ± 1%   935MB/s ± 1%  +11.16%  (p=0.000 n=10+10)
Hash8BytesUnaligned-32  77.1MB/s ± 0%  83.4MB/s ± 0%   +8.12%  (p=0.000 n=10+10)
Hash1KUnaligned-32       778MB/s ± 1%   869MB/s ± 1%  +11.64%  (p=0.000 n=10+10)
Hash8KUnaligned-32       836MB/s ± 0%   932MB/s ± 1%  +11.47%  (p=0.000 n=10+10)

Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993

This PR will be imported into Gerrit with the title and first
comment (this text) used to generate the subject and body of
the Gerrit change.

Change-Id: I02b31229b857e9257dc9d36538883eb3af4ad993
GitHub-Last-Rev: ec8b15d789181d0dac57bf0ba5041ee7aeb305c9
GitHub-Pull-Request: golang/go#43690
Reviewed-on: https://go-review.googlesource.com/c/go/+/283538
Run-TryBot: Joel Sing <[email protected]>
Reviewed-by: Matthew Dempsky <[email protected]>
Reviewed-by: David Chase <[email protected]>
TryBot-Result: Gopher Robot <[email protected]>
Reviewed-by: Joel Sing <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Used by googlebot to label PRs as having a valid CLA. The text of this label should not change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants