-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assembler version of Sum64String #42
Conversation
41a2696
to
a16de4a
Compare
This makes Sum64String somewhere between a bit and a lot faster for short strings: name old time/op new time/op delta Hashes/xxhash,direct,bytes,n=5B-8 5.96ns ± 1% 5.92ns ± 1% -0.56% (p=0.000 n=17+18) Hashes/xxhash,direct,string,n=5B-8 8.71ns ± 1% 5.91ns ± 0% -32.15% (p=0.000 n=19+19) Hashes/xxhash,direct,bytes,n=100B-8 19.2ns ± 2% 18.9ns ± 1% -1.25% (p=0.000 n=18+18) Hashes/xxhash,direct,string,n=100B-8 20.3ns ± 1% 17.7ns ± 1% -12.74% (p=0.000 n=17+20) Hashes/xxhash,direct,bytes,n=4KB-8 278ns ± 1% 276ns ± 1% -0.54% (p=0.000 n=19+16) Hashes/xxhash,direct,string,n=4KB-8 280ns ± 2% 276ns ± 1% -1.35% (p=0.000 n=18+20) Hashes/xxhash,direct,bytes,n=10MB-8 767µs ± 2% 766µs ± 1% ~ (p=0.775 n=18+19) Hashes/xxhash,direct,string,n=10MB-8 773µs ± 1% 775µs ± 2% ~ (p=0.496 n=20+19) name old speed new speed delta Hashes/xxhash,direct,bytes,n=5B-8 839MB/s ± 1% 844MB/s ± 1% +0.55% (p=0.000 n=17+18) Hashes/xxhash,direct,string,n=5B-8 574MB/s ± 1% 846MB/s ± 0% +47.38% (p=0.000 n=19+19) Hashes/xxhash,direct,bytes,n=100B-8 5.21GB/s ± 2% 5.28GB/s ± 2% +1.21% (p=0.000 n=18+19) Hashes/xxhash,direct,string,n=100B-8 4.92GB/s ± 1% 5.64GB/s ± 2% +14.57% (p=0.000 n=17+20) Hashes/xxhash,direct,bytes,n=4KB-8 14.4GB/s ± 1% 14.5GB/s ± 1% +0.44% (p=0.000 n=19+17) Hashes/xxhash,direct,string,n=4KB-8 14.3GB/s ± 2% 14.5GB/s ± 0% +1.58% (p=0.000 n=18+17) Hashes/xxhash,direct,bytes,n=10MB-8 13.0GB/s ± 2% 13.1GB/s ± 1% ~ (p=0.775 n=18+19) Hashes/xxhash,direct,string,n=10MB-8 12.9GB/s ± 1% 12.9GB/s ± 2% ~ (p=0.496 n=20+19)
a16de4a
to
518aee4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @greatroar, thanks for sending this.
A better approach, if possible, would be to convince Go to inline the function. I looked into it a bit and discovered two issues:
- The benchmark uses an indirect call which isn't inlined today (more context at Remove duplicated function definitions #22 (comment)). I added new benchmarks which use direct calls in e0ea1e3.
- The inliner weights the unsafe conversion much higher than I would expect. I'll file a compiler bug.
I found a workaround for (2) and sent #50. The advantages of that approach are:
- It can be inlined in non-amd64 environments (or with
-tags purego
). - Go (even unsafe Go) is easier to read, write, and debug than assembly.
@@ -7,5 +7,6 @@ go: | |||
- master | |||
env: | |||
- TAGS="" | |||
- TAGS="-tags appengine" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we really care about special appengine support these days. I filed #46.
This moves the forwarding of Sum64String to Sum64 into assembler. The Go version cannot be inlined by Go 1.15.3:
The assembler version makes Sum64String somewhere between a bit and a lot faster for short strings:
The implementation is a bit messy because now there need to be two
_unsafe.go
files. I didn't see an easy way of avoiding that.