-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math/big: mulAddVWW too slow #22643
Comments
See also https://go-review.googlesource.com/c/go/+/76270 (for arm64) where it is reported: The lack of proper addMulVVW implementation for arm64 hurts RSA This is an optimized implementation, it improves RSA2048 performance |
Duplicate of #9245. |
I'm not seeing the performance difference you're seeing. With your two example programs:
That's only a few percent slower. |
To add another data point, on
So within ~8% of |
Results for arm64, specifically on a HiSilicon Hi1616 (Packet "Type 2A2")
Again, go version within a few % of the C results. |
That's really strange, I triple-checked but still getting the same 1.8x difference.
And:
Btw., I've got Go from the downloads (go1.9.2.darwin-amd64.pkg)
|
Can you profile both instances (Go and C++) and see where the time goes? We can then compare your profile to profiles on our machines. |
I used
And
|
@TuomLarsen I need to see a profile of the C+gmp program as well. Without it it is hard to tell whether Go is slow on your system, or gmp is fast. Using your profiled program, my machine spends 97+% of its time in mulAddVWW, whereas your code only spends 83% there. I'm not sure if that has anything to do with the issue at hand, but it seems weird. |
From Instruments:
|
Thanks for that data. Unfortunately, no smoking gun. For the record, here is the inner loop in GMP:
compared with the inner loop in math/big:
They look remarkably similar. The GMP library does through some hoops to do |
@randall77 And thanks for looking in it! I'm certainly no expert but, if that U5 would indeed be the bottleneck, maybe GMP is trying to avoid data dependency? Btw., should this issue be re-opened as it seems it is not about general improvement tracked by #9245? |
Re-opened as this is a more special case of #9245. |
Change https://golang.org/cl/77371 mentions this issue: |
@TuomLarsen : could you try patching in CL 77371 and see if that improves performance on your machine? |
@randall77 Unfortunately no, it seems it does not improve it. What I did was I created a stub library with both new and old |
Then I'm out of ideas. Without a copy of your machine, it's hard to debug. |
What version of Go are you using?
go1.9.2 darwin/amd64
Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using?
darwin/amd64
What did you do?
The following Go program multiplies big.Int with one Word constant:
What did you expect to see?
The following C program uses GMP 6.1.2:
and terminates under around 1s.
What did you see instead?
The Go program completes on my machine in 1.87s, i.e. almost twice as slow.
mulAddVWW
is building block for multiplication (by one word, or more) so a better performance would benefit many applications.The text was updated successfully, but these errors were encountered: