math/big: better multiply primitives #9245

griesemer · 2014-12-10T19:36:48Z

Suggestions from Torbjörn Granlund (personal e-mail):

"
The multiply primitives, in particular addMulVVW surely deserves more
attention:

Offset the pointers so that you can index with a counter register
which goes from -n to 0, saving the CMPQ.

Unroll. You can save most of the ADCQ $0, R that way. Basically,
do one run with just MULQ where you sum the old highpart (DX) with
the new lowpart (AX). You will need some MOVQ to move DX
out-of-the-way too. Then do a new run over these sums where you
bring in the memory addend. This should double the speed on some
newer CPUs.

A good addMulVVW is probably really the first thing to write in
assembly; addition and subtraction is much less important, usually.
"

vielmetti · 2017-11-09T16:19:03Z

See also https://go-review.googlesource.com/c/go/+/76270 (for arm64) where it is reported:

The lack of proper addMulVVW implementation for arm64 hurts RSA
performance

This is an optimized implementation, it improves RSA2048 performance
by 10X to 15X on ARMv8 based server processors.

odeke-em · 2018-03-05T08:59:21Z

/cc @ericlagergren

griesemer · 2018-05-24T03:03:38Z

Pushing to next release. There are some discussions about other math/bits primitive operations; maybe we can write some of this code in Go rather than assembly at some point.

andig · 2021-10-13T15:37:35Z

Has this potentially been solved by https://go-review.googlesource.com/c/go/+/74851/ mentioned in #20058 (comment)?

Sorry if OT, I was researching around Go performance topics and stumbled here.

griesemer · 2021-10-13T16:04:44Z

I believe this was for ARMv8; there's more to do here. The Go team is pre-occupied with generics for 1.18, so this is unlikely to happen for 1.18 unless somebody else wants to step in, preferably with experience in performance-critical arithmetic routines.

andig · 2021-10-13T16:23:50Z

I believe this was for ARMv8; there's more to do here

CL mentioned is after issue was raised, fixes amd64 and has been merged ;)

griesemer · 2021-10-13T16:26:50Z

Indeed, I misread, my apologies. So what's left to do is porting this to other architectures?

andig · 2021-10-14T13:20:23Z

I read the issue as related to x86. addMulVVW as such already has special cases for amd64 and arm64 afaikt.

griesemer self-assigned this Dec 10, 2014

ianlancetaylor added repo-main labels Dec 10, 2014

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed release-none labels Apr 10, 2015

griesemer added the Performance label Oct 26, 2015

griesemer modified the milestones: Go1.9Maybe, Unplanned Feb 25, 2017

griesemer modified the milestones: Go1.10, Go1.9Maybe May 9, 2017

griesemer modified the milestones: Go1.10, Go1.11 Nov 3, 2017

griesemer mentioned this issue Nov 9, 2017

math/big: mulAddVWW too slow #22643

Open

bradfitz added the NeedsFix The path to resolution is known, but the work has not been done. label Nov 9, 2017

griesemer modified the milestones: Go1.11, Go1.12 May 24, 2018

griesemer modified the milestones: Go1.12, Unplanned Sep 17, 2018

aeneasr mentioned this issue Feb 19, 2020

High Latency with id_token mutator and RS256 keys ory/oathkeeper#364

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

math/big: better multiply primitives #9245

math/big: better multiply primitives #9245

griesemer commented Dec 10, 2014

vielmetti commented Nov 9, 2017

odeke-em commented Mar 5, 2018

griesemer commented May 24, 2018

andig commented Oct 13, 2021

griesemer commented Oct 13, 2021

andig commented Oct 13, 2021 •

edited

Loading

griesemer commented Oct 13, 2021 •

edited

Loading

andig commented Oct 14, 2021

math/big: better multiply primitives #9245

math/big: better multiply primitives #9245

Comments

griesemer commented Dec 10, 2014

vielmetti commented Nov 9, 2017

odeke-em commented Mar 5, 2018

griesemer commented May 24, 2018

andig commented Oct 13, 2021

griesemer commented Oct 13, 2021

andig commented Oct 13, 2021 • edited Loading

griesemer commented Oct 13, 2021 • edited Loading

andig commented Oct 14, 2021

andig commented Oct 13, 2021 •

edited

Loading

griesemer commented Oct 13, 2021 •

edited

Loading