Add BPJS8 EC multiplication #55

brandonblack · 2022-03-16T20:21:14Z

Like my io-speedups PR, I am absolutely not attached to this code. My exploration of potential performance improvements was triggered by my work on MuSig2* signing, which currently gets ~140 ops/sec on my machine vs. 280 for schnorr, and 2000 for ECDSA when using noble-secp256k1. The result, so far, has been utter failure to improve my target metric. However, if this implementation is swapped in for Point.multiply when _WINDOW_SIZE is not set, ECDH is > 80% faster (453 vs 247 ops/sec on my machine).

Given the results below, an argument could also be made for dropping multiplyUnsafe in favor of this algorithm, since the slowdown is small, and it removes a potential footgun. I did some investigation of potential ways to close the gap to multiplyUnsafe and was able to bring it down to ~5% by replacing mod with % when the value is known to be positive, but the same optimization could be used to speedup multiplyUnsafe.

This is an algorithm for EC multiplication that emulates the Montgomery
Ladder double-and-add, but in a constant time way. An early version of
this algorithm was published in 2017, and the version implemented here
was published in 2020. The result is constant time multiply that is 85%
faster than wNAF, <10% slower than endomorphic Montgommery Ladder and
~20% faster than w/o endomorphism.

multiply (BPSJ8) x 433 ops/sec @ 2ms/op
multiply (precomputed) x 2,997 ops/sec @ 333μs/op
multiply (not precomputed) x 232 ops/sec @ 4ms/op
multiplyUnsafe x 465 ops/sec @ 2ms/op
multiplyUnsafe (no endomorphism) x 356 ops/sec @ 2ms/op

This is an algorithm for EC multiplication that emulates the Montgomery Ladder double-and-add, but in a constant time way. An early version of this algorithm was published in 2017, and the version implemented here was published in 2020. The result is constant time multiply that is 85% faster than wNAF, ~10% slower than endomorphic Montgommery Ladder and ~20% faster than w/o endomorphism. multiply (BPSJ8) x 433 ops/sec @ 2ms/op multiply (precomputed) x 2,997 ops/sec @ 333μs/op multiply (not precomputed) x 232 ops/sec @ 4ms/op multiplyUnsafe x 465 ops/sec @ 2ms/op multiplyUnsafe (no endomorphism) x 356 ops/sec @ 2ms/op

paulmillr · 2022-03-16T21:42:50Z

wow. no shit. this is great!

The result is constant time multiply that is 85% faster than wNAF

So, we can replace wnaf precomputes with precomputes for this thing?

brandonblack · 2022-03-16T21:44:01Z

Sorry if I oversold, it's still much slower than precomputed wNAF. About the same speed as the current multiplyUnsafe.

paulmillr · 2022-03-16T21:44:29Z

But it's safe, right? Did you propose this algo to libsecp256k1?

brandonblack · 2022-03-16T21:49:59Z

Yep, safe*.

In my implementation, the Half operation is either add-and-shift or shift, depending on the oddness of an intermediate value. There may be a more constant time way to implement half in JS, but I couldn't find one.
The sequence of operations is data independent, but the locations of memory accesses are data dependent, which may be a concern for some people.
Like the current wNAF, and other similar implementations the time does vary based on the total number of bits that must be considered (ie. a 255-bit value is faster to compute than a 256-bit value). Most implementations of ECC that I've seen do not mitigate this, but it's something to be aware of. Well that was wrong.

I have not proposed it to libsecp256k1. TBH I haven't read their entire ecmult implementation, but I know it's already fast.

paulmillr · 2022-03-16T21:55:26Z

the time does vary based on the total number of bits that must be considered

Currently it takes ~same time to calculate private key for 2-bit value vs 255-bit value. So, we work around this rn. Could you test this with such values?

brandonblack · 2022-03-16T21:57:13Z

Currently it takes ~same time to calculate private key for 2-bit value vs 255-bit value. So, we work around this rn. Could you test this with such values?

Ah, my reading of the code made me think that the 2-bit value would be much faster. Can you help me understand what makes it the same?

brandonblack · 2022-03-16T22:06:07Z

--- a/index.ts
+++ b/index.ts
@@ -392,7 +392,8 @@ export class BPSJ8 {
     if (scalar < _1n || scalar >= CURVE.P - _1n) throw new Error("Expecting scalar between 2 and P - 1");
 
     const scalarBits = genBits(numTo32b(scalar));
-    while (!scalarBits.next().value);
+    this.setup();
+    while (!scalarBits.next().value) this.ladd(0);
 
     this.setup();
     for (const ki of scalarBits) this.ladd(ki);

results in

getPublicKey 1 bit x 529 ops/sec @ 1ms/op
getPublicKey 256 bit x 530 ops/sec @ 1ms/op

But I'm not confident on the cryptographic validity of it.

paulmillr · 2022-03-16T22:10:39Z

for wnaf. We go through multiplication windows (=17 for default W=8, handling 16-bit values at once):

noble-secp256k1/index.ts

Line 281 in 97aa518

for (let window = 0; window < windows; window++) {

If the bits are not zero, we add it to result point

noble-secp256k1/index.ts

Line 304 in 97aa518

let cached = precomputes[offset + Math.abs(wbits) - 1];

If the bits are zero, we add randomness to a fake point

noble-secp256k1/index.ts

Line 297 in 97aa518

// Add random point inside current window to f.

noble-secp256k1/index.ts

Line 300 in 97aa518

let pr = precomputes[offset];

Basically there are two result points, one of them always does addition.

brandonblack · 2022-03-16T22:11:22Z

Gotcha, thank you!

paulmillr · 2022-03-17T19:41:32Z

If you replace Point#multiply with return new BPSJ8(this).multiply(BigInt(scalar));, the tests will fail. It will try to do inversion from zero. Seems like the algorithm is incorrect.

brandonblack · 2022-03-17T19:46:42Z

Ah, I did miss that -- the paper defines 1/0 as 0, and I didn't verify noble-secp256k1's implementation matched behavior.

Edit to add: Still some test failures though, which is surprising only because I tested many thousands of random points * tweaks against the previous algorithm.

paulmillr · 2022-03-17T20:55:45Z

Where did you find the papers? Also, at which page does it define 1/0 as 0, and how is this possible?

brandonblack · 2022-03-17T21:00:38Z

https://eprint.iacr.org/2017/669.pdf (2017), Page 13
Oddly, that definition is not mentioned in the 2020 paper:
https://www.aimsciences.org/article/exportPdf?id=5c293be6-723e-4b97-ae1d-ff359e261cdb

paulmillr · 2022-03-17T21:02:26Z

Where did you find the 2017 paper? I can't find any references to it.

brandonblack · 2022-03-17T21:04:07Z

DuckDuckGo's search results for some search phrase about fast constant time ec multiplication :)

paulmillr · 2022-03-17T21:14:55Z

I've created invert0 function that calls returns 0 for n=0, or falls back to invert(). With that method, new scalar mult does not throw errors, but the result is still incorrect.

brandonblack · 2022-03-17T21:54:43Z

Yeah, not yet certain whether my implementation has a bug or the paper, especially since whatever the case is, it isn't often hit. Possibly something specific that comes up when it's used in calculating wNAF precomputed points?

brandonblack · 2022-03-17T23:22:50Z

I pushed a commit with the script I've been mucking about with for verifying the function. Uses the base point, as well as random points, and checks that the result of the multiplication .equals the output of the existing wNAF algorithm.

Edit to add: It's kinda a pain to switch multiply between BPSJ8 and wNAF to facilitate the check vs. test runs :sigh:

brandonblack · 2022-03-17T23:46:42Z

The paper authors do comment that values for the scalar of 0,1, P, and P-1 do not work with this algorithm, so that explains at least some of the failures. Not sure why the 2*2 case is failing.

brandonblack · 2022-03-18T13:50:06Z

Tests pass with this small modification to make 1/-1 behave as expected.

Edit to add: Even without redefining invert, so I think that was a red herring that was eliminated from the latest version of the paper.

paulmillr · 2023-03-17T01:45:13Z

Thanks Brandon, moving towards v2.0 now

Add check script

c125b33

Handle 1/-1 properly

cdaa2db

paulmillr mentioned this pull request Apr 4, 2022

Investigate using new algorithm for scalar multiplication bitcoin-core/secp256k1#1099

Open

mratsim mentioned this pull request Aug 6, 2022

Scalar-mul: Constant-Time double-and-add mratsim/constantine#198

Open

paulmillr closed this Mar 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BPJS8 EC multiplication #55

Add BPJS8 EC multiplication #55

brandonblack commented Mar 16, 2022 •

edited by paulmillr

Loading

paulmillr commented Mar 16, 2022

brandonblack commented Mar 16, 2022

paulmillr commented Mar 16, 2022

brandonblack commented Mar 16, 2022 •

edited

Loading

paulmillr commented Mar 16, 2022

brandonblack commented Mar 16, 2022

brandonblack commented Mar 16, 2022

paulmillr commented Mar 16, 2022

brandonblack commented Mar 16, 2022

paulmillr commented Mar 17, 2022

brandonblack commented Mar 17, 2022 •

edited

Loading

paulmillr commented Mar 17, 2022

brandonblack commented Mar 17, 2022

paulmillr commented Mar 17, 2022

brandonblack commented Mar 17, 2022 •

edited

Loading

paulmillr commented Mar 17, 2022

brandonblack commented Mar 17, 2022

brandonblack commented Mar 17, 2022 •

edited

Loading

brandonblack commented Mar 17, 2022

brandonblack commented Mar 18, 2022 •

edited

Loading

paulmillr commented Mar 17, 2023

Add BPJS8 EC multiplication #55

Add BPJS8 EC multiplication #55

Conversation

brandonblack commented Mar 16, 2022 • edited by paulmillr Loading

paulmillr commented Mar 16, 2022

brandonblack commented Mar 16, 2022

paulmillr commented Mar 16, 2022

brandonblack commented Mar 16, 2022 • edited Loading

paulmillr commented Mar 16, 2022

brandonblack commented Mar 16, 2022

brandonblack commented Mar 16, 2022

paulmillr commented Mar 16, 2022

brandonblack commented Mar 16, 2022

paulmillr commented Mar 17, 2022

brandonblack commented Mar 17, 2022 • edited Loading

paulmillr commented Mar 17, 2022

brandonblack commented Mar 17, 2022

paulmillr commented Mar 17, 2022

brandonblack commented Mar 17, 2022 • edited Loading

paulmillr commented Mar 17, 2022

brandonblack commented Mar 17, 2022

brandonblack commented Mar 17, 2022 • edited Loading

brandonblack commented Mar 17, 2022

brandonblack commented Mar 18, 2022 • edited Loading

paulmillr commented Mar 17, 2023

brandonblack commented Mar 16, 2022 •

edited by paulmillr

Loading

brandonblack commented Mar 16, 2022 •

edited

Loading

brandonblack commented Mar 17, 2022 •

edited

Loading

brandonblack commented Mar 17, 2022 •

edited

Loading

brandonblack commented Mar 17, 2022 •

edited

Loading

brandonblack commented Mar 18, 2022 •

edited

Loading