Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BPJS8 EC multiplication #55

Closed
wants to merge 3 commits into from

Conversation

brandonblack
Copy link
Contributor

@brandonblack brandonblack commented Mar 16, 2022

Like my io-speedups PR, I am absolutely not attached to this code. My exploration of potential performance improvements was triggered by my work on MuSig2* signing, which currently gets ~140 ops/sec on my machine vs. 280 for schnorr, and 2000 for ECDSA when using noble-secp256k1. The result, so far, has been utter failure to improve my target metric. However, if this implementation is swapped in for Point.multiply when _WINDOW_SIZE is not set, ECDH is > 80% faster (453 vs 247 ops/sec on my machine).

Given the results below, an argument could also be made for dropping multiplyUnsafe in favor of this algorithm, since the slowdown is small, and it removes a potential footgun. I did some investigation of potential ways to close the gap to multiplyUnsafe and was able to bring it down to ~5% by replacing mod with % when the value is known to be positive, but the same optimization could be used to speedup multiplyUnsafe.


This is an algorithm for EC multiplication that emulates the Montgomery
Ladder double-and-add, but in a constant time way. An early version of
this algorithm was published in 2017, and the version implemented here
was published in 2020. The result is constant time multiply that is 85%
faster than wNAF, <10% slower than endomorphic Montgommery Ladder and
~20% faster than w/o endomorphism.

multiply (BPSJ8) x 433 ops/sec @ 2ms/op
multiply (precomputed) x 2,997 ops/sec @ 333μs/op
multiply (not precomputed) x 232 ops/sec @ 4ms/op
multiplyUnsafe x 465 ops/sec @ 2ms/op
multiplyUnsafe (no endomorphism) x 356 ops/sec @ 2ms/op

This is an algorithm for EC multiplication that emulates the Montgomery
Ladder double-and-add, but in a constant time way. An early version of
this algorithm was published in 2017, and the version implemented here
was published in 2020. The result is constant time multiply that is 85%
faster than wNAF, ~10% slower than endomorphic Montgommery Ladder and
~20% faster than w/o endomorphism.

multiply (BPSJ8) x 433 ops/sec @ 2ms/op
multiply (precomputed) x 2,997 ops/sec @ 333μs/op
multiply (not precomputed) x 232 ops/sec @ 4ms/op
multiplyUnsafe x 465 ops/sec @ 2ms/op
multiplyUnsafe (no endomorphism) x 356 ops/sec @ 2ms/op
@paulmillr
Copy link
Owner

wow. no shit. this is great!

The result is constant time multiply that is 85% faster than wNAF

So, we can replace wnaf precomputes with precomputes for this thing?

@brandonblack
Copy link
Contributor Author

Sorry if I oversold, it's still much slower than precomputed wNAF. About the same speed as the current multiplyUnsafe.

@paulmillr
Copy link
Owner

But it's safe, right? Did you propose this algo to libsecp256k1?

@brandonblack
Copy link
Contributor Author

brandonblack commented Mar 16, 2022

Yep, safe*.

  • In my implementation, the Half operation is either add-and-shift or shift, depending on the oddness of an intermediate value. There may be a more constant time way to implement half in JS, but I couldn't find one.
  • The sequence of operations is data independent, but the locations of memory accesses are data dependent, which may be a concern for some people.
  • Like the current wNAF, and other similar implementations the time does vary based on the total number of bits that must be considered (ie. a 255-bit value is faster to compute than a 256-bit value). Most implementations of ECC that I've seen do not mitigate this, but it's something to be aware of. Well that was wrong.

I have not proposed it to libsecp256k1. TBH I haven't read their entire ecmult implementation, but I know it's already fast.

@paulmillr
Copy link
Owner

the time does vary based on the total number of bits that must be considered

Currently it takes ~same time to calculate private key for 2-bit value vs 255-bit value. So, we work around this rn. Could you test this with such values?

@brandonblack
Copy link
Contributor Author

Currently it takes ~same time to calculate private key for 2-bit value vs 255-bit value. So, we work around this rn. Could you test this with such values?

Ah, my reading of the code made me think that the 2-bit value would be much faster. Can you help me understand what makes it the same?

@brandonblack
Copy link
Contributor Author

--- a/index.ts
+++ b/index.ts
@@ -392,7 +392,8 @@ export class BPSJ8 {
     if (scalar < _1n || scalar >= CURVE.P - _1n) throw new Error("Expecting scalar between 2 and P - 1");
 
     const scalarBits = genBits(numTo32b(scalar));
-    while (!scalarBits.next().value);
+    this.setup();
+    while (!scalarBits.next().value) this.ladd(0);
 
     this.setup();
     for (const ki of scalarBits) this.ladd(ki);

results in

getPublicKey 1 bit x 529 ops/sec @ 1ms/op
getPublicKey 256 bit x 530 ops/sec @ 1ms/op

But I'm not confident on the cryptographic validity of it.

@paulmillr
Copy link
Owner

for wnaf. We go through multiplication windows (=17 for default W=8, handling 16-bit values at once):

for (let window = 0; window < windows; window++) {

If the bits are not zero, we add it to result point

let cached = precomputes[offset + Math.abs(wbits) - 1];

If the bits are zero, we add randomness to a fake point

// Add random point inside current window to f.

let pr = precomputes[offset];

Basically there are two result points, one of them always does addition.

@brandonblack
Copy link
Contributor Author

Gotcha, thank you!

@paulmillr
Copy link
Owner

If you replace Point#multiply with return new BPSJ8(this).multiply(BigInt(scalar));, the tests will fail. It will try to do inversion from zero. Seems like the algorithm is incorrect.

@brandonblack
Copy link
Contributor Author

brandonblack commented Mar 17, 2022

Ah, I did miss that -- the paper defines 1/0 as 0, and I didn't verify noble-secp256k1's implementation matched behavior.

Edit to add: Still some test failures though, which is surprising only because I tested many thousands of random points * tweaks against the previous algorithm.

@paulmillr
Copy link
Owner

Where did you find the papers? Also, at which page does it define 1/0 as 0, and how is this possible?

@brandonblack
Copy link
Contributor Author

https://eprint.iacr.org/2017/669.pdf (2017), Page 13
Oddly, that definition is not mentioned in the 2020 paper:
https://www.aimsciences.org/article/exportPdf?id=5c293be6-723e-4b97-ae1d-ff359e261cdb

@paulmillr
Copy link
Owner

Where did you find the 2017 paper? I can't find any references to it.

@brandonblack
Copy link
Contributor Author

brandonblack commented Mar 17, 2022

DuckDuckGo's search results for some search phrase about fast constant time ec multiplication :)

@paulmillr
Copy link
Owner

I've created invert0 function that calls returns 0 for n=0, or falls back to invert(). With that method, new scalar mult does not throw errors, but the result is still incorrect.

@brandonblack
Copy link
Contributor Author

Yeah, not yet certain whether my implementation has a bug or the paper, especially since whatever the case is, it isn't often hit. Possibly something specific that comes up when it's used in calculating wNAF precomputed points?

@brandonblack
Copy link
Contributor Author

brandonblack commented Mar 17, 2022

I pushed a commit with the script I've been mucking about with for verifying the function. Uses the base point, as well as random points, and checks that the result of the multiplication .equals the output of the existing wNAF algorithm.

Edit to add: It's kinda a pain to switch multiply between BPSJ8 and wNAF to facilitate the check vs. test runs :sigh:

@brandonblack
Copy link
Contributor Author

The paper authors do comment that values for the scalar of 0,1, P, and P-1 do not work with this algorithm, so that explains at least some of the failures. Not sure why the 2*2 case is failing.

@brandonblack
Copy link
Contributor Author

brandonblack commented Mar 18, 2022

Tests pass with this small modification to make 1/-1 behave as expected.

Edit to add: Even without redefining invert, so I think that was a red herring that was eliminated from the latest version of the paper.

@paulmillr
Copy link
Owner

Thanks Brandon, moving towards v2.0 now

@paulmillr paulmillr closed this Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants