-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BPJS8 EC multiplication #55
Conversation
This is an algorithm for EC multiplication that emulates the Montgomery Ladder double-and-add, but in a constant time way. An early version of this algorithm was published in 2017, and the version implemented here was published in 2020. The result is constant time multiply that is 85% faster than wNAF, ~10% slower than endomorphic Montgommery Ladder and ~20% faster than w/o endomorphism. multiply (BPSJ8) x 433 ops/sec @ 2ms/op multiply (precomputed) x 2,997 ops/sec @ 333μs/op multiply (not precomputed) x 232 ops/sec @ 4ms/op multiplyUnsafe x 465 ops/sec @ 2ms/op multiplyUnsafe (no endomorphism) x 356 ops/sec @ 2ms/op
wow. no shit. this is great!
So, we can replace wnaf precomputes with precomputes for this thing? |
Sorry if I oversold, it's still much slower than precomputed wNAF. About the same speed as the current |
But it's safe, right? Did you propose this algo to libsecp256k1? |
Yep, safe*.
I have not proposed it to libsecp256k1. TBH I haven't read their entire ecmult implementation, but I know it's already fast. |
Currently it takes ~same time to calculate private key for 2-bit value vs 255-bit value. So, we work around this rn. Could you test this with such values? |
Ah, my reading of the code made me think that the 2-bit value would be much faster. Can you help me understand what makes it the same? |
--- a/index.ts
+++ b/index.ts
@@ -392,7 +392,8 @@ export class BPSJ8 {
if (scalar < _1n || scalar >= CURVE.P - _1n) throw new Error("Expecting scalar between 2 and P - 1");
const scalarBits = genBits(numTo32b(scalar));
- while (!scalarBits.next().value);
+ this.setup();
+ while (!scalarBits.next().value) this.ladd(0);
this.setup();
for (const ki of scalarBits) this.ladd(ki); results in
But I'm not confident on the cryptographic validity of it. |
for wnaf. We go through multiplication windows ( Line 281 in 97aa518
If the bits are not zero, we add it to result point Line 304 in 97aa518
If the bits are zero, we add randomness to a fake point Line 297 in 97aa518
Line 300 in 97aa518
Basically there are two result points, one of them always does addition. |
Gotcha, thank you! |
If you replace |
Ah, I did miss that -- the paper defines 1/0 as 0, and I didn't verify Edit to add: Still some test failures though, which is surprising only because I tested many thousands of random points * tweaks against the previous algorithm. |
Where did you find the papers? Also, at which page does it define 1/0 as 0, and how is this possible? |
https://eprint.iacr.org/2017/669.pdf (2017), Page 13 |
Where did you find the 2017 paper? I can't find any references to it. |
DuckDuckGo's search results for |
I've created |
Yeah, not yet certain whether my implementation has a bug or the paper, especially since whatever the case is, it isn't often hit. Possibly something specific that comes up when it's used in calculating wNAF precomputed points? |
I pushed a commit with the script I've been mucking about with for verifying the function. Uses the base point, as well as random points, and checks that the result of the multiplication Edit to add: It's kinda a pain to switch multiply between BPSJ8 and wNAF to facilitate the check vs. test runs :sigh: |
The paper authors do comment that values for the scalar of 0,1, P, and P-1 do not work with this algorithm, so that explains at least some of the failures. Not sure why the |
Tests pass with this small modification to make 1/-1 behave as expected. Edit to add: Even without redefining |
Thanks Brandon, moving towards v2.0 now |
Like my io-speedups PR, I am absolutely not attached to this code. My exploration of potential performance improvements was triggered by my work on MuSig2* signing, which currently gets ~140 ops/sec on my machine vs. 280 for schnorr, and 2000 for ECDSA when using noble-secp256k1. The result, so far, has been utter failure to improve my target metric. However, if this implementation is swapped in for
Point.multiply
when_WINDOW_SIZE
is not set, ECDH is > 80% faster (453 vs 247 ops/sec on my machine).Given the results below, an argument could also be made for dropping
multiplyUnsafe
in favor of this algorithm, since the slowdown is small, and it removes a potential footgun. I did some investigation of potential ways to close the gap tomultiplyUnsafe
and was able to bring it down to ~5% by replacingmod
with%
when the value is known to be positive, but the same optimization could be used to speedupmultiplyUnsafe
.This is an algorithm for EC multiplication that emulates the Montgomery
Ladder double-and-add, but in a constant time way. An early version of
this algorithm was published in 2017, and the version implemented here
was published in 2020. The result is constant time multiply that is 85%
faster than wNAF, <10% slower than endomorphic Montgommery Ladder and
~20% faster than w/o endomorphism.