Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pippenger multiscalar multiplication algorithm #249

Merged
merged 11 commits into from
Jun 4, 2019

Conversation

oleganza
Copy link
Contributor

@oleganza oleganza commented May 21, 2019

Currently supported Straus algorithm does not improve performance as input size grows and saturates at ≈4x improvement over naïve multiplication. Pippenger’s algorithm takes advantage of very large amount of points (>190) by avoiding premultiplication of points and instead placing points in the buckets indexed by the multiplication factor. Then, buckets are cleverly added up to have their multipliers applied automatically. The process is repeated for each "digit" (that are 6 to 8 bits wide, depending on number of input points). As a result, the cost of multiplication grows slower than linearly with respect to the input size. For 1024 points Pippenger is >40% faster than Straus.

To make the gains more relatable:

  1. Batch verification of less than 100 Schnorr signatures is unchanged — Straus is optimal.
  2. Single 64-bit rangeproof check in Bulletproofs is unchanged — Straus is optimal.
  3. 2x64-bit aggregated rangeproof check is 10% faster. Example: a typical 2-output confidential transaction.
  4. Ten 64-bit aggregated rangeproofs is ≈1.8-2x faster. Example: a 10-output confidential transaction.

This patch adds:

  1. An implementation of VartimeMultiscalarMul using Pippenger algorithm.
  2. Dynamic switch based on input's size_hint() from Straus to Pippenger (at 190 points).
  3. New to_pippenger_radix internal API to convert scalars into digits in radix 64, 128, or 256.

This addresses issue #130 and replaces previous PR #129.

@oleganza
Copy link
Contributor Author

@hdevalence could you please take a look at this line? https://github.com/dalek-cryptography/curve25519-dalek/pull/249/files#diff-40c9945ba2b4ac799c45803abe9da3bdR1031

If i remove it, the pseudo-random tests still pass. Even for |G|-1 scalars. Maybe it's specific to the radixes 6/7/8?

@oleganza oleganza marked this pull request as ready for review May 21, 2019 20:48
@oleganza
Copy link
Contributor Author

Benchmarks:

image

     Running target/release/deps/dalek_benchmarks-3b30c98c1f589a9e
Variable-time variable-base multiscalar multiplication/1                                                                            
                        time:   [24.649 us 24.668 us 24.680 us]
                        change: [+0.0456% +0.1652% +0.2902%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Variable-time variable-base multiscalar multiplication/2                                                                            
                        time:   [30.244 us 30.269 us 30.295 us]
                        change: [-0.3981% -0.2577% -0.1267%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Variable-time variable-base multiscalar multiplication/4                                                                            
                        time:   [41.449 us 41.476 us 41.499 us]
                        change: [-0.1541% -0.0554% +0.0254%] (p = 0.28 > 0.05)
                        No change in performance detected.
Found 3 outliers among 15 measurements (20.00%)
  2 (13.33%) high mild
  1 (6.67%) high severe
Variable-time variable-base multiscalar multiplication/8                                                                           
                        time:   [63.819 us 63.825 us 63.834 us]
                        change: [-2.4276% -2.3787% -2.3089%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 15 measurements (13.33%)
  1 (6.67%) high mild
  1 (6.67%) high severe
Variable-time variable-base multiscalar multiplication/16                                                                           
                        time:   [109.17 us 109.22 us 109.25 us]
                        change: [-0.3171% -0.2525% -0.1786%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 15 measurements (6.67%)
  1 (6.67%) high mild
Variable-time variable-base multiscalar multiplication/32                                                                           
                        time:   [199.55 us 199.59 us 199.66 us]
                        change: [-0.1247% -0.0640% +0.0057%] (p = 0.08 > 0.05)
                        No change in performance detected.
Variable-time variable-base multiscalar multiplication/64                                                                           
                        time:   [379.57 us 379.63 us 379.69 us]
                        change: [-0.0002% +0.0351% +0.0680%] (p = 0.06 > 0.05)
                        No change in performance detected.
Variable-time variable-base multiscalar multiplication/128                                                                            
                        time:   [756.51 us 756.71 us 756.96 us]
                        change: [+0.1368% +0.1867% +0.2326%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 15 measurements (6.67%)
  1 (6.67%) high mild
Variable-time variable-base multiscalar multiplication/256                                                                            
                        time:   [1.3250 ms 1.3254 ms 1.3258 ms]
                        change: [-10.356% -10.273% -10.204%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 15 measurements (6.67%)
  1 (6.67%) high mild
Variable-time variable-base multiscalar multiplication/384                                                                            
                        time:   [1.8276 ms 1.8282 ms 1.8288 ms]
                        change: [-19.070% -18.972% -18.904%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 15 measurements (6.67%)
  1 (6.67%) high mild
Variable-time variable-base multiscalar multiplication/512                                                                            
                        time:   [2.3366 ms 2.3376 ms 2.3386 ms]
                        change: [-22.551% -22.459% -22.389%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 15 measurements (6.67%)
  1 (6.67%) low mild
Variable-time variable-base multiscalar multiplication/768                                                                            
                        time:   [3.2301 ms 3.2314 ms 3.2328 ms]
                        change: [-38.392% -37.982% -37.748%] (p = 0.00 < 0.05)
                        Performance has improved.
Variable-time variable-base multiscalar multiplication/1024                                                                            
                        time:   [4.0123 ms 4.0135 ms 4.0154 ms]
                        change: [-42.556% -42.153% -41.933%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 15 measurements (6.67%)

@hdevalence
Copy link
Contributor

Looks great! I'll check about the radix question.

src/scalar.rs Outdated
/// $$
/// with \\(-2\^w/2 \leq a_i < 2\^w/2\\) for \\(0 \leq i < (n-1)\\) and \\(-2\^w/2 \leq a_{n-1} \leq 2\^w/2\\).
///
pub(crate) fn to_pippenger_radix(&self, w: usize) -> ([i8; 43], usize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(crate) fn to_pippenger_radix(&self, w: usize) -> ([i8; 43], usize) {
pub(crate) fn to_radix_2w(&self, w: usize) -> ([i8; 43], usize) {

would match the naming scheme for to_radix_16 better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2w also reads as 2*w. I intentionally placed "pippenger" there to bring attention to it being highly specialized and not accepting arbitrary w as an input. Any moment someone needs a slightly more general-purpose implementation, they should create a new method and not use this one.

How about to_radix_64_128_256?

Copy link
Contributor Author

@oleganza oleganza May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a few variants, and to_radix_2w is the cleanest. It's private API anyway.

/// For large `n`, dominant factor is (n*256/w) additions.
/// However, if `w` is too big and `n` is not too big, then `(2^w/2)*A` could dominate.
/// Therefore, the optimal choice of `w` grows slowly as `n` grows.
///
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This documentation should maybe link to section 4 of https://eprint.iacr.org/2012/549.pdf which I believe is the original source of this special case of Pippenger's technique (which is very general).

@hdevalence
Copy link
Contributor

@oleganza what's the issue with scalar.rs:1031 exactly? Since the lookup tables can handle the closed interval [-2^r/2, +2^r/2], we can be sure that the carry bit is not dropped by adding it at the end. This will be correct regardless of whether or not the carry is always 0 or not. Is it just that you want to prove that the carry can or can't be 1?

@hdevalence
Copy link
Contributor

This branch looks good and I'd like to merge it and ship it as 1.2!

@oleganza
Copy link
Contributor Author

oleganza commented Jun 4, 2019

Is it just that you want to prove that the carry can or can't be 1?

i think it's that, yeah

@oleganza
Copy link
Contributor Author

oleganza commented Jun 4, 2019

i'm fine with merging

pinkforest pushed a commit to pinkforest/curve25519-dalek that referenced this pull request Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants