Pippenger multiscalar multiplication algorithm #249

oleganza · 2019-05-21T19:40:42Z

Currently supported Straus algorithm does not improve performance as input size grows and saturates at ≈4x improvement over naïve multiplication. Pippenger’s algorithm takes advantage of very large amount of points (>190) by avoiding premultiplication of points and instead placing points in the buckets indexed by the multiplication factor. Then, buckets are cleverly added up to have their multipliers applied automatically. The process is repeated for each "digit" (that are 6 to 8 bits wide, depending on number of input points). As a result, the cost of multiplication grows slower than linearly with respect to the input size. For 1024 points Pippenger is >40% faster than Straus.

To make the gains more relatable:

Batch verification of less than 100 Schnorr signatures is unchanged — Straus is optimal.
Single 64-bit rangeproof check in Bulletproofs is unchanged — Straus is optimal.
2x64-bit aggregated rangeproof check is 10% faster. Example: a typical 2-output confidential transaction.
Ten 64-bit aggregated rangeproofs is ≈1.8-2x faster. Example: a 10-output confidential transaction.

This patch adds:

An implementation of VartimeMultiscalarMul using Pippenger algorithm.
Dynamic switch based on input's size_hint() from Straus to Pippenger (at 190 points).
New to_pippenger_radix internal API to convert scalars into digits in radix 64, 128, or 256.

This addresses issue #130 and replaces previous PR #129.

oleganza · 2019-05-21T19:43:02Z

@hdevalence could you please take a look at this line? https://github.com/dalek-cryptography/curve25519-dalek/pull/249/files#diff-40c9945ba2b4ac799c45803abe9da3bdR1031

If i remove it, the pseudo-random tests still pass. Even for |G|-1 scalars. Maybe it's specific to the radixes 6/7/8?

oleganza · 2019-05-21T21:15:06Z

Benchmarks:

     Running target/release/deps/dalek_benchmarks-3b30c98c1f589a9e
Variable-time variable-base multiscalar multiplication/1                                                                            
                        time:   [24.649 us 24.668 us 24.680 us]
                        change: [+0.0456% +0.1652% +0.2902%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Variable-time variable-base multiscalar multiplication/2                                                                            
                        time:   [30.244 us 30.269 us 30.295 us]
                        change: [-0.3981% -0.2577% -0.1267%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Variable-time variable-base multiscalar multiplication/4                                                                            
                        time:   [41.449 us 41.476 us 41.499 us]
                        change: [-0.1541% -0.0554% +0.0254%] (p = 0.28 > 0.05)
                        No change in performance detected.
Found 3 outliers among 15 measurements (20.00%)
  2 (13.33%) high mild
  1 (6.67%) high severe
Variable-time variable-base multiscalar multiplication/8                                                                           
                        time:   [63.819 us 63.825 us 63.834 us]
                        change: [-2.4276% -2.3787% -2.3089%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 15 measurements (13.33%)
  1 (6.67%) high mild
  1 (6.67%) high severe
Variable-time variable-base multiscalar multiplication/16                                                                           
                        time:   [109.17 us 109.22 us 109.25 us]
                        change: [-0.3171% -0.2525% -0.1786%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 15 measurements (6.67%)
  1 (6.67%) high mild
Variable-time variable-base multiscalar multiplication/32                                                                           
                        time:   [199.55 us 199.59 us 199.66 us]
                        change: [-0.1247% -0.0640% +0.0057%] (p = 0.08 > 0.05)
                        No change in performance detected.
Variable-time variable-base multiscalar multiplication/64                                                                           
                        time:   [379.57 us 379.63 us 379.69 us]
                        change: [-0.0002% +0.0351% +0.0680%] (p = 0.06 > 0.05)
                        No change in performance detected.
Variable-time variable-base multiscalar multiplication/128                                                                            
                        time:   [756.51 us 756.71 us 756.96 us]
                        change: [+0.1368% +0.1867% +0.2326%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 15 measurements (6.67%)
  1 (6.67%) high mild
Variable-time variable-base multiscalar multiplication/256                                                                            
                        time:   [1.3250 ms 1.3254 ms 1.3258 ms]
                        change: [-10.356% -10.273% -10.204%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 15 measurements (6.67%)
  1 (6.67%) high mild
Variable-time variable-base multiscalar multiplication/384                                                                            
                        time:   [1.8276 ms 1.8282 ms 1.8288 ms]
                        change: [-19.070% -18.972% -18.904%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 15 measurements (6.67%)
  1 (6.67%) high mild
Variable-time variable-base multiscalar multiplication/512                                                                            
                        time:   [2.3366 ms 2.3376 ms 2.3386 ms]
                        change: [-22.551% -22.459% -22.389%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 15 measurements (6.67%)
  1 (6.67%) low mild
Variable-time variable-base multiscalar multiplication/768                                                                            
                        time:   [3.2301 ms 3.2314 ms 3.2328 ms]
                        change: [-38.392% -37.982% -37.748%] (p = 0.00 < 0.05)
                        Performance has improved.
Variable-time variable-base multiscalar multiplication/1024                                                                            
                        time:   [4.0123 ms 4.0135 ms 4.0154 ms]
                        change: [-42.556% -42.153% -41.933%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 15 measurements (6.67%)

hdevalence · 2019-05-22T18:42:37Z

Looks great! I'll check about the radix question.

hdevalence · 2019-05-22T20:20:51Z

src/scalar.rs

+    /// $$
+    /// with \\(-2\^w/2 \leq a_i < 2\^w/2\\) for \\(0 \leq i < (n-1)\\) and \\(-2\^w/2 \leq a_{n-1} \leq 2\^w/2\\).
+    ///
+    pub(crate) fn to_pippenger_radix(&self, w: usize) -> ([i8; 43], usize) {


Suggested change

pub(crate) fn to_pippenger_radix(&self, w: usize) -> ([i8; 43], usize) {

pub(crate) fn to_radix_2w(&self, w: usize) -> ([i8; 43], usize) {

would match the naming scheme for to_radix_16 better?

2w also reads as 2*w. I intentionally placed "pippenger" there to bring attention to it being highly specialized and not accepting arbitrary w as an input. Any moment someone needs a slightly more general-purpose implementation, they should create a new method and not use this one.

How about to_radix_64_128_256?

I tried a few variants, and to_radix_2w is the cleanest. It's private API anyway.

hdevalence · 2019-05-24T22:48:02Z

src/backend/serial/scalar_mul/pippenger.rs

+/// For large `n`, dominant factor is (n*256/w) additions.
+/// However, if `w` is too big and `n` is not too big, then `(2^w/2)*A` could dominate.
+/// Therefore, the optimal choice of `w` grows slowly as `n` grows.
+///


This documentation should maybe link to section 4 of https://eprint.iacr.org/2012/549.pdf which I believe is the original source of this special case of Pippenger's technique (which is very general).

src/backend/serial/scalar_mul/pippenger.rs

Co-Authored-By: Henry de Valence <[email protected]>

hdevalence · 2019-06-04T20:40:44Z

@oleganza what's the issue with scalar.rs:1031 exactly? Since the lookup tables can handle the closed interval [-2^r/2, +2^r/2], we can be sure that the carry bit is not dropped by adding it at the end. This will be correct regardless of whether or not the carry is always 0 or not. Is it just that you want to prove that the carry can or can't be 1?

hdevalence · 2019-06-04T20:42:34Z

This branch looks good and I'd like to merge it and ship it as 1.2!

oleganza · 2019-06-04T20:46:19Z

Is it just that you want to prove that the carry can or can't be 1?

i think it's that, yeah

oleganza · 2019-06-04T20:46:30Z

i'm fine with merging

new pippenger radix 6/7/8 implementation

b52c205

oleganza mentioned this pull request May 21, 2019

Pippenger multiscalar multiplication algorithm for very large inputs #129

Closed

2 tasks

cgs

42648aa

oleganza force-pushed the oleg/pippenger2 branch from 991c959 to 42648aa Compare May 21, 2019 20:33

fix type conversions

33b41ac

oleganza marked this pull request as ready for review May 21, 2019 20:48

oops - forgot to switch on pippenger

df745e9

oleganza mentioned this pull request May 22, 2019

Efficient multiscalar multiplication for large inputs #130

Closed

5 tasks

oleganza and others added 2 commits May 22, 2019 11:14

avoid unnecessary allocation

7fba2a1

rustfmt and copyright fixes

dfcac0d

use one buffer instead of two

ca2926a

hdevalence reviewed May 22, 2019

View reviewed changes

osuketh mentioned this pull request May 23, 2019

Optimize multiexp LayerXcom/zero-chain#106

Open

cleaner name per Henry’s suggestion

9836d66

hdevalence reviewed May 24, 2019

View reviewed changes

src/backend/serial/scalar_mul/pippenger.rs Outdated Show resolved Hide resolved

oleganza and others added 2 commits May 24, 2019 18:00

Update src/backend/serial/scalar_mul/pippenger.rs

eb82a9d

Co-Authored-By: Henry de Valence <[email protected]>

Replace std::iter with core::iter

5921d6d

Add reference to 2012/549

19dcd62

hdevalence merged commit c084def into dalek-cryptography:develop Jun 4, 2019

burdges mentioned this pull request Oct 1, 2019

Replace ring with ed25519-dalek in primitives paritytech/substrate#2415

Merged

burdges mentioned this pull request Feb 28, 2020

Batch signature verification paritytech/substrate#5023

Merged

burdges mentioned this pull request Apr 29, 2020

Use background task to batch-verify sr25510 signatures with optimum size paritytech/substrate#5832

Merged

pinkforest pushed a commit to pinkforest/curve25519-dalek that referenced this pull request Jun 27, 2023

Impld Clone for SigningKey (dalek-cryptography#249)

616d55c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pippenger multiscalar multiplication algorithm #249

Pippenger multiscalar multiplication algorithm #249

oleganza commented May 21, 2019 •

edited

Loading

oleganza commented May 21, 2019

oleganza commented May 21, 2019

hdevalence commented May 22, 2019

hdevalence May 22, 2019

oleganza May 22, 2019

oleganza May 24, 2019 •

edited

Loading

hdevalence May 24, 2019

hdevalence commented Jun 4, 2019

hdevalence commented Jun 4, 2019

oleganza commented Jun 4, 2019

oleganza commented Jun 4, 2019

	pub(crate) fn to_pippenger_radix(&self, w: usize) -> ([i8; 43], usize) {
	pub(crate) fn to_radix_2w(&self, w: usize) -> ([i8; 43], usize) {

Pippenger multiscalar multiplication algorithm #249

Pippenger multiscalar multiplication algorithm #249

Conversation

oleganza commented May 21, 2019 • edited Loading

oleganza commented May 21, 2019

oleganza commented May 21, 2019

hdevalence commented May 22, 2019

hdevalence May 22, 2019

Choose a reason for hiding this comment

oleganza May 22, 2019

Choose a reason for hiding this comment

oleganza May 24, 2019 • edited Loading

Choose a reason for hiding this comment

hdevalence May 24, 2019

Choose a reason for hiding this comment

hdevalence commented Jun 4, 2019

hdevalence commented Jun 4, 2019

oleganza commented Jun 4, 2019

oleganza commented Jun 4, 2019

oleganza commented May 21, 2019 •

edited

Loading

oleganza May 24, 2019 •

edited

Loading