Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relative performance of Mongomery and Edwards scalarmul #588

Closed
oconnor663 opened this issue Oct 27, 2023 · 7 comments
Closed

relative performance of Mongomery and Edwards scalarmul #588

oconnor663 opened this issue Oct 27, 2023 · 7 comments

Comments

@oconnor663
Copy link

Apologies for filing an issue that's more of a question :) My very basic understanding of the performance relationship between Curve25519 and Edwards25519 is that the former is faster when you're scalarmult'ing an arbitrary point, like you do in the second half of Diffie-Hellman. But when I benchmark scalarmult in this library (i5-1145G7 Linux laptop), it looks like the Edwards curve is faster:

use criterion::{criterion_group, criterion_main, Criterion};

pub fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("curve25519_dalek Montgomery scalarmul", |b| {
        let point = curve25519_dalek::montgomery::MontgomeryPoint::mul_base_clamped(rand::random());
        b.iter(|| point.mul_clamped(rand::random()))
    });

    c.bench_function("curve25519_dalek Edwards scalarmul", |b| {
        let point = curve25519_dalek::edwards::EdwardsPoint::mul_base_clamped(rand::random());
        b.iter(|| point.mul_clamped(rand::random()))
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
$ cargo bench
   Compiling scratch v0.1.0 (/tmp/tmp.x9omrxNKmT/scratch)
    Finished bench [optimized] target(s) in 1.17s
     Running unittests src/main.rs (target/release/deps/scratch-67fe8039c398f6ac)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/bench.rs (target/release/deps/bench-3a6cb2fec14f5ab6)
curve25519_dalek Montgomery scalarmul
                        time:   [41.320 µs 41.428 µs 41.561 µs]
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe

curve25519_dalek Edwards scalarmul
                        time:   [23.729 µs 23.757 µs 23.789 µs]
Found 15 outliers among 100 measurements (15.00%)
  6 (6.00%) high mild
  9 (9.00%) high severe

There's a big chance that my assumption going in was wrong, or that I'm not benchmarking what I think I'm benchmarking. Am I right that this is surprising? Does this suggest that doing Diffie-Hellman on Edwards25519 directly (not converting to the Mongtomery form first) would actually be faster than regular X25519?

@tarcieri
Copy link
Contributor

curve25519-dalek implements fixed-based Montgomery multiplication in terms of Edwards multiplication:

https://github.com/dalek-cryptography/curve25519-dalek/blob/8e0cef5/curve25519-dalek/src/montgomery.rs#L123

Notably this allows it to leverage the precomputed basepoint tables for the Edwards form, so we don't need to maintain two different copies of the basepoint tables for the two different curve forms.

You're free to use either form for ECDH, though for interop purposes you may find it easier to use X25519.

@oconnor663
Copy link
Author

Fascinating, thanks for the lightning fast reply.

@oconnor663
Copy link
Author

oconnor663 commented Oct 27, 2023

Actually, I'm not sure we're talking about the same thing. What I'm trying to benchmark above is arbitrary/variable base scalarmult, via mul_clamped, which I think bottoms out at mul_bits_be (which by complete coincidence was introduced in the commit you linked to?). The calls to mul_base_clamped aren't part of the benchmarking loop. Am I right to think that that does not go through the Edwards form?

@tarcieri
Copy link
Contributor

tarcieri commented Oct 27, 2023

Aah yes, sorry I misread your benchmark. mul_base_clamped goes through mul_base where indeed mul_clamped goes through mul_bits_be. So fixed-base multiplication converts to Edwards form, whereas variable-base multiplication uses the Montgomery ladder, in case that's unclear.

@oconnor663
Copy link
Author

Yes that's what it looks like to me, which brings me back to the performance question: Isn't the Mongtomery ladder supposed to be faster than scalar multiplication on the Edwards curve? Is it weird that Edwards seems faster in this benchmark?

@tarcieri
Copy link
Contributor

Edwards arithmetic in curve25519-dalek has multiple backends including highly optimized architecture-specific SIMD backends including ones for AVX2 and AVX-512, whereas the Montgomery backend is relatively simplistic.

@oconnor663
Copy link
Author

oconnor663 commented Oct 27, 2023

Got it, that makes sense. And yeah, I do notice a decent speedup (~17µs instead of ~23) when I benchmark nightly and pick up the AVX-512 optimizations. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants