Poor Scalar performance #507

kayabaNerve · 2023-02-13T21:23:07Z

dalek scalars are about 10x as inperformant as k256 Scalars, for both addition and multiplication, according to some recent benchmarking I had performed. While that isn't a fair comparison, one being curve25519 and one being secp256k1, it is a striking difference that shows something is likely going on with dalek's impl. The following block of code was pointed out to me as the likely culprit:

curve25519-dalek/src/scalar.rs

Lines 367 to 375 in b2d0f0e

    
           // The UnpackedScalar::add function produces reduced outputs 
        
           // if the inputs are reduced.  However, these inputs may not 
        
           // be reduced -- they might come from Scalar::from_bits.  So 
        
           // after computing the sum, we explicitly reduce it mod l 
        
           // before repacking. 
        
           let sum = UnpackedScalar::add(&self.unpack(), &_rhs.unpack()); 
        
           let sum_R = UnpackedScalar::mul_internal(&sum, &constants::R); 
        
           let sum_mod_l = UnpackedScalar::montgomery_reduce(&sum_R); 
        
           sum_mod_l.pack()

Please note I have not flamegraphed this to confirm.

Would it be possible to expose an addition which does not hedge against the potential use of an API in existence for x25519, greatly accelerating Ed25519 operations? Potentially a distinct ReducedScalar type entirely?

tarcieri · 2023-02-17T16:50:49Z

I'm curious if there are actually good use cases for Scalar::from_bits to be public, or if they're covered by Scalar::from_bits_clamped.

Also curious what overhead using bytes as the internal "packed" representation of a scalar might be adding (if any) or if it impacts LLVM inlining (k256 uses an array of limbs, similar to UnpackedScalar).

kayabaNerve · 2023-02-18T03:15:12Z

Beyond x25519, which I can't comment on, I think the remaining concern may be how the Group API requires the modulus as a Scalar (not just via char_le_bits), which requires unreduced Scalars as a possibility.

tarcieri · 2023-02-18T03:22:55Z

I think the remaining concern may be how the Group API requires the modulus as a Scalar

It doesn't though? For Scalar it'd be the Field and PrimeField traits, and of those there's only PrimeField::MODULUS, which is a freeform string.

not just via char_le_bits

(note: there's that on the PrimeFieldBits trait as well)

But beyond that there's nothing that requires an unreduced Scalar in the API, in either ff or group

kayabaNerve · 2023-02-18T03:37:37Z

... you are right. I've even implemented 0.13 into my own libs. I have no idea why I thought it did that. Sorry.

kayabaNerve · 2023-02-23T08:55:24Z

BASEPOINT_ORDER also currently exists as a way to access an unreduced Scalar.

tarcieri · 2023-02-24T15:42:28Z

Indeed. Wonder if that could be made private at least.

rozbb · 2023-03-09T19:58:28Z

Clamping makes the given scalar a multiple of 8, so we'd definitely still need to expose from_bits for ff. But maybe from_bits should do a reduction? I think it used to but it doesn't anymore. @tarcieri do you recall?

kayabaNerve · 2023-03-10T09:16:37Z

Considering the order of the points actually generatable from G is 2^252 + ..., and the Scalar type is documented as All arithmetic on Scalars is done modulo \( \ell \)., I'd call for Scalars to be canonical (or reduced) to l when entered unless there's breaking behavior caused by x25519.

If x25519 does have breaking behavior, I'd note that anything expecting those bits to not be cleared (or reduced out) could be shimmed by a function which clears the bits, performs the scalarmul, and then for the set bits, adds the relevant point from https://docs.rs/curve25519-dalek/latest/curve25519_dalek/constants/constant.EIGHT_TORSION.html. Accordingly, while an API breaking change, it'd still be possible to provide a safe, always % l, Scalar API that still acknowledges/supports all use cases.

tarcieri · 2023-05-21T00:30:06Z

Was this addressed by #519?

kayabaNerve · 2023-05-21T07:46:48Z

I'd assume so. @rozbb noted the performance benefits of their PR.

rozbb · 2023-05-21T07:52:13Z

Ok I'll close for now. @kayabaNerve if you end up running some benchmarks against other scalar impls and find that ours is noticeably worse, feel free to re-open. Thanks!

rozbb mentioned this issue Mar 20, 2023

Make scalars always reduced #519

Merged

pinkforest mentioned this issue Mar 26, 2023

4.0.0 Remaining Breaking Changes #521

Closed

rozbb closed this as completed May 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor Scalar performance #507

Poor Scalar performance #507

kayabaNerve commented Feb 13, 2023 •

edited

Loading

tarcieri commented Feb 17, 2023 •

edited

Loading

kayabaNerve commented Feb 18, 2023

tarcieri commented Feb 18, 2023 •

edited

Loading

kayabaNerve commented Feb 18, 2023

kayabaNerve commented Feb 23, 2023

tarcieri commented Feb 24, 2023

rozbb commented Mar 9, 2023

kayabaNerve commented Mar 10, 2023 •

edited

Loading

tarcieri commented May 21, 2023

kayabaNerve commented May 21, 2023

rozbb commented May 21, 2023

Poor Scalar performance #507

Poor Scalar performance #507

Comments

kayabaNerve commented Feb 13, 2023 • edited Loading

tarcieri commented Feb 17, 2023 • edited Loading

kayabaNerve commented Feb 18, 2023

tarcieri commented Feb 18, 2023 • edited Loading

kayabaNerve commented Feb 18, 2023

kayabaNerve commented Feb 23, 2023

tarcieri commented Feb 24, 2023

rozbb commented Mar 9, 2023

kayabaNerve commented Mar 10, 2023 • edited Loading

tarcieri commented May 21, 2023

kayabaNerve commented May 21, 2023

rozbb commented May 21, 2023

kayabaNerve commented Feb 13, 2023 •

edited

Loading

tarcieri commented Feb 17, 2023 •

edited

Loading

tarcieri commented Feb 18, 2023 •

edited

Loading

kayabaNerve commented Mar 10, 2023 •

edited

Loading