Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor Scalar performance #507

Closed
kayabaNerve opened this issue Feb 13, 2023 · 11 comments
Closed

Poor Scalar performance #507

kayabaNerve opened this issue Feb 13, 2023 · 11 comments

Comments

@kayabaNerve
Copy link
Contributor

kayabaNerve commented Feb 13, 2023

dalek scalars are about 10x as inperformant as k256 Scalars, for both addition and multiplication, according to some recent benchmarking I had performed. While that isn't a fair comparison, one being curve25519 and one being secp256k1, it is a striking difference that shows something is likely going on with dalek's impl. The following block of code was pointed out to me as the likely culprit:

// The UnpackedScalar::add function produces reduced outputs
// if the inputs are reduced. However, these inputs may not
// be reduced -- they might come from Scalar::from_bits. So
// after computing the sum, we explicitly reduce it mod l
// before repacking.
let sum = UnpackedScalar::add(&self.unpack(), &_rhs.unpack());
let sum_R = UnpackedScalar::mul_internal(&sum, &constants::R);
let sum_mod_l = UnpackedScalar::montgomery_reduce(&sum_R);
sum_mod_l.pack()
Please note I have not flamegraphed this to confirm.

Would it be possible to expose an addition which does not hedge against the potential use of an API in existence for x25519, greatly accelerating Ed25519 operations? Potentially a distinct ReducedScalar type entirely?

@tarcieri
Copy link
Contributor

tarcieri commented Feb 17, 2023

I'm curious if there are actually good use cases for Scalar::from_bits to be public, or if they're covered by Scalar::from_bits_clamped.

Also curious what overhead using bytes as the internal "packed" representation of a scalar might be adding (if any) or if it impacts LLVM inlining (k256 uses an array of limbs, similar to UnpackedScalar).

@kayabaNerve
Copy link
Contributor Author

Beyond x25519, which I can't comment on, I think the remaining concern may be how the Group API requires the modulus as a Scalar (not just via char_le_bits), which requires unreduced Scalars as a possibility.

@tarcieri
Copy link
Contributor

tarcieri commented Feb 18, 2023

I think the remaining concern may be how the Group API requires the modulus as a Scalar

It doesn't though? For Scalar it'd be the Field and PrimeField traits, and of those there's only PrimeField::MODULUS, which is a freeform string.

not just via char_le_bits

(note: there's that on the PrimeFieldBits trait as well)

But beyond that there's nothing that requires an unreduced Scalar in the API, in either ff or group

@kayabaNerve
Copy link
Contributor Author

... you are right. I've even implemented 0.13 into my own libs. I have no idea why I thought it did that. Sorry.

@kayabaNerve
Copy link
Contributor Author

BASEPOINT_ORDER also currently exists as a way to access an unreduced Scalar.

@tarcieri
Copy link
Contributor

Indeed. Wonder if that could be made private at least.

@rozbb
Copy link
Contributor

rozbb commented Mar 9, 2023

Clamping makes the given scalar a multiple of 8, so we'd definitely still need to expose from_bits for ff. But maybe from_bits should do a reduction? I think it used to but it doesn't anymore. @tarcieri do you recall?

@kayabaNerve
Copy link
Contributor Author

kayabaNerve commented Mar 10, 2023

Considering the order of the points actually generatable from G is 2^252 + ..., and the Scalar type is documented as All arithmetic on Scalars is done modulo \( \ell \)., I'd call for Scalars to be canonical (or reduced) to l when entered unless there's breaking behavior caused by x25519.

If x25519 does have breaking behavior, I'd note that anything expecting those bits to not be cleared (or reduced out) could be shimmed by a function which clears the bits, performs the scalarmul, and then for the set bits, adds the relevant point from https://docs.rs/curve25519-dalek/latest/curve25519_dalek/constants/constant.EIGHT_TORSION.html. Accordingly, while an API breaking change, it'd still be possible to provide a safe, always % l, Scalar API that still acknowledges/supports all use cases.

@tarcieri
Copy link
Contributor

Was this addressed by #519?

@kayabaNerve
Copy link
Contributor Author

I'd assume so. @rozbb noted the performance benefits of their PR.

@rozbb
Copy link
Contributor

rozbb commented May 21, 2023

Ok I'll close for now. @kayabaNerve if you end up running some benchmarks against other scalar impls and find that ours is noticeably worse, feel free to re-open. Thanks!

@rozbb rozbb closed this as completed May 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants