Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce overhead from computing modular reduction parameters #4588

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

randombit
Copy link
Owner

This was inspired by an OpenSSL bug; apparently there is/was something wrong with their PKCS8 decoders in OpenSSL 3, and parsing private keys took a very long time. I wrote a test to check how Botan behaved and happily the PKCS8 decoding seems to be fine but it took a very long time to set up the Montgomery arithmetic. Specifically computing the Montgomery params requires setting up a Barrett reduction, which required a constant time division. The constant time division consumed well over 90% of the total runtime when parsing a 4096 bit RSA key in a loop.

First optimize the initial Barrett setup by adding a specialized implementation for 2^k / m. I had hoped to find a specific algorithm - seems like something should be possible here - but Google is useless. But it's anyway a bit faster because we know only 1 bit is set and we can assume k is public.

Second commit follows up by distinguishing between public and secret moduli for Barrett, much as #4569 did for modular inverses. In the public modulus case we can use the much faster variable time division. This also puts some effort towards sharing Barrett constants once we've computed them, eg by making them available from DL_Group and caching them in the RSA key data.

Overall this reduces the cost of repeated parsing of a RSA 4096 bit private key to about 1/4 of current cost on master.

@randombit randombit requested a review from reneme January 23, 2025 23:02
@randombit randombit added this to the Botan 3.7.0 milestone Jan 23, 2025
@randombit randombit force-pushed the jack/faster-redc-setup branch from aa41ebf to d6c0271 Compare January 23, 2025 23:04
@coveralls
Copy link

coveralls commented Jan 23, 2025

Coverage Status

coverage: 91.266% (+0.01%) from 91.256%
when pulling 3c17c73 on jack/faster-redc-setup
into 9beda9f on master.

@randombit randombit force-pushed the jack/faster-redc-setup branch from 2c1308f to dca4db4 Compare January 24, 2025 12:38
We can do somewhat better knowing the input is a power of 2.
We can use a much faster variable time division if the modulus is
public already, which is the common case.

This also (mostly) eliminates the situation where Modular_Reducer can
be uninitialized via passing zero to the constructor; this is a
holdover from lacking std::optional

Also make some changes so Barrett computations can be shared over
time, for example by exposing it from DL_Group and accepting it
as an argument to Blinder
@randombit randombit force-pushed the jack/faster-redc-setup branch from 0a195c8 to 3c17c73 Compare January 26, 2025 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants