Reduce overhead from computing modular reduction parameters #4588
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This was inspired by an OpenSSL bug; apparently there is/was something wrong with their PKCS8 decoders in OpenSSL 3, and parsing private keys took a very long time. I wrote a test to check how Botan behaved and happily the PKCS8 decoding seems to be fine but it took a very long time to set up the Montgomery arithmetic. Specifically computing the Montgomery params requires setting up a Barrett reduction, which required a constant time division. The constant time division consumed well over 90% of the total runtime when parsing a 4096 bit RSA key in a loop.
First optimize the initial Barrett setup by adding a specialized implementation for
2^k / m
. I had hoped to find a specific algorithm - seems like something should be possible here - but Google is useless. But it's anyway a bit faster because we know only 1 bit is set and we can assumek
is public.Second commit follows up by distinguishing between public and secret moduli for Barrett, much as #4569 did for modular inverses. In the public modulus case we can use the much faster variable time division. This also puts some effort towards sharing Barrett constants once we've computed them, eg by making them available from
DL_Group
and caching them in the RSA key data.Overall this reduces the cost of repeated parsing of a RSA 4096 bit private key to about 1/4 of current cost on master.