-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Constant-time enhancements for bignum.c and ecdsa.c #3309
fix: Constant-time enhancements for bignum.c and ecdsa.c #3309
Conversation
Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. Signed-off-by: Scott Arciszewski <[email protected]>
Hi @scottarc, and thanks for your contribution! I understand you've already been in touch with Janos, so you're already aware that his contribution is very welcome. I should be able to start reviewing your PR next week, in the meantime here are some preliminary remarks:
|
include/mbedtls/bignum.h
Outdated
#elif defined(__GNUC__) && ( \ | ||
defined(__amd64__) || defined(__x86_64__) || \ | ||
defined(__ppc64__) || defined(__powerpc64__) || \ | ||
defined(__ia64__) || defined(__alpha__) || \ | ||
( defined(__sparc__) && defined(__arch64__) ) || \ | ||
defined(__s390x__) || defined(__mips64) || \ | ||
defined(__aarch64__) ) | ||
#if !defined(MBEDTLS_HAVE_INT64) | ||
#if !defined(MBEDTLS_HAVE_INT64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is indentation change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My IDE did that. I'll correct it on my next commit.
Signed-off-by: Scott Arciszewski <[email protected]>
Signed-off-by: Scott Arciszewski <[email protected]>
2739d52
to
506c8dc
Compare
Signed-off-by: Scott Arciszewski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first pass, focusing on the general design rather than on the details for now. I think the most important points are:
- Could we get away with not adding new public functions to the bignum API? The answer to this question probably depends on some benchmarking data.
- While the code in this PR is a significant improvement over the existing, it still leaves
mbedtls_mpi_inv_mod()
quite far from being constant-time. I'm all for incremental improvements, we don't need to fix everything at once, I just want to make sure we agree on where we stand.
* | ||
* \return Index of the highest nonzero limb in X. | ||
*/ | ||
int mbedtls_mpi_first_nonzero(size_t *out, const mbedtls_mpi *X); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably doesn't need to be a public function. Currently it doesn't seem to be used outside bignum.c
, and the fact that its description involves limbs is an indication that it's rather concerned with internal implementation details.
(In the past we have often exposed too much in the public API of each module, and we're trying to get better at it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On another note, the function description doesn't state if it's constant-time.
* | ||
* \return 1 if all limbs in X == 0, 0 otherwise. | ||
*/ | ||
int mbedtls_mpi_is_zero( mbedtls_mpi *X ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note on naming: it would probably be good to use the _ct
suffix consistently for constant-time functions. That way we can get used to seeing it and notice when we're using a function that doesn't have it :)
(In an ideal world it should be the other way round: all functions would be constant-time except perhaps a few exceptions that would have a scary suffix such as _leaky
. But we can't do that right now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second note: is that not a bit redundant with mbedtls_mpi_cmp_int_ct()
? (Or with mbedtls_mpi_cmp_int()
if we decide to make it constant-time?)
* \return \c -1 if \p X is lesser than \p Y. | ||
* \return \c 0 if \p X is equal to \p Y. | ||
*/ | ||
int mbedtls_mpi_cmp_mpi_ct( const mbedtls_mpi *X, const mbedtls_mpi *Y ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you're adding this function is very reasonable and is what we usually do when we want to extend an existing function. However, in this particular instance I don't think being leaky was ever part of the contract of the original function, so I think we can consider just changing the original function to the new constant-time implementation.
The obvious concern is performance, but unless the difference is demonstrably large enough, security and code size usually trump performance. (Code size matters especially for LTS branches, where people want to be able to pick up updates and still fit in the flash of already deployed devices.)
Could you try replacing the implementation of the existing function with your new CT implementation, running programs/test/benchmark
and checking if it makes a measurable difference in performance?
* \return \c -1 if \p X is lesser than \p z. | ||
* \return \c 0 if \p X is equal to \p z. | ||
*/ | ||
int mbedtls_mpi_cmp_int_ct( const mbedtls_mpi *X, mbedtls_mpi_sint z ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
int mbedtls_mpi_first_nonzero(size_t *out, const mbedtls_mpi *X) | ||
{ | ||
int ret = MBEDTLS_ERR_ERROR_CORRUPTION_DETECTED; | ||
MPI_VALIDATE_RET( X->n < 0x80000000 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I think we already have checks that are stronger than that, related to MBEDTLS_MPI_MAX_LIMBS
, so this is probably redundant. (If we don't have it already, we could have a static check that MBEDTLS_MPI_MAX_LIMBS < 1 << 31
.)
mbedtls_mpi TA, TB; | ||
MPI_VALIDATE_RET( X != NULL ); | ||
MPI_VALIDATE_RET( A != NULL ); | ||
MPI_VALIDATE_RET( B != NULL ); | ||
|
||
/* Ensure A->n and B->n are smaller than 2^31. */ | ||
MPI_VALIDATE_RET( A->n < 0x80000000 ); | ||
MPI_VALIDATE_RET( B->n < 0x80000000 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I like the intention, but I think that's probably redundant with the MBEDTLS_MPI_MAX_LIMBS
limit.
break; | ||
|
||
MBEDTLS_MPI_CHK( mbedtls_mpi_first_nonzero(&i, A) ); | ||
MBEDTLS_MPI_CHK( mbedtls_mpi_first_nonzero(&j, B) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: I think it's clear that this is unfortunately not enough to make this function constant-time, considering we're running loops j
and i
times just below, so I just want to clarify what the intention is here: is it just for readability or is there another reason I'm missing?
To be clear: I'm not objecting to this change, and I think improving readability is good. I just want to make sure I understand the intent.
@@ -239,7 +239,7 @@ static int derive_mpi( const mbedtls_ecp_group *grp, mbedtls_mpi *x, | |||
MBEDTLS_MPI_CHK( mbedtls_mpi_shift_r( x, use_size * 8 - grp->nbits ) ); | |||
|
|||
/* While at it, reduce modulo N */ | |||
if( mbedtls_mpi_cmp_mpi( x, &grp->N ) >= 0 ) | |||
if( mbedtls_mpi_cmp_mpi_ct( x, &grp->N ) >= 0 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is a small improvement over the existing code, it leaves a larger leak open: the if branch itself. I think it would be better to always compute the result of the subtraction and then do a safe conditional swap (as above).
|
||
while( mbedtls_mpi_cmp_int( &V1, 0 ) < 0 ) | ||
while( mbedtls_mpi_cmp_int_ct( &V1, 0 ) < 0 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is still a loop whose number of iterations depends on secret data.
MBEDTLS_MPI_CHK( mbedtls_mpi_add_mpi( &V1, &V1, N ) ); | ||
|
||
MBEDTLS_MPI_CHK( mbedtls_mpi_add_mpi( &V1, &V1, N ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand why this line was added.
Hi @scottarc - I haven't heard of you since my first review, I'm not sure if you were waiting on us, were simply busy with other things, or perhaps my feedback was not really giving clear directions for how to improve this PR, so I'm going to try and suggest ways to move forward if you're still willing and able to work on this PR. I think it's important to first think about the general strategy and set a clear goal for this PR. Clearly the focus is on
I think that would be the best way to make a significant improvement to our GCD and modular inversion functions, and hopefully close this avenue of attack once and for all. However this is also a pretty significant amount of work, and might be more that what you were prepared to contribute.
Again, an incremental approach should be taken here: first writing and unit-testing the constant-time replacement function for Also, (2) (just fixing the weakest points) could be an intermediate step towards (1) (a truly constant-time extended GCD alg), as the constant-time By the way, please note that we already have a constant-time "lest than" function: Please let us know if you're still interested in working on this and what you think about my suggestions above. |
Hello again, sorry for the delay. As you suspected, I have been busy with other things since I first opened this pull request. However, I should have some available bandwidth soon.
I'm happy to contribute incremental fixes to alleviate the risk of side-channels in MbedTLS, but I must admit I'm personally not an excellent C programmer, so my involvement might create a very noisy feedback loop (especially in terms of CI/CD notifications). So long as that's not a problem, then I'll get to work as soon as possible. :) |
If you haven't already done so, I recommend reading Jean-Philippe Aumasson's guidelines. Specifically, here, the first few guidelines are about constant-time code. Writing constant-time code in C is quite a different skill from “normal” C programming. The core code that needs to be constant-time rarely involves the more advanced features of C, but conversely it requires careful use of some features that normal C programmers take for granted, and a habit of languages with stronger type systems can actually help (a secret integer and a public integer are the same data type, but they aren't the same semantic type and shouldn't be mixed). Mbed TLS already has a few building blocks for constant-time code in |
Incidentally, I'm very familiar with JP's guidelines. A lot of the design decisions we need to consider aren't captured by them, but they're a great starting point. (Constant-time inequality operators are one example of a blind spot. Constant-time memory resizing is another.) |
Ah, great!
Indeed, some compilers generate a branch for
I'm not sure what you mean here. Would the sizes be secret or not? If the only secret is the content of the memory, wouldn't a naive algorithm work on “sane” platform? Or is this about platforms that do deduplication (in which case, is constant-time anything even possible)? |
Yes, I agree the there tend to be gaps between various references what we're trying to do here. Generally speaking, I'm not aware of a good general reference for constant-time implementation of multi-precision arithmetic. Some aspects of it are treated in the academic literature (for example, see the references above for constant-time extended GCD algorithms), but those tend to focus on rather high-level descriptions of the algorithms. At the other end of the spectrum, Aumasson's guidelines include solutions for rather low-level problems such as comparing 32-bit values. In the middle, we have classical references on multi-precision arithmetic (such as the Handbook of Applied Cryptography) which completely ignore timing issues (because they were written before those issues became as relevant as they are today). We need to cross those partial references in order to build implementations of constant-time multi-precision arithmetic operations. |
When doing ephemeral-static Diffie-Hellman, the number of leading zero bytes is leaked from timing information, which can then be used to setup a lattice attack against the target server's static DH key. Removing the timing leak on the number of leading zeroes requires a constant-time left trim function, which would typically be implemented as a loop that iterates over every {limb, byte} and counts the number of significant {limbs, bytes}, then passes to the resize function. (Whether or not you need the resize function to be constant-time depends on whether memory access patterns are in your threat model. My intuition here is: For embedded devices, it's probably safe to exclude it; for general-purpose computers, it's probably not safe to exclude it. You still want the counter loop to be constant-time.)
Agreed. Aside from Aumasson's resource and random blog posts on the topic (e.g.), this is a significant blind spot in the cryptographic literature that needs to be filled. |
Regarding resizing: ah, ok, I see what you mean. The size of a buffer can definitely leak through memory access patterns on platforms with a cache, so you need to instrument the code that processes that byte string to work in a buffer that's large enough to accommodate the maximum size. There's an example of that in the new TLS CBC MAC verification code (Lucky 13 countermeasure) in 2.24: Thanks for mentioning Soatok's guide, I'd never seen it and it's pretty good for what it covers. |
This PR has been inactive for a while, and the path forward is not clear from previous discussion (ranges from incremental improvements like doing constant-time comparisons in well-chosen critical places to complete rewrite of GCD/invmod based on a different algorithm). So, we're closing this PR, and if you're still interested in contributing constant-time improvements, please get in touch on the mailing list to discuss the general strategy first. Thanks again for your contribution! |
Description
This change ensures many bignum operations critical to asymmetric cryptography (especially RSA and ECDSA signing, which relies on
mbedtls_mpi_inv_mod()
) are constant-time for a given fixed-size input.Status
READY
Requires Backporting
Yes.
This should be backported to any branches still in support. I can follow-up this PR with a backport pull request. At minimum, the 2.16 branch should receive a backport.
Migrations
YES. Additional functions are added to bignum.h, but no backwards incompatible changes were introduced.
Additional comments
N/A
Todos
Steps to test or reproduce
There isn't really a reliable and trivial way to reproduce and measure a failing case (variable-time behavior).