Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Constant-time enhancements for bignum.c and ecdsa.c #3309

Closed

Conversation

scottarc
Copy link

@scottarc scottarc commented May 6, 2020

Description

This change ensures many bignum operations critical to asymmetric cryptography (especially RSA and ECDSA signing, which relies on mbedtls_mpi_inv_mod()) are constant-time for a given fixed-size input.

Status

READY

Requires Backporting

Yes.

This should be backported to any branches still in support. I can follow-up this PR with a backport pull request. At minimum, the 2.16 branch should receive a backport.

Migrations

YES. Additional functions are added to bignum.h, but no backwards incompatible changes were introduced.

Additional comments

N/A

Todos

  • Tests
  • Documentation
  • Changelog updated
  • Backported

Steps to test or reproduce

  • Ensure unit tests pass.

There isn't really a reliable and trivial way to reproduce and measure a failing case (variable-time behavior).

Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.

Signed-off-by: Scott Arciszewski <[email protected]>
@mpg mpg self-requested a review May 7, 2020 10:21
@mpg mpg added bug component-crypto Crypto primitives and low-level interfaces needs-backports Backports are missing or are pending review and approval. needs: changelog needs-ci Needs to pass CI tests needs-review Every commit must be reviewed by at least two team members, labels May 7, 2020
@mpg
Copy link
Contributor

mpg commented May 7, 2020

Hi @scottarc, and thanks for your contribution! I understand you've already been in touch with Janos, so you're already aware that his contribution is very welcome.

I should be able to start reviewing your PR next week, in the meantime here are some preliminary remarks:

  1. There are issues found by the CI. Currently only the results of the Travis CI are public, so once you've fixed all issues found by the Travis CI, if there are still other issues found by Jenkins, please ping me and I'll copy-paste the relevant bits here.

  2. This change is significant enough to deserve an entry in our ChangeLog file; please add one by creating a new file in ChangeLog.d (see the README file in that directory).

  3. The new public functions in bignum.h should have unit tests (rather than be implicitly tested through the functions that use them). See https://github.com/ARMmbed/mbedtls/blob/development/docs/architecture/testing/test-framework.md#unit-tests - and feel very free to ask questions about our unit testing framework if needed, we're aware it's not (yet) as well documented as it should be. (Edited to clarify: as you wrote, testing for constant-time behaviour is not easy, so here I'm just talking about good old testing the outputs of the functions with various inputs.)

  4. Thanks you very much for offering to work on backports. Considering this is relevant for security, we're likely to want to backport for both 2.16 and 2.7. We usually avoid extending the API in LTS branches, but this case may be a justified exception. I suggest we discuss this only after advancing this PR a bit more.

#elif defined(__GNUC__) && ( \
defined(__amd64__) || defined(__x86_64__) || \
defined(__ppc64__) || defined(__powerpc64__) || \
defined(__ia64__) || defined(__alpha__) || \
( defined(__sparc__) && defined(__arch64__) ) || \
defined(__s390x__) || defined(__mips64) || \
defined(__aarch64__) )
#if !defined(MBEDTLS_HAVE_INT64)
#if !defined(MBEDTLS_HAVE_INT64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is indentation change?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My IDE did that. I'll correct it on my next commit.

library/bignum.c Outdated Show resolved Hide resolved
@scottarc scottarc force-pushed the scottarc-development-fixes branch from 2739d52 to 506c8dc Compare May 7, 2020 19:19
Copy link
Contributor

@mpg mpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first pass, focusing on the general design rather than on the details for now. I think the most important points are:

  1. Could we get away with not adding new public functions to the bignum API? The answer to this question probably depends on some benchmarking data.
  2. While the code in this PR is a significant improvement over the existing, it still leaves mbedtls_mpi_inv_mod() quite far from being constant-time. I'm all for incremental improvements, we don't need to fix everything at once, I just want to make sure we agree on where we stand.

*
* \return Index of the highest nonzero limb in X.
*/
int mbedtls_mpi_first_nonzero(size_t *out, const mbedtls_mpi *X);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably doesn't need to be a public function. Currently it doesn't seem to be used outside bignum.c, and the fact that its description involves limbs is an indication that it's rather concerned with internal implementation details.

(In the past we have often exposed too much in the public API of each module, and we're trying to get better at it.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On another note, the function description doesn't state if it's constant-time.

*
* \return 1 if all limbs in X == 0, 0 otherwise.
*/
int mbedtls_mpi_is_zero( mbedtls_mpi *X );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note on naming: it would probably be good to use the _ct suffix consistently for constant-time functions. That way we can get used to seeing it and notice when we're using a function that doesn't have it :)

(In an ideal world it should be the other way round: all functions would be constant-time except perhaps a few exceptions that would have a scary suffix such as _leaky. But we can't do that right now.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second note: is that not a bit redundant with mbedtls_mpi_cmp_int_ct()? (Or with mbedtls_mpi_cmp_int() if we decide to make it constant-time?)

* \return \c -1 if \p X is lesser than \p Y.
* \return \c 0 if \p X is equal to \p Y.
*/
int mbedtls_mpi_cmp_mpi_ct( const mbedtls_mpi *X, const mbedtls_mpi *Y );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way you're adding this function is very reasonable and is what we usually do when we want to extend an existing function. However, in this particular instance I don't think being leaky was ever part of the contract of the original function, so I think we can consider just changing the original function to the new constant-time implementation.

The obvious concern is performance, but unless the difference is demonstrably large enough, security and code size usually trump performance. (Code size matters especially for LTS branches, where people want to be able to pick up updates and still fit in the flash of already deployed devices.)

Could you try replacing the implementation of the existing function with your new CT implementation, running programs/test/benchmark and checking if it makes a measurable difference in performance?

* \return \c -1 if \p X is lesser than \p z.
* \return \c 0 if \p X is equal to \p z.
*/
int mbedtls_mpi_cmp_int_ct( const mbedtls_mpi *X, mbedtls_mpi_sint z );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

int mbedtls_mpi_first_nonzero(size_t *out, const mbedtls_mpi *X)
{
int ret = MBEDTLS_ERR_ERROR_CORRUPTION_DETECTED;
MPI_VALIDATE_RET( X->n < 0x80000000 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I think we already have checks that are stronger than that, related to MBEDTLS_MPI_MAX_LIMBS, so this is probably redundant. (If we don't have it already, we could have a static check that MBEDTLS_MPI_MAX_LIMBS < 1 << 31.)

mbedtls_mpi TA, TB;
MPI_VALIDATE_RET( X != NULL );
MPI_VALIDATE_RET( A != NULL );
MPI_VALIDATE_RET( B != NULL );

/* Ensure A->n and B->n are smaller than 2^31. */
MPI_VALIDATE_RET( A->n < 0x80000000 );
MPI_VALIDATE_RET( B->n < 0x80000000 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I like the intention, but I think that's probably redundant with the MBEDTLS_MPI_MAX_LIMBS limit.

break;

MBEDTLS_MPI_CHK( mbedtls_mpi_first_nonzero(&i, A) );
MBEDTLS_MPI_CHK( mbedtls_mpi_first_nonzero(&j, B) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I think it's clear that this is unfortunately not enough to make this function constant-time, considering we're running loops j and i times just below, so I just want to clarify what the intention is here: is it just for readability or is there another reason I'm missing?

To be clear: I'm not objecting to this change, and I think improving readability is good. I just want to make sure I understand the intent.

@@ -239,7 +239,7 @@ static int derive_mpi( const mbedtls_ecp_group *grp, mbedtls_mpi *x,
MBEDTLS_MPI_CHK( mbedtls_mpi_shift_r( x, use_size * 8 - grp->nbits ) );

/* While at it, reduce modulo N */
if( mbedtls_mpi_cmp_mpi( x, &grp->N ) >= 0 )
if( mbedtls_mpi_cmp_mpi_ct( x, &grp->N ) >= 0 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is a small improvement over the existing code, it leaves a larger leak open: the if branch itself. I think it would be better to always compute the result of the subtraction and then do a safe conditional swap (as above).


while( mbedtls_mpi_cmp_int( &V1, 0 ) < 0 )
while( mbedtls_mpi_cmp_int_ct( &V1, 0 ) < 0 )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is still a loop whose number of iterations depends on secret data.

MBEDTLS_MPI_CHK( mbedtls_mpi_add_mpi( &V1, &V1, N ) );

MBEDTLS_MPI_CHK( mbedtls_mpi_add_mpi( &V1, &V1, N ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I understand why this line was added.

@gilles-peskine-arm gilles-peskine-arm added needs-work and removed needs-review Every commit must be reviewed by at least two team members, labels May 25, 2020
@mpg
Copy link
Contributor

mpg commented Sep 25, 2020

Hi @scottarc - I haven't heard of you since my first review, I'm not sure if you were waiting on us, were simply busy with other things, or perhaps my feedback was not really giving clear directions for how to improve this PR, so I'm going to try and suggest ways to move forward if you're still willing and able to work on this PR.

I think it's important to first think about the general strategy and set a clear goal for this PR. Clearly the focus is on mbedtls_mpi_mod_inv(), which is good as it's indeed an important (again involved in a recent security advisory). I'm afraid we can't make this function truly constant-time without changing its structure substantially, so I think there are two ways to go:

  1. Use a constant-time extended GCD algorithm (which also gives the modular inverse), such as this one - or see Plan to address RSA attacks of eprint.iacr.org/2020/055? ARMmbed/mbed-crypto#347 where another contributor, @jack-fortanix, gave pointers to other references. For this, you'd probably need to write a few helper functions for constant-time comparisons etc. and I'd strongly suggest taking an incremental approach by first adding and unit-testing those helpers before using them to build the overall extended GCD function.

I think that would be the best way to make a significant improvement to our GCD and modular inversion functions, and hopefully close this avenue of attack once and for all. However this is also a pretty significant amount of work, and might be more that what you were prepared to contribute.

  1. Alternatively, one could start with identifying the weakest points of the existing modular inversion and GCD functions, that is, the ones exploited by https://eprint.iacr.org/2020/055.pdf and start with fixing just those:
    • removing the call to mbedtls_mpi_gcd() in mbedtls_mpi_inv_mod() - this just requires ensuring that the function will return an error later if the operands aren't co-prime - and adding a test case for that if we don't have one already
    • using constant-time variant of mbedtls_mpi_cmp_mpi() and mbedtls_mpi_shift_r() as those are the two sources of timing leaks identified in the paper (p. 205)

Again, an incremental approach should be taken here: first writing and unit-testing the constant-time replacement function for cmp or shift_r and unit-testing it, then using it where appropriate.

Also, (2) (just fixing the weakest points) could be an intermediate step towards (1) (a truly constant-time extended GCD alg), as the constant-time cmp and shift_r functions are likely to be useful there too.

By the way, please note that we already have a constant-time "lest than" function: mbedtls_mpi_lt_mpi_ct() which might be useful, and also functions for conditional assignment and conditional swap.

Please let us know if you're still interested in working on this and what you think about my suggestions above.

@scottarc
Copy link
Author

Hi @scottarc - I haven't heard of you since my first review, I'm not sure if you were waiting on us, were simply busy with other things, or perhaps my feedback was not really giving clear directions for how to improve this PR, so I'm going to try and suggest ways to move forward if you're still willing and able to work on this PR.

Hello again, sorry for the delay. As you suspected, I have been busy with other things since I first opened this pull request. However, I should have some available bandwidth soon.

I think it's important to first think about the general strategy and set a clear goal for this PR. Clearly the focus is on mbedtls_mpi_mod_inv(), which is good as it's indeed an important (again involved in a recent security advisory). I'm afraid we can't make this function truly constant-time without changing its structure substantially, so I think there are two ways to go:

1. Use a constant-time extended GCD algorithm (which also gives the modular inverse), such as [this one](https://eprint.iacr.org/2020/972) - or see [ARMmbed/mbed-crypto#347](https://github.com/ARMmbed/mbed-crypto/issues/347) where another contributor, @jack-fortanix, gave pointers to other references. For this, you'd probably need to write a few helper functions for constant-time comparisons etc. and I'd strongly suggest taking an incremental approach by first adding and unit-testing those helpers before using them to build the overall extended GCD function.

I think that would be the best way to make a significant improvement to our GCD and modular inversion functions, and hopefully close this avenue of attack once and for all. However this is also a pretty significant amount of work, and might be more that what you were prepared to contribute.

1. Alternatively, one could start with identifying the weakest points of the existing modular inversion and GCD functions, that is, the ones exploited by https://eprint.iacr.org/2020/055.pdf and start with fixing just those:
   
   * removing the call to `mbedtls_mpi_gcd()` in `mbedtls_mpi_inv_mod()` - this just requires ensuring that the function will return an error later if the operands aren't co-prime - and adding a test case for that if we don't have one already
   * using constant-time variant of `mbedtls_mpi_cmp_mpi()` and `mbedtls_mpi_shift_r()` as those are the two sources of timing leaks identified in the paper (p. 205)

Again, an incremental approach should be taken here: first writing and unit-testing the constant-time replacement function for cmp or shift_r and unit-testing it, then using it where appropriate.

Also, (2) (just fixing the weakest points) could be an intermediate step towards (1) (a truly constant-time extended GCD alg), as the constant-time cmp and shift_r functions are likely to be useful there too.

By the way, please note that we already have a constant-time "lest than" function: mbedtls_mpi_lt_mpi_ct() which might be useful, and also functions for conditional assignment and conditional swap.

Please let us know if you're still interested in working on this and what you think about my suggestions above.

I'm happy to contribute incremental fixes to alleviate the risk of side-channels in MbedTLS, but I must admit I'm personally not an excellent C programmer, so my involvement might create a very noisy feedback loop (especially in terms of CI/CD notifications).

So long as that's not a problem, then I'll get to work as soon as possible. :)

@gilles-peskine-arm
Copy link
Contributor

If you haven't already done so, I recommend reading Jean-Philippe Aumasson's guidelines. Specifically, here, the first few guidelines are about constant-time code.

Writing constant-time code in C is quite a different skill from “normal” C programming. The core code that needs to be constant-time rarely involves the more advanced features of C, but conversely it requires careful use of some features that normal C programmers take for granted, and a habit of languages with stronger type systems can actually help (a secret integer and a public integer are the same data type, but they aren't the same semantic type and shouldn't be mixed).

Mbed TLS already has a few building blocks for constant-time code in rsa.c and ssl_msg.c. To avoid code duplication, we'd like to move and merge those functions into a common module. We don't have a schedule for this yet.

@scottarc
Copy link
Author

Incidentally, I'm very familiar with JP's guidelines. A lot of the design decisions we need to consider aren't captured by them, but they're a great starting point. (Constant-time inequality operators are one example of a blind spot. Constant-time memory resizing is another.)

@gilles-peskine-arm
Copy link
Contributor

Ah, great!

Constant-time inequality operators are one example of a blind spot.

Indeed, some compilers generate a branch for b = (x < y) on some architectures. Fortunately, on architectures that have a branch predictor, compilers tend to avoid branches because they're bad for performance, and on architectures that have no branch predictor, secret-dependent branches are harder to exploit.

Constant-time memory resizing is another.

I'm not sure what you mean here. Would the sizes be secret or not? If the only secret is the content of the memory, wouldn't a naive algorithm work on “sane” platform? Or is this about platforms that do deduplication (in which case, is constant-time anything even possible)?

@mpg
Copy link
Contributor

mpg commented Sep 29, 2020

Yes, I agree the there tend to be gaps between various references what we're trying to do here. Generally speaking, I'm not aware of a good general reference for constant-time implementation of multi-precision arithmetic. Some aspects of it are treated in the academic literature (for example, see the references above for constant-time extended GCD algorithms), but those tend to focus on rather high-level descriptions of the algorithms. At the other end of the spectrum, Aumasson's guidelines include solutions for rather low-level problems such as comparing 32-bit values. In the middle, we have classical references on multi-precision arithmetic (such as the Handbook of Applied Cryptography) which completely ignore timing issues (because they were written before those issues became as relevant as they are today). We need to cross those partial references in order to build implementations of constant-time multi-precision arithmetic operations.

@scottarc
Copy link
Author

I'm not sure what you mean here. Would the sizes be secret or not? If the only secret is the content of the memory, wouldn't a naive algorithm work on “sane” platform? Or is this about platforms that do deduplication (in which case, is constant-time anything even possible)?

https://raccoon-attack.com/

When doing ephemeral-static Diffie-Hellman, the number of leading zero bytes is leaked from timing information, which can then be used to setup a lattice attack against the target server's static DH key.

Removing the timing leak on the number of leading zeroes requires a constant-time left trim function, which would typically be implemented as a loop that iterates over every {limb, byte} and counts the number of significant {limbs, bytes}, then passes to the resize function.

(Whether or not you need the resize function to be constant-time depends on whether memory access patterns are in your threat model. My intuition here is: For embedded devices, it's probably safe to exclude it; for general-purpose computers, it's probably not safe to exclude it. You still want the counter loop to be constant-time.)

Generally speaking, I'm not aware of a good general reference for constant-time implementation of multi-precision arithmetic.

Agreed. Aside from Aumasson's resource and random blog posts on the topic (e.g.), this is a significant blind spot in the cryptographic literature that needs to be filled.

@gilles-peskine-arm
Copy link
Contributor

Regarding resizing: ah, ok, I see what you mean. The size of a buffer can definitely leak through memory access patterns on platforms with a cache, so you need to instrument the code that processes that byte string to work in a buffer that's large enough to accommodate the maximum size. There's an example of that in the new TLS CBC MAC verification code (Lucky 13 countermeasure) in 2.24: mbedtls_ssl_cf_hmac works on data_len_secret bytes in a buffer of size max_data_len, and leaks max_data_len but not data_len_secret.

Thanks for mentioning Soatok's guide, I'd never seen it and it's pretty good for what it covers.

@daverodgman daverodgman added the needs-reviewer This PR needs someone to pick it up for review label Oct 2, 2020
@gilles-peskine-arm gilles-peskine-arm removed the needs-ci Needs to pass CI tests label Oct 6, 2020
@mpg mpg self-requested a review April 7, 2021 09:33
@mpg mpg added the historical-reviewing Currently reviewing (for legacy PR/issues) label Mar 17, 2022
@mpg
Copy link
Contributor

mpg commented Mar 18, 2022

This PR has been inactive for a while, and the path forward is not clear from previous discussion (ranges from incremental improvements like doing constant-time comparisons in well-chosen critical places to complete rewrite of GCD/invmod based on a different algorithm). So, we're closing this PR, and if you're still interested in contributing constant-time improvements, please get in touch on the mailing list to discuss the general strategy first. Thanks again for your contribution!

@mpg mpg closed this Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug component-crypto Crypto primitives and low-level interfaces historical-reviewing Currently reviewing (for legacy PR/issues) needs-backports Backports are missing or are pending review and approval. needs-reviewer This PR needs someone to pick it up for review needs-work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants