Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add crypto.timingSafeEqual() #8040

Closed

Conversation

not-an-aardvark
Copy link
Contributor

@not-an-aardvark not-an-aardvark commented Aug 9, 2016

Checklist
  • make -j4 test (UNIX), or vcbuild test nosign (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

crypto

Description of change

Relates to #3073. This is an implementation of a timing-safe key comparison function, implemented on the C++ side as suggested in #3073 (comment) and #3073 (comment).

@nodejs-github-bot nodejs-github-bot added crypto Issues and PRs related to the crypto subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. labels Aug 9, 2016
@addaleax
Copy link
Member

addaleax commented Aug 9, 2016

For the record, this still has the issue that the binary distinction between a successful and an unsuccessful comparison is leaked as timing information, and the choices seem to be either saying that that doesn’t matter or that the extra memcmp() is basically superfluous (which I tend to agree with).

@not-an-aardvark
Copy link
Contributor Author

not-an-aardvark commented Aug 9, 2016

I added that check because it was in #3073, but I can remove it if there's consensus to do so. From reading the discussion in #3073 I couldn't tell whether an agreement had been reached on that issue.

@addaleax
Copy link
Member

addaleax commented Aug 9, 2016

I don’t think agreement has been reached there, either.

@addaleax
Copy link
Member

addaleax commented Aug 9, 2016

/cc @nodejs/crypto

@@ -488,22 +488,21 @@ class Hmac : public BaseObject {

static void Initialize(Environment* env, v8::Local<v8::Object> target);

protected:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this. Not only would it create an inconsistency with the other crypto C++ APIs (since we'd now be exposing the 3 below methods to everyone), but I think doing so could even constitute a semver-major change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just seems to turn a couple of protected properties into public ones, which would be a semver-minor change (if this is a public API). But this header is one of those with a NODE_WANT_INTERNALS, so I don’t think we even consider this public.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed; this PR no longer modifies these properties.

@jasnell
Copy link
Member

jasnell commented Aug 10, 2016

It would definitely be good to have this in but we do need to get a better sense of just how much of a timing issue exists with the second memcmp().


Returns true if `a` is equal to `b`, without leaking timing information that would allow an attacker to guess one of the values. This is suitable for comparing HMAC digests or secret values like authentication cookies or [capability urls](https://www.w3.org/TR/capability-urls/).

A TypeError will be thrown if either `a` or `b` is not a Buffer instance, or if `a` and `b` have different lengths.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long lines, please wrap at 80 columns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@bnoordhuis
Copy link
Member

the binary distinction between a successful and an unsuccessful comparison is leaked as timing information

Agreed. As is, I don't think this pull request is acceptable.

@not-an-aardvark not-an-aardvark force-pushed the timing-safe-equal branch 2 times, most recently from 432750c to d4f6385 Compare August 10, 2016 13:33
@not-an-aardvark
Copy link
Contributor Author

From what I can tell, here are the main issues that have discussed regarding the second memcmp():

  • If we include the second memcmp():
    • An attacker will be able to distinguish between a successful and an unsuccessful comparison
    • We will become vulnerable to a timing attack if the compiler decides to reorder the two memcmp() functions (this risk might be reduced now that it is implemented in C++)
  • If we exclude the second memcmp():
    • The timingSafeEqual() function will return a false positive in the event of a random sha256 collision (seems unlikely enough that we shouldn't worry about it)
    • If sha256 is broken in the future, and the system's RNG is also compromised, then an attacker will be able to perform a timing attack. (I don't think we need to be concerned about this too much, because if the system's RNG is compromised, then the system will already be vulnerable through other means.)

Based on this, I think we should remove the second memcmp(); it seems to add a timing risk without providing any practical benefit.

@jasnell
Copy link
Member

jasnell commented Aug 10, 2016

Dropping the second memcmp() is a reasonable approach. If even the remote possibility of a collision is unacceptable, however, then another possible approach that would allow us to double check would be to optionally calculate a second digest for each input using a randomly selected key. If the second set of digests is always calculated and always compared, then we avoid leaking the timing data... essentially (in pseudo-code)

key1 = random_key()
key2 = random_key()
h1_input1 = hmac(input1, key1)
h1_input2 = hmac(input2, key1)
h2_input1 = hmac(input1, key2)
h2_input2 = hmac(input2, key2)
r1 = h1_input1 == h1_input2
r2 = h2_input1 == h2_input2
return r1 & r2   // logical and

Again, it's not foolproof because there's likely some timing information that can be extracted from the hmac generation but it avoids the issue above with the second memcpy() (I think ;-) ..)

@not-an-aardvark
Copy link
Contributor Author

not-an-aardvark commented Aug 10, 2016

If even the remote possibility of a collision is unacceptable, however, then another possible approach ... would be to optionally calculate a second digest for each input using a randomly selected key.

If we care about the distinction between a 1/2256 chance of failure and a 1/2512 chance of failure, it might be better to use sha512 rather than using sha256 twice. That said, I think a 1/2256 error rate is low enough that we shouldn't be too concerned about it.

@not-an-aardvark
Copy link
Contributor Author

@addaleax:

the binary distinction between a successful and an unsuccessful comparison is leaked as timing information

Wouldn't this be an issue even without the second memcmp()? The first memcmp() (comparing the HMAC digests) will take longer if the digests match, and will exit early if the digests are different. Due to the double HMAC strategy, this early-exit won't leak any information about the contents of the original buffers themselves, but it will still leak information about the binary distinction between a successful and an unsuccessful comparison.

@jasnell
Copy link
Member

jasnell commented Aug 10, 2016

Yes, I believe that is correct. I do believe memcmp() will exit on the first non-match. Perhaps using memcmp() here is the wrong approach, then? Once the hash is generated, we do a simple iteration that does not exit early?

@not-an-aardvark
Copy link
Contributor Author

not-an-aardvark commented Aug 10, 2016

It seems like that would have the same issue (unpredictable compiler optimization) as simply comparing the input buffers directly using iteration that doesn't exit early.

@jasnell
Copy link
Member

jasnell commented Aug 10, 2016

yep... we'll have to explore the options there more. The only thing that's certain is that memcmp() is not the right choice ;-)

@bnoordhuis
Copy link
Member

Perhaps using memcmp() here is the wrong approach, then?

Good point, that should really be CRYPTO_memcmp(). A sufficiently smart whole program optimizer could still thwart it but it's better than plain memcmp().

@jasnell
Copy link
Member

jasnell commented Aug 10, 2016

Last I checked (and it's been a while) but CRYPTO_memcmp() support was rather sparse (update: as in not extensively used). Has that improved?

(mainly concerned here about whether there is enough usage analysis to support the claim that it's fundamentally less of an issue that memcmp)

@not-an-aardvark
Copy link
Contributor Author

not-an-aardvark commented Aug 10, 2016

Assuming there are no usage/support issues, CRYPTO_memcmp() seems like it would resolve the problem. (At that point we could probably just get rid of the double HMAC entirely, and compare the buffers themselves with CRYPTO_memcmp().)

Otherwise, we could probably just keep the double HMAC solution and note in the docs that it will leak the success/failure status of the comparison. In the majority of use-cases, the success/failure status is observable to an attacker anyway (e.g. a request with a signed token will fail if the HMAC signature is invalid).

@not-an-aardvark
Copy link
Contributor Author

@Trott I think for the most part, the flakiness that we saw wasn't from statistical noise; it was from external factors (such as gc pauses) that caused some benchmarks to take longer than they otherwise would have. The 1/1000 chance of failure noted in the test file only accounts for statistical noise.

The question is whether these external factors are a timing problem as far as security is concerned. I don't think they are a problem, because a gc pause is no more likely to affect the case where two buffers are equal than it is to affect the case where two buffers are different.

@Trott
Copy link
Member

Trott commented Aug 20, 2016

@not-an-aardvark and anyone else: Any opinion on whether these ideas are great, terrible, or just kinda meh?

  • Extract this one test to its own file and mark that test file as flaky instead of this one. Then at least the rest of the tests in this file will send up a red flag if they starts failing.
  • Move the flaky part out of sequential and into pummel where we can have it run for a much longer period of time over more data or more iterations or whatever. Upside is the test might be more reliable. Downside is it won't be run regularly in CI. People will have to know to run it there from time to time. (It's possible that running the pummel tests is part of the release process, but I don't believe so.)

@not-an-aardvark
Copy link
Contributor Author

Extract this one test to its own file and mark that test as flaky. Then at least the rest of the tests in the file will send up a red flag if it starts failing.

Seems like a good idea to me.

Move the flaky part out of sequential and into pummel where we can have it run for a much longer period of time over more data or more iterations or whatever. Upside is the test might be more reliable. Downside is it won't be run regularly in CI. People will have to know to run it there from time to time. (It's possible that running the pummel tests is part of the release process, but I don't believe so.)

I have no strong opinions either way. Of course, the ideal solution would be to investigate the flakiness and get rid of it (perhaps by doing manual gc in the test). I'd be willing to help with this, but it might be difficult for me to do so since I can't seem to reproduce the issue locally.

@not-an-aardvark
Copy link
Contributor Author

not-an-aardvark commented Aug 21, 2016

By the way, why is the test for crypto.timingSafeEqual still causing failures on other builds? Based on this, I had thought marking it as flaky would prevent it from doing that.

@Trott
Copy link
Member

Trott commented Aug 21, 2016

@not-an-aardvark Apparently, there's a bug in the flaky status stuff. nodejs/build#467

@Trott
Copy link
Member

Trott commented Aug 21, 2016

PR to split test: #8202

@Trott
Copy link
Member

Trott commented Aug 21, 2016

Does it make sense to skip the memory-intensive stats-based currently-flaky part of the test on machines that are memory constrained? We currently skip a number of tests on devices that have 1Gb or less of RAM. If it makes sense that the test would fail under those circumstances, we can certainly add it.

@jasnell
Copy link
Member

jasnell commented Aug 21, 2016

We likely should consider a separate bucket for inherently flaky tests that are evaluated more along the lines that @bnoordhuis described. That is, monitoring the rate of flakiness over time. I don't think skipping this test entirely on more constrained devices is necessarily a good idea but running less frequently and not as part of the on demand CI we use for PR review would be just fine.

On Aug 20, 2016, at 8:33 PM, Rich Trott [email protected] wrote:

Does it make sense to skip the memory-intensive stats-based currently-flaky part of the test on machines that are memory constrained? We currently skip a number of tests on devices that have 1Gb or less of RAM. If it makes sense that the test would fail under those circumstances, we can certainly add it.


You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub, or mute the thread.

@not-an-aardvark
Copy link
Contributor Author

not-an-aardvark commented Aug 21, 2016

I don't think this test should be using a lot of memory (it creates two arrays of 10000 integers as it gathers benchmarks, but not much aside from that).

That said, at least part of the problem should be fixable. While the test will always fail 1/1000 test runs due to statistical noise, failures like this one have very high t-values, so they are clearly not only due to statistical noise. Instead, they're probably due to some external factor such as gc pauses. In other words, while this test is inherently flaky, it doesn't have to be as flaky as it is now.

Would it be worth creating a separate PR to see if this is fixed by using manual garbage collection with the --expose-gc flag?

@Trott
Copy link
Member

Trott commented Aug 21, 2016

EDIT: Never mind, I'm probably wrong about much of the below.

@jasnell While this test is inherently flaky, that's not what I'm talking about.

This test is failing much more frequently than it should. And it seems to be failing because of GC. And GC is going to happen a lot more on memory-constrained machines (I imagine). So this test may never be reliable on those machines. (Or maybe it can and we just have to find the workaround.) I'm suggesting that if memory constraints are likely to make this test fail (say) 50% of the time, we should just skip it like we do the stringbytes tests that fail about that much on those machines.

We have a small sample size, but it sure seems like this test is failing about 50% of the time on the Pi 1 Raspbian Wheezy.

@Trott
Copy link
Member

Trott commented Aug 21, 2016

Would it be worth creating a separate PR to see if this is fixed by using manual garbage collection with the --expose-gc flag?

Yes, definitely.

@Trott
Copy link
Member

Trott commented Aug 21, 2016

@jasnell (And never mind. I'm wrong about much of what I wrote above. Ugh. Sorry for my confusion.)

@Trott
Copy link
Member

Trott commented Aug 21, 2016

@not-an-aardvark PTAL at #8203 to confirm that is more or less what you had in mind.

jasnell added a commit to jasnell/node that referenced this pull request Aug 22, 2016
This reverts commit 0fc5e0d.

Additional testing indicates that there may still be timing issues
with this implementation. Revert in order to give more time for
testing before this goes out into a release...

Refs: nodejs#8040
Refs: nodejs#8203
@jasnell
Copy link
Member

jasnell commented Aug 22, 2016

See: #8225

jasnell added a commit that referenced this pull request Aug 23, 2016
This reverts commit 0fc5e0d.

Additional testing indicates that there may still be timing issues
with this implementation. Revert in order to give more time for
testing before this goes out into a release...

Refs: #8040
Refs: #8203
PR-URL: #8225
Reviewed-By: Rich Trott <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
@Fishrock123 Fishrock123 mentioned this pull request Sep 6, 2016
@Fishrock123
Copy link
Contributor

#8428 (comment)

@ChALkeR
Copy link
Member

ChALkeR commented Sep 12, 2016

Fixes: #3043

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. crypto Issues and PRs related to the crypto subsystem. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.