Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for broken Ryzen RDRAND #67

Merged
merged 6 commits into from
Nov 1, 2019

Conversation

SalusaSecondus
Copy link
Contributor

@SalusaSecondus SalusaSecondus commented Oct 30, 2019

It is important that the dieharder acceptance test passes before this PR is merged.

Description of changes:
Some AMD Ryzen CPUs get into a bad state where the output of RDRAND becomes "stuck" on a value of all 1s (e.g., every bit in the output is a one). Unfortunately, it doesn't correctly set the carry flag to indicate an RNG failure.

This change detects this case and handles it as a standard RNG error. It does this by watching for the suspicious value of 0xffffffffffffffff and if it is seen, checks to see if the RNG is stuck on that value. We take this strategy rather than failing upon simply seeing a single instance of 0xffffffffffffffff for two reasons:

  1. We don't like the odds of a 1/2^64 false positive. It's low, but this library is used in masssively large scale systems.
  2. We don't like the (negligible) bias introduced by rejecting any output with the value 0xffffffffffffffff. This avoids all biased output.

We also check for the suspicious value of 0 using the same technique. While we believe all modern systems to correctly set the carry flag on failure, some old/non-standard systems used a 0 to indicate failure.

For more context please see this Ars Technical article and this systemd patch.

To support some libraries which do not define UINT64_MAX, I also added a definition for this. As evidence that it is being correctly defined, here is the disassembled version of rng_rdseed which is correctly checking against the suspicious value of 64 bits of 1s.

00000000000419d0 <_ZN28AmazonCorrettoCryptoProvider10rng_rdrandEPm>:
   419d0:       31 d2                   xor    %edx,%edx
   419d2:       89 d1                   mov    %edx,%ecx
   419d4:       89 d0                   mov    %edx,%eax
   419d6:       48 0f c7 f1             rdrand %rcx
   419da:       0f 92 c0                setb   %al
   419dd:       84 c0                   test   %al,%al
   419df:       41 89 c0                mov    %eax,%r8d
   419e2:       48 89 ce                mov    %rcx,%rsi
   419e5:       48 89 0f                mov    %rcx,(%rdi)
   419e8:       74 0a                   je     419f4 <_ZN28AmazonCorrettoCryptoProvider10rng_rdrandEPm+0x24>
   419ea:       48 8d 49 ff             lea    -0x1(%rcx),%rcx
   419ee:       48 83 f9 fd             cmp    $0xfffffffffffffffd,%rcx
   419f2:       77 04                   ja     419f8 <_ZN28AmazonCorrettoCryptoProvider10rng_rdrandEPm+0x28>
   419f4:       44 89 c0                mov    %r8d,%eax
   419f7:       c3                      retq
   419f8:       89 d1                   mov    %edx,%ecx
   419fa:       89 d0                   mov    %edx,%eax
   419fc:       48 0f c7 f1             rdrand %rcx
   41a00:       0f 92 c0                setb   %al
   41a03:       48 39 ce                cmp    %rcx,%rsi
   41a06:       41 89 c0                mov    %eax,%r8d
   41a09:       75 e9                   jne    419f4 <_ZN28AmazonCorrettoCryptoProvider10rng_rdrandEPm+0x24>
   41a0b:       48 c7 07 00 00 00 00    movq   $0x0,(%rdi)
   41a12:       45 31 c0                xor    %r8d,%r8d
   41a15:       eb dd                   jmp    419f4 <_ZN28AmazonCorrettoCryptoProvider10rng_rdrandEPm+0x24>
   41a17:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
   41a1e:       00 00

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Some AMD Ryzen CPUs get into a bad state where
the output of RDRAND becomes "stuck" on a value
of all 1s (e.g., every bit in the output is a one).
Unfortunately, it doesn't correctly set the carry flag
to indicate an RNG failure.

This change detects this case and handles it as a
standard RNG error.
csrc/rdrand.cpp Outdated Show resolved Hide resolved
csrc/rdrand.cpp Outdated Show resolved Hide resolved
csrc/rdrand.cpp Outdated Show resolved Hide resolved
csrc/rdrand.cpp Outdated Show resolved Hide resolved
This moves where we check for RDRAND being stuck to a better
location and also adds tests for this logic. These tests only
run with the 'coverage' target. I have validated that the
tests work by backing out the fixes and manually verifying
the failure.
csrc/rdrand.cpp Outdated Show resolved Hide resolved
csrc/rdrand.cpp Outdated Show resolved Hide resolved
@SalusaSecondus
Copy link
Contributor Author

As discussed with @bdonlan, I've simplified the logic to just discard the suspicious values of 0 and UINT64_MAX.

I've also restarted the dieharder run. The specific run looked to fail with a false-positive (as happens in all entropy tests). The failure is pasted below so we can see if it recurs.

        diehard_sums|   0|       100|     100|0.00072921|   WEAK    
        diehard_sums|   0|       100|     200|0.00288711|   WEAK    
        diehard_sums|   0|       100|     300|0.00005085|   WEAK    
        diehard_sums|   0|       100|     400|0.00005684|   WEAK    
        diehard_sums|   0|       100|     500|0.00005652|   WEAK    
        diehard_sums|   0|       100|     600|0.00005056|   WEAK    
        diehard_sums|   0|       100|     700|0.00000350|   WEAK    
        diehard_sums|   0|       100|     800|0.00000494|   WEAK    
        diehard_sums|   0|       100|     900|0.00000016|  FAILED   

@SalusaSecondus SalusaSecondus merged commit 348f01c into corretto:develop Nov 1, 2019
@SalusaSecondus SalusaSecondus deleted the amd-rdrand branch November 1, 2019 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants