Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/rand: crash process on error reading randomness #66821

Closed
FiloSottile opened this issue Apr 14, 2024 · 34 comments
Closed

crypto/rand: crash process on error reading randomness #66821

FiloSottile opened this issue Apr 14, 2024 · 34 comments
Labels
Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Milestone

Comments

@FiloSottile
Copy link
Contributor

FiloSottile commented Apr 14, 2024

On almost all our platforms, we now have crypto/rand backends that ~never fail.

  • On Linux, we primarily use the getrandom(2) system call, which never fails.

    • It may block if the pool is not initialized yet at early boot, and may be interrupted by a signal handler if requesting more than 256 bytes, but neither of those surface as errors to the application.
    • getrandom() was first available in Linux 3.17, released in October 2014. Debian oldstable is on Linux 5.10.
    • getrandom() can be blocked with seccomp. That's a bad (and weird) idea, and the default Docker profile doesn't do that. In that case we fall back to opening /dev/urandom, which might fail if the file is not available or file descriptors run out.
  • On macOS and iOS we use arc4random() since https://go.dev/cl/569655. From the man page:

    These functions are always successful, and no return value is reserved to indicate an error.

  • On Windows we use the ProcessPrng function. From the docs:

    Always returns TRUE.

  • The BSDs use similar syscalls with similar properties (whether getrandom or getentropy) although we should switch the ones we can to arc4random.

  • On js/wasm we use getRandomValues which doesn't have documented failure modes.

  • On WASIP1 there's random_get which regrettably has an error return value, making it the one platform (ignoring misconfigured Linux) where there might be errors getting platform random bytes. Since WASI rests on an underlying platform, and every underlying platform has failure-less CSPRNGs, it's hard to imagine why random_get should actually return an error.

I'm proposing we make crypto/rand throw (irrecoverably crash the program) if an error occurs, and document that the error return values of crypto/rand.Read and crypto/rand.Reader.Read are always nil.

This will free applications from having to do error handling for a condition that essentially can't happen, and that if it did happen is essentially not possible to handle securely by the application.

This will also allow introducing new APIs like a hypothetical String(charset string) string (not part of this proposal) without an error return, making them more usable and appealing.

Based on a suggestion by @rsc.

/cc @golang/security @golang/proposal-review

@FiloSottile FiloSottile added the Proposal-Crypto Proposal related to crypto packages or other security issues label Apr 14, 2024
@gopherbot gopherbot added this to the Proposal milestone Apr 14, 2024
@daira
Copy link

daira commented Apr 14, 2024

Do you mean crypto/rand should panic on errors? I'm not very familiar with Go but I didn't think it used the terminology of "throwing" errors. I agree, on general language-independent robustness principles, that it should panic.

@FiloSottile
Copy link
Contributor Author

Panics are recoverable, throw is an internal name for fatal errors. Think of it as a call to exit(1).

If we made crypto/rand panic that risks encouraging applications to wrap the calls in defer/recover "for robustness", when really we think it's so unlikely and so unrecoverable that applications shouldn't try.

go/src/runtime/panic.go

Lines 1008 to 1022 in 519f6a0

// throw triggers a fatal error that dumps a stack trace and exits.
//
// throw should be used for runtime-internal fatal errors where Go itself,
// rather than user code, may be at fault for the failure.
//
//go:nosplit
func throw(s string) {
// Everything throw does should be recursively nosplit so it
// can be called even when it's unsafe to grow the stack.
systemstack(func() {
print("fatal error: ", s, "\n")
})
fatalthrow(throwTypeRuntime)
}

@bradfitz
Copy link
Contributor

I think all our code already panics on crypto/rand errors (via wrapppers that don't return errors) so SGTM 😀

@ydnar
Copy link

ydnar commented Apr 14, 2024

WASI 0.2 random interface thankfully does not return an error: https://github.com/WebAssembly/wasi-random/blob/main/wit/random.wit

@icholy
Copy link

icholy commented Apr 15, 2024

Sounds like we need a rand v3

@Jorropo
Copy link
Member

Jorropo commented Apr 15, 2024

Sounds like we need a rand v3

This is the best argument for always using crand and mrand aliases.
||crypto/rand != math/rand 😉||

@Jorropo
Copy link
Member

Jorropo commented Apr 16, 2024

  • On Linux, we primarily use the getrandom(2) system call, which never fails.
    • It may block if the pool is not initialized yet at early boot, and may be interrupted by a signal handler if requesting more than 256 bytes, but neither of those surface as errors to the application.
    • getrandom() was first available in Linux 3.17, released in October 2014. Debian oldstable is on Linux 5.10.
    • getrandom() can be blocked with seccomp. That's a bad (and weird) idea, and [the default Docker profile doesn't do that]

Should we keep the /dev/urandom fallback for 3.17+ ?
I would rather be forced to tweak my seccomp config than having to debug rare flaky throws because I incorrectly configured some sandboxing options.
We can't remove /dev/urandom completely on linux without raising the 2.6.32 baseline.

@FiloSottile
Copy link
Contributor Author

Should we keep the /dev/urandom fallback for 3.17+ ?

This is tempting, but I think making decisions based on kernel version is opening a can of worms. I think even the urandom fallback is reasonably reliable: the file is opened only once, so either crypto/rand never works in a given process or it always works (although it might flake across process executions, if you run out of fds before the first Read call, or if the file is removed).

@Jorropo
Copy link
Member

Jorropo commented Apr 16, 2024

Ah I thought it opened a new file each time.

Then can we use import time side effects to solve this (open the file in init) ?
It's very unlikely you are running out of fds before main even started running.

I get the std tries to not do that, but the overwhelming majority of cases init will start running, try getrandom, succeed and do nothing that seems fine to me. (there also is a clear path to solving this if anyone finds it to be an issue, fixing their seccomp config)

@FiloSottile
Copy link
Contributor Author

I was thinking about that but I have no intuition as to whether the cost of calling getrandom (to check if it's available) on init() for every Linux program is acceptable.

@randall77
Copy link
Contributor

If crypto/rand is imported (even indirectly), it should be fine to make a single system call during init. I'm kinda surprised it doesn't do so already.
The runtime reads from /dev/urandom on every startup.

@mateusz834
Copy link
Member

mateusz834 commented Apr 16, 2024

@randall77

The runtime reads from /dev/urandom on every startup.

Does it? On linux when there is random in auxv, then it does not even open it.

go/src/runtime/rand.go

Lines 44 to 58 in f17b28d

if startupRand != nil {
for i, c := range startupRand {
seed[i%len(seed)] ^= c
}
clear(startupRand)
startupRand = nil
} else {
if readRandom(seed[:]) != len(seed) {
// readRandom should never fail, but if it does we'd rather
// not make Go binaries completely unusable, so make up
// some random data based on the current time.
readRandomFailed = true
readTimeRandom(seed[:])
}
}

@randall77
Copy link
Contributor

@mateusz834 True, on linux if we get auxv randomness we don't read /dev/urandom.
We read it unconditionally on lots of OSes, like darwin and the BSDs.

@rsc rsc changed the title proposal: crypto/rand: throw on errors proposal: crypto/rand: crash process on errors Apr 24, 2024
@rsc rsc changed the title proposal: crypto/rand: crash process on errors proposal: crypto/rand: crash process on error reading randomness Apr 24, 2024
@rsc
Copy link
Contributor

rsc commented Apr 24, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented May 8, 2024

Have all remaining concerns about this proposal been addressed?

The proposal is to document that crypto/rand.Read and crypto/rand.Reader.Read always return the full amount requested and never return errors. If the underlying OS returns an error, the Go process will runtime.throw, meaning the process crashes with no chance to recover. But no underlying OS’s actually return errors from random reads anymore.

This lets callers simplify and delete their dead error handling paths.

@mateusz834
Copy link
Member

mateusz834 commented May 9, 2024

FYI, the Reader is a var and people might change it to a custom Reader implementation. I wonder whether that might be an issue for us here?

The proposal is to document that crypto/rand.Read and crypto/rand.Reader.Read always return the full amount requested and never return errors.

@rsc I think that we cannot document rand.Read as such.

I think that, we can only document the default rand.Reader this way.

@mateusz834
Copy link
Member

mateusz834 commented May 11, 2024

Also it seems like plan9 always opens /dev/urandom, so i guess it might not always exist. Maybe we can replace it with some kind of syscall?

func (r *reader) Read(b []byte) (n int, err error) {
r.seeded.Do(func() {
t := time.AfterFunc(time.Minute, func() {
println("crypto/rand: blocked for 60 seconds waiting to read random data from the kernel")
})
defer t.Stop()
entropy, err := os.Open(randomDevice)
if err != nil {
r.seedErr = err
return
}
defer entropy.Close()
_, r.seedErr = io.ReadFull(entropy, r.key[:])
})

CC @0intro

@rsc rsc changed the title proposal: crypto/rand: crash process on error reading randomness crypto/rand: crash process on error reading randomness Jun 5, 2024
@rsc rsc modified the milestones: Proposal, Backlog Jun 5, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/602497 mentions this issue: crypto/rand: crash program if Read would return an error

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/602496 mentions this issue: crypto/rand: improve getrandom batching and retry logic

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/602495 mentions this issue: crypto/rand: remove /dev/urandom fallback and simplify package structure

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/608175 mentions this issue: crypto/rand: reintroduce urandom fallback for legacy Linux kernels

@FiloSottile
Copy link
Contributor Author

A follow-up question from the CL: should we have a randcrash=0 GODEBUG to turn the throw off? It makes it a little less obvious/universal that it's ok to ignore the error from Read, but that depends on the toolchain version anyway.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/608435 mentions this issue: crypto/rand: add randcrash=0 GODEBUG

gopherbot pushed a commit that referenced this issue Oct 7, 2024
The fallback was reachable on

    - Linux, where starting in Go 1.24 we require a kernel with
      getrandom(2), see #67001.

    - FreeBSD, which added getrandom(2) in FreeBSD 12.0, which we
      require since Go 1.19.

    - OpenBSD, which added getentropy(2) in OpenBSD 5.6, and we only
      support the latest version.

    - DragonFly BSD, which has getrandom(2) and where we support only
      the latest version.

    - NetBSD, where we switched to kern.arandom in CL 511036, available
      since NetBSD 4.0.

    - illumos, which has getrandom(2). (Supported versions unclear.)

    - Solaris, which had getrandom(2) at least since Oracle
      Solaris 11.4.

    - AIX, which... ugh, fine, but that code is now in rand_aix.go.

At the end of the day the platform-specific code is just a global
func(b []byte) error, so simplified the package around that assumption.

This also includes the following change, which used to be a separate CL.

    crypto/rand: improve getrandom batching and retry logic

    The previous logic assumed getrandom never returned short, and then
    applied stricter-than-necessary batch size limits, presumably to
    avoid short returns.

    This was still not sufficient because above 256 bytes getrandom(2)
    can be interrupted by a signal and return short *or* it can simply
    return EINTR if the pool is not initialized (regardless of buffer
    size).

    https://man.archlinux.org/man/getrandom.2#Interruption_by_a_signal_handler

    Whether this ever failed in practice is unknown: it would have been
    masked by the /dev/urandom fallback before.

    Instead, we apply buffer size limits only where necessary (really,
    only Solaris in practice and FreeBSD in theory) and then handle
    gracefully short returns and EINTR.

    Change-Id: I8677b457aab68a8fb6137a3b43538efc62eb7c93

It turns out that we now know that large getrandom calls *did* fail in
practice, falling back on /dev/urandom, because when we removed the
fallback TestBidiStreamReverseProxy with its 4KiB read started failing.

https://cr-buildbucket.appspot.com/build/8740779846954406033

For #66821

Change-Id: Iaca62997604f326501a51401cdc2659c2790ff22
Reviewed-on: https://go-review.googlesource.com/c/go/+/602495
Reviewed-by: Roland Shoemaker <[email protected]>
Reviewed-by: David Chase <[email protected]>
Reviewed-by: Daniel McCarney <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
gopherbot pushed a commit that referenced this issue Oct 7, 2024
Reintroduce the urandom fallback, but this time with a robust set of
tests all pointing guns at each other, including a seccomp'd respawn
simulating the lack of getrandom, to make sure the fallback both works
and is never hit unexpectedly.

Unlike the Go 1.23 fallback, the new one only triggers on ENOSYS (which
is cached by unix.GetRandom) and doesn't handle the EAGAIN errors we
never got an explanation for.

We still crash the program from Read if we have to go to /dev/urandom
and we fail to open it.

For #67001
Updates #66821

Tested on legacy SlowBots (without plan9 and illumos, which don't work):
TRY=aix-ppc64,dragonfly-amd64,freebsd-amd64,freebsd-386,netbsd-amd64

Cq-Include-Trybots: luci.golang.try:gotip-darwin-amd64_14,gotip-solaris-amd64,gotip-js-wasm,gotip-wasip1-wasm_wasmtime,gotip-wasip1-wasm_wazero,gotip-windows-amd64,gotip-windows-386,gotip-linux-386,gotip-linux-amd64-longtest-race,gotip-linux-amd64-boringcrypto
Change-Id: Idecc96a18cd6363087f5b2a4671c6fd1c41a3b0e
Reviewed-on: https://go-review.googlesource.com/c/go/+/608175
Reviewed-by: Daniel McCarney <[email protected]>
Reviewed-by: Cherry Mui <[email protected]>
Reviewed-by: Roland Shoemaker <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
gopherbot pushed a commit that referenced this issue Oct 7, 2024
For #66821

Change-Id: I525c308d6d6243a2bc805e819dcf40b67e52ade5
Reviewed-on: https://go-review.googlesource.com/c/go/+/608435
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Daniel McCarney <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
Reviewed-by: Roland Shoemaker <[email protected]>
@dmitshur dmitshur modified the milestones: Backlog, Go1.24 Oct 7, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/621979 mentions this issue: crypto/internal/mlkem768: remove crypto/rand.Read error checking

@FiloSottile
Copy link
Contributor Author

We realized a GODEBUG is actually a security risk here: most programs will start to ignore errors from Read because they can't happen (which is the intended behavior), but then if a program is run with GODEBUG=randcrash=0 it will use a partial buffer in case an error occurs, which may be catastrophic. (Errors should be impossible, but if they are then the GODEBUG is useless anyway.) Mailed https://go.dev/cl/622115 to revert it.

Note that the proposal was accepted without the GODEBUG, which was only added later.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/622115 mentions this issue: Revert "crypto/rand: add randcrash=0 GODEBUG"

gopherbot pushed a commit that referenced this issue Oct 28, 2024
A GODEBUG is actually a security risk here: most programs will start to
ignore errors from Read because they can't happen (which is the intended
behavior), but then if a program is run with GODEBUG=randcrash=0 it will
use a partial buffer in case an error occurs, which may be catastrophic.

Note that the proposal was accepted without the GODEBUG, which was only
added later.

This (partially) reverts CL 608435. I kept the tests.

Updates #66821

Change-Id: I3fd20f9cae0d34115133fe935f0cfc7a741a2662
Reviewed-on: https://go-review.googlesource.com/c/go/+/622115
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Carlos Amedee <[email protected]>
Reviewed-by: Roland Shoemaker <[email protected]>
Auto-Submit: Filippo Valsorda <[email protected]>
Reviewed-by: Daniel McCarney <[email protected]>
gopherbot pushed a commit that referenced this issue Nov 19, 2024
After #66821 crypto/rand.Read can't return an error.

Change-Id: I185063a25ef70986448f2a300e5578de17f6e61e
Reviewed-on: https://go-review.googlesource.com/c/go/+/621979
Auto-Submit: Filippo Valsorda <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Reviewed-by: Michael Knyszek <[email protected]>
Reviewed-by: Russ Cox <[email protected]>
Reviewed-by: Daniel McCarney <[email protected]>
@dmitshur
Copy link
Contributor

@FiloSottile Does this change need to be mentioned in Go 1.24 release notes? Thanks.

@ianlancetaylor
Copy link
Member

I think this does need to be in the release notes. Sent https://go.dev/cl/632036. Thanks.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/632036 mentions this issue: doc/next: document that crypto/rand.Read never fails

gopherbot pushed a commit that referenced this issue Nov 27, 2024
For #66821

Change-Id: Id9b640a57b9d4d1f9114769f607480b14961e7b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/632036
Reviewed-by: Russ Cox <[email protected]>
Reviewed-by: Dmitri Shuralyov <[email protected]>
Auto-Submit: Ian Lance Taylor <[email protected]>
LUCI-TryBot-Result: Go LUCI <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal Proposal-Accepted Proposal-Crypto Proposal related to crypto packages or other security issues
Projects
Status: Accepted
Development

No branches or pull requests