Incremental compilation relies on hashes for soundness #129016
Labels
A-incr-comp
Area: Incremental compilation
C-bug
Category: This is a bug.
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
T-lang
Relevant to the language team, which will review and decide on the PR/issue.
Incremental compilation relies on hashes for soundness -- specifically, on SipHash-1-3 with an all-zero key, which is not a cryptographic hash function. Collisions in that hash can lead to UB and other ill effects. Is that a problem? Should we do things to mitigate the risk of that? Or do we tell people they are expected to use non-incremental builds if they actually want to rely on the soundness of the result? Or is this not a bug at all and the current hashing scheme is "good enough"? After all, as @the8472 argues, if the programmer doesn't actively try to exploit the hard-coded all-zero key they are quite unlikely to hit a collision by pure chance. (That should follow from the fact that SipHash is a PRF, but I am no cryptographer.)
Fixing this by using a cryptographic hash is likely to be very bad for performance. However, @michaelwoerister mentions that there could be cheaper mitigation techniques:
If we use a truly random key for this, we could rely on SipHash being a cryptographic PRF -- without knowing the key, it's supposed to be very hard to find collisions. However, that promise is typically made for SipHash-2-4, not the weakened variant SipHash-1-3 that rustc uses. There seems to be evidence that even the weaker function is "good", but I'll leave it to cryptographic experts to evaluate the evidence here. @briansmith would be good to get your take on this.
Nominating for t-lang discussion to see what their stance is on the soundness requirements for incremental compilation, and whether relying on a non-collision-resistant hash function for soundness is "good enough" -- or whether this should be considered an implementation decision, to be made by t-compiler. (Previously, t-lang ruled that TypeId should use a "full (non-truncated) cryptographic hash", but the tradeoffs are quite different here so it's not at all clear that the same decision would apply to incremental hashes.)
EDIT: I have moved the nomination to #129030 as that broader question directly affects this more specific question.
The text was updated successfully, but these errors were encountered: