Incremental compilation relies on hashes for soundness #129016

RalfJung · 2024-08-12T17:29:06Z

Incremental compilation relies on hashes for soundness -- specifically, on SipHash-1-3 with an all-zero key, which is not a cryptographic hash function. Collisions in that hash can lead to UB and other ill effects. Is that a problem? Should we do things to mitigate the risk of that? Or do we tell people they are expected to use non-incremental builds if they actually want to rely on the soundness of the result? Or is this not a bug at all and the current hashing scheme is "good enough"? After all, as @the8472 argues, if the programmer doesn't actively try to exploit the hard-coded all-zero key they are quite unlikely to hit a collision by pure chance. (That should follow from the fact that SipHash is a PRF, but I am no cryptographer.)

Fixing this by using a cryptographic hash is likely to be very bad for performance. However, @michaelwoerister mentions that there could be cheaper mitigation techniques:

Unless I'm overlooking something all incr. comp. specific hashes only need to be stable between successive incremental builds. So we can easily harden our use of SipHash there by generating random keys and then caching these in the incr. comp. cache.

If we use a truly random key for this, we could rely on SipHash being a cryptographic PRF -- without knowing the key, it's supposed to be very hard to find collisions. However, that promise is typically made for SipHash-2-4, not the weakened variant SipHash-1-3 that rustc uses. There seems to be evidence that even the weaker function is "good", but I'll leave it to cryptographic experts to evaluate the evidence here. @briansmith would be good to get your take on this.

Nominating for t-lang discussion to see what their stance is on the soundness requirements for incremental compilation, and whether relying on a non-collision-resistant hash function for soundness is "good enough" -- or whether this should be considered an implementation decision, to be made by t-compiler. (Previously, t-lang ruled that TypeId should use a "full (non-truncated) cryptographic hash", but the tradeoffs are quite different here so it's not at all clear that the same decision would apply to incremental hashes.)

EDIT: I have moved the nomination to #129030 as that broader question directly affects this more specific question.

michaelwoerister · 2024-08-12T18:03:54Z

Some more, possibly interesting, points:

We never claimed that incr. comp. hashes are resistant against a deliberate attack. The hashes are meant to be fingerprints with the property of virtual uniqueness, i.e. the chance of a collision is supposed to be so small to be negligible in practice.
As far as I understand it, using a cryptographic hash function would only really improve the situation if we also widened hashes to at least 256 bits (we are using 128 bits at the moment).
There are hash functions that are designed specifically for the fingerprinting case, like XXH3. These make no claims about collision resistance.
The UMASH hash function might be interesting for this use case. It is not a cryptographic hash function but it provides a collision probability bound, which is the thing we are most interested in.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 12, 2024

RalfJung added the I-lang-nominated Nominated for discussion during a lang team meeting. label Aug 12, 2024

RalfJung mentioned this issue Aug 12, 2024

Collisions in type_id #10389

Closed

jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-incr-comp Area: Incremental compilation C-bug Category: This is a bug. labels Aug 12, 2024

RalfJung removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 12, 2024

RalfJung mentioned this issue Aug 12, 2024

type_id is not sufficiently collision-resistant #129014

Open

jieyouxu added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Aug 12, 2024

the8472 mentioned this issue Aug 12, 2024

Language vs. implementation threat models and implications for TypeId collision resistance #129030

Open

RalfJung removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental compilation relies on hashes for soundness #129016

Incremental compilation relies on hashes for soundness #129016

RalfJung commented Aug 12, 2024 •

edited

Loading

michaelwoerister commented Aug 12, 2024

Incremental compilation relies on hashes for soundness #129016

Incremental compilation relies on hashes for soundness #129016

Comments

RalfJung commented Aug 12, 2024 • edited Loading

michaelwoerister commented Aug 12, 2024

RalfJung commented Aug 12, 2024 •

edited

Loading