-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Widen TypeId from 64 bits to 128. #75923
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
I don't think I'm the right person to review this. Randomly assigning someone from the compiler team: r? @estebank From libs perspective: I think we do not promise anything about the type (we even explicitly state that it's "opaque"), so this change seems fine. |
Probably should've picked a reviewer myself. r? @oli-obk cc @nikomatsakis Also, I want a crater run but the queue seems pretty bad right now and I wouldn't want to add onto it (especially if this might not be able to be check-only). @bors try @rust-timer queue |
Awaiting bors try build completion |
⌛ Trying commit 3990fad with merge 7d696513621aa8d27282101be594aedb03593636... |
☀️ Try build successful - checks-actions, checks-azure |
Queued 7d696513621aa8d27282101be594aedb03593636 with parent 41aaa90, future comparison URL. |
Finished benchmarking try commit (7d696513621aa8d27282101be594aedb03593636): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
diff lgtm, I think we should just merge it this early in the cycle and see what happens. Feel free to |
I agree that this is purely T-compiler; and I don't think it needs an FCP. It should be an entirely private implementation detail. Waiting for a non-check crater run to come back also seems hard, and we're at the very start of a cycle right now so I think we should just land it and keep an eye out for regressions. I was going to just r=oli, but it looks like this breaks one of the perf benchmarks (in turn breaking another one, which depends on webrender crate as well). I think we should fix that before landing this. @eddyb, would you be up for submitting a PR to rustc-perf which changes the transmute to a transmute_copy with u128 or similar? I've not investigated at all why this is being done either.
|
See, this is why I wanted a crater run: I've either seen, or had strong suspicions of, code like that being in the wild. It doesn't really matter that it's internal, it's always been Maybe it's fine to do a check-only crater, but Note that there is an even greater danger, of people being able to assume the value of This could also be observed at compile-time, if someone were to try and transmute a (we could maybe turn a |
Ouch.
Well, this increases the size, so previous |
@RalfJung Right, that part was more of a reply to @Mark-Simulacrum's suggestion for fixing If we change it to So I guess there's no easy way to detect non- |
For webrender, they could do something like match size_of::<TypeId>() {
8 => /* transmute_copy to u64 */,
16 => /* transmute_copy to u128 */,
_ => panic!(),
} |
@eddyb I’m having a hard time coming up with a mechanism for linkers to detect a collision. We would invariably need to emit symbols where the name of the symbol is the hash value, but then in a cross-crate scenario you would never be able to know if any sibling crate has already emitted such a symbol itself, causing a false positive. It would be somewhat more plausible in a situation where the intermediate crates put the information about generated Finally, anything linker-based with So the odds at making the collision a guaranteed compile-time/link-time failure are against us. We can maybe do something to catch some cases, but definitely not all of them. IMHO we should both use a better quality hasher and increase the bit width here. (Silly ideas start here) Even better if we can figure out how to make weirdly sized like e.g. |
I was imagining using the Worst case we could use C++ global constructors support to register all the generated |
Coming back to my suggestion which @eddyb mentioned in #75923 (comment), one possibility is that This would break compatibility with the current situation, as was noted in that earlier comment, in a few ways:
If the conclusion is that we need to break some of the properties which code is currently relying on (such as the size of (It could also be impossible for reasons I'm not seeing off the top of my head, but y'know)
I don't think global constructors are supported on every platform, and could have a lot of overhead. Especially because |
One thing about @mystor's suggestion is that IMO we should undo #72488 (including on beta, making this a ticking clock!) until we have an implementation of it, to avoid anyone relying on the compile-time value being less than fully opaque, if we ever want to do it.
My idea was a We could even precompute this hash table for a Rust binary (and maybe pick a per-binary bucket count, to make it flatter/more like a PHF), so only At least we could start by (statically) tracking what |
Going to nominate for compiler team to talk about @eddyb's concern wrt to typeid stabilization; it might be a T-lang question too. It would perhaps be good to file a dedicated issue for that concern, though, I am not sure I follow it completely myself. It seems like const-time TypeId is the "easy case" as we can definitely check for hash collisions in that context, potentially with an entirely different scheme than used at runtime. |
closing this as inactive after a discussion with the author |
Late comment, but
This is a ludicrous argument. Cryptographic hash functions having collisions coming up in routine use with a handful (a few thousands) of inputs would be a stop-the-world, patch-everything and incinerate unpatchable hardware event. One of the defining features of cryptographic hashes is collision resistance. And no, structured but distinct inputs will not do better than random inputs in this area, because if structured inputs did better then attacks would be using them! So, assuming the hash function is secure (upholding its contract) and that your inputs are distinct then it would take more than the age of the universe, a significant fraction of all matter in the universe used as storage and enormous amounts of energy even given optimal computers operating at the temperature of the microwave background to find one collision. If a rustc-invocation happens to do that on the side by generating so many type IDs also doing otherwork then I'll say you either have a serious problem or you're Clippy and perhaps really do need to fix rust to take over the remainder of the universe. And in that case you you simply switch to a 512bit hash function. |
@the8472 For what it's worth, very similar arguments have been brought up a lot more on #95845, at some point in that discussion I even believe @RalfJung started agreeing as well, and in the end #95845 was abandoned because of consensus along the very same lines (at least to the extent that I understand them). My main regret nowadays is that we didn't design the Rust (v0) mangling 2-3 years earlier than we did, because maybe if we had, it might've been easier to transition |
(btw calling the second mangling scheme Rust uses "v0" was IMO not a good choice -- I always think v0 is the legacy one and v1 the more structured one we are transitioning to)
Well if you apply the same argument to md5 you will get "similar" numbers and we know collisions there because of cryptographic weaknesses. So just looking at the number of bits is a rather naive way of analyzing this. IOW, "assuming the hash function is secure" is doing a lot of work here. We know the hash function is not secure, in the sense that we know that there are collisions (by the pigeon hole principle). There isn't even an agreed-upon standard for what makes a deterministic hash function secure. So this is all not quite as unambiguous as you make it sound. But, yeah, I would still be impressed if Rust were able to find a collision in sha256. Other assumptions the language makes about how hardware and operating systems work are IMO much bigger risks to validity of the Rust claims than those hash collisions. |
Given non-malicious inputs md5 would in fact still do the job, albeit at only 64bits worth of collision-resistance which would actually get you much smaller numbers, more supercomputer-scale instead of universe-scale.
A hash function is considered secure when you can't do better than brute-forcing it.
If you mean that most widely used cryptographic hash functions aren't provably secure because the provably-secure ones are way too slow, then that is correct. Which is why we have to replace them every few decades because someone does find a better-than-brute-force attack. But that only means we may have to replace one 256bit hash algorithm with another one at some distant point in the future. If the concern is that someone might rely on the specific bits of the hash - which is a completely different argument - then we can add the rustc version to the hash inputs which is somewhat similar to randomizing hashmaps which not only helps against HashDoS but also against people relying on iteration order. |
We already hash the rustc version. The stable crate id, which is included in type id hashes, is a hash of among other things the rustc version. |
Then I don't see a need to do string comparisons. Against accidental collisions a 128bit hash would do in all but extreme scenarios. Against malicious collisions a 256bit hash + occasional algorithm changes when new attacks are discovered should do. And if we're ever concerned about an attacker with with access to quantum computers then we might have to bump it to 384 bits. |
That's an intuition, not a definition.
No, what I mean is that there isn't even a definition to prove them secure against. (Which also means that provably-secure ones cannot exist.) There are security definitions for keyed hash functions, but AFAIK there is nothing for unkeyed deterministic hash functions. For encryption, we have well-established game-based security definitions, encoding the inability of an attacker to learn even a single bit of information about the clear text. For signatures, we have something similar. But for hash functions, no such definition exists: the property we want is "has no collisions that can be found in polynomial time", but that definition is meaningless -- there exist two inputs to sha256 that produce the same output, and so there exists a constant-time attacker that just produces those two inputs. We can't actually compute such an attacker, but to my knowledge nobody figured out a way to incorporate that into a security definition.
I agree. I am just applying some factual corrections to your statements, but I agree with your conclusions. |
Would it be possible to do something like finding n unique collisions each in polynomial time where n tends towards infinity. Or find a collision between hash(a || b) and hash(a || c) in polynomial time for any given a. In both cases I think a constant time algorithm would require being able to derive one collision from another collision in constant time, which if the case would almost definitively make the hash algorithm be considered not cryptographically secure. |
I wouldn't know that any of them is an accepted notion of security. The one with the |
(entirely the wrong venue for this but...) I'm not 100% happy w/ it but IIRC choosing But, it's "Rust mangling v0", an officially specified format, whereas the other thing is "whatever "Rust didn't have a mangling scheme, one was RFC'd in 2019, and The fact that we're aware of "what Speaking of long-term support... you don't need any dedicated tools to get something (just demangle it as a C++ symbol) - in fact, outside of the (Oh and in the C port of the demangler, the legacy support uses |
…apkin Use 128 bits for TypeId hash Preliminary/Draft impl of rust-lang/compiler-team#608 Prior art (probably incomplete list) - rust-lang#75923 - rust-lang#95845
Use 128 bits for TypeId hash Preliminary/Draft impl of rust-lang/compiler-team#608 Prior art (probably incomplete list) - rust-lang/rust#75923 - rust-lang/rust#95845
Doubling the bit-width of the type hash inside
TypeId
would serve two purposes:transmute
-ingTypeId
s tou64
(although not if they read anu64
from a&TypeId
in a less checked way)Maybe we shouldn't use SipHash for
TypeId
, as unlike incremental dep-graph hashes, we don't need computingTypeId
s to be incredibly fast, but also we can change that independently of the bit-width we keep inTypeId
.