-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Never allow reads from uninitialized memory in safe Rust #837
Conversation
will serve to set an explicit policy that: | ||
|
||
**Uninitialized memory can ever be exposed in safe Rust, even when it | ||
would not lead to undefined behavior**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that you mean "can never be exposed". 😀
# Summary | ||
|
||
Set an explicit policy that uninitialized memory can ever be exposed | ||
in safe Rust, even when it would not lead to undefined behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed the one below first, but this also should say "can never".
@quantheory Shortest "Detailed Design" ever and I still managed to insert crucial typos! |
memory and typesafe, but they carry security risks. | ||
|
||
In particular, it may be possible to exploit a bug in safe Rust code | ||
that leads that code to reveal the contents of memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit: I would say "causes" rather than "leads".
I'm pretty good at making typos myself! I do want to take a crack at copyediting all the RFCS one of these days... Regarding the proposal itself, 👍. I think that in a lot of the cases where it's "obvious" that initialization is unnecessary, a good dead code elimination pass has some chance of optimizing out the extra cost. Anyway, the cost is likely to be acceptable given that you're incurring some cost by allocating the buffer in the first place and/or doing whatever operation fills it with the data you actually wanted. |
So, what would this RFC mean in practical terms? Sounds like |
Yes, and (b) in particular means that we would have to zero out memory or something similar in the implementation of |
👍 |
Given that there are really not too many valid uses for reading uninitialized memory even in unsafe code, this seems like a welcome change for consistency. |
I'm personally not quite in favor of this proposal. I think that it's a fine line in how we leverage Taken to an extreme, I would find it quite unfortunate if the "I/O rule of thumb" was to use Put another way, the motivation here seems to solely be "uninitialized memory is a security risk," but I think there is a much broader issues about the role of |
I think that there's a problem of semantics here. Most discussions of "memory safety" that I can recall consider reads of uninitialized data to be unsafe by definition, because of issues like:
The usual safety guarantees in Rust protect you from problem 1. They don't protect you from problem 2; an uninitialized If you define memory safety as meaning only that you avoid undefined behavior in the C/C++ sense, then you can get away with some reads of uninitialized data in safe code. But it's tricky business, and a lot of people would still not consider that to be consistent with what they expect "memory safe" code to allow. |
Note that the motivation for this RFC is clear in that this is not a question of memory safety. The only cases being considered are uninitialized arrays of scalars which cannot result in memory safety violations if read. In this sense points 1/2 are somewhat moot as the "objects" are scalars which basically can't be corrupt. Points 3/4/5 do not actually affect memory safety but play into unsafety when related to security. |
Absolutely. Which is exactly why I felt this topic merited an RFC, since this is a gray area for which there's not complete agreement, and we need to make a clear policy decision.
That does sound bad! But I'll note that if this policy turns out to be too expensive in practice -- that is, if people end up reaching for
I understand where you're coming from here, but as a rule, I generally prefer to avoid arguments from extremes/slippery slope arguments. We as a community get to decide exactly what safety covers, and thanks to the RFC process we can ensure that changes to this policy are deliberate and relatively clear. I feel like this is a way of framing the issue in much more extreme terms than the RFC is actually talking about. Yes, there are many other security issues, but this RFC is about a specific policy decision with concrete, quantifiable tradeoffs. In the case of IO -- which seems to be the main culprit at the moment -- we can and should measure the performance impact here. But as we're seeing with hashing, it's possible to provide some localized mitigation against security/DoS risks based on a specific cost/benefit analysis. If the local downsides are not too great, why not take steps where we can to be even more ambitious with the problems that Rust helps you catch? I know that in general we've been moving away from opinionation (green threading, high-level IO abstractions, etc), but we shouldn't go into full retreat! Especially in cases like this, and hashing, where there are clear and relatively simple ways to opt out of the guarantee for performance-critical scenarios, without paying any extra performance costs. |
That is what I was getting at. If I wasn't clear, I think that we are in perfect agreement about all of the concrete technical details and what the RFC is intended to do. (Though I'd add a nitpick that "scalar" is not entirely the right word; references/pointers, and in some languages strings are considered scalars, after all.)
I think we've miscommunicated. My point is that this is true for some definitions of "memory safety", and trivially false for others, and there is (to my knowledge) no authoritative source for what the "true" definition is. A lot of people would consider a program to be memory-unsafe any time it has a read of uninitialized data which causes non-deterministic behavior, even if undefined behavior is not possible. (See for instance this blog post bemoaning the fact that the definition of memory safety really is not universal and black-and-white. Some of its sources consider reads of uninitialized to be memory errors, while others don't, though unfortunately I think some of the academic stuff is paywalled. Note also that many developer groups de facto consider anything flagged by valgrind's Memcheck or similar tools to be memory-unsafe, except for memory leaks and edge cases not covered by the tool. Typically this includes use of uninitialized data.) |
I don't think this RFC is about the definition of "memory safety". I think it's about what developers expect from a "safe" language. This is a fuzzy notion, driven by humans rather than computer science, but it's a vital one. In C nobody would bat an eye at this design. But Rust does the safe thing 99.8% of the time. |
If that is the case then you might as well not zero to begin with. People who want it zeroed for security reasons cannot use it because it might change at any time unless it is documented. |
I wasn't talking about leaving it undocumented; the RFC is (I think) quite clearly setting an explicit policy. I'm just saying that, at some later point, it may be possible to re-evaluate and change this policy -- but we would presumably only do so if there was a strong consensus that the security benefits were minimal and performance drawbacks onerous. Regardless, I don't think this particular point has much bearing on the debate; it was responding to a hypothetical doomsday scenario. The question is, what is the best policy decision we can make given what we know today? |
If read_to_end is documented to zero then this cannot be changed post 1.0. |
I see what you're getting at, but I don't think we should document it as literally zeroing. Rather, this RFC would establish a global policy about uninitialized memory and safe code. That is, no code would be allowed to rely (in the spec sense) on the values being literally zeros. |
@alexcrichton BTW, I seem to remember you taking some measurements about the perf hit for IO. Any chance you could dig those up? |
@aturon: The same applies if it is documented as overwriting the allocated memory before it is passed to |
(I think the RFC could probably clarify its use of terminology to disambiguate.) |
I don't believe there is a need for such a distinction. |
It doesn't make sense for undefined behaviour in Rust to be directly tied to the back-end the main implementation happens to be using now. |
That may be so but the current set of undefined behaviors seems to be derived from things that cause UB in LLVM or are expected to cause UB in LLVM once better optimizations become available. Is there any justification at all for making this UB except to stop people from doing it? |
I retract my vote in favor of this RFC if we're actually treating reads of uninitialized memory as UB, not just "don't do this in safe code." There are legitimate reasons to do this in unsafe code. |
@huonw : reading from undef or uninitialized memory is already undefined behavior. This rule goes further than limiting reading from undef: it eliminates our ability to pass undef outside of unsafe even if that undef is never read. |
META: I'm going to need to step away from this debate for the next couple of days to help with the alpha2 push. I do think that several good questions/concerns have been raised and that the text needs to be more clear. |
@pythonesque I tend to think that calling such reads undefined behavior may be too strong, but it's worth pointing out that some such cases are undefined behavior in C (and I have no doubt that LLVM will exploit that). See e.g. https://www.securecoding.cert.org/confluence/display/seccode/EXP33-C.+Do+not+read+uninitialized+memory. In any case, I'm curious to know what legit reason you have in mind for reading uninitialized memory? I think that at minimum it makes sense to say that the standard library will not expose uninitialized memory to its clients without an unsafe keyword being required somewhere (and moreover that all libraries are encouraged to follow a similar rule). The exact rules that should apply to third party code -- and the minimum rules required for unsafe code to be considered stable and well-defined -- seem like a somewhat separate (but entangled) topic. At least that's my current feeling. |
If we consider accepting this RFC, we need a mechanism to maintain our ability to create efficient apis in a manner transparent to the user of the API. For the specific case of Read and read_to_end, we could probably add a MaybeUndef type that wraps the Frankly, we might need a more general way to indicate that arguments are intended to be used as |
@nikomatsakis the problem is that this RFC isn't restricting "C/LLVM undef behavior", it's making it so that bugs in other code don't trigger "C/LLVM undef behavior". The RFC goes much further than forbidding the reading of uninitialized memory (which is already forbidden/undefined). |
Ah, I just remembered something we used to do in a past life that was a (perhaps?) legitimate use of uninitialized memory. The idea was you wanted to construct a set of integers of fixed-size domain but not pay Because you do no want to pay O(n) cost, you can't even take the time to zero out any memory. The way we did this was to have two arrays, both of the same length, neither initialized, and a counter with the number of entries. One array contains, for each possible value, the length of the set at the time it was added (its "index"). And the other contains, for each possible index, the value that was added at that time. This way you can "check-and-double-check" to find out if a value was indeed added without every initializing the full arrays. This does require reading uninitialized memory, though, because you must read from the struct IntSet {
len: usize,
capacity: usize,
values: Box<[usize]>,
indices: Box<[usize]>,
}
fn insert(&mut self, value: usize) {
assert!(value < self.capacity);
if self.contains(value) { return; }
let index = self.len;
self.len += 1;
self.indices[value] = index;
self.values[index] = value;
}
fn contains(&self, value: usize) -> bool {
let index = self.indices[value];
if index >= self.len { return false; }
self.values[index] == value
} Now, for all I know this violates some C rules. I haven't bothered to check of course. |
@nikomatsakis Yes, that is the example I was thinking of. |
Sorry I have been a little slow to respond to this thread everyone!
It looks like a TCP stream is slowed down by about 25%. Here I'm just writing tons of data on one thread to another over a TCP socket and printing out the amount of data received each second. The numbers at the bottom show the measurements for me at least. I wanted to get at least one "real-ish world" example (hence the TCP stream). For a more "raw benchmark" I wrote some benchmarks for just reading some repeated data with zeroing both before and afterwards. It looks like the data is roughly the same in that the performance hit is about 20%.
I think this is a good point, and I definitely agree. I think I just wanted to make sure that we were aware of the situation it may put us in. When it comes down to measurements, though, 20% isn't all that much...
Ah yes good point, I think I see where you're coming from now! |
For a systems language where the standard (IO?) library is focused on "zero-cost abstractions" I rather disagree... In addition, I think @mahkoh's point about "relying on" this behavior is subtler than the point that was addressed in responses. Specifically, this proposal would incur costs without the benefits for security, aside from avoiding the true nasal demons of deeply undefined behavior on reading uninitialized memory.
Basically, I see this as well-intentioned but misguided - in the pursuit of stronger security guarantees in specification, it'll result in weaker security guarantees in practice. I'd suggest that straight |
I don't think there is anything subtle about it. It has all been spelled out in this issue but people keep ignoring or evading it.
People using unsafe will still pay this penalty. I've already said this above: There is no way to use the |
@mahkoh, I'm not saying the point is particularly subtle - just that it's subtler than the one that the answers addressed. And yes, one of the places where it'd incentivize adding unsafe APIs is in the Read trait itself. |
This seems quite close to the RAII property.
👍 This RFC should open the door for more general memory sanitization. cc rust-lang/rust#17046 |
This formulation is weak anyway as @aturon has later explained that it will be treated as UB. The real formulation should be
|
Just realized another funny thing: Since unsafe blocks end at safe function calls but not at unsafe function calls, if this proposal is accepted, you can never change unsafe functions to safe because some users might pass uninitialized memory to them. If you change them to safe, the behavior becomes undefined. |
Thanks everyone who has participated in this discussion so far. I'm reasonably convinced that the wording of the original RFC is problematic, and would like to take a step back and have a broader discussion about these issues as a community -- and hopefully hash out a policy that we can all agree on. To that end, I'm closing this RFC in favor of a new internals thread. I hope to hear from strong proponents on both sides. |
I have a few points to make First, I am somewhat on the fence about this RFC, but leaning towards a 👍 To be explicit, I think that I am OK with:
Maintaining this invariant is possible in However, this RFC doesn't actually address rust-lang/rust#20314. The original issue says that So.... If I misunderstood the original issue, please forgive me, but this request seems very unreasonable to me. Virtually every single type in the rust ecosystem can expose uninitialized memory to the user given a bug in the implementation. And yes, this means that bugs in the implementation could be exploited by an attacker to access secrets in memory like passwords etc... but this is true for all code everywhere. So, to recap, this RFC does not require any zeroing out of memory in EDIT: I did in fact misunderstand the original issue. I have posted an update in the forum thread. |
The The flaw in this design is obvious if you try to generalize Reader and Writer to general types instead of just I think this special casing is actually incorrect, and just like with the placement new issues, we actually need more advanced language features to properly deal with this, particularly some form of An alternative fix/workaround for this specific case is for read to take a |
@reem: Well, a "safe" option with a more forgiving contract might be to pass in an In order to be efficient, the callee trusts that there's at least some capacity reserved, and the caller trusts that But if the contract is violated, the cost is "Oh no, we had an annoyingly short outbuffer" or "Oh no, we allocated" rather than "Oh no, we read uninitialized memory." Of course, that doesn't permit windowing a larger buffer for multiple reads like Heck, if someone makes a |
@eternaleye yes, type wrapping (in some yet to determned way) is probably a sane solution. That said: I don't think we need to mix Vec in here, as They key problem with the type wrapping is that there doesn't seem to be a good way to say for the caller that "if x, then the data I passed in will certainly now be initialized", and have the type system check it. |
Set an explicit policy that uninitialized memory can never be exposed
in safe Rust, even when it would not lead to undefined behavior.
See rust-lang/rust#20314.
Rendered