-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prove the Rust type system sound #9883
Comments
Nominating. |
To clarify: a proof of some subset of rust |
1.0, high |
Nominating for removal. I expect that we won't be willing to actually block release on this. |
P-high, taking off 1.0 milestone. |
I'm interested to look at any recent work on this issue. The #rust IRC channel topic refers to a Rust 1.0 resources spreadsheet that includes this issue. Is that spreadsheet no longer a good reference for 1.0 status? Meanwhile, the spreadsheet has Eric Reed assigned. Eric, a pointer to your work would be appreciated. |
Triage: @aturon is working with some external academics on some amount of formalization. |
In particular, we're working with Derek Dreyer's group, which includes @RalfJung. They've made an incredible amount of progress already! |
@RalfJung I wonder if you can put some effort into publicly tracking your progress? Either on this issue or a new one? |
I suppose I need something more like a blog for this ;-) The trouble is, as I already mentioned last time you brought this up - I don't think I have the time to write a full tutorial on all the formal methods I am using. Which means that either I can't say all that much, or what I say will be pretty much incomprehensible. |
"pretty much incomprehensible" is fine. I'd be very happy if you'd incrementally commit your work to someplace world-readable, even if you made zero effort to explain it to a wider audience. |
@RalfJung A blog would of course be great, but a reasonable starting point would just be a page -- or even a tracking issue -- that spells out at a very high level what you're trying to accomplish, mentions the aspects of Rust that you're modeling, and tracks how far you've gotten in the various steps. |
So, I wrote a blog post on my Rust project right here: https://www.ralfj.de/blog/2015/10/12/formalizing-rust.html I will probably not have the time to do real-time progress reports in such a detailed fashion. There might be more blog posts on some parts of the formalization, or maybe not - we will see. Current statusThe current status is that I have a formal definition of a language and type-system that have a "rust-y" flavor: There's borrowing and lifetimes, and everything allowed in safe Rust should be also allowed in that type system. The language "lambda-Rust" is a lambda calculus equipped with a heap, but no stack - still, MIR programs can be translated to lambda-Rust fairly directly (if you put the right glasses on, and like doing a CPS transform ;-). Right now, I am fixing some places of that type system, to allow a few more programs to be type-checked. I also started working on the semantic model. At this point, I have a first version of formal properties expressing what I want the borrowing system to be able to do. I know there are some issues with these properties which I'll try to fix later this week. I have some ideas for how to actually define the system such that it has all these desired properties, but I am still looking for some more common patterns here that I can abstract away. This will end up being a whole lot of bookkeeping, and I'd like to somehow avoid getting drowned in all the bureaucracy... |
@RalfJung: Are you using a proof assistant to manage your model? I'm also interested in verifying Rust's type system. |
I plan to formalize the model, and the soundness of the syntactic type system, in Coq. These proofs often become so complicated that I don't really trust my own, hand-written proofs... |
@RalfJung: Great! A Coq model of Rust's semantics is like a dream come true. I'm using Coq at the moment to verify language semantics too, and should probably devote all my time to that, but I'll be interested to watch your progress and help if I can :) |
I've been working on bringing up an Isabelle/HOL model of Rust as well, and recently discovered Beluga which may or may not be easier to use. |
(Though I'm not trying to capture as many things as @RalfJung is!) |
@cmr So you are talking about a formal model of the language (or some subset of it) and the type system, but without the semantic part of what I described above? As I mentioned there, currently I am not trying to model exactly what the Rust compiler does. But at some point, I would like to connect this to the MIR and what rustc actually does. Something like: We can translate every MIR program to lambda-Rust, such that well-typed MIR programs become well-typed lambda-Rust programs. Connecting this with the theorems I hope to show about lambda-Rust would end up proving memory safety not just for a "rust-y" language, and for the concepts of ownership and borrowing, but for rustc itself! (Well, assuming the operational semantics of LLVM match with the one of lambda-Rust, which they most certainly don't. But whatever, it'd be a step.) To do this, I will need a formal version of MIR and borrowck (and hopefully dropck ;-) ). |
@RalfJung What I planned on doing is defining a formal semantics for MIR (or a lower-level IR) to do analysis and verification of programs. I planned on assuming some soundness properties of the type system (so that, eg, I don't have to worry about data races). Sounds like we'll have some interesting things to talk about Wednesday :) |
Excellent! Roughly, what does your model look like? I have a dream of getting a CompCert for Rust out of this someday :). |
As I tried to make clear when I slightly edited my blog post a few hours after its publication, what I am looking for here is, roughly, the essence of borrowing. This means that many of the details that are crucial to an actual compiler of an actual language, are only of superficial interest to me. Of course, as part of my formal model, I need to give some kind of formal operational semantics to some kind of language - but I am fairly free here to make concessions. So, the language I have could only be a rough starting spot for certifying the existing Rust compiler. That doesn't mean I am not interested in closing this gap, I very much am. Some of it is probably "just" a lot of work, but some of it will need some clever ideas. In particular, we'd have to figure out what the LLVM So, coming back to @Ericson2314's question - my model of the memory is inspired by CompCert, but much simpler. I assume all primitive types to have the same size, the size of a single memory location - this way, I don't have to get into the mess of alignment and overlapping reads/writes. My model of the language itself is a lambda-calculus with primitive operations to work on the heap. There is no stack. Status updateI promised to give short status updates here, and then promptly forgot about this oops. So, here comes the first one. I did quite some cleanup of the stuff I wrote up previously, trying to get rid of all redundancy and generally make things nicer and less ambiguous. I still use excessive amounts of syntactic sugar on all levels, but I think at least now I clearly define what I do ;-) . Speaking of excess, I also make excessive use of both the Latin and the Greek alphabet as source for variable names, and I'm running out of letters. Seems like I will have to figure out how to put Russian characters into LaTeX, or use multi-letter variable names 😱 I made the following type-check in my type system: struct Two<'a> {
f: &'a i32,
g: &'a i32,
}
fn lazy_lft() {
let (mut t, f, g) : (Two, i32, i32);
f = 42;
t = Two { f: &f, g: &f };
*t.f; // The lifetime definitely is already active here
g = 23; // And g is definitely not yet borrowed.
t.g = &g; // But now I can borrow g at the *old* lifetime
} The comments explain what is special about this example: When we borrow For now, I still make one restriction here: When the lifetime starts (in the case of this example, that's where the variables are declared), every variable that will eventually be borrowed at that lifetime has to already exist, and be named. That is, if What else? I made sure with my advisor that it is okay for me to take the "easy" rather than the "general" solution, if that gets the job done :) . And just today, I extended my language to be able to prove data-race freedom for non-atomic locations, in a way that can deal with locations that are subject to both atomic and non-atomic accesses. This is a long and complicated story, but the short version is that I think we came up with a new and equivalent way to define data-race-freedom, such that this result should just fall out of the proofs I'd have done anyway. Yay :D (Seeing how long this status-update got, I'm thinking maybe I should put them into my blog as well. I don't know what makes more sense, really.) |
(I'm sorry to chime in with this slightly off-topic comment) @RalfJung as I happened to write my master's in Russian 😄 I have something that might be useful for you: https://github.com/mkpankov/adaptor/blob/master/doc/latex/DiplomaProject.tex AFAIR, the important part there is
Of course you'll still have to fight font encoding problems but I guess it's a start. Examples of usage of this can be found in other files in the same directory. |
😆 I wasn't entirely serious, but should I become really desperate, I may get back to this. Thanks :D |
Long time no status update... I'm not good at doing this frequently, sorry ;-) I finally started doing serious work on the semantic side of types - that is, figuring out which Iris assertions Rust types correspond to. It turned out that accounting for borrowed pointers ("borrows", as I call them) is way harder than I expected. The operations Rust lets you perform on borrows may seem harmless, but they are actually pretty much ruling out any simplification I could come up with. The worst part is stuff like fn deref_borrowed_box<T>(b: &mut Box<T>) -> &mut T { *b }
fn deref_borrowed_borrow<T, 'a, 'b>(b: &'a mut &'b mut T) -> &'a mut T { *b } So, after a few weeks of my idea of types becoming more and more complicated, I had to step back and figure out what the heck was going on. Turns out advisors are advisors for good reasons ;-) . Instead of directly trying to come up with formal definitions of owned and borrowed pointers (we're only talking about mutable borrows here, for now), I tried to somehow formalize my intuition for why all of this seems to be so obviously sound. In the end, I now have a set of proof rules that formalize the idea of "borrowing some permission for some (life)time", completely unrelated to pointers. I proved (yay, first results :D ) that from these proof rules, I can derive all that is going on with owned and mutable pointers. Since most of the complication is in the basic idea of borrowing, that was not extremely hard, but it gives me some confidence that my proof rules actually are useful. Next step: Prove correctness of these proof rules! That will be hard, really hard - probably the most complicated protocol I have ever seen, and higher-order in too many crazy ways. This is stressing corners of our logic that I didn't even know existed, which means I have to be extra careful not to accidentally skip a proof step. I definitely will want a machine to check this proof for me, eventually. That's what I get for doing this generally :-/ . ("Protocol" is what we call a formal description of the interaction of everybody involved in borrowing permissions for some lifetime, using permissions that have been borrowed for an active lifetime, and getting back permissions out of a dead lifetime.) I also realized that shared borrows are really subtle. Even with that lifetime logic done, I'll have some "fun" defining what a shared borrow is, in general - and then making instances of that for owned and mutable borrows. This is hard because we can do stuff like this: fn deref_borrowed_box<T>(b: &Box<T>) -> &T { *b }
fn deref_borrowed_borrow<T, 'a, 'b>(b: &'a &'b mut T) -> &'a T { *b } I will also have to make sure that my general definition of what constitutes a shared borrow, is wide enough to cover cases like I'm also currently involved in two discussions that relate to my formal work: #30424 and https://internals.rust-lang.org/t/comparing-dangling-pointers/3019. And finally, I found a way to encode in my definitions that data races yield undefined behavior. This means that whatever I do will prove not only absence of memory errors, but also absence of data races :) . The proofs won't get any more complicated, since data race freedom is a fairly natural consequence of the way Rust is set up - but the trick is to set up everything such that my theorems actually show that this is the case. This involves (a) detecting that a data race is happning, so that we can assign undefined behavior, and then (b) proving that there is no UB. Here, (a) was the interesting bit. (The first of the two discussions I mentioned in the previous paragraph is also about detecting UB, namely the UB arising from mutating through a shared borrow, or having aliasing on mutable borrows. This is related to the |
I'd very much like to look at the coq sources of your work as it progresses, @RalfJung . Is it on the web? If not, would you stick it on the web somewhere, please? If that's not your style... oh well. I've got plenty of other shiny toys to play with :) |
There's no Coq formalization of my Rust work yet, it's all on paper only so far. We have to get our Iris formalization straight before I can build something large like the Rust work on top of it. That's a separate project, which I'm actually concurrently working on. That work is public, at https://gitlab.mpi-sws.org/FP/iris-coq/. (The "V2.0" branch is the one which is supposed to be scalable enough to eventually be the foundation for larger projects, like the Rust formalization.) |
Ok. Thanks for clearing that up for me. |
I wrote another blog post about unsafe Rust: https://www.ralfj.de/blog/2016/01/09/the-scope-of-unsafe.html. |
I believe the problem @RalfJung reported could be solved by tagging variables themselves as unsafe. Only I don't know enough about Rust to say how unsafe variable tagging would be implemented (traits?), but it seems like it would be backwards compatible. Any existing code would still compile and run, but it might have invariants that are invalidated by safe code (as is currently the case). |
There's an old idea of unsafe fields that would come up for datatypes. It can be emulated with a Unsafe wrapper that has unsafe methods to get references to its interior, or to read/write. It's not widely used, or used at all afaik, since the gains aren't that great. People writing unsafe code generally know the issues, and that privacy is the #1 way the abstraction boundary is maintained On January 11, 2016 5:41:31 PM EST, Ben Trask [email protected] wrote:
Sent from my Android device with K-9 Mail. Please excuse my brevity. |
Thanks for the info. |
The subject of unsafe fields and @cmr's "privacy is the #1 way the abstraction boundary is maintained" got me thinking. It's pretty well known right now that privacy is necessary to encapsulate invariants which unsafe code assumes for its correctness, but with unsafe fields I think it might not be necessary. Assuming, for example, that we were willing to publicize pub struct Vec<T> {
pub unsafe data: *mut T,
pub unsafe cap: usize,
pub unsafe len: usize,
} Now we could theoretically remove functions like Current safety-at-module-boundary
|
Outside module | Inside module | |
---|---|---|
Safe code | may not cause Vec code to go wrong |
may cause Vec code to go wrong |
Unsafe code | may cause Vec code to go wrong |
may cause Vec code to go wrong |
Vec
using unsafe fields
Outside module | Inside module | |
---|---|---|
Safe code | may not cause Vec code to go wrong |
may not cause Vec code to go wrong |
Unsafe code | may cause Vec code to go wrong |
may cause Vec code to go wrong |
Note that we actually get this win even if Vec
's fields are private unsafe.
As @cmr mentioned on IRC,
the real trick is determining which fields must be unsafe
I think the rule would be:
- Any field mentioned by any invariant assumed by any unsafe code is an unsafe field.
- Otherwise, it's a safe field.
In other words, if a field can be set to any value by safe code and no unsafe code will go wrong as a result, it's a safe field.
If what I'm saying holds together, it would mean segfaults and the like would always be caused by code in unsafe blocks or fns — we would just have one new way for unsafe code to be at-fault: "it made an assumption about a safe field".
This possibly doesn't improve the language enough to be a worthwhile addition, but I thought it was an interesting observation. On the other hand, there may be holes in this I didn't consider, so please point them out if you see them.
tl;dr Add unsafe fields and an "unsafe code may not rely on safe fields to have any particular value" rule and you no longer need to rely on privacy to safely encapsulate.
P.S. This is a summary of an IRC discussion.
I also just yesterday evening suggested something similar: https://internals.rust-lang.org/t/pre-rfc-unsafe-types/3073. However, you mentioned "unsafe variables" above, and I think that's an important observation. Not just fields can have invariants that others rely on; the same holds for local variables. However, I am not sure if it is worth adding such an annotation, considering that local variables are, well, local, so it's not too hard to oversee the code that could modify them and violate their invariants. The story is different for private fields with their module-wide access. Btw, there's a discussion about unsafe fields here rust-lang/rfcs#381, so it can be kept out of this thread. Unsafe fields are not about soundness proofs anyway ;-) |
Is changing the value of a variable or field the only way that safe code can indirectly break |
Yeah, I think @glaebhoerl's right. I can't really think of a reasonable way to deal with that problem. It's tempting to claim that if safe code breaks an invariant which some unsafe code assumed, then the unsafe code is at fault for making the assumption, but that doesn't really work. As a user of On another note, does it make any sense to have a way to say in code that |
You are absolutely right. If unsafe code calls other code, it may depend on that other code's functional correctness, not just on its memory safety. This is not restricted to the case where the depended-on code is safe. For example, even if I don't think it is feasible to make sure that the "code at fault" is guaranteed to be in an (Btw, all this is still entirely off-topic here in this issue ;-) |
This will be an open issue forever. Closing. |
In case you are interested in the current status of my work on this, I recently gave a talk about this at the Paris Rust Meetup. See https://www.ralfj.de/blog/2017/01/20/paris-rust-meetup.html for more info and a recording. |
@RalfJung presented his paper "RustBelt: Securing the Foundations of the Rust Programming Language" at POPL2018 on Friday January 12, 2018, and the video is available on YouTube. The talk makes reference to Derek Dreyer's keynote at the same conference. |
We want a reasonable assurance that Rust does what it says.
The text was updated successfully, but these errors were encountered: