-
Notifications
You must be signed in to change notification settings - Fork 15
Asserting-Conflicting-Access Model #26
Comments
I noted some concerns and a question in my recent blog post. Here is the relevant section, excerpted and lightly edited. The question comes at the end. Lifetimes are significant to the model, but you can't trust the compiler's inference in unsafe code. There, we have to assume that the unsafe code author is free to pick any valid lifetime, so long as it would still type check (not "borrow check" -- i.e., it only has to ensure that no data outlives its owning scope). Nonetheless, I have concerns about this formulation because it seems to assume that the logic for unsafe code can be expressed in terms of Rust's lifetimes -- but as I wrote above Rust's lifetimes are really a conservative approximation. As we improve our type system, they can change and become more precise -- and users might have in mind more precise and flow-dependent lifetimes still. In particular, it seems like the "ACA" would disallow my fn alloc_free2() {
unsafe {
let p: *mut i32 = allocate_an_integer();
let q: &i32 = &*p;
let r = *q; // (1)
if condition1() {
free(p); // (2)
}
if condition2() {
use(r); // (3)
if condition3() {
use_again(*q); // (4)
}
}
}
} Intuitively, the problem is that the lifetime of In terms of their model, the conflicting access would be (2) and the asserting access would be (1). But I might be misunderstanding how this whole thing works. |
If we do "lifetimes as sets of paths", I'd hope users could manually write out in all the endpoints. And if users manually specify a lifetime in unsafe code and mark it safe, then we could use it for optimization! |
@ubsan is worried that the model does not interact well with LLVM, as the model permits this kind of code: fn example(foo_addr: *mut usize) -> usize {
let mut data = 0;
if pointers_equal(&mut data as *mut _, foo_addr) {
unsafe { *foo_addr = 42; }
}
data
} He is convinced we should track derived information through everything and not have broadcasts, including both unsafe pointers and integers (am I right?). However, unless we have an exotic way of defining derivation, even that is incompatible with LLVM's instcombine in cases like this: pub fn example(foo_addr: usize) -> usize {
let mut data = 0;
if pointers_equal(&mut data as *mut _, foo_addr) {
data = 1;
unsafe {
*((foo_addr + (0*(&mut data as *const _ as usize))) as *mut usize) = 42;
}
}
data
} |
@arielb1 this is an interesting example: fn example(foo_addr: *mut usize) -> usize {
let mut data = 0;
if pointers_equal(&mut data as *mut _, foo_addr) {
unsafe { *foo_addr = 42; }
}
data
} Do we have an issue for it? Probably not. I've often talked informally about how, in the Tootsie Pop Model, you can treat your machine (within unsafe code) as "a simple turing machine". This kind of code is precisely what I was thinking of ... |
I think " BTW, I would like to mention that this model is intended for the language spec. It is emphatically not supposed to be closed under any class of endooptimizations - for that we should have a weaker MIR memory model and the even-weaker LLVM memory model. Also, there is no ordering requirement between the asserting and conflicting access - only that both of them must fulfill the conditions at their respective time of access Example 1
Sure, and that location might be chosen non-deterministically at runtime, although it must be inside the same function. Example 2unsafe {
let mut a: &mut i32 = i32;
let p: *mut i32 = a;
*a += 1; // asserting access
{
let mut b: &mut i32 = &mut *a;
*b += 1;
// Can this serve as an asserting access? I can't
// tell if accesses through a raw pointer count
// as being accessed through a "borrow chain" from `a`.
// I think the answer is no.
*p += 1;
*b += 1;
}
}
This does not matter because |
So I think that one bit of confusion for me might have been what is meant by "lifetime". My assumption was that lifetime here was referring to the lifetimes that the Rust compiler computes. But I realize that perhaps lifetime was meant to be defined more in terms of the dynamic trace -- in other words, the span between the creation of a reference and one of its uses (if we ignore other threads for a moment and so forth). This would explain for example why @ubsan mentioned (on IRC) that the
Here, can you say more about what you mean when you say "broadcasts"? I think it is something like this: fn main() {
let a: &mut i32 = &mut 3;
let b: *mut i32 = a;
let c: &mut i32 = a;
let d: *mut i32 = c;
// can I use `b` here?
} I think that in this case, the question is whether having coerced |
fn main() {
let a: &mut i32 = &mut 3;
let b: *mut i32 = a;
let c: &mut i32 = a;
let d: *mut i32 = c;
// can I use `b` here?
} Broadcasts, as in the This model is very conservative, and has the advantage of not including any "counterfactual derived magic", but by itself is insufficient to allow LLVM's standard optimizations on |
Thanks for the clarification!
Are you referring to something specific, or is this a new term that you coined?
Can you elaborate here too? Do we have a nice issue summarizing those interactions? I'd love to get all the facts in your head documented. =) |
|
intptrcast is "A Formal C Memory Model Supporting Integer-Pointer Casts" available at http://sf.snu.ac.kr/intptrcast/ (whatistherightwaytociteit?). There is probably some other less-formaly place describing that, but I could not be bothered to look.
This function is well-typed in the ACA model (in particular when fn example(data: &mut u32, extra: *mut u32) -> u32 {
unsafe { if (data as *mut u32) == extra { *extra = 0; } }
*data
} |
Cool. I hadn't read that paper yet. I just did so and took the opportunity to write up a summary in #30 (tagged as K-Related-Work). |
This is my best effort to describe the model put forward by @arielb1 and @ubsan. Naturally they should correct any misconceptions that I have. We should edit this description to be accurate. Also, we need a catchy name. =) I'm calling it the Asserting-Conflicting-Access (ACA) model based on its core rule.
The Aliasing Rule
The heart of the model is "the aliasing rule":
If a reference
r
(the head reference) with lifetime'r
is created, then afterwards, before'r
ends but at no particular orderl
is accessed through a borrow-chain fromr
- this is the asserting access.Then an aliasing access has occured.
Aliasing accesses are UB, unless there exists a reference
s
tol
with lifetime's
, such thats
was created by a borrow-chain fromr
s
has sufficient permissions for the conflicting access.'s
is alive at the point of access.s
is potentially alive. This means either:s
is an immutable references
is the reference used for the conflicting accesss
was converted to a raw pointer before the conflicting accessIn that case,
s
is called the guaranteeing reference.Note the use of lifetimes in the model. From conversation with @arielb1, I gather that the intution is that, in a safe function, the lifetimes inferred by the compiler are to be used. But in unsafe code, the compiler must assume that the user could have chosen whatever (legal) lifetimes they would like. These lifetimes would still be subject to the basic constraints of the type system.
Facets
The model is access-based and defined in terms of undefined behavior. That is, the mere possession of a reference or overlapping pointer does not create undefined behavior. What creates undefined behavior is the conflicting use thereof. Moreover, undefined behavior does not necessarily define "blame" -- that is, the model can tell you what trace is illegal, but it can't always tell you if a function will lead to undefined behavior in isolation.
Intuition
I think the best intuition is to think of having permission to access memory at a particular point. Permissions flow from safe references. Raw pointers created from a reference get the permission of the reference they came from. When we exit the lifetime of a safe reference
s
that was borrowed from borrowed fromr
, the permissions fors
return tor
.Example 1
Shows a mutable reference
a
that is aliased by a raw pointerp
.a
is then reborrowed tob
. Usingb
is ok and it's ok to usep
after we're done usingb
.Example 2
Like example 1, except that we intermingle uses of
b
andp
. I think this is expected to be illegal, but I can't figure out how it plays out in terms of the formal language (@arielb1 or @ubsan, can you interpret this for me?).The reason I expect this to be illegal is that it could correspond to something like:
being invoked like:
The text was updated successfully, but these errors were encountered: