Skip to content
This repository has been archived by the owner on Apr 5, 2024. It is now read-only.

Passing safe references to empty enums #2

Closed
nikomatsakis opened this issue Aug 16, 2016 · 24 comments
Closed

Passing safe references to empty enums #2

nikomatsakis opened this issue Aug 16, 2016 · 24 comments

Comments

@nikomatsakis
Copy link
Contributor

On IRC, @mystor asked me whether it would be illegal to pass around &SafeType or &mut SafeType given this definition:

enum Impossible {}

#[repr(C)]
pub struct SafeType {
    _prohibit_constructor: Impossible
}

In particular, is it ok to have an empty enum "by-value" in a struct in this fashion?

@Amanieu
Copy link
Member

Amanieu commented Aug 16, 2016

I think it should definitely be possible to include such an enum in a struct, especially considering the fact that Impossible may come from a generic parameter. This of course means that in practice there is no way to create an instance of SafeType.

&SafeType and &mut SafeType are therefore roughly equivalent to &Impossible and &mut Impossible. I'm still not sure whether we should allow these and only trigger UB when they are dereferenced (or lvalue to rvalue conversion), or whether a reference alone should be UB.

@eternaleye
Copy link

eternaleye commented Aug 16, 2016

Personally, I'd argue that we should follow Weeping Angels rules here :P

The image of an empty type is an empty type

In particular, a reference's lifetime contract is that it can only exist for a lifetime strictly included in the referent's lifetime. As one cannot construct an instance of an empty type, no value of such a type can have a non-empty lifetime. Constructing a reference with an invalid lifetime is (I think?) UB. QED.

I personally feel that generic parameters are a red herring here: In the absence of unsafe {}, one can only obtain a reference from either an existing value, or an existing reference. As a result, all possible ways to obtain such a reference without unsafe {} are dead code, in the generic case.

Creating such a reference with unsafe {}, meanwhile, is violently error-prone:

  1. If the user tries to unsafely allocate memory, transmute it, and return a reference, this is already UB.
  2. Point (1) is also UB for non-empty types, unless filled with some value (likely via ptr::write()), and as one cannot obtain such a value for the empty type, any code that is safe against (1) for non-empty types is dead code for empty types.

Other options include creating a reference from a pointer with invalid pointee (UB), or transmuting a reference of another type (which I think is, or at least should be, UB).

What I think should be trivially valid are raw pointers to empty-typed values (which can be used to refer to opaque FFI types), which can be meaningfully newtyped (for external use) or retyped and then reference-ized/dereferenced (for internal use).

@arielb1
Copy link
Contributor

arielb1 commented Aug 17, 2016

This is equivalent creating references to invalid enum variants, e.g.

enum Foo {
    X, Y
}

fn main() {
    let data: *const u32 = &42;
    let illegal = unsafe { &*(data as *const Foo) };
}

I am still not sure whether to make this UB. dereferencing such a pointer is obviously UB, through.

@nikomatsakis
Copy link
Contributor Author

@eternaleye

If the user tries to unsafely allocate memory, transmute it, and return a reference, this is already UB.

Can you say more about why you feel this should be UB? I am not persuaded. =) I agree that dereferencing of trying to load from such a pointer (at least without transmuting it back...) seems problematic, but I'm not sure that creating a pointer is bad.

That said, if the original intention of this pattern is to make something that can't be created, it seems like privacy could do that more easily and somewhat less problematically. =)

@mystor
Copy link

mystor commented Aug 18, 2016

@nikomatsakis

As the person who asked the question, I can clarify the original goal of this question.

I want to have an object Foo which has a representation in C++, which is opaque to the Rust code. Consumers of my library would be defining extern "C" methods which would accept and return pointers to this type. The in-rust version of this type was to have a series of methods and traits implemented on it, which would invoke the C++ FFI among other things. I wanted this object to feel as natural to use as other rust types, which meant that in safe code, it could be passed around as &Foo or &mut Foo, where Foo is the type.

The type enum Foo {} is, of course, unsafe to use in this context. By exposing it directly to safe code, the safe code can invoke undefined behavior by attempting to match against it. In addition, any non-zero-sized-type could not be used for Foo, as mem::swap()-ing any number of bytes between two &mut Foo objects would cause the object to be in an invalid state (the object is not safe to move, and has a variable size).

I originally tried to implement this by creating a wrapper type around a private member:

#[repr(C)]
pub struct Foo {
    _prevent_construction: (),
}

However, when consumers would put this type in extern "C" functions like so:

extern "C" {
    fn pass_a_foo(foo: *const Foo);
}

the improper_ctypes warning would fire. I needed a zero sized type which would not fire that warning when passed behind a raw pointer. The only type which I know of which acts this way is the empty enum. I was hoping that I could use:

enum Impossible {}
#[repr(C)]
pub struct Foo {
    _prevent_construction: Impossible,
}

and simply make sure that within the module where this type was defined (which is littered with unsafe code anyways), references to Foo were never dereferenced. That way I both have a zero-sized type (which means that the user hopefully can't break things with a mem::swap), and the nice ergonomics of not having a warning whenever the type is passed behind a raw pointer in an extern "C" block.

@arielb1
Copy link
Contributor

arielb1 commented Aug 18, 2016

That sounds like an issue with the improper_ctypes lint.

@mystor
Copy link

mystor commented Aug 18, 2016

I agree that this would be best accomplished by changing the improper_ctypes lint, I was just trying to explain the motivation.

I still am curious as to whether or not it's safe, whether or not it is the best solution to the problem at hand.

@eternaleye
Copy link

eternaleye commented Aug 18, 2016

@nikomatsakis: I was referring to transmuting the value, not the pointer to the value, and so it runs afoul of "actually constructing an instance of an empty type". Converting the pointer and then making a reference from that runs afoul of the lifetime issue at the top of my post.

@mystor: Out of curiousity, why use *{const,mut} Foo rather than defining struct Foo { cpp_type: *{const,mut} Impossible } and then using plain Foo? By hiding the raw pointer behind a privacy boundary, that should resolve your concerns, no?

EDIT: Ah, nevermind, misunderstood @mystor's goals.

@mystor
Copy link

mystor commented Aug 25, 2016

Is it possible to invoke Undefined Behavior in this program without modifying the module foo, and while only using safe code? (I think that this is the root of the question which I'm asking here) https://is.gd/P7dQBf

@Amanieu
Copy link
Member

Amanieu commented Aug 25, 2016

Yes, it allows you to trigger UB from safe code, but this is fine since you need unsafe code to create such a reference in the first place. It is your responsibility to ensure that you do not leak this reference to safe code outside your module.

With that said, I think simply manipulating the reference without dereferencing it (inside your unsafe module) should not be UB.

@mystor
Copy link

mystor commented Aug 25, 2016

But can you trigger Undefined Behavior from the outside module? The outside module cannot see the empty enum due to privacy, and thus cannot match on it. I'm handing the reference to safe code, but the safe code can't do anything with the reference, so I think it might be okay.

@Amanieu
Copy link
Member

Amanieu commented Aug 25, 2016

Hmm I think that would allow UB since you can do let x = *foo::get_bar();. IMO creating an instance of an empty enum type should be instant UB. This matches the semantics of the ! type: as soon as such a value is returned from a function, the code effectively becomes "unreachable".

@mystor
Copy link

mystor commented Aug 25, 2016

I don't think that's possible. Bar is not Copy or Clone, so it can't be copied out.

@eternaleye
Copy link

eternaleye commented Aug 25, 2016

What that says to me (modulo @nikomatsakis' point about Tootsie-Pop models) is that type violations probably should not be permitted to exit the unsafe {} block - that is, at least "passing an invalid reference across an unsafe {} boundary to safe code is UB", and possibly the stronger "creating an invalid reference is UB"

Anything else pushes us quite far towards Tootsie-Pop models, which have very nontrivial effects on the compiler's ability to optimize safe code ("the unchecked get use case").

@glaebhoerl
Copy link

glaebhoerl commented Sep 3, 2016

I agree with @eternaleye's original comment. References (&T, &mut T) can only be created by referring to a live value of type T. If you have a live value of type !, it is UB by definition. Therefore creating an &! or &mut ! must also be UB. Put another way, if a live value of type &T or &mut T exists, then a live value of type T exists which it refers to, and a live value of type ! is UB, so a live value of type &! or &mut ! is also UB. I don't see why it has to be any more subtle or complicated than that.

(Requiring (or hypothesizing) a value of such a type, that is fn foo(x: &mut !) is not UB, it is just dead code, just like in the fn bar(x: !) case.)

@strega-nil
Copy link

strega-nil commented Sep 8, 2016

Is this UB? (if so, we might want to fix it...)

https://doc.rust-lang.org/src/core/up/src/libcore/fmt/mod.rs.html#185

@mystor
Copy link

mystor commented Sep 8, 2016

According to many of the arguments here, the answer is yes. It is currently not optimized as UB afaik, so it should be OK for now.

@strega-nil
Copy link

@mystor I would agree. Just pointing out that it's used in real code.

@mystor
Copy link

mystor commented Sep 8, 2016

That use seems like another use case for my opaque struct idea (https://internals.rust-lang.org/t/pre-rfc-opaque-structs/4005). It could also just use raw pointers and a PhantomData however.

@arielb1
Copy link
Contributor

arielb1 commented Oct 5, 2016

cc rust-lang/rust#36976 - the plain old "reference to undef" case is also interesting.

@glaebhoerl
Copy link

glaebhoerl commented Oct 8, 2016

Revisiting my comment from earlier....

References (&T, &mut T) can only be created by referring to a live value of type T. If you have a live value of type !, it is UB by definition. Therefore creating an &! or &mut ! must also be UB. Put another way, if a live value of type &T or &mut T exists, then a live value of type T exists which it refers to, and a live value of type ! is UB, so a live value of type &! or &mut ! is also UB. I don't see why it has to be any more subtle or complicated than that.

I'm struck anew by what treacherous ground this question of "should it be legal, or should it be undefined behavior" really is. When other people express the sentiment that some particular unsafe hack is "obviously wrong" and shouldn't be allowed, I'm usually quick to observe that preventing people from doing it is not a realistic possibility -- that the only decision we have the power to make is whether to introduce potential security vulnerabilities into peoples' code when (not if) they do it. In other words, that in these questions we should be attuned to practical consequences at least as much as theoretical elegance. And then here I am, expressing the sentiment that using unsafe code to get a reference to an uninhabited type is obviously wrong and shouldn't be allowed.

For what it's worth, I think my feelings on the subject attach more strongly to uninhabited types than to reference types. A live value of an uninhabited type is the very definition of undefined behavior (like GCC's __builtin_unreachable), what sound type systems everywhere exist to preclude, and the clarity of this really should not be muddied. If this isn't undefined behavior, then nothing is.

On the other hand, if we say that in the presence of unsafe code, reference types should not always be taken at their word, a principle that just happens to apply uniformly to references to uninhabited types as well as to every other type, that seems like a separate matter. In a tootsie pop model, it seems appropriate that in "safe code", &! and &mut ! really would be tantamount to !, as per the logical interpretation of reference types, while in "unsafe code", they would perhaps not be taken so seriously.

@RalfJung
Copy link
Member

Merging rust-lang/rust#36449 has kind of answered the question raised here, didn't it? Everything is_uninhabited considers uninhabited certainly cannot be safely passed around.

@arielb1
Copy link
Contributor

arielb1 commented Dec 1, 2016

@RalfJung

No. We don't use is_uninhabited for anything except match exhaustiveness checking, and even that is a hack.

@RalfJung
Copy link
Member

This is now basically being discussed at rust-lang/unsafe-code-guidelines#77: does a reference always have to have a pointee that is valid for its type, or does it "just" have to be aligned and dereferencable?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants