Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC 2195] Document new type representations #246

Closed
wants to merge 3 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 142 additions & 22 deletions src/type-layout.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,8 @@ layout such as reinterpreting values as a different type.
Because of this dual purpose, it is possible to create types that are not useful
for interfacing with the C programming language.

This representation can be applied to structs, unions, and enums.
This representation can be applied to structs, unions, and enums. The exception
is [zero-variant enumerations] for which the `C` representation is an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to use "enums" in subsequent sections, but "enumerations" here? (I see this reference name is pre-existing but metadata shouldn't affect the actual text)


#### \#[repr(C)] Structs

Expand Down Expand Up @@ -222,48 +223,165 @@ assert_eq!(std::mem::size_of::<SizeRoundedUp>(), 8); // Size of 6 from b,
assert_eq!(std::mem::align_of::<SizeRoundedUp>(), 4); // From a
```

#### \#[repr(C)] Enums
#### \#[repr(C)] Field-less Enums

For [C-like enumerations], the `C` representation has the size and alignment of
For [field-less enums], the `C` representation has the size and alignment of
the default `enum` size and alignment for the target platform's C ABI.

> Note: The enum representation in C is implementation defined, so this is
> really a "best guess". In particular, this may be incorrect when the C code
> of interest is compiled with certain flags.

> Warning: There are crucial differences between an `enum` in the C language and
> Rust's C-like enumerations with this representation. An `enum` in C is
> Rust's field-less enumerations with this representation. An `enum` in C is
> mostly a `typedef` plus some named constants; in other words, an object of an
> `enum` type can hold any integer value. For example, this is often used for
> bitflags in `C`. In contrast, Rust’s C-like enumerations can only legally hold
> the discrimnant values, everything else is undefined behaviour. Therefore,
> using a C-like enumeration in FFI to model a C `enum` is often wrong.
> bitflags in `C`. In contrast, Rust’s field-less enums can only legally hold
> the discrimnant values, everything else is [undefined behavior]. Therefore,
> using a field-less enum in FFI to model a C `enum` is often wrong.

It is an error for [zero-variant enumerations] to have the `C` representation.
#### \#[repr(C)] Enums With Fields

For all other enumerations, the layout is unspecified.
For enums with fields, the `C` representation is a struct with representation
`C` of two fields where the first field is a field-less enum with the `C`
representation that has one variant for each variant in the enum with fields
and the second field a union with the `C` representation that's fields consist
of structs with the `C` representation corresponding to each variant in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very long sentence that's hard to parse. I'd recommend splitting the description of the fields into separate sentences, and perhaps rather than writing "with the C representation" inline in each of them, just write at the end something like "The structs and unions declared here all have the C representation themselves."

Copy link
Contributor Author

@Havvy Havvy Feb 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't happy with it either. I had a personal breakthrough in how I can describe it, so now it's longer and with paragraph breaks. Saying they all have C representation on its own line is also a nice win. 👍 I'm leaving for awhile, and I need to do the same to the primitive repr, so don't look for the change yet.

enum. Each struct consists of the fields from the corresponding variant in the
order defined in the enum with fields.

Likewise, combining the `C` representation with a primitive representation, the
layout is unspecified.
Because unions with non-copy fields aren't allowed, this representation can only
be used if every field is also [`Copy`].

### Primitive representations
```rust
// This Enum has the same layout as
#[repr(C)]
enum MyEnum {
A(u32),
B(f32, u64),
C { x: u32, y: u8 },
D,
}

// this struct.
#[repr(C)]
struct MyEnumRepr {
tag: MyEnumTag,
payload: MyEnumPayload,
}

#[repr(C)]
enum MyEnumTag { A, B, C, D }

#[repr(C)]
union MyEnumPayload {
A: u32,
B: MyEnumPayloadB,
C: MyEnumPayloadC,
D: (),
}

#[repr(C)]
#[derive(Clone, Copy)]
struct MyEnumPayloadB(f32, u64);

#[repr(C)]
#[derive(Clone, Copy)]
struct MyEnumPayloadC { x: u32, y: u8 }
```

<span id="c-primitive-representation">Combining the `C` representation and a
primitive representation is only defined for enums with fields. The primitive
representation modifies the `C` representation by changing the representation of
the tag, e.g. `MyEnumTag` in the previous example, to have the representation of
the chosen primitive representation. So, if you chose the `u8` representation,
then the tag would have a size and alignment of 1 byte. </span>

> Note: This representation was designed for primarily interfacing with C code
> that already exists matching a common way Rust's enums are implemented in
> C. If you have control over both the Rust and C code, such as using C as FFI
> glue between Rust and some third language, then you should use a
> [primitive representation](#primitive-representation-of-enums-with-fields)
> instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sentence is a bit weird. Suggested rewrite:

Note: This representation is primarily intended for Rust code that wants to interoperate with the idioms of preexisting C(++) codebases.

(not sure if the reference uses my "C(++)" shorthand yet)


### Primitive Representations

The *primitive representations* are the representations with the same names as
the primitive integer types. That is: `u8`, `u16`, `u32`, `u64`, `usize`, `i8`,
`i16`, `i32`, `i64`, and `isize`.

Primitive representations can only be applied to enumerations.
Primitive representations can only be applied to enumerations, and have
different behavior whether the enum has fields or no fields. It is an error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can/should this "whether" be an "if"? genuinely doesn't know

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a logical standpoint, no. Could be "depending on".

for [zero-variant enumerations] to have a primitive representation.

Combining two primitive representations together is unspecified.

For [C-like enumerations], they set the size and alignment to be the same as the
primitive type of the same name. For example, a C-like enumeration with a `u8`
representation can only have discriminants between 0 and 255 inclusive.
Combining the `C` representation and a primitive representation is described
[above](#c-primitive-representation).

It is an error for [zero-variant enumerations] to have a primitive
representation.
#### Primitive Fepresentation of Field-less Enums
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Fepresentation


For all other enumerations, the layout is unspecified.
For [field-less enums], they set the size and alignment to be the same as
the primitive type of the same name. For example, a field-less enum with
a `u8` representation can only have discriminants between 0 and 255 inclusive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it also gives the type the same ABI as a primitive int (e.g. it would be passed in a register instead of on the stack on some x86 ABIs)


Likewise, combining two primitive representations together is unspecified.
#### Primitive Representation of Enums With Fields

For enums with fields, the enum will have the same type layout a union with the
`C` representation that's fields consist of structs with the `C` representation
corresponding to each variant in the enum. The first field in each struct is
the same field-less enum with the same primitive representation that is
the enum with all fields in its variants removed and the rest of the fields
consisting of the fields of the corresponding variant in the order defined in
original enumeration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is not as bad, but it might be possible to clean it up somewhat as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is definitely better than the repr(C) version, but I think we can also improve it. I suggest a similar rewrite (with another too long pedantic point I just want to write down somewhere:

The representation of a repr(int) enum is a repr(C) union of repr(C) structs for each variant with a field. The first field of each struct in the union is a repr(int) version of the enum with all fields removed ("the tag") and the remaining fields are the fields of that variant.

Note: this representation is unchanged if the tag is given its own member in the union, should that make manipulation more clear for you (although in C++, to follow The Exact Word Of The Standard the tag member should be wrapped in a struct).


Because unions with non-copy fields aren't allowed, this representation can only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be used if every field is also [`Copy`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"can only be used" -> "can only be expressed in Rust", maybe? (C(++) can use it fine, and you could do some really hacky crap to use it in Rust too)


> Note: This is commonly different than what is done in C and C++. Projects in
> those languages often use a tuple of `(enum, payload)`. For making your enum
> represented like that, use the `C` representation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps link up to C representation here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is necessary. Or at least I would eliminate the reference to "what's commonly done in C(++)" which is like, never a true statement.


```rust
// This custom enum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar notes on this example as the previous (although you used my preferred naming scheme on this one..?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit was a WIP where I was transitioning from this style (that you like better) to the style I used in #[repr(C)] (that you like less).

#[repr(u8)]
enum MyEnum {
A(u32),
B(f32, u64),
C { x: u32, y: u8 },
D,
}

// has the same type layout as this union
#[repr(C)]
#[derive(Clone, Copy)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd omit the derive here; it doesn't tell us anything about the layout and isn't always correct.

union MyEnumRepr {
A: MyEnumVariantA,
B: MyEnumVariantB,
C: MyEnumVariantC,
D: MyEnumVariantD,
}

#[repr(u8)]
#[derive(Clone, Copy)]
enum MyEnumDiscriminant { A, B, C, D }

#[repr(C)]
#[derive(Clone, Copy)]
struct MyEnumVariantA(MyEnumDiscriminant, u32);

#[repr(C)]
#[derive(Clone, Copy)]
struct MyEnumVariantB(MyEnumDiscriminant, f32, u64);

#[repr(C)]
#[derive(Clone, Copy)]
struct MyEnumVariantC { tag: MyEnumDiscriminant, x: u32, y: u8 }

#[repr(C)]
#[derive(Clone, Copy)]
struct MyEnumVariantD(MyEnumDiscriminant);
```

### The `align` Representation

Expand All @@ -288,7 +406,7 @@ padding bytes and forcing the alignment of the type to `1`.
The `align` and `packed` representations cannot be applied on the same type and
a `packed` type cannot transitively contain another `align`ed type.

> Warning: Dereferencing an unaligned pointer is [undefined behaviour] and it is
> Warning: Dereferencing an unaligned pointer is [undefined behavior] and it is
> possible to [safely create unaligned pointers to `packed` fields][27060].
> Like all ways to create undefined behavior in safe Rust, this is a bug.

Expand All @@ -298,7 +416,9 @@ a `packed` type cannot transitively contain another `align`ed type.
[`size_of`]: ../std/mem/fn.size_of.html
[`Sized`]: ../std/marker/trait.Sized.html
[dynamically sized types]: dynamically-sized-types.html
[C-like enumerations]: items/enumerations.html#custom-discriminant-values-for-field-less-enumerations
[field-less enums]: items/enumerations.html#custom-discriminant-values-for-field-less-enumerations
[zero-variant enumerations]: items/enumerations.html#zero-variant-enums
[undefined behavior]: behavior-considered-undefined.html
[27060]: https://github.com/rust-lang/rust/issues/27060
[primitive representation]: #primitive-representations
[`Copy`]: special-types-and-traits.html#copy