-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC 2195] Document new type representations #246
Conversation
As this is Gankro's RFC, r? @gankro |
src/type-layout.md
Outdated
|
||
> Note: This is commonly different than what is done in C and C++. Projects in | ||
> those languages often use a tuple of `(enum, payload)`. For making your enum | ||
> represented like that, see [the tagged union representation] below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tagged union representation is a broken link? Looks like you missed it.
src/type-layout.md
Outdated
It is an error for [zero-variant enumerations] to have the `C` representation. | ||
For enums with fields, the `C` representation has the same representation as | ||
it would with the [primitive representation] with the field-less enum in its | ||
description having the `C` representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't read correctly to me, I think. The primitve representation reference is circular with the section below, and there's no explicit mention in the text here of the (tag, value) pair representation.
src/type-layout.md
Outdated
`C` of two fields where the first field is a field-less enum with the `C` | ||
representation that has one variant for each variant in the enum with fields | ||
and the second field a union with the `C` representation that's fields consist | ||
of structs with the `C` representation corresponding to each variant in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very long sentence that's hard to parse. I'd recommend splitting the description of the fields into separate sentences, and perhaps rather than writing "with the C
representation" inline in each of them, just write at the end something like "The structs and unions declared here all have the C
representation themselves."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wasn't happy with it either. I had a personal breakthrough in how I can describe it, so now it's longer and with paragraph breaks. Saying they all have C
representation on its own line is also a nice win. 👍 I'm leaving for awhile, and I need to do the same to the primitive repr, so don't look for the change yet.
src/type-layout.md
Outdated
|
||
It is an error for [zero-variant enumerations] to have a primitive | ||
representation. | ||
#### Primitive Fepresentation of Field-less Enums |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: Fepresentation
the same field-less enum with the same primitive representation that is | ||
the enum with all fields in its variants removed and the rest of the fields | ||
consisting of the fields of the corresponding variant in the order defined in | ||
original enumeration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is not as bad, but it might be possible to clean it up somewhat as well.
src/type-layout.md
Outdated
|
||
> Note: This is commonly different than what is done in C and C++. Projects in | ||
> those languages often use a tuple of `(enum, payload)`. For making your enum | ||
> represented like that, use the `C` representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps link up to C representation here?
src/type-layout.md
Outdated
consisting of the fields of the corresponding variant in the order defined in | ||
original enumeration. | ||
|
||
Because unions with non-copy fields aren't allowed, this representation can only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to be the case: https://play.rust-lang.org/?gist=6d0c155ebdffe17c6a3cd9683318209b&version=nightly
|
||
// has the same type layout as this union | ||
#[repr(C)] | ||
#[derive(Clone, Copy)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd omit the derive
here; it doesn't tell us anything about the layout and isn't always correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start, heavy edits needed (go to bed before 1AM next time 😉)
Some issues may be philosophical ones with existing patterns in this document.
@@ -149,7 +149,8 @@ layout such as reinterpreting values as a different type. | |||
Because of this dual purpose, it is possible to create types that are not useful | |||
for interfacing with the C programming language. | |||
|
|||
This representation can be applied to structs, unions, and enums. | |||
This representation can be applied to structs, unions, and enums. The exception | |||
is [zero-variant enumerations] for which the `C` representation is an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to use "enums" in subsequent sections, but "enumerations" here? (I see this reference name is pre-existing but metadata shouldn't affect the actual text)
|
||
For all other enumerations, the layout is unspecified. | ||
For enums with fields, the `C` representation is defined to be the same as the | ||
follow types. These types don't actually exist, so the names are only here to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what "follow types" means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
following*
The enums with fields with the `C` representation, the represented enum, has | ||
the same representation of a a struct two fields, the tagged union. The first | ||
field of the tagged union is a field-less enum, the discriminant enum. The | ||
second field of the tagged union is a union, the fields union. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph is really hard to understand, even as someone who knows exactly what it's describing. I think we need to establish some more standard terms/short-hands for talking about type layouts. For example, "type with the C
representation" could just be "repr(C)
type".
It's also important here that it is not just the layout as the term is defined by this document, but also the ABI that matches. This is important for e.g. passing this type by-value in a C FFI function. I think we should either define layout to include ABI, or create a term for "layout and ABI". Here I'm going to use "representation" to refer to layout+ABI, and also use it to refer the "desugared" type to provide a potential rewrite (with an incredibly long pedantic aside):
The representation of a
repr(C)
enum with fields is arepr(C)
struct with two fields:
- a
repr(C)
version of the enum with all fields removed ("the tag")- a
repr(C)
union ofrepr(C)
structs for the fields of each variant that had them ("the payload")(Note: due to the representation of
repr(C)
structs and unions, if a variant has a single field there is no difference between putting that field directly in the union or wrapping it in a struct; any system which wishes to manipulate such an enum's representation may therefore use whichever form is more convient/consistent for them)
struct FieldsC { x: u32, y: u8 } | ||
|
||
#[repr(C)] | ||
struct FieldsD; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: you need to derive(Copy, Clone)
on all the field structs for this example to compile on today's Rust because union's can't contain non-copyable types, although it doesn't affect the layout/repr so it might be reasonable to omit it in this document? (do we need to mark this as ignore
if we do that?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note: FieldsD could be omitted, and it must be in C(++) headers, so I am slightly inclined to simply omit it. Worth discussing this footgun? Or is that more of a nomicon thing.
```rust | ||
// This Enum has the same layout as | ||
#[repr(C)] | ||
enum RepresentedEnum { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this example would benefit from some tweaked type names to better emphasize the connection
enum MyEnum
struct MyEnumRepr
enum MyEnumDiscriminant (would honestly prefer tag if you'll allow it, but I understand discriminant might be more consistent)
union MyEnumFields
struct MyEnumAFields (maybe remove MyEnum from this one?)
|
||
For [field-less enums], they set the size and alignment to be the same as | ||
the primitive type of the same name. For example, a field-less enum with | ||
a `u8` representation can only have discriminants between 0 and 255 inclusive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it also gives the type the same ABI as a primitive int (e.g. it would be passed in a register instead of on the stack on some x86 ABIs)
|
||
For all other enumerations, the layout is unspecified. | ||
Because unions with non-copy fields aren't allowed, this representation can only | ||
be used if every field is also [`Copy`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"can only be used" -> "can only be expressed in Rust", maybe? (C(++) can use it fine, and you could do some really hacky crap to use it in Rust too)
the same field-less enum with the same primitive representation that is | ||
the enum with all fields in its variants removed and the rest of the fields | ||
consisting of the fields of the corresponding variant in the order defined in | ||
original enumeration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is definitely better than the repr(C)
version, but I think we can also improve it. I suggest a similar rewrite (with another too long pedantic point I just want to write down somewhere:
The representation of a
repr(int)
enum is arepr(C)
union ofrepr(C)
structs for each variant with a field. The first field of each struct in the union is arepr(int)
version of the enum with all fields removed ("the tag") and the remaining fields are the fields of that variant.Note: this representation is unchanged if the tag is given its own member in the union, should that make manipulation more clear for you (although in C++, to follow The Exact Word Of The Standard the tag member should be wrapped in a struct).
Likewise, combining two primitive representations together is unspecified. | ||
> Note: This is commonly different than what is done in C and C++. Projects in | ||
> those languages often use a tuple of `(enum, payload)`. For making your enum | ||
> represented like that, use the `C` representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is necessary. Or at least I would eliminate the reference to "what's commonly done in C(++)" which is like, never a true statement.
> represented like that, use the `C` representation. | ||
|
||
```rust | ||
// This custom enum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar notes on this example as the previous (although you used my preferred naming scheme on this one..?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit was a WIP where I was transitioning from this style (that you like better) to the style I used in #[repr(C)]
(that you like less).
@Havvy what's the status of this? |
The somewhat conflicting feedback scared me on this, and I still need to actually look at the feedback... |
Ping @Havvy, did you get a chance to actually look at the feedback? |
I don't know if and when I'd get back to this, and it needs a lot of work. So I'm going to just close this. |
This PR does two things:
Incidentally, fixes #244