-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unnamed fields of struct and union type #2102
Changes from 1 commit
3946e44
a29c4b7
49466b1
55ae14f
ca3ca0c
592b8b9
275a2c2
49db3d4
bd37bd2
bb350be
bd98f9f
3ce66a1
e164166
45350b4
c7961d8
e504dd7
917e073
9e2f5f0
a826357
2ceb8e7
adcdbf0
1242d1a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,350 @@ | ||
- Feature Name: unnamed_fields | ||
- Start Date: 2017-08-05 | ||
- RFC PR: | ||
- Rust Issue: | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Allow unnamed fields of `union` and `struct` type, contained within structs and | ||
unions, respectively; the fields they contain appear directly within the | ||
containing structure, with the use of `union` and `struct` determining which | ||
fields have non-overlapping storage (making them usable at the same time). | ||
This allows grouping and laying out fields in arbitrary ways, to match C data | ||
structures used in FFI. The C11 standard allows this, and C compilers have | ||
allowed it for decades as an extension. This proposal allows Rust to represent | ||
such types using the same names as the C structures, without interposing | ||
artificial field names that will confuse users of well-established interfaces | ||
from existing platforms. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
Numerous C interfaces follow a common pattern, consisting of a `struct` | ||
containing discriminants and common fields, and an unnamed `union` of fields | ||
specific to certain values of the discriminants. To group together fields used | ||
together as part of the same variant, these interfaces also often use unnamed | ||
`struct` types. | ||
|
||
Thus, `struct` defines a set of fields that can appear at the same time, and | ||
`union` defines a set of mutually exclusive overlapping fields. | ||
|
||
This pattern appears throughout many C APIs. The Windows and POSIX APIs both | ||
use this pattern extensively. However, Rust currently can't represent this | ||
pattern in a straightforward way. While Rust supports structs and unions, every | ||
such struct and union must have a field name. When creating a binding to such | ||
an interface, whether manually or using a binding generator, the binding must | ||
invent an artificial field name that does not appear in the original interface. | ||
|
||
This RFC proposes a minimal mechanism to support such interfaces in Rust. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
This explanation should appear after the definition of `union`, and after an | ||
explanation of the rationale for `union` versus `enum` in Rust. | ||
|
||
Please note that most Rust code will want to use an `enum` to define types that | ||
contain a discriminant and various disjoint fields. The unnamed field mechanism | ||
here exist primarily for compatibility with interfaces defined by non-Rust | ||
languages, such as C. Types declared with this mechanism require `unsafe` code | ||
to access. | ||
|
||
A `struct` defines a set of fields all available at the same time, with storage | ||
available for each. A `union` defines (in an unsafe, unchecked manner) a set of | ||
mutually exclusive fields, with overlapping storage. Some types and interfaces | ||
may require nesting such groupings. For instance, a `struct` may contain a set | ||
of common fields and a `union` of fields needed for different variations of the | ||
structure; conversely, a `union` contain a `struct` grouping together fields | ||
needed simultaneously. | ||
|
||
Such groupings, however, do not always have associated types and names. A | ||
structure may contain groupings of fields where the fields have meaningful | ||
names, but the groupings of fields do not. In this case, the structure can | ||
contain *unnamed fields* of `struct` or `union` type, to group the fields | ||
together, and determine which fields overlap. | ||
|
||
As an example, when defining a `struct`, you may have a set of fields that will | ||
never be used at the same time, so you could overlap the storage of those | ||
fields. This pattern often occurs within C APIs, when defining an interface | ||
similar to a Rust `enum`. You could do so by declaring a separate `union` type | ||
and a field of that type. With the unnamed fields mechanism, you can also | ||
define an unnamed grouping of overlapping fields inline within the `struct`, | ||
using the `union` keyword: | ||
|
||
```rust | ||
struct S { | ||
a: u32, | ||
union { | ||
b: u32, | ||
c: f32, | ||
}, | ||
d: u64, | ||
} | ||
``` | ||
|
||
Given a struct `s` of this type, code can access `s.a`, `s.d`, and either `s.b` | ||
or `s.c`. Accesses to `a` and `d` can occur in safe code; accesses to `b` and | ||
`c` require unsafe code, and `b` and `c` overlap, requiring care to access only | ||
the field whose contents make sense at the time. As with any `union`, code | ||
cannot borrow `s.b` and `s.c` simultaneously. | ||
|
||
Conversely, sometimes when defining a `union`, you may want to group multiple | ||
fields together and make them available simultaneously, with non-overlapping | ||
storage. You could do so by defining a separate `struct`, and placing an | ||
instance of that `struct` within the `union`. With the unnamed fields | ||
mechanism, you can also define an unnamed grouping of non-overlapping fields | ||
inline within the `union`, using the `struct` keyword: | ||
|
||
```rust | ||
union U { | ||
a: u32, | ||
struct { | ||
b: u16, | ||
c: f16, | ||
}, | ||
d: f32, | ||
} | ||
``` | ||
|
||
Given a union `u` of this type, code can access `u.a`, or `u.d`, or both `u.b` | ||
and `u.c`. Since all of these fields can potentially overlap with others, | ||
accesses to any of them require unsafe code; however, `b` and `c` do not | ||
overlap with each other. Code can borrow `u.b` and `u.c` simultaneously, but | ||
cannot borrow any other fields at the same time. | ||
|
||
Unnamed fields can contain other unnamed fields. For example: | ||
|
||
```rust | ||
struct S { | ||
a: u32, | ||
union { | ||
b: u32, | ||
struct { | ||
c: u16, | ||
d: f16, | ||
}, | ||
e: f32, | ||
}, | ||
f: u64, | ||
} | ||
``` | ||
|
||
This structure contains six fields: `a`, `b`, `c`, `d`, `e`, and `f`. Safe code | ||
can access fields `a` and `f`, at any time, since those fields do not lie | ||
within a union and do not overlap with any other field. Unsafe code can access | ||
the remaining fields. This definition effectively acts as the overlap of the | ||
following three structures: | ||
|
||
```rust | ||
// variant 1 | ||
struct S { | ||
a: u32, | ||
b: u32, | ||
f: u64, | ||
} | ||
|
||
// variant 2 | ||
struct S { | ||
a: u32, | ||
c: u16, | ||
d: f16, | ||
f: u64, | ||
} | ||
|
||
// variant 3 | ||
struct S { | ||
a: u32, | ||
e: f32, | ||
f: u64, | ||
} | ||
``` | ||
|
||
## Instantiation | ||
|
||
Given the following declaration: | ||
|
||
```rust | ||
struct S { | ||
a: u32, | ||
union { | ||
b: u32, | ||
struct { | ||
c: u16, | ||
d: f16, | ||
}, | ||
e: f32, | ||
}, | ||
f: u64, | ||
} | ||
``` | ||
|
||
All of the following will instantiate a value of type `S`: | ||
|
||
- `S { a: 1, b: 2, f: 3.0 }` | ||
- `S { a: 1, c: 2, d: 3.0, f: 4.0 }` | ||
- `S { a: 1, e: 2.0, f: 3.0 }` | ||
|
||
## Representation | ||
|
||
By default, Rust lays out structures using its native representation, | ||
`repr(Rust)`; that representation permits any layout that can store all the | ||
non-overlapping fields simultaneously, and makes no other guarantees about the | ||
storage of unnamed fields. | ||
|
||
When using this mechanism to define a C interface, remember to use the | ||
`repr(C)` attribute to match C's data structure layout. Any representation | ||
attribute applied to the top-level structure also applies to every unnamed | ||
field within that declaration. Such a structure defined with `repr(C)` will use | ||
a representation identical to the same structure with all unnamed fields | ||
transformed to equivalent named fields of a struct or union type with the same | ||
fields. | ||
|
||
## Derive | ||
|
||
A `struct` or `union` containing unnamed fields may derive `Copy`, `Clone`, or | ||
both, if all the fields it contains (including within unnamed fields) also | ||
implement `Copy`. | ||
|
||
A `struct` containing unnamed fields may derive `Clone` if every field | ||
contained directly in the `struct` implements `Clone`, and every field | ||
contained within an unnamed `union` (directly or indirectly) implements `Copy`. | ||
|
||
## Ambiguous field names | ||
|
||
You cannot use this feature to define multiple fields with the same name. For | ||
instance, the following definition will produce an error: | ||
|
||
```rust | ||
struct S { | ||
a: u32, | ||
union { | ||
a: u32, | ||
b: f32, | ||
}, | ||
} | ||
``` | ||
|
||
The error will identify the duplicate `a` fields as the sources of the error. | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
## Parsing | ||
|
||
Within a struct's fields, in place of a field name and value, allow | ||
`union { fields }`, where `fields` allows everything allowed within a `union` | ||
declaration. Conversely, within a union's fields, in place of a field name | ||
and value, allow `struct { fields }`, where `fields` allows everything allowed | ||
within a `struct` declaration. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The C version of this feature permits an anonymous There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have an example where a struct within a struct and the equivalent "flat" struct would have different padding/alignment ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @roblabla Sure, that's easy: #[repr(C)]
struct Foo {
u16 a,
struct {
u16 b,
struct {
u16 c,
},
},
u16 d,
} with an ABI where the minimum alignment for all structs is at least 32 bits: there will be 16 bits of padding before each of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fields are not inlined for the sake of layout, they are only "inlined" in terms of the syntax you use to refer to the fields. Because this RFC explicitly only works with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's another example where nesting structs can produce a different layout: #include <stdint.h>
#include <stdio.h>
struct S1 {
uint8_t a;
uint8_t b;
uint16_t c;
};
struct S2 {
uint8_t a;
struct {
uint8_t b;
uint16_t c;
};
};
int main(void) {
printf("%zu %zu\n", sizeof(struct S1), sizeof(struct S2));
}
|
||
Note that the keyword `struct` cannot appear as a field name, making it | ||
entirely unambiguous. The contextual keyword `union` could theoretically appear | ||
as a field name, but an open brace cannot appear immediately after a field | ||
name, allowing disambiguation via a single token of context (`union {`). | ||
|
||
## Layout and Alignment | ||
|
||
The layout and alignment of a `struct` or `union` containing unnamed | ||
fields should look the same as if each unnamed field has a separately declared | ||
type and a named field of that type, rather than as if the fields appeared | ||
directly within the containing `struct` or `union`. In some cases, this may | ||
result in different alignment. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rather than going into this much detail, suggest saying that the layout and alignment are whatever the C ABI requires for this case (because of |
||
## Simultaneous Borrows | ||
|
||
An unnamed `struct` within a `union` should behave the same with respect to | ||
borrows as a named and typed `struct` within a `union`, allowing borrows of | ||
multiple fields from within the `struct`, while not permitting borrows of other | ||
fields in the `union`. | ||
|
||
## Visibility | ||
|
||
Each field within an unnamed `struct` or `union` may have an attached | ||
visibility (`pub` or `pub(crate)`). An unnamed field itself does not have its | ||
own visibility; all of its fields appear directly within the containing | ||
structure, and their own visibilities apply. | ||
|
||
## Documentation | ||
|
||
Public fields within an unnamed `struct` or `union` should appear in the | ||
rustdoc documentation of the outer structure, along with any doc comment or | ||
attribute attached to those fields. The rendering should include all unnamed | ||
fields that contain (at any level of nesting) a public field, and should | ||
include the `// some fields omitted` note within any `struct` or `union` that | ||
has non-public fields, including unnamed fields. | ||
|
||
Any unnamed field that contains only non-public fields should be omitted | ||
entirely, rather than included with its fields omitted. Omitting an unnamed | ||
field should trigger the `// some fields omitted` note. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
This introduces additional complexity in structure definitions. Strictly | ||
speaking, C interfaces do not *require* this mechanism; any such interface | ||
*could* define named struct or union types, and define named fields of that | ||
type. This RFC provides a usability improvement for such interfaces. | ||
|
||
# Rationale and Alternatives | ||
[alternatives]: #alternatives | ||
|
||
Choosing not to implement this feature would force binding generators (and the | ||
authors of manual bindings) to invent new names for these groupings of fields. | ||
Users would need to look up the names for those groupings, and would not be | ||
able to rely on documentation for the underlying interface. Furthermore, | ||
binding generators would not have any basis on which to generate a meaningful | ||
name. | ||
|
||
Several alternative syntaxes could exist to designate the equivalent of | ||
`struct` and `union`. Such syntaxes would declare the same underlying types. | ||
However, inventing a novel syntax for this mechanism would make it less | ||
familiar both to Rust users accustomed to structs and unions as well as to C | ||
users accustomed to unnamed struct and union fields. | ||
|
||
We could introduce a mechanism to declare arbitrarily positioned fields, such | ||
as attributes declaring the offset of each field. The same mechanism was also | ||
proposed in response to the original union RFC. However, as in that case, using | ||
struct and union syntax has the advantage of allowing the compiler to implement | ||
the appropriate positioning and alignment of fields. | ||
|
||
In addition to introducing just this narrow mechanism for defining unnamed | ||
fields, we could introduce a fully general mechanism for anonymous `struct` and | ||
`union` types that can appear anywhere a type can appear, including in function | ||
arguments and return values, named structure fields, or local variables. Such | ||
an anonymous type mechanism would *not* replace the need for unnamed fields, | ||
however, and vice versa. Furthermore, anonymous types would interact | ||
extensively with far more aspects of Rust. Such a mechanism should appear in a | ||
subsequent RFC. | ||
|
||
This mechanism intentionally does not provide any means to reference an unnamed | ||
field as a whole, or its type. That intentional limitation avoids allowing such | ||
unnamed types to propagate. | ||
|
||
# Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
This proposal does *not* support anonymous `struct` and `union` types that can | ||
appear anywhere a type can appear, such as in the type of an arbitrary named | ||
field or variable. Doing so would further simplify some C interfaces, as well | ||
as native Rust constructs. | ||
|
||
However, such a change would also cascade into numerous other changes, such as | ||
anonymous struct and union literals. Unlike this proposal, anonymous aggregate | ||
types for named fields have a reasonable alternative, namely creating and using | ||
separate types; binding generators could use that mechanism, and a macro could | ||
allow declaring those types inline next to the fields that use them. | ||
|
||
Furthermore, during the pre-RFC process, that portion of the proposal proved | ||
more controversial. And such a proposal would have a much more expansive impact | ||
on the language as a whole, by introducing a new construct that works anywhere | ||
a type can appear. Thus, this proposal provides the minimum change necessary to | ||
enable bindings to these types of C interfaces. | ||
|
||
This proposal only permits an unnamed `struct` to appear within a `union` and | ||
vice versa. An unnamed `union` within a `union` doesn't seem to have any useful | ||
value. An unnamed `struct` within a `struct` works in C11, and does affect | ||
alignment, but does not seem particularly useful without the ability to | ||
reference the unnamed field. Nonetheless, extending this feature to allow | ||
unnamed `struct` and `union` fields to appear within either a `struct` or | ||
`union` would not introduce much additional complexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the
siginfo_t
example, where different implementations of POSIX put different subsets of the standardized fields into an anonymous union, I recommend that this be changed: if a struct has an anonymous union inside, it should become unsafe to access any of its fields.