Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnamed fields of struct and union type #2102

Merged
merged 22 commits into from
Apr 9, 2018
Merged
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
3946e44
Add unnamed fields RFC
joshtriplett Aug 9, 2017
a29c4b7
Explicitly note that we cannot implement this as a macro
joshtriplett Aug 9, 2017
49466b1
Mention alias mechanism as an alternative
joshtriplett Aug 9, 2017
55ae14f
Clarify the explanation of `siginfo_t`
joshtriplett Aug 9, 2017
ca3ca0c
Mention other interface issues this does not attempt to address
joshtriplett Aug 17, 2017
592b8b9
Add an explanation of the mental model, with a diagram
joshtriplett Aug 17, 2017
275a2c2
Per discussion, limit to repr(C)
joshtriplett Aug 17, 2017
49db3d4
Document precedent for limiting to repr(C)
joshtriplett Aug 17, 2017
bd37bd2
Add alternative syntax for declaring unnamed fields with named types
joshtriplett Aug 17, 2017
bb350be
Allow struct-in-struct and union-in-union
joshtriplett Sep 7, 2017
bd98f9f
Don't enumerate (a subset of) possible visibilities for fields
joshtriplett Sep 8, 2017
3ce66a1
Update summary and parsing section for struct-in-struct and union-in-…
joshtriplett Sep 8, 2017
e164166
Change the syntax to use a field name of `_`
joshtriplett Sep 8, 2017
45350b4
Add a section on pattern matching
joshtriplett Sep 8, 2017
c7961d8
Explicitly state that the layout and alignment must match the C ABI
joshtriplett Sep 8, 2017
e504dd7
Allow unnamed fields with named types
joshtriplett Sep 8, 2017
917e073
Clarify union field borrowing
joshtriplett Sep 8, 2017
9e2f5f0
Explicitly add `repr(C)` in all examples
joshtriplett Sep 8, 2017
a826357
Only propagate repr(C); show how to write repr(packed) explicitly
joshtriplett Sep 8, 2017
2ceb8e7
Add a clarification for a corner case involving generics
joshtriplett Sep 18, 2017
adcdbf0
Explicitly prohibit type parameters as named types of unnamed fields
joshtriplett Jan 29, 2018
1242d1a
RFC 2102
Centril Apr 9, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
350 changes: 350 additions & 0 deletions text/0000-unnamed-fields.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,350 @@
- Feature Name: unnamed_fields
- Start Date: 2017-08-05
- RFC PR:
- Rust Issue:

# Summary
[summary]: #summary

Allow unnamed fields of `union` and `struct` type, contained within structs and
unions, respectively; the fields they contain appear directly within the
containing structure, with the use of `union` and `struct` determining which
fields have non-overlapping storage (making them usable at the same time).
This allows grouping and laying out fields in arbitrary ways, to match C data
structures used in FFI. The C11 standard allows this, and C compilers have
allowed it for decades as an extension. This proposal allows Rust to represent
such types using the same names as the C structures, without interposing
artificial field names that will confuse users of well-established interfaces
from existing platforms.

# Motivation
[motivation]: #motivation

Numerous C interfaces follow a common pattern, consisting of a `struct`
containing discriminants and common fields, and an unnamed `union` of fields
specific to certain values of the discriminants. To group together fields used
together as part of the same variant, these interfaces also often use unnamed
`struct` types.

Thus, `struct` defines a set of fields that can appear at the same time, and
`union` defines a set of mutually exclusive overlapping fields.

This pattern appears throughout many C APIs. The Windows and POSIX APIs both
use this pattern extensively. However, Rust currently can't represent this
pattern in a straightforward way. While Rust supports structs and unions, every
such struct and union must have a field name. When creating a binding to such
an interface, whether manually or using a binding generator, the binding must
invent an artificial field name that does not appear in the original interface.

This RFC proposes a minimal mechanism to support such interfaces in Rust.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

This explanation should appear after the definition of `union`, and after an
explanation of the rationale for `union` versus `enum` in Rust.

Please note that most Rust code will want to use an `enum` to define types that
contain a discriminant and various disjoint fields. The unnamed field mechanism
here exist primarily for compatibility with interfaces defined by non-Rust
languages, such as C. Types declared with this mechanism require `unsafe` code
to access.

A `struct` defines a set of fields all available at the same time, with storage
available for each. A `union` defines (in an unsafe, unchecked manner) a set of
mutually exclusive fields, with overlapping storage. Some types and interfaces
may require nesting such groupings. For instance, a `struct` may contain a set
of common fields and a `union` of fields needed for different variations of the
structure; conversely, a `union` contain a `struct` grouping together fields
needed simultaneously.

Such groupings, however, do not always have associated types and names. A
structure may contain groupings of fields where the fields have meaningful
names, but the groupings of fields do not. In this case, the structure can
contain *unnamed fields* of `struct` or `union` type, to group the fields
together, and determine which fields overlap.

As an example, when defining a `struct`, you may have a set of fields that will
never be used at the same time, so you could overlap the storage of those
fields. This pattern often occurs within C APIs, when defining an interface
similar to a Rust `enum`. You could do so by declaring a separate `union` type
and a field of that type. With the unnamed fields mechanism, you can also
define an unnamed grouping of overlapping fields inline within the `struct`,
using the `union` keyword:

```rust
struct S {
a: u32,
union {
b: u32,
c: f32,
},
d: u64,
}
```

Given a struct `s` of this type, code can access `s.a`, `s.d`, and either `s.b`
or `s.c`. Accesses to `a` and `d` can occur in safe code; accesses to `b` and
`c` require unsafe code, and `b` and `c` overlap, requiring care to access only
the field whose contents make sense at the time. As with any `union`, code
cannot borrow `s.b` and `s.c` simultaneously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given a struct s of this type, code can access s.a, s.d, and either s.b
or s.c. Accesses to a and d can occur in safe code; accesses to b and
c require unsafe code

Because of the siginfo_t example, where different implementations of POSIX put different subsets of the standardized fields into an anonymous union, I recommend that this be changed: if a struct has an anonymous union inside, it should become unsafe to access any of its fields.

Conversely, sometimes when defining a `union`, you may want to group multiple
fields together and make them available simultaneously, with non-overlapping
storage. You could do so by defining a separate `struct`, and placing an
instance of that `struct` within the `union`. With the unnamed fields
mechanism, you can also define an unnamed grouping of non-overlapping fields
inline within the `union`, using the `struct` keyword:

```rust
union U {
a: u32,
struct {
b: u16,
c: f16,
},
d: f32,
}
```

Given a union `u` of this type, code can access `u.a`, or `u.d`, or both `u.b`
and `u.c`. Since all of these fields can potentially overlap with others,
accesses to any of them require unsafe code; however, `b` and `c` do not
overlap with each other. Code can borrow `u.b` and `u.c` simultaneously, but
cannot borrow any other fields at the same time.

Unnamed fields can contain other unnamed fields. For example:

```rust
struct S {
a: u32,
union {
b: u32,
struct {
c: u16,
d: f16,
},
e: f32,
},
f: u64,
}
```

This structure contains six fields: `a`, `b`, `c`, `d`, `e`, and `f`. Safe code
can access fields `a` and `f`, at any time, since those fields do not lie
within a union and do not overlap with any other field. Unsafe code can access
the remaining fields. This definition effectively acts as the overlap of the
following three structures:

```rust
// variant 1
struct S {
a: u32,
b: u32,
f: u64,
}

// variant 2
struct S {
a: u32,
c: u16,
d: f16,
f: u64,
}

// variant 3
struct S {
a: u32,
e: f32,
f: u64,
}
```

## Instantiation

Given the following declaration:

```rust
struct S {
a: u32,
union {
b: u32,
struct {
c: u16,
d: f16,
},
e: f32,
},
f: u64,
}
```

All of the following will instantiate a value of type `S`:

- `S { a: 1, b: 2, f: 3.0 }`
- `S { a: 1, c: 2, d: 3.0, f: 4.0 }`
- `S { a: 1, e: 2.0, f: 3.0 }`

## Representation

By default, Rust lays out structures using its native representation,
`repr(Rust)`; that representation permits any layout that can store all the
non-overlapping fields simultaneously, and makes no other guarantees about the
storage of unnamed fields.

When using this mechanism to define a C interface, remember to use the
`repr(C)` attribute to match C's data structure layout. Any representation
attribute applied to the top-level structure also applies to every unnamed
field within that declaration. Such a structure defined with `repr(C)` will use
a representation identical to the same structure with all unnamed fields
transformed to equivalent named fields of a struct or union type with the same
fields.

## Derive

A `struct` or `union` containing unnamed fields may derive `Copy`, `Clone`, or
both, if all the fields it contains (including within unnamed fields) also
implement `Copy`.

A `struct` containing unnamed fields may derive `Clone` if every field
contained directly in the `struct` implements `Clone`, and every field
contained within an unnamed `union` (directly or indirectly) implements `Copy`.

## Ambiguous field names

You cannot use this feature to define multiple fields with the same name. For
instance, the following definition will produce an error:

```rust
struct S {
a: u32,
union {
a: u32,
b: f32,
},
}
```

The error will identify the duplicate `a` fields as the sources of the error.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

## Parsing

Within a struct's fields, in place of a field name and value, allow
`union { fields }`, where `fields` allows everything allowed within a `union`
declaration. Conversely, within a union's fields, in place of a field name
and value, allow `struct { fields }`, where `fields` allows everything allowed
within a `struct` declaration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C version of this feature permits an anonymous struct inside a struct (which might itself be anonymous) and an anonymous union inside a union (ditto). Such nested uses are not necessarily equivalent to the manually-inlined version because of padding and alignment, so I recommend Rust do likewise.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example where a struct within a struct and the equivalent "flat" struct would have different padding/alignment ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roblabla Sure, that's easy:

#[repr(C)]
struct Foo {
    u16 a,
    struct {
        u16 b,
        struct {
            u16 c,
        },
    },
    u16 d,
}

with an ABI where the minimum alignment for all structs is at least 32 bits: there will be 16 bits of padding before each of b, c, and d, and 16 bits of padding after d, and depending on how clever the ABI is about eliminating redundant padding, there might be even more. (The size of a type must always be a multiple of its intrinsic alignment, because there is never padding between the elements of an array.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fields are not inlined for the sake of layout, they are only "inlined" in terms of the syntax you use to refer to the fields. Because this RFC explicitly only works with #[repr(C)] it must necessarily match what C does, so there's no decisions to be made about which layout to use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another example where nesting structs can produce a different layout:

#include <stdint.h>
#include <stdio.h>

struct S1 {
    uint8_t a;
    uint8_t b;
    uint16_t c;
};

struct S2 {
    uint8_t a;
    struct {
        uint8_t b;
        uint16_t c;
    };
};

int main(void) {
    printf("%zu %zu\n", sizeof(struct S1), sizeof(struct S2));
}
/tmp$ gcc test.c -o test && ./test
4 6

Note that the keyword `struct` cannot appear as a field name, making it
entirely unambiguous. The contextual keyword `union` could theoretically appear
as a field name, but an open brace cannot appear immediately after a field
name, allowing disambiguation via a single token of context (`union {`).

## Layout and Alignment

The layout and alignment of a `struct` or `union` containing unnamed
fields should look the same as if each unnamed field has a separately declared
type and a named field of that type, rather than as if the fields appeared
directly within the containing `struct` or `union`. In some cases, this may
result in different alignment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than going into this much detail, suggest saying that the layout and alignment are whatever the C ABI requires for this case (because of #[repr(C)]), which is probably the same as if the unnamed sub-structures were named.

## Simultaneous Borrows

An unnamed `struct` within a `union` should behave the same with respect to
borrows as a named and typed `struct` within a `union`, allowing borrows of
multiple fields from within the `struct`, while not permitting borrows of other
fields in the `union`.

## Visibility

Each field within an unnamed `struct` or `union` may have an attached
visibility (`pub` or `pub(crate)`). An unnamed field itself does not have its
own visibility; all of its fields appear directly within the containing
structure, and their own visibilities apply.

## Documentation

Public fields within an unnamed `struct` or `union` should appear in the
rustdoc documentation of the outer structure, along with any doc comment or
attribute attached to those fields. The rendering should include all unnamed
fields that contain (at any level of nesting) a public field, and should
include the `// some fields omitted` note within any `struct` or `union` that
has non-public fields, including unnamed fields.

Any unnamed field that contains only non-public fields should be omitted
entirely, rather than included with its fields omitted. Omitting an unnamed
field should trigger the `// some fields omitted` note.

# Drawbacks
[drawbacks]: #drawbacks

This introduces additional complexity in structure definitions. Strictly
speaking, C interfaces do not *require* this mechanism; any such interface
*could* define named struct or union types, and define named fields of that
type. This RFC provides a usability improvement for such interfaces.

# Rationale and Alternatives
[alternatives]: #alternatives

Choosing not to implement this feature would force binding generators (and the
authors of manual bindings) to invent new names for these groupings of fields.
Users would need to look up the names for those groupings, and would not be
able to rely on documentation for the underlying interface. Furthermore,
binding generators would not have any basis on which to generate a meaningful
name.

Several alternative syntaxes could exist to designate the equivalent of
`struct` and `union`. Such syntaxes would declare the same underlying types.
However, inventing a novel syntax for this mechanism would make it less
familiar both to Rust users accustomed to structs and unions as well as to C
users accustomed to unnamed struct and union fields.

We could introduce a mechanism to declare arbitrarily positioned fields, such
as attributes declaring the offset of each field. The same mechanism was also
proposed in response to the original union RFC. However, as in that case, using
struct and union syntax has the advantage of allowing the compiler to implement
the appropriate positioning and alignment of fields.

In addition to introducing just this narrow mechanism for defining unnamed
fields, we could introduce a fully general mechanism for anonymous `struct` and
`union` types that can appear anywhere a type can appear, including in function
arguments and return values, named structure fields, or local variables. Such
an anonymous type mechanism would *not* replace the need for unnamed fields,
however, and vice versa. Furthermore, anonymous types would interact
extensively with far more aspects of Rust. Such a mechanism should appear in a
subsequent RFC.

This mechanism intentionally does not provide any means to reference an unnamed
field as a whole, or its type. That intentional limitation avoids allowing such
unnamed types to propagate.

# Unresolved questions
[unresolved]: #unresolved-questions

This proposal does *not* support anonymous `struct` and `union` types that can
appear anywhere a type can appear, such as in the type of an arbitrary named
field or variable. Doing so would further simplify some C interfaces, as well
as native Rust constructs.

However, such a change would also cascade into numerous other changes, such as
anonymous struct and union literals. Unlike this proposal, anonymous aggregate
types for named fields have a reasonable alternative, namely creating and using
separate types; binding generators could use that mechanism, and a macro could
allow declaring those types inline next to the fields that use them.

Furthermore, during the pre-RFC process, that portion of the proposal proved
more controversial. And such a proposal would have a much more expansive impact
on the language as a whole, by introducing a new construct that works anywhere
a type can appear. Thus, this proposal provides the minimum change necessary to
enable bindings to these types of C interfaces.

This proposal only permits an unnamed `struct` to appear within a `union` and
vice versa. An unnamed `union` within a `union` doesn't seem to have any useful
value. An unnamed `struct` within a `struct` works in C11, and does affect
alignment, but does not seem particularly useful without the ability to
reference the unnamed field. Nonetheless, extending this feature to allow
unnamed `struct` and `union` fields to appear within either a `struct` or
`union` would not introduce much additional complexity.