Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnamed fields of struct and union type #2102

Merged
merged 22 commits into from
Apr 9, 2018

Conversation

joshtriplett
Copy link
Member

@joshtriplett joshtriplett commented Aug 9, 2017

Rendered
Tracking issue

Allow unnamed fields of union and struct type, contained within structs and unions, respectively; the fields they contain appear directly within the containing structure, with the use of union and struct determining which fields have non-overlapping storage (making them usable at the same time). This allows grouping and laying out fields in arbitrary ways, to match C data structures used in FFI. The C11 standard allows this, and C compilers have allowed it for decades as an extension. This proposal allows Rust to represent such types using the same names as the C structures, without interposing artificial field names that will confuse users of well-established interfaces from existing platforms.

This was originally proposed in a pre-RFC, but that proposal also included a more general mechanism for anonymous types for named fields; that more general mechanism proved controversial, and people rightfully observed that we should add such types to the type system in general, not just within structures. I would like to see a solution for that as well. But based on the discussion from that pre-RFC, in this RFC I've separated out and proposed a minimal change that allows us to represent this common FFI pattern in Rust.

For more information about unnamed fields in C, see the GCC documentation.

CC @retep998 (Windows APIs use this extensively), @zackw (siginfo_t and similar structures), @roblabla (Android APIs), @aturon

@joshtriplett joshtriplett added the T-lang Relevant to the language team, which will review and decide on the RFC. label Aug 9, 2017
@retep998
Copy link
Member

retep998 commented Aug 9, 2017

Possible alternative:

Use _ as the identifier of a field. While the type still has to be defined separately, the field will effectively not have a name. Under this alternative the following would be legal:

struct Foo { x: i32, y: i32 }
struct Bar { _: Foo, z: i32 }
fn main() {
    let bar = Bar { x: 1, y: 2, z: 3 };
    bar.x = bar.y * 2 + bar.z;
}

@joshtriplett
Copy link
Member Author

@retep998 There are two aspects to that, and I'd like to unpack them and discuss them separately: the use of _ as a field name, and the separate declaration of types.

First, the use of _ as a field name. This seems unusual to me, because you can currently legally use _ident as a field name, if you want to suppress warnings about unused fields. Using _ as the name is not currently legal, so this wouldn't break existing code, but attaching this very different semantic to it (compared to the difference between let _ = ... and let _x = ...) seems confusing to me.

Second, the separate declarations of types. Since the inner fields appear directly within the containing type, it seems to me an advantage to require declaring them inline within that containing type, so that you can easily find and reason about all of those fields in one place. Allowing a separate structure declaration, and then expanding the fields of that structure declaration into another structure, seems more likely to lead to confusion and action-at-a-distance. On top of that, it would require several additional semantic considerations. What happens if Foo is declared in a separate module? What if not all of its fields are public? And on top of all that, you'd have to invent a name for it, and the documentation would likely look less clear.

I can certainly document this as an alternative within the RFC. But could you explain whether this is an alternative you would prefer, and if so, why? Or are you primarily mentioning it for the sake of completeness, to make sure we've fully explored the design space?

@cramertj
Copy link
Member

cramertj commented Aug 9, 2017

RE _: Golang has anonymous struct fields, which seem very similar:

type Foo struct {
	x int
	y int
}

type Bar struct {
	Foo
	z int
}

Interestingly, Go does not consider it an error to have overlapping field names-- it just uses the top-most field with the given name.

@retep998
Copy link
Member

retep998 commented Aug 9, 2017

Well the big advantage that I see it having is the fact that it removes all questions about the behavior of types declared inline. Questions regarding visibility, attributes, rustdoc, trait implementations, and so on are rendered moot because the type is defined normally instead of inline.

Also, your RFC combines two concepts, anonymous fields and anonymous structs. It is entirely possible in C to have an anonymous inline struct with a named field, or to have a named inline struct with a named field, whereas this RFC only allows for anonymous inline structs with an anonymous field. My alternative touches on exclusively anonymous fields allowing it to be more flexible in the ways it can be used, and could be combined with a future proposal for anonymous structs/unions.

Furthermore, for someone actually using such a type, they would not be able to observe any difference between a struct declaring an anonymous field of an anonymous type and an anonymous field of a named type, unless they looked in rustdoc and saw that the field's type had a name. In either case they'd still be able to do foo.a instead of foo.x.a. Having to invent a name for the type of an anonymous field is an acceptable drawback to the simplicity and flexibility of my proposal in my opinion.

@RalfJung
Copy link
Member

RalfJung commented Aug 9, 2017

I am wondering if having to give a name to the field is really such a big problem that the entire language needs to get a new concept for the benefit of a really small fraction of the code written in Rust.

Not being able to see the structure of a type (as in, what overlaps and what does not) is a huge footgun; mostly the compiler should do enough tracking to prevent mistakes but I still wouldn't be surprised if, in unsafe code, people would end up writing to foo.b and then reading from foo.c. With the current unions and structs, at least all fields of one "level" always behave the same, making it much less likely to do mistakes like this.

Is there any situation besides C(++) FFI where using this new feature would be a good idea? I don't think so. This is a misfeature of C, in my opinion, and we should have really good reasons to copy that misfeature. Maybe FFI is a good enough reason, but even then I would prefer the RFC to explicitly say that we are not adding this because we think it is a good idea to have the feature on its own merrits.

Or maybe, actually, the feature has its own merrits, but then the RFC should state them.

@roblabla
Copy link

roblabla commented Aug 9, 2017

So my use-case is for Binder (the kerne API, header at https://github.com/torvalds/linux/blob/master/include/uapi/linux/android/binder.h). This includes both nested unnamed fields (which this proposal covers) and nested named fields.

An example of nested unnamed field :

struct somestruct {
        // A nested unnamed field
	union {
		binder_uintptr_t	pad_binder;
		__u32			fd;
	};
        // A nested named field
	union {
		__u32	handle;
		binder_uintptr_t ptr;
        } target;
};

Currently, when running bindgen, it will generate types like somestruct__bindgen_type_1 to hold the nested types. With this proposal, it could probably avoid generating those types (even in the nested named field case, with some prefix/suffix).

So big 👍 for me.

Regarding @retep998 's proposal, I don't think that's such a good idea as my main goal is actually to "hide" those (unnamed) structs from the documentation.

As for @RalfJung 's concern, I agree that this is a misfeature and probably shouldn't be used outside of providing FFI/mirroring an existing API. I guess we could make it a warning to use this feature without #[repr(C)] ?

@est31
Copy link
Member

est31 commented Aug 9, 2017

I agree with @RalfJung : this would be a bad feature for general Rust structs.

As FFI is one of Rust's goals though, some way of doing this for FFI is desirable. Therefore I agree with @roblabla and think we should restrict this feature to #[repr(C)] structs (with a compilation error if you use it on structs without that attribute, not a warning, nor an error by default lint), similarly to how variadics are restricted to extern "C" functions.

@arielb1
Copy link
Contributor

arielb1 commented Aug 10, 2017

@est31

There already exists "some way of doing this for FFI" - give each union field a name (if you're using bindgen, maybe have an annotation file to allow giving these fields a human-readable name) and use them as usual.

Variadics are basically glued-on to allow accessing the relevant C ABI.

@arielb1
Copy link
Contributor

arielb1 commented Aug 10, 2017

Rather than introducing unnamed fields, we could introduce a mechanism to define field aliases for a type, such that for struct S, s.b desugars to s.b_or_c.b. However, such a mechanism does not seem any simpler than unnamed fields, and would not align as well with the potential future introduction of full anonymous structure types.

That's basically inherent associated fields (#1546). We could have:

#[repr(C)]
pub union _Bindgen_somestruct_Field_0 {
    pad_binder: binder_uintptr_t,
    fd: u32
}

#[repr(C)]
pub union _Bindgen_somestruct_Field_target {
    handle: u32,
    ptr: binder_uintptr_t,
}

#[repr(C)]
pub struct somestruct {
    _Field_0: _Bindgen_somestruct_Field_0,
    target: _Bindgen_somestruct_Field_target
};

impl somestruct {
    unsafe pad_binder: self._Field_0.pad_binder;
    unsafe fd: self._Field_0.fd;
}

fn foo() {
    let somestruct = unsafe { somestruct {
        fd: 2,
        target
    }};
}

Anonymous types are just annoying because you can't put them in structs, can't impl them, etc.

@retep998
Copy link
Member

@roblabla It's a good thing we have #[doc(hidden)] so we can hide those types from being listed n rustdoc.

@joshtriplett
Copy link
Member Author

@retep998

Also, your RFC combines two concepts, anonymous fields and anonymous structs. It is entirely possible in C to have an anonymous inline struct with a named field, or to have a named inline struct with a named field, whereas this RFC only allows for anonymous inline structs with an anonymous field.

I've tried very intentionally to separate those two, focusing exclusively on one concept (unnamed fields), and not the other (anonymous types), by ensuring that the type and aggregate is never visible or accessible, and is used solely for structure layout and field naming. I intended that as the biggest reduction in scope from the pre-RFC to this RFC.

Standard C doesn't allow you to have a inline struct while naming the struct type; that's a Microsoft extension. See the mention of -fms-extensions in https://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html. You can have a named field with an anonymous type, and I do think we should support that at some point, but I'd like to handle that entirely separately if possible; that doesn't affect the ergonomics of all the other code using the resulting type, and we could theoretically support that with a macro.

I also feel like inlining a named structure's fields into another structure has more of an "action at a distance" feeling; you could even do that with a type in another module, which seems quite problematic. Inlining the declaration seems much clearer to me. That also produces clearer rustdoc results that reflect the resulting structure. We could teach rustdoc to inline the fields and their documentation, but that makes the documentation not match the code. I'd rather have the code match the underlying interface.

@joshtriplett
Copy link
Member Author

@RalfJung The primary value in this feature is for FFI, and I emphasized that in many places of the RFC. In particular:

Please note that most Rust code will want to use an enum to define types that contain a discriminant and various disjoint fields. The unnamed field mechanism here exist primarily for compatibility with interfaces defined by non-Rust languages, such as C. Types declared with this mechanism require unsafe code to access.
If it would help, I'd be happy to emphasize this even more emphatically.

@est31 Restricting this to repr(C) would still allow it to handle the FFI interfaces I intended it to handle. I do think there's value to providing more structure layout control for Rust data structures, but at the same time, I won't fight particularly hard for that unless there are people especially interested in using this for non-FFI. So, if there's a consensus that this would be more appealing if limited to repr(C), I'll make that change in the RFC.

@sfackler
Copy link
Member

I think we'd need to allow all non-Rust reprs rather than only for #[repr(C)] since you could have a packed C type, right?

@joshtriplett
Copy link
Member Author

@sfackler I think the intent was that you must have repr(C), and you might also have other reprs such as packed.

@joshtriplett
Copy link
Member Author

@arielb1

Anonymous types are just annoying because you can't put them in structs, can't impl them, etc.

To some extent, I intended that as a feature, for simplicity's sake. Don't think of this as an "anonymous type", think of this as a way of laying out the fields of a structure. I think this feature should be as simple as possible to handle the specific issue in question.

We may want to, later, have an "anonymous type mechanism". In the pre-RFC, that proved wildly controversial. I'd like to focus on the narrowest possible thing that solves the FFI issue here.

@joshtriplett
Copy link
Member Author

joshtriplett commented Aug 10, 2017

@RalfJung

I am wondering if having to give a name to the field is really such a big problem that the entire language needs to get a new concept for the benefit of a really small fraction of the code written in Rust.

One of Rust's major strengths is the exceptional support for interfacing to C; that's a critical part of the story for replacing C code with Rust code. This is a pattern that occurs pervasively in Windows APIs, POSIX APIs, Linux APIs, and many other C APIs. It's so common that C11 standardized it.

I also don't see this as a completely "new concept"; this is just a way of describing how the fields of a structure are laid out. The way C programmers think of this is that the uses of struct { ... } and union { ... } change the direction that fields are being laid out: if you think of memory addresses as going vertically, struct is vertical, and union is horizontal. So, there's a very clear mapping between this:

struct S {
    a: u32,
    union {
        b: u32,
        struct {
            c: u16,
            d: f16,
        },
        e: f32,
    },
    f: u64,
}

And this:

+-----------+ 0
|     a     |
+-----------+ 4
| b | c | e |
|   +---+   | 6
|   | d |   |
+-----------+ 8
|     f     |
+-----------+ 16

(In fact, I should probably put that diagram into the RFC.)

@est31
Copy link
Member

est31 commented Aug 10, 2017

@joshtriplett

Restricting this to repr(C) would still allow it to handle the FFI interfaces I intended it to handle.

That's why I was suggesting it!

I don't think this feature should be had outside of support for C.
Regarding repr(packed), is this feature actually compatible in combination with repr(packed)? I have thought repr(packed) is only useful for wrapper structs, aka ones with only one element.

@joshtriplett
Copy link
Member Author

@est31 No, repr(packed) makes sense for any kind of struct. You can use it to define a structure that looks like "byte, short, byte, packed into 4 bytes", which does occur in some real structures, as bad of an idea as it is. So, this does make sense for repr(C), or for repr(C,packed).

@zackw
Copy link
Contributor

zackw commented Aug 10, 2017

I don't have any immediate use for this feature other than FFI, but it is very much needed for FFI. Kernel-user interfaces on both Unix and Windows use the C equivalent extensively, which means something that provides this functionality is unavoidable. The absence of this feature is the reason I gave up on trying to get waitid support into nix, for instance. I tend to think that "you can't access these fields outside of an unsafe block" is enough of an obstacle to discourage people from using this feature unless they really need to.

I'm going to work through the process of mapping the OSX struct ip6_hdr (see netinet/ip6.h) (I happen to be typing this on a Mac, and it's relatively compact but also pretty durn complicated, and also it may actually be specified to be this way in an RFC) into the proposed Rust binding. Here's the C:

struct ip6_hdr {
        union {
                struct ip6_hdrctl {
                        u_int32_t ip6_un1_flow; /* 20 bits of flow-ID */
                        u_int16_t ip6_un1_plen; /* payload length */
                        u_int8_t  ip6_un1_nxt;  /* next header */
                        u_int8_t  ip6_un1_hlim; /* hop limit */
                } ip6_un1;
                u_int8_t ip6_un2_vfc;   /* 4 bits version, top 4 bits class */
        } ip6_ctlun;
        struct in6_addr ip6_src;        /* source address */
        struct in6_addr ip6_dst;        /* destination address */
} __attribute__((__packed__));

#define ip6_vfc         ip6_ctlun.ip6_un2_vfc
#define ip6_flow        ip6_ctlun.ip6_un1.ip6_un1_flow
#define ip6_plen        ip6_ctlun.ip6_un1.ip6_un1_plen
#define ip6_nxt         ip6_ctlun.ip6_un1.ip6_un1_nxt
#define ip6_hlim        ip6_ctlun.ip6_un1.ip6_un1_hlim
#define ip6_hops        ip6_ctlun.ip6_un1.ip6_un1_hlim

(Yes, this is what you usually see in a C system header when this feature is needed on the Rust side. Most of the interfaces in question, at least on Unix, were specified long before C11 added an anonymous-struct-field feature.)

(It is possible that you are supposed to be able to refer to the ip6_un1 and ip6_ctlun fields directly; they don't have implementation-namespace names. It is also possible that this header doesn't give a [bleep] about namespace cleanliness, which is sadly typical for BSD-lineage network headers. I'm going to assume the latter.)

#[repr(C,packed)]
struct ip6_hdr {
        union {
                struct {
                        ip6_flow : u32, // 20 bits of flow-ID
                        ip6_plen : u16, // payload length
                        ip6_nxt : u8,   // next header
                        ip6_hlim : u8,  // hop limit
                },
                ip6_vfc : u8,      // 4 bits version, top 4 bits class
        },
        ip6_src : in6_addr,        // source address
        ip6_dst : in6_addr,        // destination address
}

Nice and mechanical. I like it. 😉

This case answers the question "does this need to work with repr(packed)?" in the affirmative, incidentally.

@zackw
Copy link
Contributor

zackw commented Aug 10, 2017

Oh, also, this quick-and-dirty check...

$ grep -hEri '#define [a-z0-9_]+[         ]+[a-z_][a-z0-9_]*\.[a-z0-9_]+' /usr/include |
    awk '{print$2}' | cut -d. -f1 | sort -u | wc -l

... indicates that there are something like 120 different structures in the OSX system headers for which this feature would be useful. (Some of them may be kernel-internal.)

@zackw
Copy link
Contributor

zackw commented Aug 10, 2017

@joshtriplett Good diagram, do put it into the RFC. 😀

@zackw
Copy link
Contributor

zackw commented Aug 10, 2017

Hang on, does Rust have bitfields? Because that IPv6 header really ought to have been defined in C as

struct ip6_hdr {
    uint32_t ip6_ver    : 4;
    uint32_t ip6_tclass : 8;
    uint32_t ip6_flow   : 20;
    uint16_t ip6_plen;
    uint8_t  ip6_nxt;
    uint8_t  ip6_hlim;
    in6_addr ip6_src;
    in6_addr ip6_dst;
} __attribute__((__packed__));

and this isn't just academic: I am certain that there is C out there that expects to be able to write to both ip6_flow and ip6_vfc simultaneously as long as it keeps to the low 20 bits of ip6_flow. Which is sort of kosher in C(99+errata), but wouldn't be in Rust under this proposal.

I still want the feature proposed here, it's just I picked a bad example. 🤦‍♂️

@joshtriplett
Copy link
Member Author

@zackw Rust doesn't have bitfields yet; that's also being worked on.

Perhaps you could post, as an example, two different versions of siginfo_t instead? That'd be a compelling example.

@zackw
Copy link
Contributor

zackw commented Aug 10, 2017

@joshtriplett OSX siginfo_t is boring:

union sigval {
        /* Members as suggested by Annex C of POSIX 1003.1b. */
        int     sival_int;
        void    *sival_ptr;
};

typedef struct __siginfo {
        int     si_signo;               /* signal number */
        int     si_errno;               /* errno association */
        int     si_code;                /* signal code */
        pid_t   si_pid;                 /* sending process */
        uid_t   si_uid;                 /* sender's ruid */
        int     si_status;              /* exit value */
        void    *si_addr;               /* faulting instruction */
        union sigval si_value;          /* signal value */
        long    si_band;                /* band event for SIGPOLL */
        unsigned long   __pad[7];       /* Reserved for Future Use */
} siginfo_t;

The type union sigval is part of the standard-specified API - not an appropriate use for this feature. (Maybe appropriate for hand-smoothing-over in nix, but not for machine generation of bindings.) I'll see about getting Linux and Solaris editions later today or tomorrow morning.

@rfcbot rfcbot added the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Feb 23, 2018
@clarfonthey
Copy link
Contributor

Unclear by the RFC, but is this allowed for enum variants? e.g. would:

pub enum Thing {
    Variant {
        a: u32,
        _: union {
            b: u32,
            c: f32,
        }
    }
}

be allowed?

I think it should, but the RFC doesn't clarify.

@retep998
Copy link
Member

@clarcharr I see no reason why that couldn't be allowed. However whether it should be allowed is another question, which can be resolved later. This feature is primarily useful for FFI, where Rust enums are very rarely used, so that specific case might not be worth supporting.

@clarfonthey
Copy link
Contributor

@retep998 Considering how #[repr(C)] and co. now work on enums with payloads, this might be helpful, although it probably depends.

@clarfonthey
Copy link
Contributor

Also consider the case of offloading variants' fields to another type: enum E { Variant { _: S } }.

That is very useful outside of FFI.

@joshtriplett
Copy link
Member Author

I'd like to avoid introducing them to enums right away, precisely because they're less likely to be useful for FFI, and there's no parallel to C there. I'm open to doing so, but my inclination would be to prioritize that lower than the initial minimal step.

@scottmcm scottmcm mentioned this pull request Mar 4, 2018
@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Mar 30, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Mar 30, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@bstrie
Copy link
Contributor

bstrie commented Apr 4, 2018

I like that this proposal is confined to repr(C), because while it's conceptually and syntactically pretty unfortunate it's also one of those eat-your-vegetables things that we need to serve FFI purposes. I do want to make sure that we don't see this as being a stepping stone to more general support elsewhere though (IOW that the comparison with variadic argument lists remains apt), because I'd really hate to deal with this sort of complication in Rust code in general.

@rfcbot
Copy link
Collaborator

rfcbot commented Apr 9, 2018

The final comment period is now complete.

@Centril Centril merged commit 7ebf9f6 into rust-lang:master Apr 9, 2018
@Centril
Copy link
Contributor

Centril commented Apr 9, 2018

Huzzah! 🎉 This RFC has been merged!

Tracking issue: rust-lang/rust#49804

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-data-types RFCs about data-types A-ffi FFI related proposals. A-syntax Syntax related proposals & ideas A-unions Proposals relating to union items final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.