Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow arbitrary enums to have explicit discriminants #2363

Merged
merged 3 commits into from
May 5, 2019

Conversation

nox
Copy link
Contributor

@nox nox commented Mar 16, 2018

Summary

This RFC gives users a way to control the discriminants of variants of all enumerations, not just the ones that are shaped like C-like enums (i.e. where all the variants have no fields).

Thanks

Thanks to Mazdak Farrokhzad (@Centril) and Simon Sapin (@SimonSapin) for the reviews, and my local bakery for their delicious baguettes. 🥖🥖


Rendered
Tracking issue

@Centril Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label Mar 16, 2018
This introduces one more knob to the representation of enumerations.

# Rationale and alternatives
[alternatives]: #alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you have encountered the XY Problem. You are finding a solution to the problem that different enums don't give guarantees about their variant indices, when what you originally wanted was converting enums whose variants have fields into enums whose variants have no fields.

So here's my alternative:

Allow annotating enums with #[discriminant(OtherEnum)], where OtherEnum must have a variant for each variant in the annotated enum (with matching name). The annotated enum will then take all its discriminant values from OtherEnum.

as casts could then convert from annotated enums to their discriminant enum.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems more convoluted, IMO. Definitely harder to implement, at least.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but it has a path forward for having this operation in safe code.

Alternatively a procedural macro could generate said conversions with this RFC. So the compiler magic might not be necessary.

Copy link
Contributor Author

@nox nox Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be great and nice, if only real world allowed me to do that.

I simplified my use case for the sake of the RFC: PropertyDeclaration actually has more variants than LonghandId, thanks to custom CSS properties (and some other stuff that I don't need to mention here):

#[repr(u16)]
enum PropertyDeclaration {
    Color(Color),
    Height(Length),
    InlineSize(Length),
    TransformOrigin(TransformOrigin),
    Custom(CustomDeclaration),
}

struct CustomDeclaration {
    name: Name,
    value: Value,
}

pub enum PropertyDeclarationId<'a> {
    Longhand(LonghandId),
    Custom(&'a Name),
}

impl PropertyDeclaration {
    pub fn id(&self) -> PropertyDeclarationId {
        if let PropertyDeclaration::Custom(ref declaration) =  *self {
            return PropertyDeclarationId::Custom(&declaration.name);
        }
        let id = unsafe { *(self as *const _ as *const LonghandId) };
        PropertyDeclarationId::Longhand(id)
    }
}

In general, I think this alternative is too specific to the exact use case described in the RFC, and cannot fulfil more intricate ones like the actual stuff I require in Servo, or use cases I could imagine for this feature with FFI, @gankro may have an opinion on that regard here.

Edit: I'll edit this RFC to include ite nonetheless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention in the RFC that rust is generating a 4KB jump table for the required match expression - this seems excessive, even if the discriminants don't match exactly. How many variants does this enum have? Maybe part of the solution should be improving the code that rust generates in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Diggsey There are >320 variants, so far. I welcome any rustc improvement but they won't be able to remove all jump tables if the discriminants of the common variants to AnimationValue and PropertyDeclaration don't coincide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Diggsey Also if you are curious, here is the PR where PropertyDeclaration::id was slimmed down from 4KB to mere 96 bytes, and there is the PR where I simplified AnimationValue::id.

@burdges
Copy link

burdges commented Mar 17, 2018

It'd be cool if this worked with ATCs to permit defining both discriminant and enum together:

trail Fields { type Id<T>; }
struct YesFields {}
impl Fields for YesFields { type It<T> = T; }
struct NoFields {}
impl Fields for NoFields { type It<T> = (); }

enum MyEnumInner<F: Fields> {
    Variant1(F::Id::<u64>) = MyDiscriminant::Variant1(()),
    Variant2(F::Id::<String>) = MyDiscriminant::Variant2(()),
    ...
}

pub type MyEnum = MyEnumInner<YesFields>;
pub type MyDiscriminant = MyEnumInner<NoFields>;

We've a circular definition above, but rustc could presumably break this cycle.

@nox
Copy link
Contributor Author

nox commented Mar 17, 2018

@burdges That seems unrelated to the RFC described in this pull request.

@mark-i-m
Copy link
Member

I can totally sympathize with this RFC. I have often wanted to go from an enum with fields to a value of another type.

My biggest two gripes against this RFC are

  1. The casting here is annoying: LonghandId::Color as u16,
  2. This really shouldn't require any unsafe code at all: unsafe { *(self as *const Self as *const LonghandId) }

In that light I propose the following variant/alternative:

// same as before
#[derive(Clone, Copy)]
#[repr(u16)]
enum LonghandId {
    Color,
    Height,
    InlineSize,
    TransformOrigin,
}

// explicitly mention that the given type is the descriminant (it can also be a struct!), 
// but the compiler needs to be able to prove at compile time that two values of the
// descriminant type are actually different. Note: different means different bit patterns, 
// not different in the `Eq` sense.
#[repr(LonghandId)]
enum AnimationValue {
    Color(Color) = LonghandId::Color, // no cast
    Height(Length) = LonghandId::Height, // also each descriminant assignment must be unique!
    InlineSize(Void) = LonghandId::InlineSize,
    TransformOrigin(TransformOrigin) = LonghandId::TransformOrigin,
}

impl AnimationValue {
    fn id(&self) -> LonghandId {
        (*self) as LonghandId // compiler can just return the descriminant! Completely safe!
    }
}

@burdges
Copy link

burdges commented Mar 20, 2018

We could support discriminants with their own fields too, making this these enums into extensions. I suppose #[extends(LonghandId)] would be an alternative to #[repr(LonghandId)], but certainly #[repr(..)] makes sense.

@nox
Copy link
Contributor Author

nox commented Mar 20, 2018

Please, I have already addressed that this doesn't fit my use case in #2363 (comment) and the RFC already includes such an encoding as an alternative, except with #[discriminant(Foo)] instead of
#[repr(Foo)].

Removing the need for unsafe code for the cast is a nice thought, but again it is unrelated to this RFC.

@mark-i-m
Copy link
Member

@nox Sorry if this is a stupid question, but I still don't understand how the two proposals are not functionally equivalent. It seems like everything you can do in one proposal, you can do in the other. What am I missing?

@RalfJung
Copy link
Member

RalfJung commented Mar 21, 2018

I assume this is restricted to enums with a non-Rust repr? Otherwise, doesn't the part that says "if no set explicitly, start with 0 and count up" clash with the niche-filling enum optimizations?

@SimonSapin
Copy link
Contributor

@RalfJung I had the same reaction. nox and eddyb explained that this RFC controls the discriminant of each variant, but this is a separate concept from the tag that may or may not be part of the memory representation. #[repr] forces a tag to be present, containing the discriminant.

But yes, since mem::Discriminant is opaque this new syntax is only useful when also used with #[repr].

@nox
Copy link
Contributor Author

nox commented Mar 21, 2018

@mark-i-m #2363 (comment) One of the enums with fields has more variants than LonghandId. Your (and @oli-obk's) alternative is a strict subset of what is allowed by this RFC.

@RalfJung
Copy link
Member

@SimonSapin I am not sure I understand, but if the feature is not useful for repr(Rust), maybe it'd be better to not allow it for now?

@nox
Copy link
Contributor Author

nox commented Mar 21, 2018

@RalfJung I listed it in the unresolved questions. I could imagine rustc managing to use niche-filling optimisations for 2 different variants with an inner enum, if it could see that they use disjoint sets of discriminants and that there is an easy way to compute the discriminant from the niche value. This is quite theoretical though so I don't mind mandating a #[repr].

@Evrey
Copy link

Evrey commented Mar 21, 2018

I realy like the Variant(...) = ID, syntax. It keeps enums compact and prevents error-prone double typing.

With #[discriminant(X)] you'd have a bunch of problems and questions:

  • Is X truely just a numeric C-style enum?
  • Is X an enum at all?
  • Does X have the same #[repr({integer})]?
  • Does X have the very same number of variants?
  • Do Xs variants have the exact same names?
  • Do Xs variants have the exact same order?
  • Aaand having an additional X more than doubles your lines of code by adding redundant bloat.

Also, #[discriminant = {expr}] doubles your lines of code, or triples with readability line breaks, while having a more "magic"-ish syntax than just using the well-known Variant = {expr},.


There are two problems left for my taste:

  1. How does one name those explicit discriminants?
  2. How does one get those discriminants?

Now, about the first point... I especially like the Variant(...) = {expr}, proposal, because it looks just like C-style enums with Variant = {expr},. We do not want to match on magic numbers, especially not for huge enums, which is why Servo has those MyEnum and MyEnumId redundancies. It is much more understandable to match on MyEnumId::X than writing 0x1023. But wait, MyEnumId::X is how you get the discriminant of a C-style enum! So, why not allow this?

#[repr(u8)]
enum X {
    A(u32) = 0,
    B(f32) = 1,
}

assert!(X::A(42).id() == X::A);
assert!(0_u8 == X::A);

This could, however, mean trouble if type A = X::A is a thing. (It isn't.) Whatever. This way you do not need a second Id enum at all, while also not having to learn new syntax.

So... how do we get that discriminant? Writing a pointer-converting boiler plate function called fn id() all the time is no fun, especially for new magic syntax that is supposed to reduce the amount of boiler plate code. We already already have core::mem::discriminant, but this one is of not much help here. You can only use this for equality checks, it takes tremendous 64 bits, always, no matter the #[repr({integer})], writing mem::discriminant(X::A) == mem::discriminant(x) is painfully verbose, and you cannot really get the integer-representation out of it for e.g. FFI or RPC.

I'm not really up to date on the #[repr({integer})]-on-enums thing, but what about making this attribute implementing a const fn id_{integer}(&self) ->{integer} on the enum? Or implement an EnumRepr<{integer}> trait on it or whatever. Okay, the name may be different. fn id() alone might clash too much with common function names, fn discriminant() is so verbose that you might as well just impl Into<{integer}> on core::mem::Discriminant.

An alternative would be to treat enums logically as a (Discriminant, UnionOfVariants), or C-style enums as just (Discriminant). Then, instead of having to decide on a magic function name or trait names, we write this: X::A(42).0 == X::A. This has quite some downsides, however, like what does X::A(42).1 yield, do we want that, is it safe, what about non-#[repr({integer})]-enums, especially for those with highly optimised layouts like Option<bool>, etc.


Edit: Just noticed this:

  • under the default representation, the specified discriminants are interpreted as isize;

Under the default representation, there might be heavy layout optimisations at work merging discriminants of nested enums into a completely different numeric set. In addition to that core::mem::Discriminant chose u64 in its opaque implementation. I.e., applying this syntax to default representation enums is either nonsensical or must block layout optimisations. The exact numeric discriminant type is debatable.

@nox
Copy link
Contributor Author

nox commented Mar 21, 2018

@Evrey I'll add a commit listing all your arguments against #[discriminant], thanks!


How does one name those explicit discriminants?

We can't interpret X::A as some sort of magical ID enum, because X::A is already a function of type u32 -> X in your sample. Note that in Servo we also store the LonghandId enum in various places, without any PropertyDeclaration or AnimationValue anywhere. Please also note that as I mentioned multiple times already, PropertyDeclaration has more variants than LonghandId, so we can't directly tie the two in any way. So even if there was some magical way to say Enum::Discriminant to refer to that enum's discriminant type, we would still have a LonghandId enum in Servo.

How does one get those discriminants?

I agree that a way to safely get the u8 discriminant of an #[repr(u8)] enum value would be useful, but that's orthogonal to that RFC: after all #[repr(u8)] and #[repr(C, u8)] on enums landed without such an API.

@mark-i-m
Copy link
Member

I see... but I still have the same two annoyances against the current form:

  • The casting here is annoying: LonghandId::Color as u16
  • This really shouldn't require any unsafe code at all: unsafe { *(self as *const Self as *const LonghandId) }

Especially the second one seems like an essential part of any reasonable solution IMHO...

@SimonSapin
Copy link
Contributor

@mark-i-m I think that these two points are valid independently of this RFC. (Re unsafe code, see RFC 2195 in particular.) Improvements there can be proposed separately without necessarily blocking this RFC.

@nox
Copy link
Contributor Author

nox commented Mar 22, 2018

The casting here is annoying: LonghandId::Color as u16

This really shouldn't require any unsafe code at all: unsafe { *(self as *const Self as *const LonghandId) }

Given what I said about PropertyDeclaration having more variants than LonghandId, the casting cannot be avoided in my particular use case. Do you see why?

@main--
Copy link

main-- commented Mar 22, 2018

This is unfortunately quite painful to use, given now all methods matching against AnimationValue need to have dummy arms for all of these variants:

Note that this is #1872. Feels like a quite elegant workaround to me - the unsafe cast asserts that the ! variants are dead and then you can just match without thinking about them.

@nox
Copy link
Contributor Author

nox commented Mar 22, 2018

@main-- I know about this RFC, but it has been postponed and I don't think I should have to rely on that instead of just not having ! variants.

@main--
Copy link

main-- commented Mar 22, 2018

@nox I mean, you're essentially trying to come up with a way to define Rust enums with gaps in their discriminator space - this RFC proposes to explicitly specify the remaining ones whereas I suggested to list the missing ones. It's fundamentally the same thing, depending on the usecase either one may be more convenient. Yet I favor the ! solution as it doesn't add new syntax and complexity to enums but instead simply relies on a fix to match behavior that should have happened a long time ago (in my opinion). It just feels like a much more generic approach to me that can help with this problem and several others too.

@nox
Copy link
Contributor Author

nox commented Mar 22, 2018

Having to specifically define variants that are uninhabited for no other reason than to make 2 discriminants in 2 different enums coincide is a damn workaround that I will not qualify as "fundamentally the same thing" as being able to omit said variants. Why would I pollute my mind and my type definition with variants that I won't use?

@main--
Copy link

main-- commented Mar 22, 2018

I guess if most of your variants are dead it makes considerably less sense than, say, one of thirty. The thing is - comparing a language feature tailored specifically to this problem to a general mechanism that the language already offers (minus a papercut) just isn't fair, hence my reluctance to call it a workaround.

My point is that the alternative is not simply omitting the dead variants. Instead, you have to manually specify discriminants. When I say it's the same thing what I mean is that we're talking about two different ways to specify enum discriminants (same outcome). Your proposal introduces a new language feature which is clearly an improvement over the (mostly) existing workaround, I merely doubt that it's enough of an improvement to justify the complexity.

@nox
Copy link
Contributor Author

nox commented Mar 22, 2018

This is a very small addition to the language, semantics-wise and implementation-wise. As far as I remember discussions with @eddyb, this will be easily implementable.

@SimonSapin
Copy link
Contributor

I don’t believe added complexity is significant here, the same syntax already exists on field-less variants.

@mark-i-m
Copy link
Member

I went back and re-read the RFC and I think my objections are more specific about the example. The actual proposal is just about allowing Variant(..) = Disc, which I think is quite reasonable. I don't think y it's the best solution to the give example use case, but I'm not opposed to the rfc on its own merit...

@SimonSapin
Copy link
Contributor

(Aside: there’s a bunch of things in Stylo that are objectionable ;) (I say this as the author of many of these things.) A lot of it can be improved but the devil is in the corner cases, and better solutions are often not as easy as they first seem.)

@clarfonthey
Copy link

I may be not reading correctly, but does this work for repr(Rust) enums too? And would:

enum A {
   B = 0,
}

have size zero or one? Also, would:

enum B {
    B = 1,
    C(!) = 256,
} 

have size 0 or 2?

(These are super esoteric examples but it seems like a valid thing to ask considering how imho these don't make sense, but would be accepted.)

@eddyb
Copy link
Member

eddyb commented Apr 12, 2019

@pnkfelix I'd say I prefer 0 => Bread(BreadKind), but I agree with your conclusion.

@joshtriplett
Copy link
Member

joshtriplett commented Apr 12, 2019 via email

@nox
Copy link
Contributor Author

nox commented Apr 12, 2019

I don't like this syntax at all.

It's completely at odds with C-like enums, which means that if you currently have a C-like enum and then add a field to one of its variants, you must rewrite the entire type definition.

It also puts more emphasis on the discriminant rather than the damn variant, even though the discriminants most probably aren't the most important part of the enum. It's certainly not the most important part of my use case.

That syntax also makes the example from the RFC horrible:

#[repr(u16)]
enum AnimationValue {
    LonghandId::Color as u16 => Color(Color),
    LonghandId::Height as u16 => Height(Length),
    LonghandId::TransformOrigin as u16 => TransformOrigin(TransformOrigin),
}

It may also be a problem to macro authors, who may want to be passed an enum definition and emit explicit discriminants for them, and now they must first check whether any of the variants have fields.

Edit: But I do like a lot that @pnkfelix named his example ParisianSandwichIngredient.

@joshtriplett
Copy link
Member

joshtriplett commented Apr 12, 2019 via email

@nox
Copy link
Contributor Author

nox commented Apr 12, 2019

I'd taken it as implicit that the same syntax would be permitted for C-like enums.

Oh I see. That part is fine then. I still don't think it makes for a readable type definition in presence of complicated discriminant expressions.

@joshtriplett
Copy link
Member

joshtriplett commented Apr 12, 2019 via email

@burdges
Copy link

burdges commented Apr 12, 2019

It's easier to make the other form work with { } although maybe this just adds confusion.

enum ParisianSandwichIngredient {
    Bread = 0  { bread_kind: BreadKind },
    Ham = 1 { ham_kind: HamKind },
    Butter(ButterKind) = 2,
}

@withoutboats
Copy link
Contributor

I'd prefer not to invent a new syntax for this feature and just keep with the obvious extension to what we have, even if its not perfect.

@eddyb
Copy link
Member

eddyb commented Apr 13, 2019

Yeah, I was only replying to @pnkfelix, but I'm not proposing/in favor of a syntax different than the one in this RFC.

@SimonSapin SimonSapin mentioned this pull request Apr 15, 2019
@rfcbot
Copy link
Collaborator

rfcbot commented Apr 24, 2019

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Apr 24, 2019
@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this RFC. label May 4, 2019
@rfcbot
Copy link
Collaborator

rfcbot commented May 4, 2019

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@rfcbot rfcbot removed the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label May 4, 2019
@Centril Centril merged commit cf7581f into rust-lang:master May 5, 2019
@Centril
Copy link
Contributor

Centril commented May 5, 2019

Huzzah! This RFC has been merged!

Tracking issue: rust-lang/rust#60553

@nox nox deleted the arbitrary-enum-discriminant branch May 5, 2019 13:31
@ghost
Copy link

ghost commented May 10, 2019

Just a few thoughs. The RFC focuses to solve some issue with the Servo design. I like the basic idea, but not the part where discriminate is a different type. If they have enums for anim, color, etc features why arent they different enums? And have a separate enum that combines all of them. I think its a design flaw in servo that shall not be fixed by an RFC.
Once it is "fixed" there is no need to have a separate type for discriminant, and hence turning enum into id can be something built in.

@nox
Copy link
Contributor Author

nox commented May 10, 2019

And have a separate enum that combines all of them. I think its a design flaw in servo that shall not be fixed by an RFC.

It is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-discriminant Discriminant related proposals & ideas A-enum Enum related proposals & ideas A-repr #[repr(...)] related proposals & ideas A-syntax Syntax related proposals & ideas disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.