Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a 'discriminant_value' intrinsic #639

Merged
merged 3 commits into from
Apr 10, 2015

Conversation

Aatch
Copy link
Contributor

@Aatch Aatch commented Jan 21, 2015

Implement an intrinsic for extracting the discriminant from a value.


Rendered


# Alternatives

* More strongly specify the values returned. This would allow for a broader range of uses, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also return an opaque type that implements PartialOrd, Ord, PartialEq and Eq, but doesn't expose the raw discriminant value. This gives us flexibility with the implementation, and avoids some pitfalls with returning an explicit u64, for example, what happens if the enum is #[repr(i64)]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the return value is unrelated to ordering, it shouldn't implement (Partial)Ord. But it should implement Hash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in fact related to ordering of variants of the enum: rust-lang/rust#15523.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the RFC explicitly says "Does not allow for the value to be used as part of ordering.". Perhaps require impl<T: PartialOrd> PartialOrd for DiscriminantValue<T>. (That ordering is wrong if the enum uses a custom ordering though.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. @Aatch why couldn't the discriminant be usable in part of an Ord impl?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the RFC is currently written, the only guarantee you have is that two values of the same type produce the same value, iff the variants are the same. While you could implement a ordering with this, I deliberately restricted such that the ordering would be effectively unspecified. The restriction could be lifted, but it would require a more rigorous specification of discriminant values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we were to take the route of specifying the ordering respecting the ordering of definition or possibly the ordering of the values of C-like enums, could you elaborate on some of the cons of that strategy? e.g. do you know of some optimizations that may block us from in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 impl<T: PartialOrd> PartialOrd for DiscriminantValue<T>. Added benefit of not being able to mix discriminant values of different types.


# Unresolved questions

* Should `#[derive]` use this intrinsic to improve derived implementations of traits? While
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems pretty crucial that this is at least used by deriving, and preferably exposed with a #[stable] wrapper function. If it's not used inside of the stdlib and not stable, it seems kind of useless.

@pyfisch
Copy link
Contributor

pyfisch commented Jan 25, 2015

I try to write a language tag library in Rust, that uses big (8000 variants) enums to store all languages.
https://github.com/pyfisch/rust-language-tags The problem is that enums do not provide efficient ordering and comparison. It would be great if this intrinsic could be used to provide ordering and the efficient ordering would be added to std. Currently it takes really long to compile.

@Ericson2314
Copy link
Contributor

I think adding this a good idea whether or not the compile gets smarter. That said, has anyone tried telling LLVM that other possible discriminant values are all unreachable?

@taralx
Copy link

taralx commented Jan 28, 2015

Can someone explain why an intrinsic is preferable to a compiler-generated trait implementation?

@pnkfelix
Copy link
Member

@taralx assuming you mean a compiler-generated match that has an arm for each discriminant value, the main objection there (that I am aware of) is a mix of run-time and/or compile-time efficiency: You're either going to have that method call compiled out-of-line (slower than an intrinsic), or you will inline the method call, which will then cause code size blow up (with all of its associated effects), or will cause compile-time to become even slower as the LLVM optimizer struggles to recover the implementation that we could have just directly emitted (namely: extracting the discriminant value that is already directly embedded in the enum value).

@taralx
Copy link

taralx commented Jan 28, 2015

No, not a generated match. I mean an actual trait:

#[lang="enumtrait"]
pub trait Enum {
  pub fn discriminant(&self) -> usize
}

This would allow type-level assertions of the form "T: Enum" as well.

@rkjnsn
Copy link
Contributor

rkjnsn commented Jan 28, 2015

@taralx I think using an opaque return type, e.g., pub fn discriminant(&self) -> DiscriminantValue<Self>, would make more sense here, as it would allow uses of the discriminant to be statically limited to comparing for equality with another discriminant of the same type (and possibly some form of ordering, if that turns out to be desirable).

I do think it would beneficial to have such a trait (or possibly a free function) that can potentially be stabilized and called from safe code, as determining whether two enums of the same type are also of the same variant doesn't seem like it should require unsafe. A safe trait or function could certainly be implemented in the library using the intrinsic outlined in this RFC, but I think the intrinsic should be more of an implementation detail at that point, rather than something that would be used directly by a library like postgres.

@kennytm
Copy link
Member

kennytm commented Jan 28, 2015

@taralx That trait method can be built on top of the intrinsic, just like how the TypeId struct is based on std::intrinsics::type_id.

@taralx
Copy link

taralx commented Jan 28, 2015

Would it help to start a separate RFC for it?

@Aatch
Copy link
Contributor Author

Aatch commented Jan 29, 2015

I'll probably update this RFC soon to address some of the points that have come up. One important question I'd like more feedback on is how well-specified the value returned from the intrinsic is. I went for the most conservative/least specified version, but I have no problem specifying it more if people think it is reasonable.

@sfackler
Copy link
Member

I think we can at least specify that the discriminant ordering can be relied on for use in Ord impls. If you're going to keep the return value as a u64, I'd like to make sure that we have a good way of handling #[repr(i64)] enums. Might be best to return an opaque type instead.

@Aatch
Copy link
Contributor Author

Aatch commented Jan 29, 2015

@sfackler I'm not sure how an opaque type would work though. How do you do anything with a type you know nothing about? It's not difficult to just cast from a u64 to an i64 if you need signed comparison.

This intrinsic is intended primarily for proper ADT enums, where you can't control the discriminant values anyway. C-Like enums can be casted directly to get their value.

@sfackler
Copy link
Member

@Aatch it would implement Ord and Eq, but that's it. By "opaque", I just mean "hides the exact representation of the discriminant".

@taralx
Copy link

taralx commented Jan 30, 2015

I was hoping to use this to implement EnumSet using BitvSet. Hence the trait suggestion.

With a trait you could make the type of the discriminant associated...

@pnkfelix
Copy link
Member

pnkfelix commented Feb 5, 2015

@Aatch are there other revisions you're planning on making in the near future? (I haven't read the discussion carefully yet.)

@aturon
Copy link
Member

aturon commented Mar 5, 2015

ping @pnkfelix

@pnkfelix
Copy link
Member

pnkfelix commented Mar 5, 2015

I have now read the discussion.

In principle I would prefer for the RFC to actually document some of the concrete trait-based alternatives that have been laid out over the course of the discussion (though I personally agree with @Aatch that such alternatives could be written atop the intrinisic as specified).

@pnkfelix
Copy link
Member

pnkfelix commented Mar 5, 2015

However, having said that, I think the RFC is actually perfectly fine as written, and if presented the choice between merging it as-is versus not merging it at all, I would choose the former.

So I think we should merge it. A motivated individual can always submit a PR documenting the alternatives, if that is of interest.

@aturon
Copy link
Member

aturon commented Mar 5, 2015

@pnkfelix

However, having said that, I think the RFC is actually perfectly fine as written, and if presented the choice between merging it as-is versus not merging it at all, I would choose the former.

So I think we should merge it. A motivated individual can always submit a PR documenting the alternatives, if that is of interest.

Thanks! Given how long it's taken to move on this RFC, I don't think we should ask the author to do further revisions of that kind before we merge.

@nikomatsakis
Copy link
Contributor

Sorry, I'm late to the party. I am basically in favor of this RFC, but I do want to note that the intrinsic as defined introduces some forwards-compatibility hazards. In particular, reordering variants can potentially break code even for enums that don't implement PartialOrd. I assume that guaranteeing relative ordering is needed precisely so that we can use when deriving PartialOrd -- are there other reasons?

That said, I feel good about this fn since it is going to be unsafe and in the documentation we can spell out valid and invalid use cases -- basically, using this on enums that you don't control is a Bad Idea and may introduce hazards if the maintainers of those enums make changes.

(On a related note, it is not clear to me whether one should consider PartialOrd to be preserved across minor version updates.)

Regarding staging, I think we should follow the policy we've been mostly doing for all RFCs. That is, land the new feature unstable, and make an express decision by core team to move it to stable. I of course agree with @sfackler that it'd be nice to make this available to deriving sooner rather than later though.

@brson
Copy link
Contributor

brson commented Mar 11, 2015

This seems like a useful intrinsic but I don't find the motivating example in the RFC to be very convincing - replacing the obvious safe code with an unsafe intrinsic. That is such a simple piece of code surely there must be a way to write it safely that is performant.

@sfackler
Copy link
Member

How is the intrinsic unsafe (other than the fact that all intrinsics are tagged as unsafe)? All it does is return a u64.

In any case, "surely there must be a better way" is not a very convincing counter-example, IMO. Comparing disciminants seems to me at least like the obvious way to make this kind of function performant.

@lilyball
Copy link
Contributor

I'm in favor of this RFC, and I'm in favor of updating #[deriving] to use it. Since #[derive] is compiler-generated, there's no worry about the intrinsic changing out from under it, and if we decide to expose a public safe wrapper then #[derive] can easily transition to that.

That is such a simple piece of code surely there must be a way to write it safely that is performant.

You mean like using an intrinsic? The intrinsic seems to actually be completely safe to call on all inputs, it just requires unsafe because all intrinsics do. It would be trivial to expose a safe wrapper that merely calls the intrinsic (but I don't think that's necessary to move forward with this RFC, or required for #[deriving] to use it).

The only other approach I can think of is teaching Rust to try to optimize match statements such that it can detect when a bunch of match arms are equivalent to simply comparing the discriminant, but that seems like a lot of work that's only necessary because we aren't allowing ourselves to look at the discriminant, and it's work that would negatively impact compilation time for probably no benefit at all on code outside of #[deriving].


As for u64 vs opaque type, this is an intrinsic we're talking about here. I think it's fine to let it return u64 (and in the case of #[repr(i64)] it would just be the u64 with the same bit pattern as the i64 value). If we expose a safe wrapper function, then it's worth considering the opaque type idea. But trying to have an unsafe intrinsic return a safe opaque type seems pointless.

@sfackler
Copy link
Member

If we have it return a u64 with the same bit-pattern as the i64 discriminant in the #[repr(i64)] case seems like it'll complicate things a lot. How would #[derive(PartialOrd)] work in that case? It would need to have a special case for i64 discriminants but I'm not sure how it would even know if it's in that case or not.

@sfackler
Copy link
Member

Another option would be to have the i64 case shift the discriminant up by 2^63 which would preserve ordering but not the raw value.

EDIT: same problem for all signed discriminant types, really.

@mahkoh
Copy link
Contributor

mahkoh commented Mar 18, 2015

Why not just use (sign: i8, abs: u64)?

@sfackler
Copy link
Member

Oh, yeah, that'd be a decent way to deal with it.

@mahkoh
Copy link
Contributor

mahkoh commented Mar 18, 2015

Not quite. If sign is negative, then abs has to be u64::MAX - abs to preserve the ordering.

@lilyball
Copy link
Contributor

Given that this RFC includes PartialOrd I think @sfackler is right, that using u64 is problematic. It's too bad we can't just use an LLVM i65.

In the meantime, how about (value: u64, sign: bool). For unsigned types the value is obviously the value (and the sign is false). For signed types, the value is the signed value with the sign bit set to 0 (and the sign is the sign bit). Since we use 2's-complement integers, this preserves the ordering.

@lilyball
Copy link
Contributor

On the "alternative ideas" front, if we're ok with having the compiler define traits for us, we could have the trait Enum idea, but give it an associated type Repr that is the numeric type of the discriminant. That way we'd have something like

#[lang="enumtrait"]
pub trait Enum {
    type Repr;
    fn discriminant(&self) -> Self::Repr;
}

This way we don't have to worry about signed vs unsigned, and code can compare discriminant values correctly.

We could also then consider doing things like making this unsafe instead and adding a second trait for C-like enums that declares the safe version:

#[lang="enumtrait"]
pub trait Enum {
    type Repr;
    unsafe fn unsafe_discriminant(&self) -> Self::Repr;
}

#[lang="clikeenumtrait"]
pub trait CLikeEnum: Enum {
    fn discriminant(&self) -> Self::Repr;
    fn from_discriminant(value: Self::Repr) -> Option<Self>;
}

That way #[derive] can use unsafe_discriminant() in order to efficiently implement comparison on enums like the example one that has a single variant with an associated item and a lot of variants without. And user code can use the methods of CLikeEnum to do conversions between enums and their repr.


All that said, I think having an intrinsic is a good first step. And if we're planning on writing a safe wrapper around it, then it doesn't really matter very much what the actual return type is.

@glaebhoerl
Copy link
Contributor

I'd like to have at least either the unsafe requirement or a trait bound requirement (e.g. @kballard's Enum). I don't like the potential this has for abstraction violations otherwise, when used on abstract types or (not explicitly Enum-bounded) type parameters. (I guess abstract types can't currently be enums, and I know any parametricity guarantees we might have had are already shot due to Any, but it still doesn't seem like a good idea to encourage this kind of thing.)

@nikomatsakis
Copy link
Contributor

@glaebhoerl a good point, for some reason it took me a bit to recognize what you were saying, but it seems obvious now. To spell it out:

fn foo<T>(x: &T, y: &T) { ... }

is able to do a kind of "partial equality" comparison on x and y in terms of determining if they are the same variant or not. Perhaps the Reflect bound would be suitable here?

@glaebhoerl
Copy link
Contributor

@nikomatsakis Yes, that might be a good fit. :)

@pnkfelix
Copy link
Member

pnkfelix commented Apr 2, 2015

@nikomatsakis at this point I would like to see us move forward on this in some fashion, if only via a feature-gated intrinsic, because that would be very helpful in resolving rust-lang/rust#15523

@pnkfelix
Copy link
Member

reassigning to @nikomatsakis so that he remembers to update its state (or merge it outright).

@nikomatsakis
Copy link
Contributor

This RFC is provisionally accepted. I've included some unresolved concerns at the end of the RFC. These can be worked out over the course of implementation. Thanks all.

@Centril Centril added A-discriminant Discriminant related proposals & ideas A-intrinsic Proposals relating to intrinsics. labels Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-discriminant Discriminant related proposals & ideas A-intrinsic Proposals relating to intrinsics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.