Anonymous sum types #8277

glaebhoerl · 2013-08-03T23:58:44Z

Rust has an anonymous form of product types (structs), namely tuples, but not sum types (enums). One reason is that it's not obvious what syntax they could use, especially their variants. The first variant of an anonymous sum type with three variants needs to be syntactically distinct not just from the second and third variant of the same type, but also from the first variant of all other anonymous sum types with different numbers of variants.

Here's an idea I think is decent:

A type would look like this: (~str|int|int). In other words, very similar to a tuple, but with pipes instead of commas (signifying or instead of and).

A value would have the same shape (also like tuples), with a value of appropriate type in one of the "slots" and nothing in the rest:

let foo: (~str|int|int) = (!|!|666);
match foo {
    (s|!|!) => println(fmt!("string in first: %?", s)),
    (!|n|!) => println(fmt!("int in second: %?", n)),
    (!|!|m) => println(fmt!("int in third: %?", m))
}

(Nothing is a bikeshed, other possible colors for it include whitespace, ., and -. _ means something is there we're just not giving it a name, so it's not suitable for "nothing is there". ! has nothing-connotations from the negation operator and the return type of functions that don't.)

I'm not sure whether this conflicts syntax-wise with closures and/or negation.

Another necessary condition for this should be demand for it. This ticket is to keep a record of the idea, in case someone else has demand but not syntax. (If the Bikesheds section of the wiki is a better place, I'm happy to move it.)

The text was updated successfully, but these errors were encountered:

graydon · 2013-08-04T00:39:16Z

I think we can accommodate these use cases reasonably easily with some more named generic sum types in the stdlib like Either. Maybe Any3, Any4, Any5, etc.

glaebhoerl · 2013-08-04T01:27:31Z

Or possibly there's just not much of a use case, Haskell seems to have gotten along with only Either. :)

bblum · 2013-08-05T18:08:46Z

This (either anonymous or named-AnyN) would be good for the select interface:

fn select2<A, AP: SelectPort<A>, B, BP: SelectPort<B>>(AP, BP) -> Either<(A, BP), (AP, B)>

Would enable a natural extension without nesting Either types.

chris-morgan · 2014-02-20T02:50:37Z

With the removal of conditions, error handling this has become a more urgent need.

For example, at https://github.com/chris-morgan/diffbot-rust-client/blob/364ca8d978f669a278ac2d38804c3926d33f123d/src/diffbot/lib.rs#L233..L259 (code is out of date—now with the removal of conditions IoError should be IoError(IoError)) I have to create a whole new enum for the error cases, allowing an API error, a JSON error or an I/O error. I would much prefer to simply be able to specify the return type in those methods as Result<T, ApiError | json::Error | IoError>.

We have if_ok!, but it only works if you keep the same type of E. So for such cases as these it's not able to be used.

I would like to be able to use ApiError | json::Error | IoError as a type. I picture matching working in a manner similar to this:

match err {
    ApiError(_) => unimplemented!(),
    json::Error(_) => unimplemented!(),
    IoError(_) => unimplemented!(),
}

But assignation does not require any form of wrapping, e.g.:

fn x() -> Result<(), ApiError | json::Error | IoError {
    write!(...)
}

Put another way, sum types can be introduced silently; A can be silently cast to A | B.

(I had a much longer writeup of the problems and solutions I saw, but lost it some time back. For now, I guess this will do.)

glaebhoerl · 2014-02-20T17:19:10Z

@chris-morgan: This proposal feels more like boost::variant or OCaml's polymorphic variants than Rust's existing enum types. In particular these don't seem to be proper sum types: how would you handle a type like int | int (or heck () | ())? It's not an accident that in my proposal in the OP, the semantics follow enums and the syntax is modelled after tuples.

(Which is not to say that something like boost::variant or OCaml's polymorphic variants couldn't have a place; right now I'm just pointing out the difference.)

I would much prefer to simply be able to specify the return type in those methods as Result<T, ApiError | json::Error | IoError>.

Only because of verbosity, or for other reasons too?

(FWIW, under my proposal I think you might write (T|ApiError|json::Error|IoError). Or I guess (T|(ApiError|json::Error|IoError)) and Result<T, (ApiError|json::Error|IoError)> are also valid options if you want to keep the error cases separate from the success one. I don't know if we would want to keep Result if we had anonymous sum types; we don't have a Pair type either, and use tuples instead.)

chris-morgan · 2014-02-22T00:17:38Z

The scheme that I have proposed does have certain issues which would need to be resolved; a naïve implementation would make some error handling more difficult as it arbitrary expansion of types. For example:

let mut a = 7i;
a = 5u;

At present this is a nice simple thing: type error, a is int, so you can't put a uint in it. My scheme could be used to make that legal: a's type would be int | uint. This is probably not a good idea.

Probably the best way around that is that the type inferrer be incapable of introducing sum types. Thus, the int | uint example above would fail to compile, as it does at present, only changing to compiling with int | uint with an explicit type specification, let mut a: int | uint = 7i. Functions have their return type specified, so there is no need for inference there, either.

fn x() -> int | uint {
    let out = 5i;
    // Do something
    out
}

In a case like that out would be inferred as int | uint, from the return type—a sum type is being inferred, but not introduced.

By the way, I also imagine sum types as automatically implementing all traits that all of their types implement—this would make a type like int | uint genuinely usable for some things. It would still be prone to not doing everything people wanted, though; for example, an automatic implementation of Add<T1 | T2, T1 | T2> for T1 | T2 would require both T1 and T2 to have implementations of Add<T1, [exactly one of T1 and T2 to avoid an implementation conflict]> and Add<T2, [T1 xor T2]>. So you won't get Add working for int | uint.

But you would be able to have simpler ones like Neg<T1 | T2> for T1 | T2, which would require either {impl Neg<T1> for T1 and impl Neg<T2> for <T2>} or {impl Neg<T2> for T1 and impl Neg<T1> for T2>}.

Unfortunately, all of these automatic trait implementation notions involve combinatorial explosion with generics.

mneumann · 2014-02-22T01:33:22Z

@chris-morgan: We'd also need special match syntax to test whether it's a int or an uint, something like Int(i), while enum types or structs can be pattern matched via their constructor.

chris-morgan · 2014-02-22T02:53:30Z

@mneumann my initial suggestion was using the type path, which would yield it int(i) and uint(u). I think that would work and I do not believe it would have any problems.

huonw · 2014-02-22T03:58:19Z

@chris-morgan you haven't addressed how to handle int | int. (I personally like some variation on tuples, like in the main text.)

It would be good to collect some research and any prior art about languages with anonymous sum types (preferably, statically typed languages).

chris-morgan · 2014-02-22T05:47:28Z

@huonw You're right, I forgot that. OK, here we go:

T | T should be illegal at the local level (there being no way of distinguishing between them in a match pattern), but the implementation must be able to model it, because of generics.

I will define sum types as commutative (viz. T1 | T2 and T2 | T1 are equivalent) as I believe this to be a sound way of forming things.

Given a generic T, we might have T | int which at the global level would be int | int, or with a few degrees of recursion, the finite type int | int | int | int | int | …. I do not believe that this is a problem. The internal implementation might have the anonymous variants sorted by type (using a stable sort), but I expect it would be more efficient not done that way (in the compiler; at runtime I would expect no difference). Either way, it is not difficult for the compiler to track the state, though it may be a little more complex than the types of the current type system (I'm not sure). At no level is any collision possible.

To clarify how generic matching would operate:

fn x<T>(t: T | int) {
    match t {
         T(t) => unimplemented!(),
         int(t) => unimplemented!(),
    }
}

x(42u);  // Fine. T = uint
x(42i);  // Not fine: T = <type error>
x::<bool>(42i);  // Fine. T = bool

Hmm… suddenly I started wondering about T | (). That could be used to replace Option<T>, though I wouldn't recommend it… but my suggestions thus far would indicate the pattern for matching the () would be ()(()) (presently illegal)! () could be special cased as (), but that's probably not a good plan. Also compare tuple types with greater arity, such as with (T1, T2) | (T3, T4); the matches there would, in this simple syntax, be (T1, T2)(t1, t2). Perhaps not ideal.

chris-morgan · 2014-02-22T10:16:27Z

I feel that the two proposals in mind have different use cases.

The first approach, (T1|T2), is providing anonymous enums. They provide the full functionality of enums (except for struct variants), but carry many of their pains. Matching, for example, does not include the types that are being matched. Nor does it appear to me that these can be conveniently composed (i.e. turning (T1|T2) into (T1|T2|T3)), which is, to my mind, just about the main reason we want anonymous sum types. In summary, I feel that that while they might ameliorate some pain, they still end up quite verbose (specifically I mean the creation of them, which means that things like try! would still not be usable) and do not elegantly solve the problem for the situations that I perceive the need of anonymous sum types in.

The second approach, my T1 | T2, permits what I will for no especial reason dub "type inflation": storing multiple types in one slot. These are similar to enums but do not provide the full power of the enum, as is epitomised in the int | int example, which is illegal. While they lack that power, I do not believe that that is a genuine problem. The cases where I perceive the need of anonymous sum types do not ever require this; I believe that an explicit enum is more appropriate for such cases. But T1 | T2 makes up for it by a considerable improvement in ergonomics; things like try! would be correct in it, for example, and the types are explicit in matching.

Well, I have demonstrated cases where my proposal would be useful in existing code. Can anyone come up with a case where (~str|int|int) (or any other such thing with T|T) would be genuinely useful and would not be better served by an enum? I really can't think of any, but I'm certainly open to being informed of cases.

glaebhoerl · 2014-02-22T12:53:01Z

@chris-morgan Indeed, the two are very different things with some superficial similarities, as I noted earlier. We should also clarify terminology. I don't think what you're describing can be accurately called "anonymous sum types", because they're not sum types. I'm not remotely an expert in this area, but it sounds much closer to something like union types (see one, two).

(It might be worth discussing the two in separate issues? I don't mind having them both here, but it might help to reduce confusion.)

W.r.t. anonymous sum types (my proposal), there's no strongly motivating use case that I know of, it's just niceties and ergonomic things like:

Symmetry with anonymous product types, i.e. tuples.
When things return one of many options but they don't have a distinguished meaning, it's nicer than the alternatives of nesting Either (which we've removed) or introducing Any2, Any3, Any4, etc. See @bblum's example of select earlier.
It means that any type you could write as a combination of structs and enums, you can also write anonymously, which is a neat feature when talking about things abstractly. E.g. you get to say things like (A, (B | C)) =~ ((A, B) | (A, C)), which is the distributive law.
It's sometimes convenient not to have to introduce a new enum type for perhaps just a single function.

Arguably, any time you might want to use an anonymous sum, it would improve readability to introduce an explicit enum with names for the type and variants instead. But, I suspect you could make the exact same argument about tuples and structs as well.

You haven't explicitly answered my question with regards to what the motivation for these union-ish types is. I get that you want to make error handling nicer, but could you precisely enumerate what bothers you about the current situation, what properties any solution should have (i.e. when could it be considered "solved"), and so forth? (For instance, is your problem primarily with syntactic verbosity or semantic expressiveness? If it's the former, introducing a potentially-far-reaching type system feature to deal with it might not be the most cost-effective solution.)

rust-highfive · 2014-09-24T04:59:40Z

This issue has been moved to the RFCs repo: rust-lang/rfcs#294

fix `needless_question_mark` not considering async fn closes rust-lang#8277 changelog: [`needless_question_mark`] Fix FN on async functions

glaebhoerl mentioned this issue Sep 16, 2013

Remove std::either #9157

Closed

chris-morgan mentioned this issue Mar 17, 2014

Syntax for slices rust-lang/rfcs#13

Closed

chris-morgan mentioned this issue May 23, 2014

RFC: Bounds on trait objects should be separated with + rust-lang/rfcs#87

Merged

chris-morgan mentioned this issue Aug 22, 2014

RFC: error conventions and syntactic support (including changes to macro syntax) rust-lang/rfcs#204

Closed

rust-highfive mentioned this issue Sep 24, 2014

Anonymous sum types rust-lang/rfcs#294

Open

rust-highfive closed this as completed Sep 24, 2014

huonw mentioned this issue Oct 16, 2014

RFC: Anonymous enum types called joins, as A | B rust-lang/rfcs#402

Closed

flip1995 pushed a commit to flip1995/rust that referenced this issue Jan 27, 2022

Auto merge of rust-lang#8311 - dswij:8277, r=llogiq

d976d8a

fix `needless_question_mark` not considering async fn closes rust-lang#8277 changelog: [`needless_question_mark`] Fix FN on async functions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anonymous sum types #8277

Anonymous sum types #8277

glaebhoerl commented Aug 3, 2013

graydon commented Aug 4, 2013

glaebhoerl commented Aug 4, 2013

bblum commented Aug 5, 2013

chris-morgan commented Feb 20, 2014

glaebhoerl commented Feb 20, 2014

chris-morgan commented Feb 22, 2014

mneumann commented Feb 22, 2014

chris-morgan commented Feb 22, 2014

huonw commented Feb 22, 2014

chris-morgan commented Feb 22, 2014

chris-morgan commented Feb 22, 2014

glaebhoerl commented Feb 22, 2014

rust-highfive commented Sep 24, 2014

Anonymous sum types #8277

Anonymous sum types #8277

Comments

glaebhoerl commented Aug 3, 2013

graydon commented Aug 4, 2013

glaebhoerl commented Aug 4, 2013

bblum commented Aug 5, 2013

chris-morgan commented Feb 20, 2014

glaebhoerl commented Feb 20, 2014

chris-morgan commented Feb 22, 2014

mneumann commented Feb 22, 2014

chris-morgan commented Feb 22, 2014

huonw commented Feb 22, 2014

chris-morgan commented Feb 22, 2014

chris-morgan commented Feb 22, 2014

glaebhoerl commented Feb 22, 2014

rust-highfive commented Sep 24, 2014