[Draft Issue] Intersection and union types #4628

YairHalberstadt · 2021-04-07T13:21:37Z

YairHalberstadt
Apr 7, 2021
Collaborator

This is a draft issue. Please discuss. Depending on feedback I may open as an actual issue soon.

Union and Intersection Types

Proposed
Prototype: Not Started
Implementation: Not Started
Specification: Not Started

Summary

Add union types (which express that an instance is an instance of at least one of a set of types) and intersection types (which express that an instance is an instance of all of a set of types).

See earlier discussion at #399

Motivation

Union Types

It's common to have to work with something which can be one of a number of different types.

Discriminated unions offer one solution to this but they are somewhat heavy. You need to declare a new discriminated union type, you need to give the cases names, you need to wrap/unwrap the instance in a discriminated union. Furthermore the DU is a whole separate instance - there's no reference equality between the instance of a the DU, and the instance of the type it wraps.

Sometimes all of that is desirable, but often you want to work with such data in a more Ad Hoc fashion. For example an IOC container might want to store an array containing both IDisposables and IAsyncDisposables without completely losing type safety:

List<IDisposable or IAsyncDisposable> _toDispose = new();

T Resolve<T>()
{
    var t = Create();
    if (t is (IDisposable or IAsyncDisposable) disposable)
        _toDispose.Add(disposable);
}

It's also sometimes desirable to call a member on an instance which you know is one of a set of types which all have that method, but you don't know which. For example, this could simplify the current implementation of IEnumerable.Count() to:

if (source is (ICollection<TSource> or ICollection) collection)
{
   return collection.Count;
}

Intersection Types

In other cases you want to work with data which implements 2 separate interfaces. Currently the only way to do that is to constantly cast between the two interfaces to call the separate methods on each interface, losing type safety in the process.

Note that this is already possible in C# using generic constraints - you can constrain a type parameter to multiple types, and the type parameter will act as if it's an intersection type. This is widely used. This proposal suggest allowing that in the general case, not just via generic constraints.

Detailed design

Syntax

I would suggest reusing the or and and keywords. This is because there's no semantic difference between a pattern match which checks if an instance is either an int, or a string, and one which checks if an instance is a union(int, string). Therefore we can safely reinterpret the existing pattern x is int or string as pattern matching on the union type int or string without changing the semantic meaning of existing patterns. I think the naming also feels intuitive. The same goes for and.

type
  : ...
  | union_type
  | intersection_type
  ;

union_type
  : type 'or' type ('or' type)*
  | '(' type 'or' type ('or' type)* ')'
  ;

intersection_type
  : type 'and' type ('and' type)*
  | '(' type 'and' type ('and' type)* ')'
  ;

In some cases parenthesis will be required to disambiguate the type. e.g a and b or c to ((a and b) or c) or (a and (b or c)).

When pattern matching, to declare a variable for an intersection/union pattern the pattern must be parenthesized, so this is legal: x is (int or string) intOrString, but this is not: x is int or string intOrString.

Semantics

Let us call the set of types that make up a union/intersection type the typeparts of that type.

SubTyping

For this purpose a type is both a subtype and a supertype of itself (i.e. we're referring to non strict sub/super types).

As always there is an implicit reference conversion from a subtype of a union/intersection type to the union/intersection type, and from a union/intersection type to a supertype of the unon/intersection type.

A non union/intersection type is a subtype of a union type if and only if it is a subtype of any of the typeparts.
A union type is a subtype of a non union/intersection type if and only if all typeparts are subtypes of the type.
A non union/intersection type is a subtype of an intersection type if and only if it is a subtype of all of the typeparts.
An intersection type is a subtype of a non union/intersection type if and only if any of the typeparts are subtypes of the type.
A union type is a subtype of another union type if and only if all typeparts of the first union are subtypes of any of the typeparts of the second union.
An intersection type is a subtype of another intersection type if and only if for all typeparts of the second intersection, any of the typeparts of the first intersection are subtypes of the typepart.
A union type is a subtype of an intersection type if and only if all typeparts of the union are subtypes of all of the typeparts of the intersection.
An intersection type is a subtype of a union type if and only if any of the typeparts of the intersection are subtypes of any of the typeparts of the union.

Downcasting / pattern matching

The ability to downcast/pattern match to and from a union/intersection type should fall out of existing language rules.

This includes use of casts, is and as.

The meaning should also be straightforward. Downcasting to a union/intersection type is an O(n) operation in the number of typeparts.

Switching

A switch expression which handles all typeparts of a union type is considered to be exhaustive and will not warn.

A case which matches on an intersection type, with no when clause, is considered to handle each of the typeparts of the intersection type.

Members

A union type is considered to have any instance member which exists with the exact same signature on all typeparts of a union type.

For this purpose the order in which the typeparts of a union type are declared matters, as if a type is a subtype of multiple of these typeparts, we will call the implementation for the first typepart it is a subtype of.

Calling a member of a union type is an O(n) operation in the number of typeparts.

An intersection type is considered to have all instance members it would have if it inherited/implemented all typeparts. This is meant to be consistent with the rules for a type parameter with multiple constraints. Take a look there for the formal definition of this. In practice this means that if multiple typeparts declare the same member with the same signature, it may be illegal to call that method as it may be ambiguous.

Miscellaneous

Using a struct as a union typepart will involve boxing the struct.

Structs cannot be used as intersection typeparts.

At most one typepart of an intersection type can be a class. The others must all be interfaces.

It is an error to declare a union type where one typepart is a subtype of another typepart.

a and b is the same type as b and a, but a or b is not the same type as b or a (although there is an identity reference conversion from each one to the other).

Note that we are not changing the best common type specification to make the best common type be the union of the types. This is deliberate, as that would mean that if you make a mistake and have some of your expressions return the wrong type, you find out about it much later down the line or not at all. For example, if you did the following:

var x = isAsync ? GetAsync() : GetSync(); //forgot to await GetAsync
return x.SomeMethod();

You would get an error message "x does not contain member SomeMethod" instead of "no best type found for Task and MyType".

CodeGen

Whilst it would be possible to try and implement this via type erasure, there are lots of odd corner cases which wouldn't work without full runtime support. For example you wouldn't be able to assign IEnumerable<string or char[]> to IEnumerable<IEnumerable<char>> despite covariance rule allowing this, as the runtime type would be IEnumerable<object>. I believe there's enough such gotchas that this simply wouldn't be worth it.

Instead we would need the runtime to provide native support for union and intersection types, using the same subtyping rules as mentioned above.

It would have to be decided whether the runtime would be responsible for simulating members on union/intersection types, or the compiler. My bias would be to have the runtime handle intersection type members, but leave it to the compiler for union types as different languages may want to use different rules. For example VB might consider two members to have the same signature if they differ only by casing, whilst C# wouldn't. The compiler would lower member calls on union types to a switch expression which matches on successive typeparts, and calls the member on the first matching typepart.

i.e.

if (source is (ICollection<TSource> or ICollection) collection)
{
   return collection.Count;
}

would be lowered to

if (source is (ICollection<TSource> or ICollection) collection)
{
   return collection switch {
      ICollection<TSource> temp1 => temp1.Count,
      ICollection temp2 => temp2.Count,
   };
}

As discussed above, this means the order in which a union type is declared matters. Note that if you want the opposite order, there is an implicit identity conversion from (a or b) to (b or a).

Since the compiler simulates these members they will not be accessible via reflection.

In cases where the exact same method would be called for all typeparts - i.e. where all typeparts are interfaces which inherit from the same base interface, and you call a method defined by the common base interface, or similiar for classes and a virtual method defined by the common base class, the compiler will simply emit a virtual call to the method, instead of a switch.

Drawbacks

Significant added complexity to the type system.
Hidden performance cost as member calls to union types are O(n) in the number of typeparts.
Requires runtime support

Alternatives

Discriminated Unions can handle some of the use cases of union types, but in a much more heavyweight way.
There's not really much of an alternative to intersection types I can think of at the moment.

Unresolved questions

Should it be an error to declare a union type where one typepart is a subtype of another typepart?

One reason you might want to do this is to access implicitly implemented interface members on a type, whilst still being able to access all the new members it defines itself, e.g.

interface IA { int M1() => 42; }
class A : IA { int M2() => 42; }

var a = (A and IA)new A();
Console.WriteLine(a.M1() + a.M2());

However that comes at the cost of not being able to access any explicitly implemented members (since they would be ambiguous), so the cases this is actually useful might be rare.

Should members of a union have to have an exactly matching signature on all typeparts?

If we have the following:

interface IA { string M(object a); }
interface IB { object M(string a); }

var aOrB = (IA or IB)null;
object y = x.M("hello"); // should this be legal

There exists a member M which is capable of handling a string and returning an object on both IA and IB, so perhaps this should be legal?

We could say that if overload resolution would select a single best member from each typepart, and there is a Best Common Type for the return types of each of those members, and the target type of any target typed parameter expressions match for each selected member, then we can access the member on the union.

This would make the feature significantly more powerful, but at the risk of making it extremely tricky for humans to work out what's happening here. We could make this slightly simpler by requiring all the return types to match, rather than using the best common type, and disallowing target typed parameter expressions.

Should List<a or b> have an implicit reference conversion to List<b or a>

Since List<T> is invariant, and a or b and b or a are different types, under current language rules it should be illegal to use a List<a or b> where a List<b or a> is expected, or vice versa. However, since they are mutual subtypes of each other, I don't believe this could ever actually cause any issues, and so should be allowed, and we should encourage the runtime to support it.

Design meetings

jnm2 · 2021-04-07T16:03:55Z

jnm2
Apr 7, 2021
Collaborator

The use of (A or B) and (A and B) for union and intersection types is quite attractive to me.

Would you be able to use (IA and IB) as the generic type argument to a type parameter constrained to IA, IB?

1 reply

YairHalberstadt Apr 7, 2021
Collaborator Author

I think that falls straight out of the subtyping rules. IA and IB is a subtype of both IA and IB so yes.

jnm2 · 2021-04-07T16:09:22Z

jnm2
Apr 7, 2021
Collaborator

Should an invoked member's return type be allowed to vary?

// aOrB is (A or B)
var id = aOrB.Id;
// id is (int or string)

class A
{
    public int Id { get; }
}

class B
{
    public string Id { get; }
}

8 replies

YairHalberstadt Apr 7, 2021
Collaborator Author

Oh, I see. No as there's no Best Common Type, and I think implicitly introducing union types leads to confusing errors in the wrong places.

jnm2 Apr 7, 2021
Collaborator

Hmm, okay. It couldn't be added on later if best common type is used at first, right?

YairHalberstadt Apr 7, 2021
Collaborator Author

I think it could, since it would only ever make an error a non-error.

jnm2 Apr 7, 2021
Collaborator

Wouldn't it mean that instead of BaseClass as the return type of the invocation, it would switch to (DerivedType1 or DerivedType2), which would change overload resolution and other things?

YairHalberstadt Apr 7, 2021
Collaborator Author

If it were considered (which I don't like), I would only want it as a fallback if there's no Best Common Type

bernd5 · 2021-04-07T19:22:53Z

bernd5
Apr 7, 2021

At first this proposal looks nice.
But as in your first example with IDisposable and IAsyncDisposable different interfaces often need different handling and/or have a different API surface.
This is true for other types as well. For example a string could be seen as a list of chars. Therefore it would be nice to have a "union type" for IReadonlyList<char> and string with members like an indexer and a Count Property. But unfortunately Count is called Length in string. This means that even if this proposal would be implemented type-checking would still be required.
And we already have a perfect "union type" - object. The only thing we would gain is the possibility to express better what types we expect at certain cases (today we could use attributes and an analyzer for those checks).

But the problems mentioned in the motivation are really an issue for csharp.

In my eyes it would be much better if we could simplify the usage of adapter types which handle all the required logic.

Let`s look for example at:

using System;
using System.Threading.Tasks;

IAsyncDisposable disp = new DisposableToAsyncDisposableAdapter(new Something());
await disp.DisposeAsync();

struct Something : IDisposable
{
    public void Dispose()
    {
        Console.WriteLine("Dispose");
    }
}

struct DisposableToAsyncDisposableAdapter : IAsyncDisposable
{
    private readonly IDisposable disp;

    public ValueTask DisposeAsync()
    {
        //what ever
        disp.Dispose();
        return ValueTask.CompletedTask;
    }

    public DisposableToAsyncDisposableAdapter(IDisposable disp)
    {
        this.disp = disp;
    }
}

It would be nice if we could omit the explicit usage of the adapter class - which could be solved by an "Interface to Interface" conversion like:

public static implicit operator IAsyncDisposable(IDisposable disp)
{
    return new DisposableToAsyncDisposableAdapter(disp)
}

Sadly, such a conversion function is not allowed. In my eyes the main reason to disallow such a conversion was the question "what should be done if multiple conversions/adapters are available?".

6 replies

YairHalberstadt Apr 7, 2021
Collaborator Author

I think it's super useful to indicate that a method can return 1 of 2 types, even if you have to handle them separately. Besides switch expression exhaustiveness checking can make sure you've handled all possible cases of a union type, which it can't do for object.

bernd5 Apr 7, 2021

Yes, only with additional meta-data (which could be encoded only in attributes, today).

Just now I read about another proposal which addresses more or less the same issue(s): #4629 (comment)

It's sounds really good to me. Perhaps we could do it for return types, too.

YairHalberstadt Apr 7, 2021
Collaborator Author

Firstly this proposal specifically calls out that it needs runtime support.

Secondly, I don't see that that proposal helps in a case where you want to return e.g. an int or a string?

YairHalberstadt Apr 7, 2021
Collaborator Author

That proposal is pretty much entirely about performance

bernd5 Apr 7, 2021

Not only - it needs just runtime-support to "instantiate" the generic functions. If those functions would not be callable from other assemblies it could be done with lowering only.
What I meant is that this proposal could use the same mechanism and / or share the same syntax.

For example I can imagine a function like

void Foo(some (IList<string> or IReadonlyList<string>) strings)
{

}

the meaning would be very similar to the following C++ template:

template<
    typename T,
    typename PureT = std::decay_t<T>,
    typename = std::enable_if_t<
        std::disjunction_v<
            std::is_same<PureT, IList<string>>,
            std::is_same<PureT, IReadonlyList<string>>
        >
    >
>
void Foo(T&& strings)
{

}

Or with CSharp-like generics it would be something like:

void Foo(T strings) where T : IList<string> or IReadonlyList<string>
{
}

orthoxerox · 2021-04-08T06:57:35Z

orthoxerox
Apr 8, 2021

Is a type a subtype of itself? If not, your subtyping rules do not cover cases A <: (A or B), A :> (A and B)
While A or B and B or A are distinct types, A and B and B and A should be considered the same type, as your proposal makes calling any member with an identical signature on A and B an error
Your proposal makes calling any member of A and B with an identical signature on A and B an error, but in case of A and IA this might be useful for calling implicitly implemented members of IA or its supertype.
Intersection types must probably be restricted to just one typepart being a non-interface type.

3 replies

YairHalberstadt Apr 8, 2021
Collaborator Author

Points 1,2 and 4 are great and I've updated the proposal to reflect them.

TLDR:

yes
True. Also I've added a discussion about whether List<a or b> should be a subtype of List<b or a>.
True

With regards to 3:
I'm not sure I understand. Either A provides an explicit implementation or it doesn't.

If it does, then this is truly ambiguous - do we want to call the explicit implementation, or the implicit one (note C# actually generates 2 methods for an explicit interface implementation - an ordinary method and an implicit implementation that forwards to it. There's no way from the outside to tell the ordinary method is actually an explicit interface implementation).

If it doesn't then calling the method will be legal, as there's only one type which declares that method.

orthoxerox Apr 8, 2021

interface IFoo
{
    void Foo();
    void Bar();
    void Baz();
}

class Foo : IFoo
{
    void Foo()
    {
        System.Console.WriteLine("Foo!");
    }
    
    void IFooBar.Bar()
    {
        System.Console.WriteLine("Interface Bar!");
    }
    
    void Bar()
    {
        System.Console.WriteLine("Class Bar!");
    }
    
    void IFooBar.Baz()
    {
        System.Console.WriteLine("Interface Baz!");
    }
}

(Foo and IFoo).Foo() //should work
(Foo and IFoo).Bar() //ambiguous call
(Foo and IFoo).Baz() //should work

interface IFooBar
{
    void Foo();
}

interface IFooBar : IFoo
{
    void Bar();
}

interface IFooBaz : IFoo
{
    void Baz();
}

(IFooBar and IFooBaz).Foo() //should definitely work as long as there are no DIMs in the derived interfaces.

YairHalberstadt Apr 8, 2021
Collaborator Author

Great point. I've updated the proposal to make this the same as the current rules for a type parameter constrained to multiple types (namely act as if the type inherited from/implemented all types). That should handle that case.

gulshan · 2021-04-08T08:58:38Z

gulshan
Apr 8, 2021

The differences between this proposal and #399 I have noticed-

Syntax- A and B, A or B vs A+B, A|B.
Members- Discussion: Union and Intersection types #399 proposed members be determined from actual type hierarchy. So an 'or type' member has to be an actual member of a common parent type of type-parts, whether the parent being class or interface. No signature matching like this proposal. Members are same for 'and type's though.
As the relationship is determined by type hierarchy per Discussion: Union and Intersection types #399 , the 'and type' is actually virtually deriving from all of the type-parts. So, only a single non-interface type can participate, whereas there can be multiple interfaces in a 'and type'.

0 replies

hez2010 · 2021-08-19T08:38:47Z

hez2010
Aug 19, 2021

This doesn't take generic constraints into account. I created another proposal for generic things in #5085.
Can #5085 be merged into this proposal?

1 reply

YairHalberstadt Aug 19, 2021
Collaborator Author

There are 3 proposals in that issue.

The first is unrelated to union/intersection types.
The second will naturally fall out of this proposal.
The third won't naturally fall out of this since at the moment generic constraints must be on the type parameters, not on constructs of the parameters. I would prefer not to put that in this issue, since it is not a necessary or directly related part of union/intersection types, so I would prefer if it was a separate issue, possibly building on the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft Issue] Intersection and union types #4628

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 19 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Draft Issue] Intersection and union types #4628

YairHalberstadt Apr 7, 2021 Collaborator

Union and Intersection Types

Summary

Motivation

Union Types

Intersection Types

Detailed design

Syntax

Semantics

SubTyping

Downcasting / pattern matching

Switching

Members

Miscellaneous

CodeGen

Drawbacks

Alternatives

Unresolved questions

Design meetings

Replies: 6 comments · 19 replies

jnm2 Apr 7, 2021 Collaborator

YairHalberstadt Apr 7, 2021 Collaborator Author

jnm2 Apr 7, 2021 Collaborator

YairHalberstadt Apr 7, 2021 Collaborator Author

jnm2 Apr 7, 2021 Collaborator

YairHalberstadt Apr 7, 2021 Collaborator Author

jnm2 Apr 7, 2021 Collaborator

YairHalberstadt Apr 7, 2021 Collaborator Author

bernd5 Apr 7, 2021

YairHalberstadt Apr 7, 2021 Collaborator Author

bernd5 Apr 7, 2021

YairHalberstadt Apr 7, 2021 Collaborator Author

YairHalberstadt Apr 7, 2021 Collaborator Author

bernd5 Apr 7, 2021

orthoxerox Apr 8, 2021

YairHalberstadt Apr 8, 2021 Collaborator Author

orthoxerox Apr 8, 2021

YairHalberstadt Apr 8, 2021 Collaborator Author

gulshan Apr 8, 2021

hez2010 Aug 19, 2021

YairHalberstadt Aug 19, 2021 Collaborator Author

YairHalberstadt
Apr 7, 2021
Collaborator

Replies: 6 comments 19 replies

jnm2
Apr 7, 2021
Collaborator

YairHalberstadt Apr 7, 2021
Collaborator Author

jnm2
Apr 7, 2021
Collaborator

YairHalberstadt Apr 7, 2021
Collaborator Author

jnm2 Apr 7, 2021
Collaborator

YairHalberstadt Apr 7, 2021
Collaborator Author

jnm2 Apr 7, 2021
Collaborator

YairHalberstadt Apr 7, 2021
Collaborator Author

bernd5
Apr 7, 2021

YairHalberstadt Apr 7, 2021
Collaborator Author

YairHalberstadt Apr 7, 2021
Collaborator Author

YairHalberstadt Apr 7, 2021
Collaborator Author

orthoxerox
Apr 8, 2021

YairHalberstadt Apr 8, 2021
Collaborator Author

YairHalberstadt Apr 8, 2021
Collaborator Author

gulshan
Apr 8, 2021

hez2010
Aug 19, 2021

YairHalberstadt Aug 19, 2021
Collaborator Author