Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple usages of the case discriminator for case objects #368

Open
Araq opened this issue Apr 15, 2021 · 20 comments
Open

Multiple usages of the case discriminator for case objects #368

Araq opened this issue Apr 15, 2021 · 20 comments

Comments

@Araq
Copy link
Member

Araq commented Apr 15, 2021

This RFC is a summary of the discussion from #19

The proposal that has won is to allow repeating the case discriminator field without the : type syntax. Like so:

type
  NodeKind = enum
    foo
    bar
    bif
    baz
    bad

  Node = object
    case kind: NodeKind
    of foo, bar, baz:
      children: seq[Node]
    of bad, bif:
      dad: Node

    case kind
    of foo:
       additionalFieldForFoo: int
    of bar:
      onlyValidForBar: float
    else: discard

    case kind
    of foo, bar:
      validForFooAndBar: string
   else: discard

It's by far the simplest and most elegant extension to the existing language. It's 100% backwards compatible since this code doesn't compile yet. The ABI is clear too, the re-used field is put only once into the object, at the same offset that it used to.

To do

Things left to specify: How the AST looks like for this construct, both before sem'checking and after sem'checking.

@bluenote10
Copy link

Apart from the fact I personally find this quite ugly and hard to read: I was hoping that Nim leaves the door open for having real algebraic data types eventually, like Haskell or Rust enums. For instance via Node = object {.algebraic.}, allowing a field x having different types in different branches, and field access fully type checked via type guards. I feel with this syntax choice instead of repeated fields this goes out the window.

@Araq
Copy link
Member Author

Araq commented Apr 15, 2021

I feel with this syntax choice instead of repeated fields this goes out the window.

Huh? Why? We can always require obj.field to be used inside a case guard, nothing changed. In fact, there is a warning for this via warning[ProveField]:on already.

@bluenote10
Copy link

bluenote10 commented Apr 15, 2021

Huh? Why? We can always require obj.field to be used inside a case guard, nothing changed.

On usage site yes, but the grouping based approach doesn't seem to extend nicely towards heterogeneously typed fields. The type declaration would have to support something like:

type
  Node = object
    case kind: NodeKind
    of foo:
      children: seq[Node]
    of bar:
      children: seq[SealedNode]

I.e., it would require extension towards the first proposal in #19 anyway. If that would be supported as well there would be two ways of writing the case where a field has the same type, either grouped or repeated, which seems weird.

In Rust I don't mind the repetition of fields in branches at all. Not only is a rare issue to have significant repetition, but the explicitness also helps to see all fields of a branch with a single glance, whereas the proposal requires actual reading to mentally assemble a foo vs a bar etc.

@Araq
Copy link
Member Author

Araq commented Apr 15, 2021

I think heterogeneously typed fields are so rare that it is not worth musing about. Just use different names for different things.

@bluenote10
Copy link

bluenote10 commented Apr 15, 2021

They are not so rare in my opinion. When implementing e.g. a wrapper that is supposed to be bounded in types it makes sense to call its main data field just value or data, without having to creatively come up with a different unique name like valueArrayOfString in every branch.

I also find the feature helpful when implementing file formats that involve heterogeneous fields by the specs. Comparing implementing file formats in both Nim and Rust, I enjoyed Rust's enums a lot, because I could use the official spec field names despite incompatible types:

pub enum FileChunk {
    FooChunk { id: u16, header: FooHeaderType },
    BarChunk { id: u64, header: BarHeaderType },
    RomVersion { major: u16, minor: u16 },
    InstrumentVersion { major: u32, minor: u32 },
    // ...
}

Just use different names for different things.

They aren't necessarily different things semantically. A wrapper holds a "value", an "id" is a id, and a version has a "major" and a "minor" field, no matter what exact type is used. The elegance of algebraic data types is that fields can express whatever they want, without having to worry about conflicts or ugly type suffixes like id16 and id64.


Another example: Instruction opcodes. Take this Rust implementation of GameBoy opcodes as an example. It is straightforward from looking at the code what the individual opcodes require. This kind of representation makes it even possible to guess what they do. With grouping by fields this would become an absolute mess in my opinion. Admittedly in this specific example it would still be acceptable, because there are many unary (and few binary) opcodes, and sharing a field name would be still be relatively straightforward for them. But imagine many opcodes requires like 5 operands or so. What a headache to search more meaningful field intersections / unions and picking distinct names...

@Araq
Copy link
Member Author

Araq commented Apr 16, 2021

because I could use the official spec field names despite incompatible types

But only if you're lucky and the spec uses names that are not alien to your programming environment. Usually it's better to stick with the programming language's conventions.

@Araq
Copy link
Member Author

Araq commented Apr 16, 2021

On top of that, usually a spec is concerned with packed data and programming languages have little support for variable sized packed memory structures (yes, this includes holy low level C) so you need a DSL or some custom code anyway.

@a-mr
Copy link

a-mr commented Apr 19, 2021

So it won't be possible to define a shared field with another subset of tags?

E.g. to add to object in the example above:

    case kind
    of foo, bad:
      commonForFooAndBad: int
    else: discard

?

@Araq
Copy link
Member Author

Araq commented Apr 19, 2021

@a-mr Sure it would be possible, that's the point of this RFC... I don't understand your question.

@a-mr
Copy link

a-mr commented Apr 19, 2021

@Araq , you wrote your example as if all shared fields are placed in first case and second case without : type contains only non-shared fields. Hence the confusion.

@Araq
Copy link
Member Author

Araq commented Apr 19, 2021

Thanks, I've updated the example to make it clearer, I hope.

@Araq
Copy link
Member Author

Araq commented Apr 19, 2021

@timotheecour why the downvote?

@timotheecour
Copy link
Member

timotheecour commented Apr 19, 2021

because I much prefer your original proposal in #19 (comment)

this new RFC prompted me to go ahead and implement your original proposal, I have a PR in the works for that

rationale:

  • simpler implementation in compiler, and probably also for macro writers consuming case statement object AST's
  • more intuitive (subjective) and more DRY (objective)
  • easier to control the field ordering layout compared to if it were done with this RFC
  • it opens possibility of also extend existing case statements in a similar way in future work:
case {.allowDups.} t.typ
of routineKinds: ... # includes skTemplate
of skTemplate: ... # extra code
...
else: discard

note

i don't understand this objection though #368 (comment) (heterogenous type fields); i don't see how this can work (FieldDefect is checked at RT, not CT). So shared fields must have same type.

@bluenote10
Copy link

i don't understand this objection though #368 (comment) (heterogenous type fields); i don't see how this can work (FieldDefect is checked at RT, not CT).

My hope simply was that one day the check would become an (opt-in) CT check to make case object fully type safe, instead of relying on RT checks. I.e.:

type
  WrapperKind = enum
    Str
    Num

  # Perhaps also {.typesafe.}
  Wrapper {.algebraic.} = object
    case kind: WrapperKind
    of Str:
      value: string
    of Num:
      value: int

The "algebraic" opts in to enforce all access to this case object behind type guards only. This enables the reuse of value with heterogeneous types in different branches. Omitting the {.algebraic.} would result in a compiler error like:

Field "value" has different types in different case branches. Only {.algebraic.} case objects can have heterogeneous types.

@a-mr
Copy link

a-mr commented Jun 25, 2021

Extension to the proposal: allow if (in addition to case) for object variants

Rationale:

  1. avoids else: discard statements
  2. more consistent: if we have case then why not have if? Also if branches will look the same for both usage and definition of the fields

For point 1, the example in the description would be idiomatically written so:

  Node = object
    case kind: NodeKind
    of foo, bar, baz:
      children: seq[Node]
    of bad, bif:
      dad: Node

    if kind == foo:
       additionalFieldForFoo: int
    
    if kind == bar:
      onlyValidForBar: float

    if kind in {foo, bar}:
      validForFooAndBar: string

For point 2, if we e.g. have boolean kind (see an example here) it could be written more naturally as:

  ItemFragment = object
    isRst: bool
    if isRst:
      rst: PRstNode
    else:
      str: string

Of course, only simplest syntactic forms of ifs would be recognized:

if field == ...:
if field in {...}:
if field:   # for boolean `field`

@theamarin
Copy link

The example given by @bluenote10 is exactly what I tried first when I was experimenting with Nim. I think this is the least astonishing way and quite intuitive.

So I strongly vote for his suggestion:

type
  Node = object
    case kind: NodeKind
    of foo:
      children: seq[Node]
    of bar:
      children: seq[SealedNode]

I would really appreciate to see this in Nim, the sooner the better! 😊

@yuanweixin
Copy link

In terms of priority, is there plan to include this in Nim 2 this year (2023)?

@Araq
Copy link
Member Author

Araq commented Jan 8, 2023

No, the focus shifted since then to a related, but yet-to-be-written RFC.

@khaledh
Copy link

khaledh commented Apr 23, 2023

@Araq What's the status on this? Any progress on the related RFC?

@Araq
Copy link
Member Author

Araq commented Apr 27, 2023

As I said, no and I'm not sure I like the idea anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants