Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow usage of the same attribute name in different branches within variant types #19

Closed
cryzed opened this issue Dec 7, 2015 · 57 comments

Comments

@cryzed
Copy link

cryzed commented Dec 7, 2015

The following code snippet raises an error during compilation: Error: redefinition of 'someAttribute'.

  type
    MyKind = enum
      Kind1, Kind2, Kind3

    MyType = ref object
      case kind: MyKind
      of Kind1:
        someAttribute: string
      of Kind2:
        someAttribute: string
      of Kind3:
        someOtherAttribute: int

Since only a single branch, depending on kind, can be active at once, it would seem logical for this to work -- since no actual redefinition of someAttribute ever takes place. Currently I need to work around this by naming the fields for each branch differently and creating a custom umbrella accessor proc that has to check the object kind at runtime and return the correct field, which is associated with its own problems: nim-lang/Nim#3628.

@ghost
Copy link

ghost commented Dec 7, 2015

Keep in mind that discriminators can be grouped together, though sometimes you only want to share certain fields.

@refi64
Copy link

refi64 commented Dec 7, 2015

What about:

of Kind1, Kind2:
    someAttribute: string

@cryzed
Copy link
Author

cryzed commented Dec 7, 2015

And while we are at it, should it also be possible to have attributes of the same name but with a different type depending on the branch? This question was also raised in IRC and would improve this code for example, i.e. num and fnum.

@Araq
Copy link
Member

Araq commented Dec 8, 2015

And while we are at it, should it also be possible to have attributes of the same name but with a different type depending on the branch?

No.

This question was also raised in IRC and would improve this code for example, i.e. num and fnum.

No, it wouldn't.

@dom96
Copy link
Contributor

dom96 commented Dec 8, 2015

This question was also raised in IRC and would improve this code for example, i.e. num and fnum.

No, it wouldn't.

IRC discussion here (Starts at 12:38:06)

@Araq
Copy link
Member

Araq commented Jan 21, 2016

Conclusion:

Alternatively we can weaken the disjoint condition for 'case' in 'object'.
So that this is allowed:

case kind: NodeKind
of foo, bar, baz: children: seq[Node]
of foo: additonalFieldForFoo: int
of bar: onlyvalidForBar: float

then it's much more obvious that 'children' is actually a shared field

@vegansk
Copy link

vegansk commented Jan 21, 2016

👍 for this variant

@Varriount
Copy link

Wouldn't this cause divergence in the case statement syntax for type variants and conditions though? Why not just allow if statements then?

In addition, it still doesn't resolve the ambiguities created by an identifier being two types - What happens if type or sizeof are used on an instance's member?

@vegansk
Copy link

vegansk commented Jan 21, 2016

@Varriount, there is no divergence in Araq's variant. sizeof(x.children) and type(x.children) will always be the same

@Varriount
Copy link

@vegansk Ah, I see. However there is still the issue that we now have two different, subtle behaviors for case statements.

Enacting this change means that case blocks accept duplicate labels iff they are in a type definition, but not when they are in regular code. That's what I meant by divergent behavior.

This adds yet another piece of subtle complexity and inconsistency to Nim's syntax, which already has enough of those. I'd much rather allow duplIcate member names in each of block, since that seems to be what users try first.

@akbcode
Copy link

akbcode commented Mar 23, 2018

The limitation that two cases must either share all fields or none seems unreasonable.

@ghost
Copy link

ghost commented Mar 23, 2018

Allowing to do this would allow much nicer type definition and usage for types (in some cases)

@dom96
Copy link
Contributor

dom96 commented Mar 24, 2018

I'd much rather allow duplIcate member names in each of block, since that seems to be what users try first.

👍 to this

and 👎 to @Araq's conclusion

@cryzed
Copy link
Author

cryzed commented Mar 24, 2018

I thought @Araq's reply was completely unproductive in the first place. IIRC he seemed more pissed off because he didn't want to consider implementing it. But I'll let this rest, it's nice that this is finally getting attention.

@Araq
Copy link
Member

Araq commented Mar 24, 2018

Dunno why you think that. My proposal would actually work, the other proposals are just nuts. I don't care how many thumbs down I get for it, you cannot vote for nonsense.

@GULPF
Copy link
Member

GULPF commented Mar 24, 2018

@Araq What's the problem with the other proposal? If the fields with a shared name are required to also have the same type, then it accomplishes the same thing as your proposal, but with a more obvious syntax (at least to me).

@Bulat-Ziganshin
Copy link

Bulat-Ziganshin commented Jul 15, 2018

The proposal in the first post is how things are done in Haskell, and it frequently became useful:

Entry = ref object
      case kind
      of File:
        name: string
        modified: datetime
        size: int64
      of Dir:
        name: string
        modified: datetime
      of Device:
        name: string

@Araq
Copy link
Member

Araq commented Jul 17, 2018

Under my proposal this would become:

Entry = ref object
      case kind
      of File, Dir, Device: name: string
      of File, Dir:
        modified: datetime
      of File:
        size: int64

which is clearer. IMO anyway.

@ghost
Copy link

ghost commented Jul 17, 2018

IMO the original proposal is better and more easily readable than @Araq 's proposal as over there I can easily visually parse which fields a particular branch has by just looking at that branch, whereas in Araq's proposal I have to look at the complete object to know that.

@Araq
Copy link
Member

Araq commented Jul 17, 2018

Yeah and in my proposal you can easily see the "shared" aspect of the fields.

@dom96
Copy link
Contributor

dom96 commented Jul 17, 2018

Less code duplication wins in my book. So I have to vote for @Araq's proposal.

Especially since I assume implementing it is significantly easier.

@Bulat-Ziganshin
Copy link

Bulat-Ziganshin commented Jul 17, 2018

Pro @Araq proposal:

  • avoids redundancy
  • easier to implement

Pro classic syntax:

  • familiarity (it was used from Pascal to F#, while I don't know any language using Araq approach)
  • when I know that current tag is File, I can quickly find which fields are available. I hardly ever had opposite task - i.e. figure out in which modes field "date" is available
  • Currently there is no way to initialize object position-wise, i.e. File("io.c", 1, "1970-01-01"), rather than fieldname-wise, i.e. File(name:"io.c", size:1, date:"1970-01-01"). But if position-wise construction will be implemented, Araq approach will make it harder for human to find all fields for given tag and correctly use the constructor call.

So i feel that Araq proposal avoids redundancy (which is source of errors), but makes the type definition less readable. But all problems mentioned can be fixed with macroses. And while theoretically I can imagine various problems, practically it should be rarely an issue. The real definition in my archiver actually looks better with Araq style:

Entry = ref object
      case kind
      of File,Dir:
        name: string
        modified: datetime
        attr: uint32
        group: uint32
      of Dir:
        files: seq[Entry]
      of File:
        size: uint64
        crc32: uint32

So, my voice is to implement any proposal - this feature is important to have, while concrete syntax isn't of much issue. If users will be really unhappy with chosen syntax, alternative one may be added much later.

@andreaferretti
Copy link

I should mention that patty already supports a slightly more convenient syntax for defining variant types, and when this feature is available, it will be extended to support it, so it will translate

variant Entry:
  File(name: string, modified: datetime, size: int64)
  Dir(name: string, modified: datetime)
  Device(name: string)

into whatever syntax is chosen, for instance

type Entry = object
      case kind
      of File, Dir, Device: name: string
      of File, Dir:
        modified: datetime
      of File:
        size: int64

Whatever concrete syntax is chosen, one can always write a macro to improve on that, so I wouldn't worry too much about which syntax is best (both have advantages)

@Araq
Copy link
Member

Araq commented Jul 18, 2018

The problem with the non-Araq syntax is that it is totally misleading, in ML/Rust/etc the name field is always protected (can only be accessed by pattern matching as anything else would be unsafe) and not shared. You cannot access obj.name without a guard in ML while you can in Nim.

@Bulat-Ziganshin
Copy link

Bulat-Ziganshin commented Jul 18, 2018

First, both proposals are semantically equivalent and can be mechanically translated to each other (as @andreaferretti plans to implement in patty). You can put restrictions on top of it according to your taste.

I worked only with Haskell, where

data Entry = File {name::String, size:: Int, attr:: Int}
           | Dir {name::String, attr:: Int, files:: [Entry]}
  • defines constructor function File "io.c" 0 0x0 and constructor with record syntax File {name="io.c", size=0, attr=0x0} (ditto for Dir)
  • defines attribute access functions name :: Entry -> String, files :: Entry -> [Entry]... which fails when entry has improper tag
  • allows to update attributes with record syntax entry {files = []} which fails when entry has improper tag, too
  • allows pattern-matching with syntaxes similar to both constructors, i.e. File aname asize _ -> print (aname,asize) and File {name=aname , size=asize} -> print (aname,asize)
  • declaring the same fieldname with different types in different tags is prohibited

BTW, I just realized that people married to FP like me will anyway use patty and don't care about compiler syntax. And if classic syntax is really harder to deal, implementing @Araq syntax in compiler and classic syntax in library - will mean shifting one more complexity from compiler to library.

@jacereda
Copy link

jacereda commented Nov 22, 2018

How about this one?

type Entry = object
  kind: EntryKind

  case kind
  of File, Dir, Device:
    name: string

  case kind
  of File, Dir:
   modified: datetime

  case kind
  of File:
   size: int64

It's a bit cumbersome, but I suspect codegen could be easier that way. Things like the GC traversal will just generate as many switch statements as you have specified in your type.

@alehander92
Copy link

@Araq so is there an agreed design here? I remember you proposed using if instead of case for shared fields.
(I still think shared fields instead of kind=>fields is less useful & harder to codegen, but more important is to at least get one of those)

@PMunch
Copy link

PMunch commented Dec 11, 2018

I prefer Araq's design. The pros of deduplication (which help prevent errors), clarity of shared fields, and potentially easier to implement outweighs the cons of slightly unfamiliar syntax IMO. It's too bad that it differs from regular case syntax though, maybe case for code could support the same thing?

@alehander92
Copy link

@PMunch I don't think unfamilar syntax is the problem: one can use if a.kind in {E, F} and still make it: regular case(or this case) should not be changed to accomodate it IMO.

Deduplication might be nice, but on the other hand you'll easily forget to add all subbranches for the same value .

Otherwise, it's just a matter of how often does one need to

  • see shared fields
  • see fields for a given kind

I think the second usecase is way more useful and important: otherwise I need to always read the whole type to find out all possible fields for A.

However that's something that need measure

@narimiran narimiran transferred this issue from nim-lang/Nim Jan 2, 2019
@ghost
Copy link

ghost commented Jul 21, 2019

It does not work for my use case. I am parsing JSON-serialized data that looks like this:

[
                {
                        "type": "TypeA",
                        "address": 23
                        // has an integer as address
                },
                {
                        "type": "TypeB",
                        "address": "foo"
                        // has a string as address
                },
                {
                        "type": "TypeC",
                        "model": "bar"
                        // does not have an address attribute
                }
]

The only way to properly represent this in nim objects is by using the same attribute name in different variants.

To use your pattern the JSON data would have to look like this:

[
                {
                        "type": "TypeA",
                        "a": { "address": 23 }
                        // has an integer as address
                },
                {
                        "type": "TypeB",
                        "b": {"address": "foo" }
                        // has a string as address
                },
                {
                        "type": "TypeC",
                        "c": { "model": "bar" }
                        // does not have an address attribute
                }
]

@krux02
Copy link
Contributor

krux02 commented Jul 22, 2019

@petabyteboy That looks to me more like an issue for the json (de)serializer.

@awr1
Copy link

awr1 commented Jul 31, 2019

The problem I keep thinking in the back of my head is "how are the fields actually positioned in memory" for case sections with common fields?

Suppose we have this object:

type Object* = object
  case kind: Kind
  of File:
    modified: Date
    size: int64
    name: string
  of Directory:
    modified: Date
    name: string
  of Device:
    uuid: UUID

What is the underlying organization? Is it:

union {
  struct {
    Date modified;
    int64 size;
    string name;
  } u_File;
  struct {
    Date modified;
    string name;
  } u_Directory;
  struct {
    UUID uuid;
  } u_Device;
};

Where we "just don't care" about the underlying layout?

Or is it more like:

union {
  struct {
    Date modified;
    int64 size; // access prohibited whilst kind == Directory?
    string name;
  } u_File_Directory;
  struct {
    UUID uuid;
  } u_Device;
};

where we make an attempt to keep the location of mutual fields unified?

@akbcode
Copy link

akbcode commented Dec 2, 2019

As long as it gets in, I don't mind which proposal is accepted. I think having macros for creating ADTs and pattern matching in the standard library would be nice.

@Skaruts
Copy link

Skaruts commented Apr 11, 2020

I agree, having either one would be better than nothing.

Why not both, though?

Opinions are divided and I can see why: the initial proposal seems more readable for simpler objects, while Araq's seems more readable (and more handy) for complex ones. As been said, one is more reliable to implement than the other, but from the user standpoint this discussion makes it seem like both are equally valid. Imo, user comfort should be a core principle in the production of any tool (and that seems to be true in Nim, in many cases), so this should weigh in the decision.

So unless the hard option is actually unreasonably unreliable to implement, why not go for both?

(Since Araq's proposal is easier, it could be implemented first, and then the other one could be implemented in a "will be ready when it's ready" kind of way.)

@hiteshjasani
Copy link

hiteshjasani commented Apr 20, 2020

I would love to have this feature and I expect I'll use variant types more often.

Do either of the proposals make it easier to have generic code do json roundtrip conversions for variants? Currently marshalling is handled generically for all variants, but you have to write custom code to unmarshall a json variant type. See this gist. My bad, I tested in 1.0.6 but this works in 1.2.0+

Either proposal is fine with me. Sure there are readability considerations on both sides. But the bigger question to me is how much additional custom code will I have to write to do normal conversions with each new variant type I add?

It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures. - Alan Perlis http://www.cs.yale.edu/homes/perlis-alan/quotes.html

@zedeus
Copy link

zedeus commented Apr 20, 2020

@hiteshjasani from Nim 1.2.0 your gist is wrong, it produces the expected result.

@ghost
Copy link

ghost commented May 24, 2020

Today I also hit the same issue as @petabyteboy when considering using object variants for TDLib JSON object deserialization - objects in TDLib scheme can share some fields under the same name and type, so that even Araq's proposal would've worked fine. Example of tdlib schema:

//@class AuthenticationCodeType @description Provides information about the method by which an authentication code is delivered to the user

//@description An authentication code is delivered via a private Telegram message, which can be viewed in another client @length Length of the code
authenticationCodeTypeTelegramMessage length:int32 = AuthenticationCodeType;

//@description An authentication code is delivered via an SMS message to the specified phone number @length Length of the code
authenticationCodeTypeSms length:int32 = AuthenticationCodeType;

//@description An authentication code is delivered via a phone call to the specified phone number @length Length of the code
authenticationCodeTypeCall length:int32 = AuthenticationCodeType;

//@description An authentication code is delivered by an immediately cancelled call to the specified phone number. The number from which the call was made is the code @pattern Pattern of the phone number from which the call will be made
authenticationCodeTypeFlashCall pattern:string = AuthenticationCodeType;


//@description Information about the authentication code that was sent @phone_number A phone number that is being authenticated @type Describes the way the code was sent to the user @next_type Describes the way the next code will be sent to the user; may be null @timeout Timeout before the code should be re-sent, in seconds
authenticationCodeInfo phone_number:string type:AuthenticationCodeType next_type:AuthenticationCodeType timeout:int32 = AuthenticationCodeInfo;

In JSON that looks something like that:

{
  "@type": "authenticationCodeInfo",
  "phone_number": "+123456789",
  "type": {
    "@type": "authenticationCodeTypeCall",
    "length": 15
   },
  "next_type": {
    "@type": "authenticationCodeTypeFlashCall",
    "pattern": "somestring"
   },
   "timeout": 15
}

I guess I'll have to generate a bit more sophisticated deserialization than just using json.to for now :)

@disruptek
Copy link

disruptek commented May 25, 2020

@Clyybber has a good proposal:

type
  NodeKind = enum
    foo
    bar
    bif
    baz
    bad

  Node = object
    case kind: NodeKind
    of foo, bar, baz:
      children: seq[Node]
    of bad, bif:
      dad: Node

    case kind
    of foo:
       additionalFieldForFoo: int
    of bar:
      onlyValidForBar: float
    else: discard

I like this because it's explicit, behaves as you might expect for a case
statement, and the semantics of when the fields appear are quite clear.

@Clyybber
Copy link

Ironically, I prefer Araq's proposal (#19 (comment)) over mine :D

@ghost
Copy link

ghost commented May 25, 2020

@disruptek seems like this is the same as in #19 (comment) :)

@tontinton
Copy link

Is this going to happen?

@Araq
Copy link
Member

Araq commented Oct 7, 2020

Yes. Eventually.

@disruptek
Copy link

I like the brevity of @Araq's version but c'mon, do we really want case syntax to vary here? I still don't think that's a smart design choice, especially when I consider meta-programming these constructs.

@inv2004
Copy link

inv2004 commented Jan 10, 2021

What do you think about @Araq's variant, which is unwrapped into @Bulat-Ziganshin variant, but it will raise an error if we there is the same field with different type?

I started to look at the RFC, and it is clear that @Araq's version is more type-safe, but during the implementation I found that it is necessary:

  1. to change case-cover-check behavior
  2. unwrap the a bit special case into separate variants to generate structs in the union

After it I decided that it is probably a bit more natural (and easier) to build structs like it is done already, but to raise an error if type is not the same. It is not necessary to create and explain the special case for the type kinds, at least that coverage check is a bit different. Also, it would help to have a bit more clear mapping from Nim to C code.

Entry = ref object
      case kind
      of File:
        name: string
        modified: datetime
        size: int64
      of Dir:
        name: string
        modified: time  // errormsg: "type for the same field cannot be redefined"
      of Device:
        name: string

Correct me if I am wrong - it is my first 60minutes in Nim's compiler code.

@Bulat-Ziganshin
Copy link

  1. statically-typed languages are necessarily shoiuld have all fields of the same name having same type - otherwise e.modified type will be unknown at compile-tyime and you suddenly get a dynamically-typed language
  2. Allow usage of the same attribute name in different branches within variant types #19 (comment) mentioned another issue with my (i.e. classic) proposal - it leaves memory structure unobvious. should we have fixed position for fields with the same name in different cases? or we should obey order of the original definition?
  3. I still insist on the pros/cons list I have given in Allow usage of the same attribute name in different branches within variant types #19 (comment)

@Bulat-Ziganshin
Copy link

Can we have both mine and Araq (#19 (comment)) proposals implemented? I bet his version would be useful for code generation and code clarity since it ensures that each field has its type written only once - it may be important for anonymous types. Also, can we allow both

  1. explicitly specify the discriminator type as in case kind of KindType
  2. make it anonymous type as in case kind whose values are automatically deduced from the case labels?

As Pascal and ML languages shown, the second version is enough in most cases.

And I just realized that we can't have guarantee fixed field position with any proposal - sometimes we just don't have layouts for all cases that's compatible for same-named fields.

@treeform
Copy link

I fully support this. For me this is the biggest stumbling block with case variant objects and the reason I don't really use them. If this was fixed much of my code could be converted to them.

@inv2004
Copy link

inv2004 commented Jan 10, 2021

@Bulat-Ziganshin

  1. It is exactly why the errormsg in my example
  2. I do not see generic solution for it:
type Object* = object
  case kind: Kind
  of File:
    modified: Date
    size: int64
    name: string
  of Directory:
    modified: Date
    name: string
    uuid: UUID
  of Device:
    uuid: UUID

The only solution I see is to have all available fields in the struct, and restrict access to separate fields, but I suppose it is not good idea and brakes the main idea of unions. It would help to save field during conversion from one kind to another, but it is prohibited.

I think that the anonymous types is much longer story that they would have the same type-safe problem @Araq mentioned in the RFC until Nim has access to the fields(or the anonymous-types) without patter-matching guards.

@Araq
Copy link
Member

Araq commented Jan 11, 2021

Ironically, I prefer Araq's proposal (#19 (comment)) over mine :D

I prefer @Clyybber 's proposal over mine.

@ErikSchierboom
Copy link

I agree with @petabyteboy:

I think the initial proposal is least confusing, because this is what I was intuitively trying to do when I encountered this issue.

So from the principle of least surprise, I would be in favor of the original proposal. That being said, I'd be fine with Araq's proposal, while I'm less enthusiastic about @Clyybber's proposal.

@Q-Master
Copy link

Same for me, I think that initial proposal is good and intuitive while needs more text to write, but @Araq's proposal is ok too. Multiple case statements as in @Clyybber's one are too much confusing and least readable.

@Bulat-Ziganshin
Copy link

Bulat-Ziganshin commented Mar 10, 2021

I just recalled that Pascal allows to specify common fields BEFORE the case part, which is enough for most use-cases. So our example can be written as:

Entry = ref object
    name: string
    modified: datetime
    attr: uint32
    group: uint32
    case kind
    of File:
        size: uint64
        crc32: uint32
    of Dir:
        files: seq[Entry]

Moreover, Pascal also allows to make nested variable part, thus specifying common part for several variants

Type  
  MyRec = Record  
          X : Longint;  
          Case byte of  
            2 : (Y : Longint;  
                 case byte of  
                 3 : (Z : Longint);  
                 );  
          end;

@planetis-m
Copy link

planetis-m commented Mar 12, 2021

The more I used object variants the more I prefer @krux02 proposal. Eventually it would be nice to have sum types in the language. Note, @mratsim has already made an excellent module that emulates classes/interfaces with this exact concept. Now the only thing missing, is to make a patty fork that works with these proposal.

@Araq
Copy link
Member

Araq commented Apr 15, 2021

#368 is the winning proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests