Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate == for refs and procs #224

Closed
Araq opened this issue May 15, 2020 · 20 comments
Closed

Deprecate == for refs and procs #224

Araq opened this issue May 15, 2020 · 20 comments

Comments

@Araq
Copy link
Member

Araq commented May 15, 2020

Equality is hard and sometimes there is not one kind of equality, but multiple. The Nim compiler is a good example of this, whenever you write a == b where a, b: PNode it's likely a bug. Instead some tree structure comparsion should be used.

Motivation

Problem: .error doesn't work that well

The default ref equality is error prone and even .error doesn't work all that well:

# module A

proc `==`(a, b: PNode): bool {.error: "use sameTrees or similar instead".}
# module B

echo (n == m) # problem: B didn't import A and so there is no error here

Problem: == for refs is inconsistent with hash

Because the standard library knows that equality is hard, no hash for ref T is provided in hashes.nim, that's good but it's inconsistent with ==.

Problem: Function merging is less effective

Generic instantiations can often lead to the same binary code (for example fac[int] and fac[BiggestInt] can be identical) but the optimizer cannot blindly merge the identical functions if the language spec requires that f != g for proc f; proc g. At the same time, hardly any valid usages for function equality are known; it suffices if isNil(f) works.

Proposal: Deprecate == for ref and procs

Instead of opt-out you have to opt-in in order to get reference equality for refs. There is a new operation, system.equalRefs for the pointer-like reference equality. Then it's particularly easy to
implement a custom == operator:

proc `==`(a, b: PNode): bool = equalRefs(a, b) 

Alternatives considered

Conceptually equality is tied to a type much like assignments and destructors. If we bind equality to the type internally, the .error solution starts to work too. But then the same applies to hash and $ and we need to support type bound operations properly. So it would add a new concept to Nim or at least expand an existing concept. Removing a very questionable default seems to be the better solution. (But I personally want type bound operations too...)

@RSDuck
Copy link

RSDuck commented May 15, 2020

I agree that there has to be done something about this.

Adding new operators for different kinds of equality is common practice in scripting languages (though mostly as a fixup for bad language design think of === 😛).

Maybe new operators for value comparison could be added? Another idea would be for example for hash tables, to keep the default behaviour (hash and compare the pointer not the value it stands for), but add some kind of wrapper type, like this:

type CompareDeref[T: ref] = object
  reference: T

converter getRef[T](refVal: CompareDeref[T]) = refVal.reference

proc `==`[T](a, b: CompareDeref[T]) =
  a[] == b[]
...

These are just some ideas I have on this topic. Though the utility of ref values as keys is debatable considering the dangers (making the datastructure invalid by changing the value externally...).

@Araq Araq changed the title Deprecate == for refs and closures Deprecate == for refs and procs May 16, 2020
@disruptek
Copy link

Two ref variables that point to the same region of memory are not merely equal, they are identical. Why can't we just have is?

@Araq
Copy link
Member Author

Araq commented May 17, 2020

is is already used for when x is SomeType but yes, we can overload is to make it work.

@RSDuck
Copy link

RSDuck commented May 17, 2020

I think that wouldn't be a very good idea, since it breaks consistency.

(in an alternative reality where programming languages were german, das gleiche und das selbe would solve this problem quite well)

@Skrylar
Copy link

Skrylar commented May 21, 2020

  • I don't really like it, since if i say two stack instances of an object are == then it means they (by default) have identical contents. If two refs point to the same object then the ref has identical contents (as in addresses.) It's probably not what was meant but it is consistent.

  • Maybe an === to represent bit-equality and == to mean semantic equality. Semantic equality on a ref would then just be the same as a[] == b[] since that is most likely what someone means anyway.

@planetis-m
Copy link

I usually overload == to this:

proc `==`(self, other:MyRefT): bool =
   if self.isNil: other.isNil
   elif other.isNil: false
   else: self[] == other[]

not everyone may want this, so deprecating is fine.

@timotheecour
Copy link
Member

timotheecour commented May 21, 2020

Problem: .error doesn't work that well

  • that has everything to do with lack of custom type bound operations, == for ref types is just a consequence of it; eg: you can make exact same argument for proc $*(info: TLineInfo): string {.error.} = discard + others

Type bound operations are already supported for =destroy and = so it wouldn't really be a new language feature, more of a generalization of an existing language feature. That's the right approach and we should do that instead (but definitely with an RFC before; eg there needs to be a way to optin/optout)

  • your proposal would imply that t == nil (or t != nil) doesn't work anymore; breaking lots of valid code

  • why allow == for ptr types but not ref types?

Semantic equality on a ref would then just be the same as a[] == b[] since that is most likely what someone means anyway.

that would be a large breaking change; == for ref types is (unless overridden) not the same as a[] == b[] and is also not what I mean when I write a == b.

they are identical. Why can't we just have is?
is is already used for when x is SomeType but yes, we can overload is to make it work.

is should keep its meaning. is is not a symmetrical relationship (eg int is SomeInt is true but SomeInt is int is false), whereas == is symmetrical, for all practical purposes.

Furthermore, that'd be dangerous because of converters, or concepts (which interchange type(x) and x) or a recent change which allows foo.x is bar to mean type(foo.x) is bar; even if it would cause no actual ambiguity (which remains to be proven), it would likely cause unintended ambiguities. To paraphrase your quote, "when all types are compatible, typechecking can't help you find bugs"

Problem: Function merging is less effective

that RFC is not needed for that; see instead #229 which instead deprecates implicit conversion procvar => pointer and ptr T => pointer, which as a side effect will disallow == for incompatible procvars or ptr T, but keep == for compatible procvars, ptr T, and ref T

hardly any valid usages for function equality are known

== nil may be more common for procs but I can't agree there are no valid usages, in particular for callback heavy code, debugging, intercepting etc; also cast[int](myproc) doesn't (yet) work in VM so can't be used as a workaround; all we need is #229

proposal

there are many differences with semantic equality, as shown below:

# in std/bitequals.nim and re-exported from std/bitops
when true:
  proc bitEqualsUntyped[A, B](a: A, b: B): bool =
    const na = A.sizeof
    const nb = B.sizeof
    return na == nb and equalMem(a.unsafeAddr, b.unsafeAddr, na)

  template bitEquals[T](a, b: T): bool = bitEqualsUntyped(a, b)
  
  doAssert 0.0 == -0.0
  doAssert not bitEquals(0.0, -0.0)
  doAssert bitEquals(-0.0, -0.0)
  doAssert bitEqualsUntyped(-0.0, cast[uint64](-0.0))

  var a = 0/0 # nan
  doAssert a != a
  doAssert bitEquals(a, a)

  block:
    var a: cstring = "abc"
    var b1 = "abc"
    var b = b1.cstring
    doAssert a == b
    doAssert not bitEquals(a, b)

(see also nim-lang/Nim#8436 for additional motivation + use cases for bitEquals, including for optimizing in several places)

@Araq
Copy link
Member Author

Araq commented May 21, 2020

your proposal would imply that t == nil (or t != nil) doesn't work anymore; breaking lots of valid code

Correct, t.isNil has to be used instead.

why allow == for ptr types but not ref types?

I consider ptr T low level but you can argue either way and for consistency I wouldn't mind to treat ptr the same as ref.

@RSDuck
Copy link

RSDuck commented May 21, 2020

there's another situation where the current == semantic is useful. Say you have some kind of collection and you want to find the entry of a ref.

@mratsim
Copy link
Collaborator

mratsim commented May 21, 2020

  • I don't really like it, since if i say two stack instances of an object are == then it means they (by default) have identical contents. If two refs point to the same object then the ref has identical contents (as in addresses.) It's probably not what was meant but it is consistent.

  • Maybe an === to represent bit-equality and == to mean semantic equality. Semantic equality on a ref would then just be the same as a[] == b[] since that is most likely what someone means anyway.

I am repeating @Skrylar but:

  • a value object is a value, == compares the value.
  • a ref object is just an address, == compares the address, and like every other languages, this means comparing the instance.

What is wrong with that?

I think the change is bad for 2 reasons:

  1. It breaks code
  2. When I opt for reference semantics, it's usually because I want reference semantics, i.e. no copy but sharing instances and comparing instances.

Furthermore a[] == b[] can be used for content comparison.

The issue is not that == is error prone but that reference semantics are tricky.

The solution is to improve knowledge, i.e. what is the difference between a value and a reference
not have ref object be half reference (on copy) and half value (on equality).

@Araq
Copy link
Member Author

Araq commented May 21, 2020

When I opt for reference semantics, it's usually because I want reference semantics, i.e. no copy but sharing instances and comparing instances.

There is little connection between "I want no copy" and "I want to compare identities and the machine address is fine as the identity". You can opt into comparisons but why should there be an error-prone default that is inconsistent with hash?

@mratsim
Copy link
Collaborator

mratsim commented May 21, 2020

Now you said the forbidden consistent word ;).

People can have their custom hash function for their types.

I don't know of any other language with types with reference semantics where ==:

  • is not defined
  • does not compare instances instead of content.

Also refs are often used for unique resources and you don't hash a socket or a database connection, you would need to write a custom hash anyway.

There are good reasons to depart from status quo in language design but for this specific case, consistency between hash and comparison doesn't seem enough for me.

If anything it's even worse than not being able to print $ref by default in the past as we had. Now to compare instances we would need to create a cast[pointer](a) = cast[pointer](b) even though we want to strongly discourage casts.

@Araq
Copy link
Member Author

Araq commented May 22, 2020

I don't know of any other language with types with reference semantics where ==:

Well that's why Nim has it but copying bad design is not good design.

Also refs are often used for unique resources and you don't hash a socket or a database connection, you would need to write a custom hash anyway.

And again, I see if socket == nil in ordinary code, but never if socket == otherSocket.

Now to compare instances we would need to create a castpointer = castpointer even though we want to strongly discourage casts.

No and how to do it is covered in my RFC.

@timotheecour
Copy link
Member

timotheecour commented May 22, 2020

I don't see the inconsistency with hash

  • by default, == uses address (and hashes.nim should be updated IMO to allo hash(ref|ptr) as shown below using address, which is consistent with how it already works for proc pointer)
  • custom code can customize == and hash for a given type (yes, this loops back to the need for type bound procs, to make code more robust to missing imports)

and btw that's what other languages do too (eg D, C++); custom == goes hand in hand with custom hash; all that's needed wrt to this point is clarify in docs that hash should be consistent with == so that if you overload == you should overload hash, unless you know what you're doing.

eg: https://dlang.org/spec/hash-map.html

If toHash must consistently be the same value when opEquals returns true. In other words, two objects that are considered equal should always have the same hash value. Otherwise, undefined behavior will result.

(ditto with java equasl vs hashCode, or C++)

import tables, hashes
proc hash(a: ref|ptr|pointer): Hash = cast[Hash](a)

type Foo = ref object
  id: int

type Bar = ref object
  id: int

proc `==`(a, b: Bar): bool = a.id == b.id
proc hash(a: Bar): Hash = a.id

block: # default case: identity = address
  var t: Table[Foo, int]
  let a = Foo(id: 3)
  let b = Foo(id: 3)
  t[a] = a.id
  doAssert a in t
  doAssert b notin t
block: # special case: identity = id
  var t: Table[Bar, int]
  let a = Bar(id: 3)
  let b = Bar(id: 3)
  t[a] = a.id
  doAssert a in t
  doAssert b in t

And again, I see if socket == nil in ordinary code, but never if socket == otherSocket.

i see node1 == node2 all the time with code involving graphs, loopy structures, compilers etc; types requiring == via some other mean (eg same .id field, like PIdent in compiler) are handled via custom procs + in future, type bound procs.

note

  • D also uses == based on address for classes, (vs content for struct); that's not something ppl complain about nor find confusing in D
  • ditto for java, to the extent it makes sense talking about addresses

summary

we'd just have:

  • === (aka bitEquals) for bit equality via memcmp (see above for typical cases where it differs from ==); no special case or overloading here.
    in particular, for ptr-like types (ref|ptr|pointer|cstring|proc of compatible types) it uses address comparison
  • == for semantic equality; it defaults to === comparison except for cstring where it uses strcmp; can be typically overloaded in which case overloading hash is recommended

@mratsim
Copy link
Collaborator

mratsim commented May 22, 2020

@timotheecour I agree with your points, though I don't see the need for ===, let's leave that as an open symbol for libraries. bitEquals can be implemented in bitops as a full name.

@Araq
Copy link
Member Author

Araq commented May 22, 2020

I consider this RFC as rejected and I'm waiting for more type-bound operators as the alternative solution.

@treeform
Copy link

I would not mind == doing a deep tree equality, same with the hashes module doing a deep tree hash. It turns language gotches into performance gotches. It pushes the complexity of writing code into complexities of profiling.

I live in the profiler anyways and would stop performance issues quickly, people that don't don't live in the profile don't care as much about performance. I think its a win-win for both groups.

@Araq
Copy link
Member Author

Araq commented Mar 23, 2023

ref object vs object is commonly used as a performance hack but these have very different semantics and ref uses address/pointer semantics and IMO should continue to do so. In an ideal world we would maybe have object {.asref.} instead to get the performance of ref object for passing it around while keeping the value smemantics, I don't know.

@al6x
Copy link

al6x commented Mar 23, 2023

Wouldn't the ideal case be when the compiler could detect that the copied value weren't changed so instead of copy it pass it as reference?

@Araq
Copy link
Member Author

Araq commented Mar 23, 2023

Yeah, yeah, yeah...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants