Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should isequal(0.0, -0.0) == true? #18485

Closed
StefanKarpinski opened this issue Sep 13, 2016 · 41 comments
Closed

should isequal(0.0, -0.0) == true? #18485

StefanKarpinski opened this issue Sep 13, 2016 · 41 comments
Assignees
Labels
breaking This change will break code needs decision A decision on this change is needed
Milestone

Comments

@StefanKarpinski
Copy link
Member

Came up in #9381 but deserves its own issue: it's been proposed that positive and negative zero should hash and compare as equal, but that has an unfortunate interaction with sorting and stability.

@StefanKarpinski StefanKarpinski added needs decision A decision on this change is needed breaking This change will break code labels Sep 13, 2016
@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016
@StefanKarpinski StefanKarpinski self-assigned this Sep 13, 2016
@colbec
Copy link

colbec commented Sep 15, 2016

Does it help or hinder to consider that in fact isequal might be two different ideas? Consider:

my_is_equivalent(2+2,4) is true (post evaluation)
my_is_the_same_as(2+2,4) is false (prior to evaluation)

It is perhaps unfortunate that multiplication propagates the negative for reals: 0.0 * -0.0 evaluates to -0.0; it is ok for integers.

With respect to 0.0 and -0.0 it is a similar problem to that of "is noon AM or PM?" In fact it is neither, however we often label noon pm because 11:59AM,12:00AM,12:01PM makes less sense than 12:00PM in that context.

@JeffreySarnoff
Copy link
Contributor

I like this question.

# v0.5

nan1 = Inf/Inf;  nan1 = -0.0*Inf;  pz = 0.0; nz= -0.0;

nan1==nan2, nan1===nan2, isequal(nan1,nan2)
# (false, true, true)
pz==nz, pz===nz, isequal(pz,nz)
# (true, false, false)

# for a.b =Nan, NaN and for a,b = 0.0, -0.0
# (==)(a,b) = !(===)(a,b)  so  a == b  iff a and b differ

with (0.0 == -0.0) === true,
only NaNs have the property that equalsness is given with nonidentity
and inequalsness is given with identity

That is reason-about-able for entities that are NotNumber (excluded from numerosity).
And well avoided for entities that of numerousness or of absenct numerousness.

I favor that Julia specify isequal( 0.0, -0.0 ) is true.

@eschnett
Copy link
Contributor

@JeffreySarnoff Do you then agree that sorting of numbers doesn't distinguish between positive and negative zero any more, and that hashing returns the same value, and that dictionaries can hold only one key for zero, etc.?

@JeffreySarnoff
Copy link
Contributor

@eschnett no.

branch-cut sanity carries utility and pervades reliability more than does zeroing the zeros

pz = 0.0; nz = -pz;

showcompact( (atan2(nz,nz), atan2(nz, pz), atan2(pz, pz), atan2(pz, nz)) )
# ( -3.14159,-0.0, 0.0, 3.14159 )

object_id( pz ) != object_id( nz ) # now, and my forward preference
with this, object_id dicts hold distinct keys for 0.0 and -0.0

@JeffreySarnoff
Copy link
Contributor

@eschnett One cannot sort handedness nor parity, one may group like handednesses and same parity.

This is not a question:

taitjai

Which color precedes?

@KristofferC
Copy link
Member

White.

@JeffreySarnoff
Copy link
Contributor

The lower White or the left White?

@KristofferC
Copy link
Member

That's a tricky one. Will have to come back to you on that.

@eschnett
Copy link
Contributor

@JeffreySarnoff Did you answer my question?

Let me rephrase things:

  • What should issorted([0.0, -0.0]) return?
  • Should hash(0.0) == hash(-0.0)?
  • Can Dict dictionaries hold different values for keys 0.0 and -0.0?

@JeffreySarnoff
Copy link
Contributor

@eschnett alright

Ought isequal( 0.0, -0.0 ) be true? Yes.

Must object_id( 0.0 ) differ from object_id( -0.0 )? Yes.

Must some Dict dictionaries hold different values for keys 0.0 and -0.0? Yes.
May other Dict dictionaries hold the same values for keys 0.0 and -0.0? Yes.
Should we allow Dict dicts to hold the same values for keys 0.0 and -0.0? Yes.

isequal(x,y) must imply that hash(x) == hash(y). (the docs)

Should hash(0.0) == hash(-0.0)? Yes.

@JeffBezanson
Copy link
Member

Must some Dict dictionaries hold different values for keys 0.0 and -0.0? Yes.

That's fine for ObjectIdDict, but otherwise contradicts isequal(0.0,-0.0) being true. Unless we add a fourth standard equality predicate.

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Sep 17, 2016

@JeffBezanson you intuited my intent. As long as ObjectIdDict is available, the Must some Dict dictionaries statement is satisfied, and happiness suffuses logic.

@eschnett
Copy link
Contributor

Dict is a type in Julia; if you use Dict to create a dictionary, you will not get an ObjectIdDict. Thus my expression "Dict dictionary", as I did not want to discuss ObjectIdDict.

@TotalVerb
Copy link
Contributor

TotalVerb commented Sep 18, 2016

Here's a crazier but more general idea: might it be possible to rename the current Dict type

immutable HashDict{K,V,Hft,Eft} <: Associative{K,V}
    hash::Hft
    isequal::Eft
    ...  # as present
end

then define

typealias Dict{K,V} HashDict{K,V,typeof(hash),typeof(isequal)}

and thus allow users to supply their own hash and equality functions, kind of like how C++ does it?

@vtjnash
Copy link
Member

vtjnash commented Sep 18, 2016

Yes, it's possible and works like you surmised. Here's a complete demo implementation I had toyed with a bit a couple months ago: fce28b4

@StefanKarpinski
Copy link
Member Author

I don't really want to do that – I think it's a case of serious over-parameterization of the type. If you want to use custom hashing for your dict, you should probably call a transformation function on your keys before inserting them; it's much simpler and easier to reason about. It also doesn't address this problem since we still need to provide an isequal function that works well.

@TotalVerb
Copy link
Contributor

One parameter could be removed by keeping only the hash function, and computing the isequal function from it. This would also prevent using a hash function with the incorrect isequal companion function.

@StefanKarpinski
Copy link
Member Author

Ok, but this is totally tangential since however Dict works, we still need to design the way the isequal function works in the first place. Saying "you can use whatever equality predicate you want" isn't solving anything – it's just sweeping the problem under the proverbial rug.

@TotalVerb
Copy link
Contributor

@StefanKarpinski Point understood. The Dict issue is orthogonal to what the best behaviour for isequal should be. I lean towards making 0.0 and -0.0 equal and hash equal — this seems far less surprising mathematically. Does sorting really require isless and isequal to be consistent? I believe many other languages make do with only a single comparison predicate.

@TotalVerb
Copy link
Contributor

Also, if things as different as 0 and 0.0 are going to be considered equal, then 0 and -0.0 ought to be also. Especially since -0.0 is a better "exact zero" than 0.0 is in IEEE floating point arithmetic.

julia> isequal(0, 0.0)
true

julia> isequal(0, -0.0)
false

@JeffBezanson
Copy link
Member

Especially since -0.0 is a better "exact zero" than 0.0 is in IEEE floating point arithmetic

How so?

@TotalVerb
Copy link
Contributor

-0.0 is a better neutral element for summations. We have -0.0 + 0.0 = 0.0.

Perhaps it's an exaggeration to say that -0.0 is a better exact zero, since -1.0 + 1.0 = 0.0. But -0.0 is just as zero as 0.0 is, and shouldn't be !isequal to integer zero.

@oscardssmith
Copy link
Member

Consider that (-0.0)+(-0.0)-(-0.0)==0.0 To me this indicates that -0.0 really should equal 0.0, as otherwise addition that really should be commutative isn't. I would similarly suggest that hash(-0.0)==hash(0.0) and that dicts should treat them the same way.

@JeffBezanson
Copy link
Member

My current thinking is that we should not make this change. Here's a run-down:

  • 0.0 and -0.0 give different results for some well-defined, perfectly valid operations such as 1/x and many complex operations involving branch cuts. So they're meaningfully different.
  • If a certain application wants to equate 0.0 and -0.0 for use as dict keys, it's easy to use x==0 ? abs(x) : x as the key. But if isequal equates the values and you want to distinguish them, that's harder to get.
  • If we're going to keep both == and isequal, we might as well make them consistent with IEEE equals and totalOrder, respectively. Keeping isequal just for NaNs is not a huge win. If the plan involved getting rid of isequal completely, it might be worth a bit of ugliness.

@JeffBezanson
Copy link
Member

this indicates that -0.0 really should equal 0.0, as otherwise addition that really should be commutative isn't

That is arguably handled by ==.

@oscardssmith
Copy link
Member

My point is just that given how many seeming tautologies applied to -0.0 result in 0.0 could cause some really confusing results (although I guess that mainly is because using floats as dict keys is an absolutely horrible idea)

@TotalVerb
Copy link
Contributor

Nevertheless, it is still confusing that isequal(0, 0.0) but not isequal(0, -0.0). If equality must be transitive, then perhaps !isequal(0, 0.0) is the better option.

@JeffBezanson
Copy link
Member

it is still confusing that isequal(0, 0.0) but not isequal(0, -0.0)

Yes, I can kind of see that, but once you accept that isequal distinguishes negative zero it makes sense. 0 certainly isn't a "negative zero", but in fact could be with a signed-magnitude integer type, so this doesn't bother me so much.

@oscardssmith
Copy link
Member

but consider that isequal(-0, -0.0) is false.

@JeffBezanson
Copy link
Member

But -0 === 0; the expression used to compute a value is immaterial.

@JeffreySarnoff
Copy link
Contributor

I like this question.

  • 0.0 and -0.0 give different results for some well-defined, perfectly valid operations such as 1/x and many complex operations involving branch cuts. So they're meaningfully different.

Yes, and that is a [meaningful operational] difference without a [measure theoretic] distinction. That they are different hither and the same yon is meaningfuller.

   -  they differ as ObjectIDentifiable  symbols
    - additively, they are the same
    - multiplicatively, they differ
    - they are the same under linear ordering, 
           (without nonzero measure, parity  is as orientation -- intrinsically unorderable)

If a certain application wants to equate 0.0 and -0.0 for use as dict keys, it's easy to use x==0 ? abs(x) : x as the key. But if isequal equates the values and you want to distinguish them, that's harder to get.

Reading that standing on my head,: the branch cuts and the multiplictive resolution of odd signedness are handled for almost all applications that are not writing a floating point math library. The likelihood is members of the Julia community and a user of a Julia products for their own purposes consider them an incomplete sameness rather than as not completely different. Relying on the coherently consistent branch cuts that are availble with 0.0, -0.0 requires onlty that !(0.0 === -0.0), irrespective of the truth valueisequal( 0.0 , -0.0) yeilds. As one is of object non-identitty and iesequal(0.0, -0.) is is of the inststinguishability of lattice-free measures zero, they need [qua ought] not agree..

If we're going to keep both == and isequal, we might as well make them consistent with IEEE equals and totalOrder, respectively. ...

Doing so assigns truth to each of these expressions

  (0.0 == -0.0),  !( 0.0 === -0.0 ),  isequal(0.0, -0.0)

We are in agreement.

@StefanKarpinski
Copy link
Member Author

StefanKarpinski commented Sep 26, 2016

@TotalVerb: Does sorting really require isless and isequal to be consistent?

Yes, since otherwise if you change from a hash-based dictionary data structure to a tree-based one, you'll get different behavior, which is awful. Not having isless and isequal be consistent would be pretty bad and whether you have another name for it or not, it implicitly introduces yet another ordering.

I believe many other languages make do with only a single comparison predicate.

Lots of languages only have a few names for their orderings. But they usually have a lot of implicit orderings created by the rampant inconsistencies between the different orderings that they actually exhibit. There's a reason that equality is the first place to look for "wat" behavior in languages.

Given @JeffBezanson's arguments and the fact that no clever solutions to the sorting issues raised by isequal(-0.0, 0.0) have been forthcoming, I'm inclined to agree that we shouldn't do this.

@TotalVerb
Copy link
Contributor

That argument makes sense to me.

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Sep 28, 2016

Actually, the sorting issue makes some sense to me as well.

import Base: unbox, slt_int, eq_float, isless, isequal


for (T,I) in [(:Float64, :Int64), 
              (:Float32, :Int32), 
              (:Float16, :Int16)]
    @eval begin
        isequal(a::$T, b::$T) = eq_float(unbox($T,a), unbox($T,b))
        function isless(a::$T, b::$T)
            ia = reinterpret($I, a)
            ib = reinterpret($I, b)
            ifelse( signbit(ia) & signbit(ib), 
                    slt_int(ib,ia), slt_int(ia,ib) )
        end
    end
end


floatvec = [-Inf,-100.0,-1.0,-0.5,-0.0,0.0,0.5,1.0,100.0,Inf];
floatvec == sort(floatvec)  # true
sort([NaN, Inf, -Inf])'     # [-Inf, Inf, NaN] (as now)

issorted([ -0.0, 0.0 ]), issorted([ 0.0, -0.0 ]) # (true, false)

-0.0 === 0.0, 0.0 === -0.0              # (false, false)

-0.0 == 0.0, 0.0 == -0.0                # (true, true)
-0.0 < 0.0,  0.0 < -0.0                 # (false, false)
-0.0 <= 0.0,  0.0 <= -0.0               # (true, true)

isequal(-0.0, 0.0), isequal(0.0, -0.0)  # (true, true)
isless(-0.0, 0.0), isless(0.0, -0.0)    # (true, false)

that's all -- thanks for the active discussion

@stevengj
Copy link
Member

stevengj commented Oct 1, 2016

Seems like this is resolved in favor of preserving the current behavior isequal(+0.0, -0.0) == false, in which case this issue can be closed.

@StefanKarpinski
Copy link
Member Author

This seems resolved, and (for once) conveniently in a way that requires no action.

@JeffreySarnoff
Copy link
Contributor

ok. for my own edification: what was the drawback to the implementation I
posted responding to consistent sorting of negatives before nonnegatives?

On Thu, Oct 27, 2016 at 1:28 PM, Stefan Karpinski [email protected]
wrote:

This seems resolved, and (for once) conveniently in a way that requires no
action.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#18485 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABmqxtvyRIfXyXnkfqa_BQeWj5ljuvsQks5q4N9MgaJpZM4J8CCv
.

@StefanKarpinski
Copy link
Member Author

That's a redefinition of what sorting means, not an implementation of stable sorting.

@StefanKarpinski
Copy link
Member Author

@BioTurboNick
Copy link
Contributor

BioTurboNick commented Nov 6, 2018

For newcomers who may be curious about this issue and finding this discussion, would it be accurate to say that, after applying type promotion rules:

  • isequal(x, y) assesses bit equality
  • x == y assesses arithmetic equality

?

@StefanKarpinski
Copy link
Member Author

Roughly, except that isequal considers all NaNs to be equal even though they can have many different bit patterns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code needs decision A decision on this change is needed
Projects
None yet
Development

No branches or pull requests