Array hashing breaks equality on 0.7 #26034

chethega · 2018-02-13T17:01:50Z

struct totally_not_five end

Base.isequal(::totally_not_five, x)=isequal(5,x);
Base.isequal(x, ::totally_not_five)=isequal(5,x);
Base.isequal(::totally_not_five, ::totally_not_five)=true;
Base.hash(::totally_not_five, h::UInt64)=hash(5, h);
import Base.==
==(::totally_not_five, x)= (5==x);
==(x,::totally_not_five)= (5==x);
==(::totally_not_five,::totally_not_five)=true;

n5=totally_not_five();

############
versioninfo()
#Julia Version 0.7.0-DEV.3961
#Commit 12964e839e* (2018-02-13 10:42 UTC)

5== n5 #true
isequal([4,n5,6], [4,5,6]) #true 
isequal(hash([4,n5,6]), hash([4,5,6])) # false
isequal(hash([n5,4,n5,6]), hash([n5,4,5,6])) #true 

############ 
versioninfo()
#Julia Version 0.6.2
#Commit d386e40 (2017-12-13 18:08 UTC)

5== n5 #true
isequal([4,n5,6], [4,5,6]) #true 
isequal(hash([4,n5,6]), hash([4,5,6])) # true
isequal(hash([n5,4,n5,6]), hash([n5,4,5,6])) #true

I fear that this is fundamental to the current approach for O(1) range hashing (but I would be happy to be corrected!).

The text was updated successfully, but these errors were encountered:

mbauman · 2018-02-13T17:46:43Z

So you're defining a type that is equal to a number and hashes like a number... but it isn't itself a number and doesn't support subtraction? Yeah, I can see how that'd get you into trouble in many situations. I don't think this is really actionable… or that we'd ever guarantee such a type would work everywhere.

nalimilan · 2018-02-13T17:53:50Z

Indeed, since #16401 a type should not be equal to a number if it doesn't support -. Honestly, that's not the biggest difficulty with that approach. :-)

chethega · 2018-02-13T18:45:10Z

Oh, thanks. I apparently missed the warning that if there exists y isa Number such that isequal(x, y)===true then it is assumed that x isa Number; and, for all types <:Number, widen(x) + widen(y) must be well-defined mod isequal, i.e. isequal(x,y) and isequal(x2,y2) must imply isequal(widen(x1)+widen(y1), widen(x2)+widen(y2)) (resp. for subtraction).

That's the long-term plan for a simple rule, right? I mean addition/subtraction are intentionally not well-defined mod isequal, but the code assumes that it becomes well-defined post widening; and it would be cheap to restrict the special code to things that isa Number, in order to get rid of possible bugs like hash([1,2,[]])==MethodError.

nalimilan · 2018-02-13T19:21:43Z

It's not clear yet what approach will be retained. See #26022.

Fixes #26034

Goal: Hash approximately log(N) entries with a higher density of hashed elements weighted towards the end and special consideration for repeated values. Colliding hashes will often subsequently be compared by equality -- and equality between arrays works elementwise forwards and is short-circuiting. This means that a collision between arrays that differ by elements at the beginning is cheaper than one where the difference is towards the end. Furthermore, blindly choosing log(N) entries from a sparse array will likely only choose the same element repeatedly (zero in this case). To achieve this, we work backwards, starting by hashing the last element of the array. After hashing each element, we skip the next `fibskip` elements, where `fibskip` is pulled from the Fibonacci sequence -- Fibonacci was chosen as a simple ~O(log(N)) algorithm that ensures we don't hit a common divisor of a dimension and only end up hashing one slice of the array (as might happen with powers of two). Finally, we find the next distinct value from the one we just hashed. Fixes #27865 and fixes #26011. Fixes #26034

JeffBezanson added regression Regression in behavior compared to a previous version arrays [a, r, r, a, y, s] labels Feb 14, 2018

martinholters mentioned this issue Feb 16, 2018

RFC: Simpler array hashing #26022

Merged

nalimilan added the hashing label Feb 18, 2018

mbauman added a commit that referenced this issue Jul 26, 2018

Also add test for #26034

3d62fba

Fixes #26034

JeffBezanson closed this as completed in b0bf91e Aug 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Array hashing breaks equality on 0.7 #26034

Array hashing breaks equality on 0.7 #26034

chethega commented Feb 13, 2018

mbauman commented Feb 13, 2018

nalimilan commented Feb 13, 2018

chethega commented Feb 13, 2018

nalimilan commented Feb 13, 2018

Array hashing breaks equality on 0.7 #26034

Array hashing breaks equality on 0.7 #26034

Comments

chethega commented Feb 13, 2018

mbauman commented Feb 13, 2018

nalimilan commented Feb 13, 2018

chethega commented Feb 13, 2018

nalimilan commented Feb 13, 2018