Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ArrayHashingStyle trait for O(1) range hashing and use it only for Numbers #25822

Closed
wants to merge 1 commit into from

Conversation

nalimilan
Copy link
Member

O(1) range hashing requires widen to avoid overflows when hashing heterogeneous arrays. Since this requirement can be problematic, add an ArrayHashingStyle trait to opt-in for this behavior, and use it only for Numbers by default. This assumes that isequal(x::Number, y::T) will only return true when T<:Number, or at least that counter-examples are rare enough that it's OK to requires these types to define the trait: else, hash(::AbstractArray{T}) will be inconsistent with isequal across types.

Do not use O(1) hashing for Dates types, which fixes hashing heterogeneous arrays, and add corresponding tests.

See previous discussion. The new trait will probably not be used very often by external packages, but it sounds cleaner to define it than to special-case Number directly in the function. Again, it's annoying that this design is so complex, but at least this limits the burden of implementing widen to Number types, rather than requiring all types which support - to implement it.

…r Numbers

O(1) range hashing requires widen() to avoid overflows when hashing heterogeneous
arrays. Since this requirement can be problematic, add an ArrayHashingStyle trait
to opt-in for this behavior, and use it only for Numbers by default. This assumes
that isequal(x::Number, y::T) will only return true when T<:Number, or at least
that counter-examples are rare enough that it's OK to requires these types to define
the trait: else, hash(::AbstractArray{T}) will be inconsistent with isequal across types.

Do not use O(1) hashing for Dates types, which fixes hashing heterogeneous arrays,
and add corresponding tests.
@nalimilan nalimilan added the arrays [a, r, r, a, y, s] label Jan 30, 2018
@JeffBezanson
Copy link
Member

I don't think I agree that it's cleaner to define a trait equivalent to Number than to just use Number. Removing the applicable calls is nice but I'm hoping for more simplification here.

@nalimilan
Copy link
Member Author

I don't think I agree that it's cleaner to define a trait equivalent to Number than to just use Number. Removing the applicable calls is nice but I'm hoping for more simplification here.

If you prefer I can transform the trait into a purely internal function defined and used only in abstractarray.jl. The problem with hardcoding Number inside the function is that if a non-Number type wants to be isequal to numbers in some cases, its hashing method will be incorrect, so it would be useful to allow overriding the default behavior for these exceptional cases (even if we don't make it part of the public API).

mbauman added a commit that referenced this pull request Feb 13, 2018
This is a straw-man implementation of a simpler array hashing scheme.  It's very basic and has room for further optimizations, but it's intention is to provide a starting point as an alternative to #25822.  In short: This hashes the size of the array and then the first three distinct elements and their linear indices and then the last three distinct elements and their linear distance from the end of the array.

The exact scheme here is open to bikeshedding -- we could use more or less distinct elements.  I use "distinct elements" as a way of ensuring that all relatively empty sparse arrays don't hash similarly, and I hash elements at both the beginning and end of the array because they're the simplest to get to and final elements will be the "most expensive" to discover that they differ if we have to fall back to an equality check.
@nalimilan nalimilan closed this Sep 29, 2018
@nalimilan nalimilan deleted the nl/arrayhashingstyle branch September 29, 2018 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] hashing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants