Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add issetequal and make hash/== generic for AbstractSet #25368

Merged
merged 2 commits into from
Jan 9, 2018

Conversation

rfourquet
Copy link
Member

This is a fix for #25318. It implements a generic issetequal binary predicate which tests whether two collections have the same elements. Moreover, it defines ==(::AbstractSet, ::AbstractSet) and hash(::AbstractSet) generically so that two sets compare and hash equal. As noted in the issue, this makes hash significantly slower for BitSet (by about 6x, i.e. more or less as "slow" as for Set), but it's probably not a big deal.

In the current state, issetequal, , and are the "primary" functions, which are the ones to be specialized if needed, and ==, <= and < are aliases for those. I chose so as it makes the implementation simpler:

  1. otherwise, issetequal, , and have to be defined both for generic collections and for AbstractSet
  2. the implementation of a particular set type could have to specialize both ==(::MySet, ::AbstractSet) and issetequal(::MySet, itr) depending on whether the other argument is a set (and similarly for the other 2 functions). It seems simpler to only have to specialize one functions in all cases, (which can also share the same implementation whatever the second argument is, AbstractSet or not).

But it's possibly ugly for a set type to specialize issetequal instead of ==.

Note that the generic implementation of issetequal(a, b) currently requires length(a) and length(b) to exist, but this could be lifted in the future.

@rfourquet rfourquet added the collections Data structures holding multiple items, e.g. sets label Jan 3, 2018
@rfourquet
Copy link
Member Author

I will mark this for triage. Technically, only ==(::AbstractSet, ::AbstractSet) and hash(::AbstractSet) would need to get in 1.0. The addition of issetequal can be split off for a 1.x PR if requested, but is seems nice to have. The name issetequal may also be improved.

@rfourquet rfourquet added triage This should be discussed on a triage call breaking This change will break code needs news A NEWS entry is required for this change labels Jan 4, 2018
base/set.jl Outdated
⊇(l, r) = issubset(r, l)
⊉(l::Set, r::Set) = !⊇(l, r)
⊋(l::Set, r::Set) = <(r, l)
const issubset = ⊆
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all other unicode aliases, the ASCII name is the core function name, so this should be reversed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I understand, I just did it so that the printing of and gets more consistent.

true
```
"""
issetequal(l, r) = length(l) == length(r) && l ⊆ r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be the implementation for ==(::AbstractSet, ::AbstractSet); then we don't need issetequal.

@JeffBezanson
Copy link
Member

👍 to the basic set equality and hashing change, and I would vote for the version without issetequal.

@JeffBezanson JeffBezanson added this to the 1.0 milestone Jan 4, 2018
@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Jan 4, 2018
@StefanKarpinski
Copy link
Member

💯% agree with Jeff – this would be good minus the issetequal export. You could even keep the function as an internal implementation detail, as long as we don't export it.

base/set.jl Outdated
# (if needed, only their synonyms issetequal, ⊊ and ⊆ must be specialized)
==(l::AbstractSet, r::AbstractSet) = issetequal(l, r)
<( l::AbstractSet, r::AbstractSet) = l ⊊ r
<=(l::AbstractSet, r::AbstractSet) = l ⊆ r
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the definition is correct. I think it's instead: for sets A and B, A < B if ∀ a . A, ∀ b . B, a < b

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, I changed nothing to the definition of <= for sets. Do you suggest we should open a separate issue to change the meaning of (or delete) these methods?

@rfourquet
Copy link
Member Author

The goal of issetequal was to complement issubset, and all other unicode set-related operators ( etc.), which are useful in a more general way than ==, < and <= when we don't work with sets. Previously, and others were defined only for sets, except was defined for any collection. It seems like a useful change to extend all those operators to any collection. The only missing operation is for "set equality" when we are not working with sets: there is no unicode symbol, hence the issetequal name. But this surely can be added later.

@JeffBezanson
Copy link
Member

OK, that makes sense, but the preferred function to implement should always be ==. I think set types should implement that, and we can have

issetequal(a::AbstractSet, b::AbstractSet) = a == b

(since hopefully we can assume == on sets implements set equality).

Do you have an actual use for issetequal? We try to avoid adding functions just for the heck of it.

@rfourquet
Copy link
Member Author

but the preferred function to implement should always be ==

I agree this seems better, even if there can be a small drawback in some cases (as mentioned in the OP).

since hopefully we can assume == on sets implements set equality

There will be a generic implementation, duplicated from issetequal (if we keep this function).

Do you have an actual use for issetequal? We try to avoid adding functions just for the heck of it.

I don't remember if I needed it, but I find it strange to have all these set-related functions available (like issubset etc. but also intersect etc.) except for maybe the most natural operation you could do we sets, i.e. set equality. But I'm not convinced by the name issetequal; we could as well not export it and see if a better name or unicode character comes for 1.x. But it also won't be a big deal to deprecate issetequal in 2.0 if a better alternative is found in 1.x. I will leave that to your appreciation.

I updated the PR according to your review and added NEWS.

@rfourquet rfourquet removed the needs news A NEWS entry is required for this change label Jan 7, 2018
@JeffBezanson
Copy link
Member

Thanks for making those changes; looks very good now. Might need to be rebased around the logging issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants