Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a function to test sets equality? #25318

Closed
rfourquet opened this issue Dec 29, 2017 · 11 comments
Closed

add a function to test sets equality? #25318

rfourquet opened this issue Dec 29, 2017 · 11 comments
Labels
collections Data structures holding multiple items, e.g. sets

Comments

@rfourquet
Copy link
Member

Currently we can do length(a) == length(b) && issubset(a, b) (which replicates ==(a::Set, b::Set)), but this is not ideal.

Should we add a issetequal function? Also, would we want to have sets containing the same elements be isequal ? If so, we could just widen the == signature for Set to AbstractSet, and similarly for hash (this would be a huge performance regression for hash(::BitSet), but this would not be slower than for Set, and speed is maybe not so important there).

And probably similarly for <, etc. Currently, there is some inconsistency: for example Set(1) ⊆ BitSet(1) yields true, but using is a method error.

Even then, we could still have an issetequal function for arrays and iterators.

I'm not sure how breaking a change this would be for 1.x. I can volunteer to make a PR in the next few days if needed.

@rfourquet rfourquet added the collections Data structures holding multiple items, e.g. sets label Dec 29, 2017
@andyferris
Copy link
Member

andyferris commented Jan 2, 2018

Yes, I've thought the same before. I remember trying to do things like detect whether two AbstractDicts contain the same keys (the keys may be in different iteration orders, may have been collected, etc) and there's no great answer...

@nalimilan
Copy link
Member

Why not use == for this? Is the only reason the fact that hash would have to be made consistent for all sets, which would be slow for BitSet? I think it matters more to use the natural operator == than being able to hash some kinds of sets efficiently.

We could also find a hashing algorithm which works for Set, but can be efficient for the special case of BitSet, just like we just did for arrays and ranges (#16401). This kind of optimization can always been done after 1.0 since it's not breaking as long as the behavior of == doesn't change.

@andyferris
Copy link
Member

I think there's more than just an optimization at play here. You might have vector = collect(keys(dict)) and want to do a set-like comparison with another set, vector, tuple, etc. Since == for indexables generally depends on matching key-value pairs, you may sometimes want a separate issetequal(a, b) = (a ⊆ b) && (b ⊆ a) type of operation (which might happen to behave the same as ==(::AbstractSet, ::AbstractSet) but certainly not ==(::AbstractVector, ::AbstractVector)).

@nalimilan
Copy link
Member

Yes, but that doesn't mean AbstractSet objects shouldn't be considered as equal by == when it makes sense. Looks like we need both.

@rfourquet
Copy link
Member Author

I agree we need both. It would not be obvious for me that we want Set([1]) == BitSet([1]), but this is the direction we take apparently, with the recent [1] == 1:1. issetequal could be 1.x, but ==(::AbstractSet, ::AbstractSet) would be better in 1.0.

@andyferris
Copy link
Member

That sounds like a good plan to me.

In terms of equality of Set and BitSet, I feel it is crucial to generic programming that an interface for AbstractSet is documented and followed. Set and BitSet should just be implementations of the interface with certain performance characteristics.

If this weren’t true, it would be hard to create functions that dispatch on AbstractSet and understand/predict how that generic code might behave. In which case, better not to have AbstractSet at all (just a bunch of unrelated set-ish implementations).

@StefanKarpinski
Copy link
Member

I think that we probably do want Set([1]) == BitSet([1]) since they are different representations of the same set, although the implementation is trickier than one might expect.

@rfourquet
Copy link
Member Author

although the implementation is trickier than one might expect

Do you mean the (efficient) implementation of the == method?

@rfourquet
Copy link
Member Author

rfourquet commented Jan 4, 2018

I realize also that AbstractDict also implements generic hash and == functions, which also confirms we should just go with this.
Also in the docstring of ==: "Should be implemented for all types with a notion of equality, based on the
abstract value that an instance represents" (emphasize mine)

@andyferris andyferris added the triage This should be discussed on a triage call label Jan 9, 2018
@andyferris
Copy link
Member

I suggest adding a generic (and ordered independent) ==(::AbstractSet, ::AbstractSet) to v1.0 so we don't need to make a breaking change to == after that.

@JeffBezanson
Copy link
Member

Fixed by #25368

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

No branches or pull requests

5 participants