Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sets indexable #24508

Closed
wants to merge 1 commit into from
Closed

Make sets indexable #24508

wants to merge 1 commit into from

Conversation

andyferris
Copy link
Member

@andyferris andyferris commented Nov 7, 2017

This PR makes all AbstractSets indexable. They are containers with equal values and keys, and they iterate values (this hasn't changed).

set[i] == i  # if i in set
key(set) == set

The goal here is to enable a certain semantic with keys and indexing common across arrays, associatives and sets. For example, with this, we can have keys(dict) be an AbstractSet. Like vectors which have keys(vec) = OneTo(length(vec)), in the future we can have the idempotent indexing of keys of dictionaries as we do for vectors (and offset vectors):

ks = keys(dict)
ks[i] == i
keys(ks) == ks

Part of #24019.

(I also updated similar to empty but this probably belongs in #24390)

base/set.jl Outdated
@@ -64,7 +78,7 @@ rehash!(s::Set) = (rehash!(s.dict); s)
start(s::Set) = start(s.dict)
done(s::Set, state) = done(s.dict, state)
# NOTE: manually optimized to take advantage of Dict representation
next(s::Set, i) = (s.dict.keys[i], skip_deleted(s.dict, i+1))
@propagate_inbounds next(s::Set, i) = (s.dict.keys[i], skip_deleted(s.dict, i+1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove additional spaces.

base/set.jl Outdated
end
return x
end
eltype(set::AbstractSet) = eltype(typeof(set))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT that definition is already present in array.jl.

@@ -23,9 +39,7 @@ function Set(g::Generator)
return Set{T}(g)
end

eltype(::Type{Set{T}}) where {T} = T
similar(s::Set{T}) where {T} = Set{T}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related to this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's now just a clean-up

@stevengj
Copy link
Member

stevengj commented Nov 7, 2017

I'm still confused by this. What kind of code would you write that uses set[i] == i?

@ararslan ararslan added the collections Data structures holding multiple items, e.g. sets label Nov 7, 2017
@andyferris
Copy link
Member Author

I'm still confused by this. What kind of code would you write that uses set[i] == i?

A very good question! :) Sorry, I didn't really explain this in the OP, but it has a lot more to do with indexing of Dict and Associative then it does indexing sets directly.

In #24019 (and other places) the idea of making the array and associative indexing interface more similar is discussed. This PR is a culmination of a few facts:

  • The keys of a dictionary may as well be an AbstractSet, since they are guaranteed to be unique and testing containment is generally fast (since dictionary lookup is generally fast).
  • In Julep: Generalize indexing with and by Associatives #24019 I suggest allowing getting multiple indices from dictionaries just like we do arrays. The idea would be out = a[inds] would results in a container out which (a) shares keys with a (keys(out) == keys(inds)) and (b) satisfies out[i] == a[inds[i]]. (Note, it's been pointed out we may need a different function/syntax such as a.[out] for dictionaries since there a single key can actually be a container, which makes an ambiguity. I've started to call this hypothetical function getindices in my head).
  • It seems to me that dict, dict[:] and dict[keys(dict)] should probably be basically the same container. This happens to work perfectly with the above whenever keys returns an indexable container k where k[i] == i (e.g. Base.OneTo has this property for the keys of a vector).

So this PR is a tiny piece of work (which I had enough time to complete) which facilitates some harmony between these three points. The keys of a vector is OneTo(n) (which just like Set in this PR) is probably not a container that is most useful because we like to write OneTo(n)[i] == i, but it is super convenient and natural that it is an AbstractArray.

To lay my plan for the requisite breaking changes for v1.0, I hope to follow up by making KeyIterator into KeySet <: AbstractSet, making a.[inds] parsable (not not lowerable) and, finally, make dictionaries iterate values instead of key-value pairs. I'm also considering making pairs return an indexable container, and maybe AbstractCartesianIndex{N} <: AbstractArray{..., N}, because these play super nicely with the rest (especially w.r.t. map). After all of that, the actual getindices work can begin in packages and potentially fall in place in a non-breaking way in v1.x.

@andyferris andyferris force-pushed the ajf/sets branch 2 times, most recently from 0da600d to ae27e99 Compare November 12, 2017 12:56
 * All `AbstractSet`s can be indexed. They are containers with equal
   values and keys, much like `Base.OneTo(n)`. This will enable a
   certain semantic with `keys` and indexing common across arrays,
   associatives and sets.
@andyferris andyferris changed the title WIP: Make sets indexable Make sets indexable Nov 12, 2017
@stevengj
Copy link
Member

I still don’t understand why your roadmap requires that sets be indexable. It seems sufficient that they are iterable.

I’m skeptical of dict[keys] because it seems fundamentally broken for Dict{Any}, but this is a separate issue.

@JeffBezanson
Copy link
Member

I, too, am not sure sets should be considered to have indices. If sets are index-less, they can more easily be used as informal relations (sets of pairs). Otherwise you'd get e.g.

s = Set([1=>2, 2=>3])
for p in pairs(s)
    # p is (1=>2) => (1=>2)
end

I'm not sure what if anything pairs(::Set) should do, but this doesn't seem helpful.

In the same vein, I'm realizing arrays have a semantic ambiguity similar to that of dictionaries. Although they have indices, they are often used as kind of "ordered multi-sets", where they're just for passing some ordered values around and the indices don't matter. This is evinced by set-like functions on vectors like setdiff and uses like Dict([a=>b, c=>d]) (we don't strictly need to be able to construct dicts from arrays of pairs, but it's kind of nice that it works). I suppose this is one difference between "arrays" and "lists" --- arrays have indices but lists don't.

To fix this, we might want to add a List type, or do something drastic like considering Vector{<:Pair} to be index-less.

@andyferris
Copy link
Member Author

The issues here clearly need more thought, and the grand indexing scheme mentioned in the OP also clearly won't make v1.0, so I'll close this PR. We'll see how this plays out when prototyping a getindices scheme.

@andyferris
Copy link
Member Author

OK, in the spirit of "show, don't tell" (I'll admit that this PR came across as a little abstract!), I created a package called Dictionaries.jl, which I recently released. Here you can see a system where the keys of a dictionary is a dictionary (just like the keys of an array is an array) - and it seems that this property really underlies a lot of how you can work with dictionaries and how you can make that performant.

@DilumAluthge DilumAluthge deleted the ajf/sets branch March 25, 2021 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants