Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make keys of a dictionary a set #24580

Merged
merged 1 commit into from
Dec 2, 2017
Merged

Make keys of a dictionary a set #24580

merged 1 commit into from
Dec 2, 2017

Conversation

andyferris
Copy link
Member

@andyferris andyferris commented Nov 12, 2017

The keys of an Associative are unique, and generally support fast in because dictionaries generally support fast lookup. They are ideal candidates for being an AbstractSet.

This PR renames KeyIterator to KeySet and makes it a subtype of AbstractSet. The code changes are pretty much mechanical.

This also plays into #24019 by building on #24508 (on which this PR is currently based). Briefly, in the future this may be useful for a multiple-index getindex (or related) method for Associative that behaves consistently with those for AbstractArray.

@kmsquire
Copy link
Member

I generally like the idea, but I’m wondering if you wouldn’t mind checking out the different kinds of dictionaries in DataStructures.jl, and see if this change makes sense for those as well?

base/set.jl Outdated

# AbstractSets are idempotent under indexing:
keys(set::AbstractSet) = set
@inline function getindex(set::AbstractSet, x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change in this PR? Can't we have a PR that just makes KeyIterator into KeySet without adding getindex?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agreed, I decided to decouple this but haven't got to it just yet

base/set.jl Outdated
end
return x
end
eltype(set::Type{<:AbstractSet{T}}) where {T} = T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isdefined(T) ? T : Any

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, how can T be undefined here?

@ararslan ararslan added the collections Data structures holding multiple items, e.g. sets label Nov 14, 2017
@andyferris andyferris force-pushed the ajf/keyset branch 4 times, most recently from 417c3e8 to 4471d08 Compare November 18, 2017 05:01
@andyferris
Copy link
Member Author

OK... I've fixed this up a bit, removed #24508, and added news.

@kmsquire there's nothing I could see as a problem. Slightly orthogonal issue - but I had assumed that both the AbstractSet and Associative interfaces assert that the container is iterable, which doesn't seem true of DefaultDict. (I.e. I am proposing that DefaultDict shouldn't inherit from Associative).

dict::T
end
KeySet(dict::Associative) = KeySet{keytype(dict), typeof(dict)}(dict)

struct ValueIterator{T<:Associative}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this to be changed to ValueSet at the same time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike keys, values don’t have to be unique, so they aren’t really a set.

@kmsquire
Copy link
Member

I had assumed that both the AbstractSet and Associative interfaces assert that the container is iterable, which doesn't seem true of DefaultDict. (I.e. I am proposing that DefaultDict shouldn't inherit from Associative).

Hmmm... that DefaultDict isn't iterable is certainly an oversight (although one that doesn't seem to have been missed yet). That can be fixed. Not having DefaultDict inherit from Associative wouldn't be ideal--it should be a drop-in for Dict in most cases (including passing to functions which accept an Associative).

@kmsquire
Copy link
Member

... that DefaultDict isn't iterable is certainly an oversight ...

Well, start, next, and done should be implemented for DefaultDicts, so I'm not sure why it wouldn't be iterable. But I'll double check that code when I can.

@andyferris
Copy link
Member Author

@kmsquire I am predisposed to a very tight coupling between indexing in iteration in both AbstractArray and Associative. That is, there should be a one-to-one relationship between the elements coming out of iteration and the elements that can be indexed. For example, collect(dict) and collect(keys(dict)) should have the same length and keys(dict) should include all valid keys (those that don’t throw errors upon indexing, up to converts and so-on).

We don’t support infinite arrays (even sparse ones) that can’t be iterated to completion (eg I is not an AbstractArray), and I have been assuming all along that Associative should be the same.

I do think there is a place for infinite sets and indexable containers, and maybe the type tree needs tweaking here. For users I think it is important that the interface of abstract types is well-specified and IMO the most useful semantic is that arrays and associatives are extensions of the iteration API that let you get to the iterated elements quickly via indexing. I think this is important because generic code may use a mixture of for loops, map, and so-on and assume these approaches are the same. (If a type supported infinite collections like DefaultDict then maybe you’d want to use map but not for loops but a generic Associative user would not know that).

`keys(::Associative)` now returns an `AbstractSet`.
@kmsquire
Copy link
Member

kmsquire commented Nov 19, 2017

@andyferris, thanks, I think I see now what you were referring to when you said that it wasn't iterable. Are you assuming that essentially every possible key has a value in a DefaultDict?

That isn't how I was thinking of the DefaultDict type, and I don't think it's used that way in practice. The uses that I'm aware of simply assume that the default value is used to initialize a particular key when it's accessed, and removes the requirement of checking if the key exists and initializing it if it doesn't exist. Other than that, a DefaultDict is just a regular Dict.

This still doesn't quite fit your requirement that "keys(dict) should include all valid keys (those that don’t throw errors upon indexing...)," since, well, the point of a DefaultDict is to not have any key throw an error upon indexing. But the only "valid" keys, then, are the ones with values assigned to them.... I guess I'm missing the point of the requirement that the keys don't throw errors upon indexing. Can you explain (or point to somewhere with an explanation)?

@andyferris
Copy link
Member Author

Yes, I certainly see the practicality of DefaultDict - it would be an overhead to force the user to specify the other "valid" keys (upon construction, say). Sparse arrays get it easier because there's not much overhead in specifying the size of the array.

My point was more about the generic Associative interface. When I write a function like f(dict::Associative) I would tend to make certain assumptions about dict as a container - for example that there isn't much semantic difference between say writing a for loop to map elements or to perform a reduction, and using the map or reduce/mapreduce functions. My for loop mapping couldn't map the default value while map arguably should. And for reductions - would reduce even make sense for DefaultDict? Presumably as a container it is supposed to "contain" the default elements on the set of keys where it makes sense. (reduce could make perfect sense if the other valid keys were known and specified, just like for sparse arrays). My mental model of an Associative is that it works a bit like a 1D array but the keys are more flexible - but that's just my mental model so maybe others have a different opinion on the matter.

@andyferris
Copy link
Member Author

andyferris commented Nov 19, 2017

Wait - I think I misunderstood. So

the default value is used to initialize a particular key when it's accessed

means that getindex can create a new key-value pair and insert it into the dictionary. Which means it's a bit different to simply pretending that it existed beforehand, so reduce would make sense :) Still, it's a bit odd that getindex is not a pure function (I would have assumed that was a promise of the getindex(::Associative, key) interface but there you go).

@kmsquire
Copy link
Member

kmsquire commented Nov 19, 2017

Still, it's a bit odd that getindex is not a pure function (I would have assumed that was a promise of the getindex(::Associative, key) interface but there you go).

Yes, this is true--if there is such a promise, this Dict type violates it.

I appreciate the rigor that you're applying here, trying to unify Associative and Array concepts. It might be that some types in DataStructures.jl will have to change in response (e.g., Accumulators and PriorityQueues currently subclass Associative as well).

That said, I'm hoping we don't throw out the baby with the bath water. A DefaultDict is a convenient dictionary type, and I think it would be a shame to restrict the interface such that DefaultDict can no longer subclass Associative and isn't a (mostly) drop-in replacement for a Dict.

@mauro3
Copy link
Contributor

mauro3 commented Nov 19, 2017

What about OrderedDict? I'd expect keys(od) to return ordered keys. Could it return an ordered set?

@kmsquire
Copy link
Member

What about OrderedDict? I'd expect keys(od) to return ordered keys. Could it return an ordered set?

Presumably, the answer should be yes, and similarly, for a SortedDict, keys(sd) should return a SortedSet.

@andyferris
Copy link
Member Author

Yes, agreed that we shouldn’t get rid of useful things.

Ordered and sorted sets seem quite doable to me (I’d guess keys(ordereddict) already follows the correct order anyway? The change to KeySet might be purely mechanical and be taught to femtocleaner).

@andyferris andyferris added the triage This should be discussed on a triage call label Nov 22, 2017
@StefanKarpinski
Copy link
Member

StefanKarpinski commented Nov 30, 2017

Yes, let's go ahead with this. Any additional properties that special dict types have can be reflected in the types of their respective iterators, e.g. ordered/sorted. Need to rerun CI probably.

@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Nov 30, 2017
@StefanKarpinski StefanKarpinski added this to the 1.0 milestone Nov 30, 2017
@andyferris andyferris merged commit 3e68732 into master Dec 2, 2017
@KristofferC KristofferC deleted the ajf/keyset branch December 2, 2017 13:29
@andyferris
Copy link
Member Author

Funnily enough, without #24508 this PR has actually made some of the things I was trying to achieve with indexing of associatives worse.

E.g. one thing I am chasing is that map(f, keys(dict)) returns a new dictionary containing i => f(i) key-value pairs, where i in keys(dict). AbstractSet might be the wrong thing for this.

@stevengj
Copy link
Member

stevengj commented Dec 9, 2017

@andyferris, I'm confused, why should map(f, keys(dict)) return a dictionary? Returning an array (current behavior in 0.6) or a set (new behavior?) seems like the right thing to me.

@stevengj
Copy link
Member

stevengj commented Dec 9, 2017

Currently map(f, keys(dict)) is broken because there is no similar(::KeySet) method. Presumably this should return a Set?

@andyferris
Copy link
Member Author

why should map(f, keys(dict)) return a dictionary

I belatedly address this at #26359 (comment).

In the meantime, it might be necessary to back out of this change (this PR) unless we do something much more drastic.

@Liso77
Copy link

Liso77 commented Mar 9, 2018

Just FYI - python3 has keys as view: https://docs.python.org/3/library/stdtypes.html#dictionary-view-objects

@andyferris
Copy link
Member Author

Yes we definitely want a view - both the earlier KeyIterator and now KeySet are views of a Dict. The question is how the view behaves under operations - does it pretend to be a vector, set, dictionary, generic iterable, etc?

(I just realised that keys(::Vector)is not a view of a resizable vector! Maybe someone with the history can elaborate - is the fact that this is view here reflect the performance benefit rather than behaviour when mutating the container?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants