Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP/RFC better map!(f, values(dict)) interface for AbstractDict #31362

Closed
wants to merge 2 commits into from
Closed

WIP/RFC better map!(f, values(dict)) interface for AbstractDict #31362

wants to merge 2 commits into from

Conversation

ndinsmore
Copy link
Contributor

@ndinsmore ndinsmore commented Mar 15, 2019

This is largely a continuation of #31223, but fleshes out a more generic interface for map!(v,values(AdbstractDict)). It aims to further cover some of the issues @chethega brought up in #31199. Right now it is focused on Dicts only but could be broadened to work with general Collections.

As a preface I think everything will want to be renamed because the current descriptive names are a bit verbose, so say the least.... This preface is needed... well you will understand why.

The interface works by creating a trait with the function DirectStyle(dict)) which needs to return one of the following:

  • NaiveDict() for the worst case naive implimentation

  • SlottedDictWhich is fo dicts that are structured like Dict with slots . This will cover most important implimentations.

  • PointerDict Which has an iterate like function that function which returns 0-Dim arrays to the value for modification.

Compared to the previous map!(f,values(dict::Dict)) the SlottedDict path is able to be generic with only a 10% loss in performance.
Right now the performance follows the trend
NaiveDict ---(is 3x slower than)--->PointerDict----(is 6x slower than)--> SlottedDict

With a little bit of tuning, I think that PointerDict has the potential to be as fast as SlottedDict I just haven't figure out why the optimizer isn't getting it there. Right now it is PointerDict is roughly as fast as the ValueIterator.

A real implimentation is included for Dict while WeakKeyDict has a example of PointerDict that would be changed before final release.

The definition of the iterface can be a little helpfully in understand but is very much a WIP:
(Please forgive all the Base. they are there to make hacking outside of base easier)

abstract type DirectIteratorStyle end
# This will be the fall back that uses the naive map above
struct NaiveDict <: DirectIteratorStyle end
"""
    SlottedDict()
This trait signifies that the AbstractDict is structured similar to `Dict`
This means that it should have a LinearIndexable array of slots and an array of value.
Then be able to provide a function to provide a function which indicated whether a slot is filled.
    Required Methods:  slot_access_functions(::Type{<:AbstractDict})
"""
struct SlottedDict <: DirectIteratorStyle end
# Dicts that return SlottedDicts must have the following methods
"""
    slot_access_functions(::Type{CustomDict})
This function should return a tuple of three function that will be used to access values
    (   isslotfilled(slots::AbstractArray, index)::Bool,
        get_slots_array(dict::CustomDict)::AbstractArray,
        get_value_array(dict::CustomDict)::AbstractArray)
Importantly these functions do not have to be named as such they just need to take the arguments as shown
It is important that the arrays returned are the same length
"""
function slot_access_functions(::Type{<:AbstractDict}) end

function _map!(::SlottedDict, f, iter::Base.ValueIterator{D}) where D<:AbstractDict
    (isslotfilled, get_slots, get_vals) = slot_access_functions(D)
    slots =  get_slots(iter.dict)
    vals = get_vals(iter.dict)
    # @inbounds is here so the it gets propigated to isslotfiled
    @inbounds for i = 1:length(slots)
        if isslotfilled(slots, i)
            vals[i] = f(vals[i])
        end
    end
    return iter
end
"""
    PointerDict()
This trait signifies that the AbstractDict has an iterate like function which on each
iteration provides an `AbstractArray{T,N}` which acts as a pointer to that given value

    Required Methods:   `iterate_value_pointer(d::Abstract{K,V})`
                        `iterate_value_pointer(d::Abstract{K,V}, state)`
        Both of which should act like iterate returning either:
                        `Nothing`  to signify the iteration is over
                        `(value_pointer::AbstractArray{V,0},state)`
"""
struct PointerDict <: DirectIteratorStyle end

"""
    `iterate_value_pointer(d::Abstract{K,V})`
    `iterate_value_pointer(d::Abstract{K,V}, state)`
    Both of which should act like iterate returning either:
                    `Nothing`  to signify the iteration is over
                    `(value_pointer::AbstractArray{V,0},state)`
Importantly these functions Do not have to be named as such they just need to take the arguments as shown
It is important that the arrays returned are the same length
"""

function _map!(::PointerDict, f, iter::Base.ValueIterator{D}) where D<:AbstractDict
    dict=iter.dict
    @inbounds next = iterate_value_pointer(dict)
    while next !== nothing
        (v, state) = next
        @inbounds v[] = f(v[])
        @inbounds next = iterate_value_pointer(dict, state)
    end
    return iter
end

@ndinsmore ndinsmore changed the title WIP/RFC better map value for abstract Dict WIP/RFC better map!(f, values(dict)) interface for AbstractDict Mar 15, 2019
@chethega
Copy link
Contributor

I don't think we really need a slot API; instead, we should figure out an API for tokens / semitokens.

There are many sane kinds of dictionaries that are not slotted, but have a sensible concept of tokens. Examples would be hashtables without slots (key/values are not overallocated; instead we have an overallocated Vector{Int32} pointing to keys/values/hashes), binary trees from DataStructures, external (ccall-wrapped) adaptive radix trees like ART or judy, or even databases. Tokens can represent indices (Base Dict), naked pointers (external library), or tree traversals (datastructures sorteddict).

@ndinsmore
Copy link
Contributor Author

ndinsmore commented Mar 16, 2019

The PointerDict() concept was meant to cover most of those. In fact, if the iterator was faster it could easily be used for the normal dict, I am still trying to figure out why it is 6x slower.

With PointerDict() long as they can have an iterator that returns a 0-dim array we should be able to write simple functions like map! and even have lowered that can do better in place modification. The 0-dim array is required to cover both mutable and immutable data types.

The easiest way today to create a 0-dim array for data that is already in an array, is view but there is overhead associated with that.
unsafe_wrap can work but creation is slow.

@ndinsmore ndinsmore closed this Mar 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants