Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TableTraits.jl integration #76

Merged
merged 2 commits into from
Aug 30, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions REQUIRE
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@ julia 0.6
Compat 0.19
NamedTuples 2.1.0
PooledArrays
TableTraits 0.0.1
3 changes: 3 additions & 0 deletions src/IndexedTables.jl
Original file line number Diff line number Diff line change
Expand Up @@ -435,4 +435,7 @@ include("join.jl")
# query and aggregate
include("query.jl")

# TableTraits.jl integration
include("tabletraits.jl")

end # module
72 changes: 72 additions & 0 deletions src/tabletraits.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
using TableTraits

TableTraits.isiterable(x::IndexedTable) = true
TableTraits.isiterabletable(x::IndexedTable) = true

function TableTraits.getiterator{S<:IndexedTable}(source::S)
return rows(source)
end

# Sink

@generated function _fillIndexedTable{idx_indices,data_indices}(iter,idx_storage,data_storage,::Type{idx_indices},::Type{data_indices})
push_exprs = Expr(:block)
for (i,idx) in enumerate(map(i->i.parameters[1],idx_indices.parameters))
ex = :( push!(idx_storage.columns[$i], row[$idx]) )
push!(push_exprs.args, ex)
end

for (i,idx) in enumerate(map(i->i.parameters[1],data_indices.parameters))
ex = :( push!(data_storage.columns[$i], row[$idx]) )
push!(push_exprs.args, ex)
end

quote
for row in iter
$push_exprs
end
end
end

function IndexedTable(x; idxcols::Union{Void,Vector{Symbol}}=nothing, datacols::Union{Void,Vector{Symbol}}=nothing)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This could also use an optimized method when x is Columns!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you should just be able to add another method that handles that case, right? It would be good if the named arguments had the same semantics, of course.

I'm also not sure this is the right API, I just was loosely inspired by this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one more thing: we should also add a code path to this method that deals with an iterator where the element type is Pair{X,S}. If it is just any Pair, it would create an unnamed index and data column. If either X or X are a NamedTuple, it would create named columns for the index and data columns. At that point the following would automatically work:

@from i in source begin
    @select {i.a, i.b} => {i.c,i.d}
    @collect IndexedTable
end

Not in this PR, but could be added later.

if isiterabletable(x)
iter = getiterator(x)

source_colnames = TableTraits.column_names(iter)
source_coltypes = TableTraits.column_types(iter)

if idxcols==nothing && datacols==nothing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case should probably result in a Columns(1:n) column as the index, mirroring the behavior of loadfiles in JuliaDB.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then that's not what IndexedTable(xs::Vector...) does...! Maybe it should?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a natural conversion of Columns to IndexedTable is to have a 1:n index: it's the same as a 1-d array of named tuples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index columns need unique values, right? So one could also say that in this case it should create an IndexedTable without any index, so that this conversion works for any table data without having things like a unique requirement. But I'm not sure, I think the main thing is to be consistent across the different ways to create things both in IndexedTable and JuliaDB.

idxcols = source_colnames[1:end-1]
datacols = [source_colnames[end]]
elseif idxcols==nothing
idxcols = setdiff(source_colnames,datacols)
elseif datacols==nothing
datacols = setdiff(source_colnames, idxcols)
end

if length(setdiff(idxcols, source_colnames))>0
error("Unknown idxcol")
end

if length(setdiff(datacols, source_colnames))>0
error("Unknown datacol")
end

idxcols_indices = [findfirst(source_colnames,i) for i in idxcols]
datacols_indices = [findfirst(source_colnames,i) for i in datacols]

idx_storage = Columns([Array{source_coltypes[i],1}(0) for i in idxcols_indices]..., names=[source_colnames[i] for i in idxcols_indices])
data_storage = Columns([Array{source_coltypes[i],1}(0) for i in datacols_indices]..., names=[source_colnames[i] for i in datacols_indices])

tuple_type_idx = eval(Expr(:curly, :Tuple, [Expr(:curly, :Val, i) for i in idxcols_indices]...))
tuple_type_data = eval(Expr(:curly, :Tuple, [Expr(:curly, :Val, i) for i in datacols_indices]...))

_fillIndexedTable(iter, idx_storage, data_storage, tuple_type_idx, tuple_type_data)

return IndexedTable(idx_storage, data_storage)
elseif idxcols==nothing && datacols==nothing
return convert(IndexedTable, x)
else
throw(ArgumentError("x cannot be turned into an IndexedTable."))
end
end
1 change: 1 addition & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
include("test_core.jl")
include("test_query.jl")
include("test_utils.jl")
include("test_tabletraits.jl")
39 changes: 39 additions & 0 deletions test/test_tabletraits.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
using IndexedTables
using TableTraits
using NamedTuples
using Base.Test

@testset "TableTraits" begin

source_it = IndexedTable(Columns(a=[1,2,3]), Columns(b=[1.,2.,3.], c=["A","B","C"]))

@test isiterable(source_it) == true

target_array = collect(getiterator(source_it))

@test length(target_array) == 3
@test target_array[1] == @NT(a=1, b=1., c="A")
@test target_array[2] == @NT(a=2, b=2., c="B")
@test target_array[3] == @NT(a=3, b=3., c="C")

source_array = [@NT(a=1,b=1.,c="A"), @NT(a=2,b=2.,c="B"), @NT(a=3,b=3.,c="C")]

it1 = IndexedTable(source_array)
@test length(it1) == 3
@test it1[1,1.].c == "A"
@test it1[2,2.].c == "B"
@test it1[3,3.].c == "C"

it2 = IndexedTable(source_array, idxcols=[:a])
@test length(it2) == 3
@test it2[1] == @NT(b=1., c="A")
@test it2[2] == @NT(b=2., c="B")
@test it2[3] == @NT(b=3., c="C")

it3 = IndexedTable(source_array, datacols=[:b, :c])
@test length(it3) == 3
@test it3[1] == @NT(b=1., c="A")
@test it3[2] == @NT(b=2., c="B")
@test it3[3] == @NT(b=3., c="C")

end