-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TableTraits.jl integration #76
Conversation
cd8f777
to
5ae9f8d
Compare
Codecov Report
@@ Coverage Diff @@
## master #76 +/- ##
=========================================
- Coverage 92.61% 92.5% -0.11%
=========================================
Files 6 7 +1
Lines 1042 1068 +26
=========================================
+ Hits 965 988 +23
- Misses 77 80 +3
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments. This is nice! 👍
src/tabletraits.jl
Outdated
end | ||
end | ||
|
||
function IndexedTable(x; idxcols::Union{Void,Vector{Symbol}}=nothing, datacols::Union{Void,Vector{Symbol}}=nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This could also use an optimized method when x
is Columns
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you should just be able to add another method that handles that case, right? It would be good if the named arguments had the same semantics, of course.
I'm also not sure this is the right API, I just was loosely inspired by this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, one more thing: we should also add a code path to this method that deals with an iterator where the element type is Pair{X,S}
. If it is just any Pair
, it would create an unnamed index and data column. If either X
or X
are a NamedTuple
, it would create named columns for the index and data columns. At that point the following would automatically work:
@from i in source begin
@select {i.a, i.b} => {i.c,i.d}
@collect IndexedTable
end
Not in this PR, but could be added later.
src/tabletraits.jl
Outdated
source_colnames = TableTraits.column_names(iter) | ||
source_coltypes = TableTraits.column_types(iter) | ||
|
||
if idxcols==nothing && datacols==nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case should probably result in a Columns(1:n)
column as the index, mirroring the behavior of loadfiles
in JuliaDB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then that's not what IndexedTable(xs::Vector...)
does...! Maybe it should?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a natural conversion of Columns
to IndexedTable
is to have a 1:n
index: it's the same as a 1-d array of named tuples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index columns need unique values, right? So one could also say that in this case it should create an IndexedTable
without any index, so that this conversion works for any table data without having things like a unique requirement. But I'm not sure, I think the main thing is to be consistent across the different ways to create things both in IndexedTable
and JuliaDB
.
Cool, thanks David! Will there eventually be a way to get a length (or estimate) from the iterator, or better yet reuse whole columns in place? |
Yes, the whole design here just follows the iterator interface in base. So if the source returns So the issue here is just that this PR is not the most efficient implementation :) AND I just realized that this PR is a bit silly. https://github.com/davidanthoff/TableTraitsUtils.jl has a default implementation for the iterable tables trait that handles all these things properly, and there is actually no reason why I couldn't just use that implementation here as well. I just overlooked this, the code in this PR predates those efficient implementations and I forgot to update things. I'll update this PR tomorrow or so to use all of that. One thing that would help even more is JuliaLang/julia#22467, in that case even more Query.jl queries could use preallocated arrays of the right size when they get materialized. I had a very hackish version of something like that and it allowed me to get my
Not yet, but I'm mulling some designs for that. I think that can come later, though, I would add it as an optional, more performant way to the table traits interface. |
Alright, the new commit now uses the TableTraitsUtils.jl implementation to generate the vectors that hold the data, and that implementation uses an optimized codepath for sources that return |
LGTM. |
Cool. What is your plan for tagging a new version? I should try to release a version of IterableTables.jl (that no longer has the integration code in it) ideally more or less simultaneously so that folks don't get override warnings. |
The package is ready to be tagged. Just ping me on a metadata PR and I'll
make one against it.
|
No description provided.