-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-1990: [JS] C++ Refactor, Add DataFrame #1482
Conversation
… patterns to fix ci
... and a bunch of performance tests for various scanning approaches
.. also found and resolved some minor bugs (get(idx) batch length check should be <=, various extern issues with UMD builds)
js/src/table.ts
Outdated
super({batches: [[values, counts]]}); | ||
} | ||
|
||
asJSON(): Object { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheNeuralBit what about renaming this to toJSON()
, so it also becomes the default serialization behavior if an instance is serialized via JSON.stringify()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah nice I didn't know about toJSON
, I'll make that change
…tor-merge_with-table-scan-perf
…-merge_with-table-scan-perf
f90300a
to
1910962
Compare
@trxcllnt I don't have anything else major I want to add to this. What do you think? |
@TheNeuralBit I would like to add more tests tonight to validate everything before it goes into master. We're hovering around 60% coverage with unit + integration tests, but not covering some of the cases like |
This is really cool. It's a big patch -- is there anything I can help review? |
@wesm it may be worthwhile discussing the JS refactor with respect to the C++ implementation -- I tried to implement the Array/ArrayData stuff as faithfully as I could, but may have misunderstood a few things. I'm in meetings today, but will be available tonight |
@trxcllnt I'll look at adding some more tests for table.ts, hopefully that'll bump up the coverage for the vectors a bit as well. |
fix closure es5/umd toString() iterator
@wesm I believe this is ready -- we would like to add more tests of the new capabilities, but I wouldn't block the PR over it. I can follow up with the tests/enhancements I'd like in another PR. |
Cool, this patch is so big I'm just going to merge it. Hopefully the JS patch size can decrease in future PRs to make it easier to review =) |
Thanks @trxcllnt and @TheNeuralBit for all your hard work on this! Would love to see a blog post or some other document to show off the new fancy stuff |
Would you mind updating all the JIRAs that are resolved from this? |
@wesm sure thing! |
Thanks @wesm! I'm travelling at the moment but I'm hoping to write up a blog post about the DataFrame ops when I get back next week. |
This PR moves the
Table
class out of the Vector hierarchy and adds optimized dataframe operations to it. Currently implements an optimizedscan()
method,filter(predicate)
,count()
, andcountBy(column_name)
(only works on dictionary-encoded columns).Some usage examples, based on the file generated by
js/test/data/tables/generate.py
:There are performance tests for the dataframe operations, to run them you must first generate the test data by running
npm run create:perfdata
.The PR also includes @trxcllnt's refactor of the JS implementation to make it more closely resemble the C++ implementation. This refactor resolves multiple JIRAs: ARROW-1903, ARROW-1898, ARROW-1502, ARROW-1952 (partially), and ARROW-1985