-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Rust's DataFusion (arrow) #107
Comments
Thanks for filling the request. |
Here are latest benchmarks for GROUP BY and I think this is mature enough to consider adding here, but it doesn't support JOIN yet. Is that a prerequisite to getting it on this site? |
Definitely not a prerequisite. Looks competitive. Should one expect to see similar performance comparing to other tools that uses Arrow as a backend? Then we would benchmarking Arrow via its Rust interface. Still make sense, just asking to for better understanding. |
That's a good question and I don't really have a good answer. The only
other Arrow based query engines that I know about is Dremio (Java-based)
and it would be interesting to benchmarks for that too. The Arrow project
is in the process of building a C++ query engine but AFAIK that isn't ready
yet.
Good to hear that join support is optional. I expect DataFusion will
support joins eventually but its not the highest priority right now.
…On Sun, Oct 20, 2019 at 12:45 PM Jan Gorecki ***@***.***> wrote:
Definitely not a prerequisite. Looks competitive. Should one expect to see
similar performance comparing to other tools that uses Arrow as a backend?
Then we would benchmarking Arrow via its Rust interface. Still make sense,
just asking to for better understanding.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#107?email_source=notifications&email_token=AAHEBRCDVJXL27JX7FVZJBDQPSRNTA5CNFSM4JBFJOG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBYQ4LA#issuecomment-544280108>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHEBRF7K46YNT5FYV3TYQTQPSRNTANCNFSM4JBFJOGQ>
.
|
@jangorecki FYI DataFusion 3.0.0 (due to be released any day now) now supports joins |
@andygrove Thanks for update. Note that recently another rust-based solution was merged, Polars. The process was very smooth because the author of Polars submitted groupby and join benchmark scripts in PR. This helped a lot. Writing those scripts properly is not an easy job because I need not only to figure out how to answer questions, but how to answer questions in the most performant way. |
@jangorecki - can I submit a pull request with the DataFusion script to help with the process? |
If it helps, we could even publish a specific rust crate containing the datafusion h2o benchmarks. |
It would be great to add DataFusion to the benchmark! |
DataFusion is an in-memory query engine that uses Apache Arrow as the memory model. It supports executing SQL queries against CSV and Parquet files as well as querying directly against in-memory data.
DataFusion supports projection, selection, and simple aggregate queries.
https://github.com/apache/arrow/tree/master/rust/datafusion
The text was updated successfully, but these errors were encountered: