You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
DataFusion has a variety of benchmarks we use for query execution -- that is how long it takes to run a query.
There are no equivalent benchmark suite for how long it takes to plan a query, an area that many people have highlighted as an area of DataFusion they would like to improve. (see #5637 for various ideas)
Recently we have had some PRs such as #7942 and #7870 that propose some non trivial planning change, including some micro benchmarks that show good promise. However, we don't have an agreed upon way to measure the changes overall impacts
I suggest to also add some benchmarking. We could take for example TCP-H and TCP-DS (which we already have in the benchmarks / tests) and benchmark the time it takes to plan/optimize the queries rather than execute them.
Specifically, I propose adding benchmarks (with documentation about why they are included) in
I recommend picking one of these test suites (perhaps TPCH or ClickBench) and figuring out the pattern for a benchmark test, and then working on the others
Is your feature request related to a problem or challenge?
DataFusion has a variety of benchmarks we use for query execution -- that is how long it takes to run a query.
There are no equivalent benchmark suite for how long it takes to plan a query, an area that many people have highlighted as an area of DataFusion they would like to improve. (see #5637 for various ideas)
Recently we have had some PRs such as #7942 and #7870 that propose some non trivial planning change, including some micro benchmarks that show good promise. However, we don't have an agreed upon way to measure the changes overall impacts
Describe the solution you'd like
As suggested by @Dandandan #7942 (comment)
Specifically, I propose adding benchmarks (with documentation about why they are included) in
https://github.com/apache/arrow-datafusion/blob/03c2ef46f2d88fb015ee305ab67df6d930b780e2/datafusion/core/benches/sql_planner.rs
The code would basically do
Contents:
Describe alternatives you've considered
On alternative could be to update the dfbench tests so they can just plan but not run the queries:
The dfbench code is here: https://github.com/apache/arrow-datafusion/blob/main/benchmarks/src/bin/dfbench.rs
Additional context
No response
The text was updated successfully, but these errors were encountered: