-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future of experimental optimizer datafusion-tokomak #440
Comments
fwiw:
|
Some good and bad news on this front. The Tokomak optimization pass combined with a predicate pushdown pass and a filter<-cross join to filter<-inner join pass is able to handle TPCH Q19 with each iteration taking ~66ms on my laptop. It performs the AstSize cost function transforms the expression into the form that @alamb mentioned in #217. Which gives the optimized logical plan:
Bad news is that I had to use a dev branch of egg due to Send requirements so merging the Tokomak optimizer may have to wait until the next version of egg releases. On the bright side there is plenty of work that needs to be done before I feel the optimizer is ready anyways:
as the in this case both the left and the right of the filter predicate must be columns for this to be a valid transform.
Could instead write something along the lines of:
I'm going to keep working on this in a separate repo for now because it's a long way from being what I would consider being complete. |
Thank you for the update @pjmore -- sounds like some great progress |
Thanks for this @pjmore . Great milestone to have the new query working. Also wondering if some other cross joins are removed too :). |
Thanks @pjmore . The DSL looks pretty great, I was thinking about this while reviewing your initial PR the other day. Perhaps something you can upstream to egg as well? I also agree with @Dandandan that we can merge your code into the main repo early without having to wait for it to be feature complete. Can't wait for this cool feature to land in master :) |
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This issue is for discussing the future of datafusion-tokomak, an experimental optimizer using the egg library.
It currently allows to optimize
Expr
s and contains many optimizations currently not done in DataFusion.I envision it could be extended to support a logical plan or physical plan too.
The optimizer using egg has the following nice properties, which are hard to achieve otherwise:
Some material about it here https://egraphs-good.github.io/
Describe the solution you'd like
Some options:
Integrate it into DataFusion, as an optional feature
Add to DataFusion as separate crate
Keep it in separate repo as is, do some releases to crates.io in sync with DataFusion releases
Add as experimental repo / branch under the Apache organization
Describe alternatives you've considered
n/a
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: