Implement General Purpose Constant Folding with the Expression Evaluator #1070
Labels
datafusion
Changes in the datafusion crate
enhancement
New feature or request
performance
Make DataFusion faster
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A classic part of query optimization is algebraic transformations such as partially evaluating expressions once at plan time rather than over and over for each row during execution time.
For example, a predicate such as
where time < date_trunc('2021-10-04Z10:12:13', 'year')
can be rewritten towhere time < '2021-01-01Z00:00:00'
which both saves many redundant evaluations of thedate_trunc
functions and also unlocks additional optimizations such as parquet row group pruning and using constant comparison kernels.DataFusion has a basic constant folding implementation here: https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/constant_folding.rs
However, as implemented, it has a few drawbacks:
now()
expansion)2a. As new expression support is added, it would also need to be added to constant_folding.rs
2b. It runs the risk of producing different answers than if the expression had been evaluated at runtime
Describe the solution you'd like
Reuse the existing expression evaluation framework (namely
PhysicalExpr::evaluate
and everything inphysical_plan/expressions
) to implement constant folding.This would be beneficial because:
The high level idea would be to walk the
Expr
tree bottom up, and if a subtree contained only constants (and non volitalie functions #1069) create and run aPhysicalExpr
to produce a single value, and then replace the subtree with that appropriate constant.Describe alternatives you've considered
I think it is possible to implement expression evaluation as a set of rewrite rules (as is partially done in #1066) but that still has the downside that the behavior can deviate from the actual expression evaluation in
PhysicalExpr
Additional context
The text was updated successfully, but these errors were encountered: