-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FunctionRewrite
API, Move Array specific rewrites to datafusion_functions_array
#9583
Conversation
…sion-array-functions crate
/// For example, concatenating arrays `a || b` is represented as | ||
/// `Operator::ArrowAt`, but can be implemented by calling a function | ||
/// `array_concat` from the `functions-array` crate. | ||
pub trait FunctionRewrite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new trait used to rewrite Expr
s to function calls
schema: &DFSchema, | ||
_config: &ConfigOptions, | ||
) -> datafusion_common::Result<Transformed<Expr>> { | ||
let transformed = match expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole point of this PR is to move these rules out of datafusion/optimizer/src/analyzer/rewrite_expr.rs
in the optimizer crate and into the functions-array crate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading this PR, a question for me is in #9563, get_field
function also works for a single ScalarValue
of Struct
or Map
Should I put the rewrite logic in this file or create a new rewrite file under "datafusion/functions"? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we add another match in this same file
We could also add another FunctionRewriter but that would result in another (unecessary walk down the tree I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, make sense. I thought it's OK for function but may not suitable for semantic, but now adding in the same file works, we may separate another file if there're more function_rewrite.
&& is_func(&left, "make_array") | ||
&& is_func(&right, "make_array") => | ||
{ | ||
Transformed::yes(array_has_all(*left, *right)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also rewrote these rules to avoid clone
ing (as they all get an owned Expr
)
@@ -0,0 +1,123 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github renders this as a new file, but it is the old datafusion/optimizer/src/analyzer/rewrite_expr.rs with the array specific logic removed and instead invokes each FunctionRewrite
individually
pub mod rewrite_expr; | ||
pub mod subquery; | ||
pub mod type_coercion; | ||
use std::sync::Arc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why the imports were moved -- I think my editor did it
FunctionRewrite
API, port array specific rewrites to the datafu…FunctionRewrite
API, Move Array specific rewrites to datafusion_functions_array
) -> datafusion_common::Result<Transformed<Expr>> { | ||
let transformed = match expr { | ||
// array1 @> array2 -> array_has_all(array1, array2) | ||
Expr::BinaryExpr(BinaryExpr { left, op, right }) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
plan.with_new_exprs(new_expr, new_inputs) | ||
} | ||
} | ||
struct OperatorToFunctionRewriter<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
references are added for performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so. In order to pass down the ConfigOptions without clone'ing the them specifically
I think in general we could improve planning efficiency in DataFusion by avoiding clone
but that is a project for another day
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks really nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the review @jayzhan211
I am going to merge this in to unblock more function migration work
schema: &DFSchema, | ||
_config: &ConfigOptions, | ||
) -> datafusion_common::Result<Transformed<Expr>> { | ||
let transformed = match expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we add another match in this same file
We could also add another FunctionRewriter but that would result in another (unecessary walk down the tree I think)
plan.with_new_exprs(new_expr, new_inputs) | ||
} | ||
} | ||
struct OperatorToFunctionRewriter<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so. In order to pass down the ConfigOptions without clone'ing the them specifically
I think in general we could improve planning efficiency in DataFusion by avoiding clone
but that is a project for another day
Which issue does this PR close?
Closes #9519
Closes #9557
This is an alternate to the design @jayzhan22 implemented in #9557
Rationale for this change
While reviewing XX I realized that the suggestion to use
AnalyzerRule
s in the original ticket was somewhat overkill. The core function needed is rewritingExpr
s (operators specifically) to Functions.Using
AnalyzerRule
requires an entire LogicalPlanWhat changes are included in this PR?
Formalizes the general pattern of rewriting certain expressions / operators to function calls, specifically:
FunctionRewrite
traitFunctionRewriter
s withFunctionRegistry
Are these changes tested?
Covered by existing tests
Are there any user-facing changes?
There is a new API that users can use to rewrite DataFusion
Expr
s to their own function calls.