-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add simplify
method to aggregate function
#10354
Conversation
7325587
to
0e50d42
Compare
the change is quite similar to #9906 there are quite a lot optional/unused parameters in the method call, but i personally find it easier to understand as it is equivalent to AggregateFunction signature. I wonder if |
42000ac
to
f96a6d8
Compare
jsut thinking aloud, maybe if we change: pub enum ExprSimplifyResult {
/// The function call was simplified to an entirely new Expr
Simplified(Expr),
/// the function call could not be simplified, and the arguments
/// are return unmodified.
Original(Vec<Expr>),
} to: pub enum ExprSimplifyResult<T> {
/// The function call was simplified to an entirely new Expr
Simplified(Expr),
/// the function call could not be simplified, and the arguments
/// are return unmodified.
Original(T),
} simplify method would change from: pub fn simplify(
&self,
args: Vec<Expr>,
distinct: &bool,
filter: &Option<Box<Expr>>,
order_by: &Option<Vec<Expr>>,
null_treatment: &Option<NullTreatment>,
info: &dyn SimplifyInfo,
) -> Result<ExprSimplifyResult> { to: pub fn simplify(
&self,
args: Vec<Expr>,
distinct: bool,
filter: Option<Box<Expr>>,
order_by: Option<Vec<Expr>>,
null_treatment: Option<NullTreatment>,
info: &dyn SimplifyInfo,
) -> Result<ExprSimplifyResult<(Vec<Expr>, bool, Option<Box<Expr>> ...)>> {
} so in both cases, when we create replacement function or we re-assemble original function we would not need to clone parameters. maybe hand full of changes, from Downside is that we would have to re-create expression if no simplification, plus change would be backward incompatible |
datafusion/expr/src/udaf.rs
Outdated
/// See [`AggregateUDFImpl::simplify`] for more details. | ||
pub fn simplify( | ||
&self, | ||
args: Vec<Expr>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should reuse args in AggregateFunction, maybe create a new struct with AggregateArgs
?
datafusion/datafusion/expr/src/expr.rs
Lines 568 to 582 in 8190cb9
/// Aggregate function | |
#[derive(Clone, PartialEq, Eq, Hash, Debug)] | |
pub struct AggregateFunction { | |
/// Name of the function | |
pub func_def: AggregateFunctionDefinition, | |
/// List of expressions to feed to the functions as arguments | |
pub args: Vec<Expr>, | |
/// Whether this is a DISTINCT aggregation or not | |
pub distinct: bool, | |
/// Optional filter | |
pub filter: Option<Box<Expr>>, | |
/// Optional ordering | |
pub order_by: Option<Vec<Expr>>, | |
pub null_treatment: Option<NullTreatment>, | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, mixed feelings about it. IMHO I see no benefits, would not simplify anything and another structure would be introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One benefit is that we could potentially add new fields without causing an API change 🤔
Another benefit could be that we could add a default method that recreated the expression. For example
impl AggegateArgs {
// recreate the original aggregate fucntions
fn into_expr(self) -> Expr { ... }
...
}
Another benefit is, as @jayzhan211 says, that it would be more consistent with the other APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we change signature of pub struct AggregateFunction
moving args
, distinct
, filter
... to AggegateArgs
or we keep AggegateArgs
as auxiliary structure?
I think it is a good idea. |
fn simplify( | ||
&self, | ||
args: Vec<Expr>, | ||
distinct: &bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also pass others args with owned value for rewrite like args: Vec<Expr>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, if we make ExprSimplifyResult
generic, as I pointed out in the comment. If we change it without change of ExprSimplifyResult
it would lead to cloning of arguments even if simplify does nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the work we have been doing on #9637 let's not add an API that forces cloning
Another potential thing is to use Transformed
directlt
I think if we are going to change the signature, I think we should use Transformed directly (and it is already parameterized) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @milenkovicm and @jayzhan211 -- I agree this API is a little akward. It would be great to figure out how to avoid cloning when unecessary
datafusion/expr/src/udaf.rs
Outdated
/// See [`AggregateUDFImpl::simplify`] for more details. | ||
pub fn simplify( | ||
&self, | ||
args: Vec<Expr>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One benefit is that we could potentially add new fields without causing an API change 🤔
Another benefit could be that we could add a default method that recreated the expression. For example
impl AggegateArgs {
// recreate the original aggregate fucntions
fn into_expr(self) -> Expr { ... }
...
}
Another benefit is, as @jayzhan211 says, that it would be more consistent with the other APIs
@@ -1307,6 +1309,39 @@ impl<'a, S: SimplifyInfo> TreeNodeRewriter for Simplifier<'a, S> { | |||
ExprSimplifyResult::Simplified(expr) => Transformed::yes(expr), | |||
}, | |||
|
|||
Expr::AggregateFunction(AggregateFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like in order to be able to make this API avoid copying, it would need to pass the distinct
, filter
, etc fields in and then get them back
Another thing we could do is to use Transformed
directly
What if we made something like
struct AggregateFunctionsArgs {
args: Vec<Expr>,
distinct: bool,
filter: Option<Expr>,
order_by: ...
}
And then the call to simplify to be
trait AggreagateUDFImpl {
...
fn simplify(
&self,
agg_args: AggregateFunctionsArgs,
) -> Result<Transformed<AggregateFunctionsArgs> {
Ok(Transformed::no(agg_args))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with AggregateFunctionsArgs
, consistency part got me on board 😀
I'm not sure about Transformed
part, reason being is that ExprSimplifyResult
returns two types, Expr
and currently Vec<Expr>
in that case simplify
signature should be changed to:
trait AggreagateUDFImpl {
...
fn simplify(
&self,
agg_args: AggregateFunctionsArgs,
) -> Result<Transformed<Expr> {
let original : Expr = agg_args.into();
Ok(Transformed::no(original))
}
Thus simplify function should return 're-composed' original function or new function ...
Anyway, I'll have a look what can be done, we can discuss
fn simplify( | ||
&self, | ||
args: Vec<Expr>, | ||
distinct: &bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the work we have been doing on #9637 let's not add an API that forces cloning
Another potential thing is to use Transformed
directlt
f96a6d8
to
62d381e
Compare
Another option could be to make a simplify return optional closure fn simplify(&self ) -> Option<Box<dyn Fn(AggregateFunction) -> datafusion_common::tree_node::Transformed<Expr>>> {
None
} in this case expr_simplifier part would become: Expr::AggregateFunction(AggregateFunction {func_def: AggregateFunctionDefinition::UDF(ref udaf), ..}) => {
match (udaf.simplify(), expr) {
(Some(simplify), Expr::AggregateFunction(a)) => simplify(a),
(_, e) => Transformed::no(e),
}
} this solution:
other issues:
|
Another option is to introduce Expr::AggregateFunction(AggregateFunction {
func_def: AggregateFunctionDefinition::UDF(udaf),
args: AggregateFunctionArgs,
}) if udaf.has_simplify() => {
Transformed::yes(udaf.simplify()?)
}
fn simplify() -> Result<Expr> The downside is that the user needs to both define |
I personally try to avoid situations like this when creating API, this should be documented that both functions have to be implemented plus it introduces more corner cases. Anyway, closure was a brain dump from yesterday, after a good sleep I don't think we need it, will try to elaborate in next comment |
Option 1 - Original expression as a parameter fn simplify(
&self,
expr: Expr,
) -> Result<datafusion_common::tree_node::Transformed<Expr>> {
// we know it'll always be AggregateFunction but we have to match on it
// which is not terribly hard but it makes api a bit odd
if let Expr::AggregateFunction(agg) = expr {
...
Ok(datafusion_common::tree_node::Transformed::yes(...))
} else {
// no re-creation of expression if not used
Ok(datafusion_common::tree_node::Transformed::no(expr))
}
} default implementation is just: fn simplify(
&self,
expr: Expr,
) -> Result<datafusion_common::tree_node::Transformed<Expr>> {
// no re-creation of expression if not used
Ok(datafusion_common::tree_node::Transformed::no(expr))
} Option 2 - AggregationFunction as a parameter fn simplify(
&self,
agg: AggregateFunction,
) -> Result<datafusion_common::tree_node::Transformed<Expr>> {
// we know its aggregate function, no need to do extra matching, but we need to re-create expression
// issue user can return `no` with new expression in it
Ok(datafusion_common::tree_node::Transformed::no(Expr::AggregateFunction(agg)))
} Option 3 - AggregationFunction as a parameter with parametrised
|
Given these 3, I prefer 2, because I think re-create expression is not a high cost that we should avoid. And we can do even further, If user does not implement simplify, it is equivalent to Transformed::no or ExprSimplifyResult::Original, so we can just have fn simplify(
&self,
agg: AggregateFunction,
) -> Result<Expr> {
Ok(Expr::AggregateFunction(agg))
} |
62d381e
to
cd68b7f
Compare
I apologise for making noise, but it looks like all of the non closure options have issue with borrowing:
It looks like |
d0e6842
to
de51434
Compare
@jayzhan211 and @alamb whenever you get chance please have a look, IMHO I find proposal with closure to tick all the boxes, but this one is definitely simpler. |
After playing around, I feel like maybe optional closure is our answer 🤔 fn simplify(
&self,
_info: &dyn SimplifyInfo,
) -> Option<Box<dyn Fn(AggregateArgs) -> Expr>> {} |
Damn! I should have created a branch with that code 😀, if we have consensus on that direction I'll redo it |
My final answer for today is // UDAF
fn simplify(
&self,
args: AggregateArgs,
_info: &dyn SimplifyInfo,
) -> Option<Box<dyn Fn(AggregateArgs) -> Result<Expr>>> { // UDF, not necessary but for consistency
fn simplify(
&self,
args: Vec<Expr>,
_info: &dyn SimplifyInfo,
) -> Option<Box<dyn Fn(Vec<Expr>) -> Result<Expr>>> { Go with optional closure! |
you mean // UDF
fn simplify(
&self,
) -> Option<Box<dyn Fn(AggregateFunction, &dyn SimplifyInfo) -> Result<Expr>>> { or // UDF
fn simplify(
&self,
info: &dyn SimplifyInfo
) -> Option<Box<dyn Fn(AggregateFunction) -> Result<Expr>>> { |
It should be this. // UDF
fn simplify(
&self,
) -> Option<Box<dyn Fn(AggregateFunction, &dyn SimplifyInfo) -> Result<Expr>>> { The reason for optional closure is that I assume we need to return |
We're on the same page @jayzhan211 |
If we don't need pub fn simplify(
&self,
args: AggregateArgs,
info: &dyn SimplifyInfo,
) -> Result<Transformed<AggregateArgs>> {} I think the next problem is whether we need |
I think we need Expr, this way we can replace function with something else |
I agree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
You can define closure in |
47108f1
to
a678e6d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @milenkovicm and @jayzhan211 -- I think it looks good and the API is clean. Let's give it a try 🚀
Looks like there is a small conflict to resolve and then we'll be good to go
a678e6d
to
c40690c
Compare
simplify
method to aggregate function
Thanks again @jayzhan211 and @milenkovicm 🙏 |
* add simplify method for aggregate function * simplify returns closure
Which issue does this PR close?
Closes #9526.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
yes, added new test and example
Are there any user-facing changes?
no, changes are backward compatible