Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Add a WindowUDFImpl::simplify() API #9906

Merged
merged 3 commits into from
May 30, 2024
Merged

Conversation

guojidan
Copy link
Contributor

@guojidan guojidan commented Apr 2, 2024

Which issue does this PR close?

Closes #9527 .

Rationale for this change

allow user define simply rule

What changes are included in this PR?

add WindowUDFImpl::simplfy() API

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules labels Apr 2, 2024
@guojidan
Copy link
Contributor Author

guojidan commented Apr 2, 2024

hi @jayzhan211 , maybe you can review this PR 😄

@jayzhan211
Copy link
Contributor

jayzhan211 commented Apr 2, 2024

@guojidan I think we need a simple example. Something like simplifying nth_value() to first_value() if n is 1

@guojidan
Copy link
Contributor Author

guojidan commented Apr 2, 2024

@guojidan I think we need a simple example

separate create a new UDWF example file?

@jayzhan211
Copy link
Contributor

jayzhan211 commented Apr 2, 2024

We have example of cast to i64 for scalar function in https://github.com/apache/arrow-datafusion/blob/cc1db8a2043c73bda7adec309b42c08d88defab8/datafusion/core/tests/user_defined/user_defined_scalar_functions.rs#L524

You need to write an example in datafusion/core/tests/user_defined/user_defined_window_functions.rs and shows that the simply() works and make sense

@alamb
Copy link
Contributor

alamb commented Apr 2, 2024

I agree with @jayzhan211 that an example would be great ❤️

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @guojidan

@@ -75,6 +78,30 @@ impl WindowUDFImpl for SmoothItUdf {
Ok(DataType::Float64)
}

/// rewrite
fn simplify(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be confusing as now this example UDWF will never call the actual implementation.

Perhaps we can make a second UDWF and use it as an example of wrapping an existing UDWF 🤔

fn simplify(
&self,
args: Vec<Expr>,
_partition_by: &[Expr],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is interesting. I was trying to figure out how we would update this signature to avoid a copy (for example, a rewrite would have to copy the order_by exprs 🤔

I can't think of anything at the moment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks quite hard to come up with better backward compatible solution

Copy link
Contributor

@milenkovicm milenkovicm May 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we would not need do to decompose and then compose function again, something like:

          Expr::AggregateFunction(AggregateFunction { 
              func_def:AggregateFunctionDefinition::UDF(ref udaf), 
              ref args, 
              ref distinct, 
              ref filter, 
              ref order_by, 
              ref null_treatment }) => {
              match udaf.simplify(args, distinct, &filter, &order_by, &null_treatment)? {
                  ExprSimplifyResult::Simplified(simplified) => Transformed::yes(simplified),
                  ExprSimplifyResult::Original(_) => Transformed::no(expr),
              }
          }

clone of args could be avoided if simplify returns empty vector ... but that breaks semantics of the api

@comphead comphead changed the title feature: Add a WindowUDFImpl::simplfy() API feature: Add a WindowUDFImpl::simplify() API Apr 2, 2024
args: Vec<Expr>,
_partition_by: &[Expr],
_order_by: &[Expr],
_window_frame: &WindowFrame,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats probably better to use builder pattern, if the mandatory and optional input params are different

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what to do. Can you further explain 😄

@alamb
Copy link
Contributor

alamb commented Apr 4, 2024

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft April 4, 2024 18:50
@milenkovicm
Copy link
Contributor

I guess at some point we'd need to change UDF::simplify to align interfaces

Signed-off-by: guojidan <[email protected]>
Signed-off-by: guojidan <[email protected]>
@guojidan guojidan marked this pull request as ready for review May 19, 2024 04:48
@guojidan guojidan requested a review from alamb May 29, 2024 01:33

/// this function will simplify `SimplifySmoothItUdf` to `SmoothItUdf`.
fn simplify(&self) -> Option<WindowFunctionSimplification> {
// Ok(ExprSimplifyResult::Simplified(Expr::WindowFunction(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remind to cleanup this

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @guojidan and @jayzhan211 for the review

I think there are some cleanups (liek https://github.com/apache/datafusion/pull/9906/files#r1618798950 from @jayzhan211 ) that we should do but we can do them as follow on PRs too.

Thank you

@alamb alamb merged commit 3d00760 into apache:main May 30, 2024
23 checks passed
@guojidan guojidan deleted the simplify branch May 30, 2024 00:22
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* feature: Add a WindowUDFImpl::simplfy() API

Signed-off-by: guojidan <[email protected]>

* fix doc

Signed-off-by: guojidan <[email protected]>

* fix fmt

Signed-off-by: guojidan <[email protected]>

---------

Signed-off-by: guojidan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a WindowUDFImpl::simplfy() API
5 participants