Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FunctionRewrite API, Move Array specific rewrites to datafusion_functions_array #9583

Merged
merged 2 commits into from
Mar 13, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Mar 12, 2024

Which issue does this PR close?

Closes #9519
Closes #9557

This is an alternate to the design @jayzhan22 implemented in #9557

Rationale for this change

While reviewing XX I realized that the suggestion to use AnalyzerRules in the original ticket was somewhat overkill. The core function needed is rewriting Exprs (operators specifically) to Functions.

Using AnalyzerRule requires an entire LogicalPlan

What changes are included in this PR?

Formalizes the general pattern of rewriting certain expressions / operators to function calls, specifically:

  1. Add a new FunctionRewrite trait
  2. Add an API to register FunctionRewriters with FunctionRegistry
  3. Demonstrate the use of this API by porting the array specific rewrite as a FunctionRewriter

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

There is a new API that users can use to rewrite DataFusion Exprs to their own function calls.

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Mar 12, 2024
/// For example, concatenating arrays `a || b` is represented as
/// `Operator::ArrowAt`, but can be implemented by calling a function
/// `array_concat` from the `functions-array` crate.
pub trait FunctionRewrite {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the new trait used to rewrite Exprs to function calls

schema: &DFSchema,
_config: &ConfigOptions,
) -> datafusion_common::Result<Transformed<Expr>> {
let transformed = match expr {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole point of this PR is to move these rules out of datafusion/optimizer/src/analyzer/rewrite_expr.rs in the optimizer crate and into the functions-array crate

Copy link
Contributor

@yyy1000 yyy1000 Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading this PR, a question for me is in #9563, get_field function also works for a single ScalarValue of Struct or Map
Should I put the rewrite logic in this file or create a new rewrite file under "datafusion/functions"? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add another match in this same file

We could also add another FunctionRewriter but that would result in another (unecessary walk down the tree I think)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, make sense. I thought it's OK for function but may not suitable for semantic, but now adding in the same file works, we may separate another file if there're more function_rewrite.

&& is_func(&left, "make_array")
&& is_func(&right, "make_array") =>
{
Transformed::yes(array_has_all(*left, *right))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also rewrote these rules to avoid cloneing (as they all get an owned Expr)

@@ -0,0 +1,123 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github renders this as a new file, but it is the old datafusion/optimizer/src/analyzer/rewrite_expr.rs with the array specific logic removed and instead invokes each FunctionRewrite individually

pub mod rewrite_expr;
pub mod subquery;
pub mod type_coercion;
use std::sync::Arc;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why the imports were moved -- I think my editor did it

@alamb alamb changed the title Add FunctionRewrite API, port array specific rewrites to the datafu… Add FunctionRewrite API, Move Array specific rewrites to datafusion_functions_array Mar 12, 2024
) -> datafusion_common::Result<Transformed<Expr>> {
let transformed = match expr {
// array1 @> array2 -> array_has_all(array1, array2)
Expr::BinaryExpr(BinaryExpr { left, op, right })

This comment was marked as resolved.

plan.with_new_exprs(new_expr, new_inputs)
}
}
struct OperatorToFunctionRewriter<'a> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

references are added for performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so. In order to pass down the ConfigOptions without clone'ing the them specifically

I think in general we could improve planning efficiency in DataFusion by avoiding clone but that is a project for another day

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks really nice!

Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review @jayzhan211

I am going to merge this in to unblock more function migration work

schema: &DFSchema,
_config: &ConfigOptions,
) -> datafusion_common::Result<Transformed<Expr>> {
let transformed = match expr {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we add another match in this same file

We could also add another FunctionRewriter but that would result in another (unecessary walk down the tree I think)

plan.with_new_exprs(new_expr, new_inputs)
}
}
struct OperatorToFunctionRewriter<'a> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so. In order to pass down the ConfigOptions without clone'ing the them specifically

I think in general we could improve planning efficiency in DataFusion by avoiding clone but that is a project for another day

@alamb alamb merged commit a43938d into apache:main Mar 13, 2024
24 of 25 checks passed
@alamb alamb deleted the alamb/function_rewrite branch March 13, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move datafusion_array_function specific rewrite rules like to datafusion_functions_array crate
3 participants