-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a ScalarUDFImpl::simplfy()
API, move SimplifyInfo
et al to datafusion_expr
#9304
Merged
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
b787dfb
first draft
jayzhan211 7259275
clippy
jayzhan211 4d98121
add comments
jayzhan211 bacc966
move to optimize rule
jayzhan211 3199bca
cleanup
jayzhan211 63648be
fix explain test
jayzhan211 83fc9d8
move to simplifier
jayzhan211 c73126a
pass with schema
jayzhan211 dd99362
fix explain
jayzhan211 0b66ed3
fix doc
jayzhan211 0798274
move to expr
jayzhan211 5fdf177
change simplify signature
jayzhan211 ab66a19
cleanup
jayzhan211 7c7b654
cleanup
jayzhan211 cbefb3c
fix doc
jayzhan211 7718251
fix doc
jayzhan211 5cdae92
Update datafusion/expr/src/udf.rs
alamb 4886ba5
Add backwards compatibile uses, inline FunctionSimplifier, rename to …
alamb ed6a04b
Remove DFSchema from SimplifyInfo
alamb f6848d8
Avoid requiring argument copies
alamb a8541ff
Merge remote-tracking branch 'apache/main' into simply-udf
alamb 18f8371
Improve docs
alamb bfb54a0
fix link
alamb 33aa7ff
fix doc test
alamb 24adcbf
Update datafusion/physical-expr/src/lib.rs
alamb 8cab80b
Merge remote-tracking branch 'apache/main' into simply-udf
alamb 550cbc4
Merge remote-tracking branch 'apache/main' into simply-udf
alamb fea82cb
Change example simplify to always simplify its argument
alamb 4e9eb70
Clarify comment
alamb fdec54c
Merge remote-tracking branch 'apache/main' into simply-udf
alamb File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,7 +16,9 @@ | |
// under the License. | ||
|
||
use arrow::compute::kernels::numeric::add; | ||
use arrow_array::{Array, ArrayRef, Float64Array, Int32Array, RecordBatch, UInt8Array}; | ||
use arrow_array::{ | ||
Array, ArrayRef, Float32Array, Float64Array, Int32Array, RecordBatch, UInt8Array, | ||
}; | ||
use arrow_schema::DataType::Float64; | ||
use arrow_schema::{DataType, Field, Schema}; | ||
use datafusion::prelude::*; | ||
|
@@ -26,10 +28,13 @@ use datafusion_common::{ | |
assert_batches_eq, assert_batches_sorted_eq, cast::as_int32_array, not_impl_err, | ||
plan_err, ExprSchema, Result, ScalarValue, | ||
}; | ||
use datafusion_expr::simplify::ExprSimplifyResult; | ||
use datafusion_expr::simplify::SimplifyInfo; | ||
use datafusion_expr::{ | ||
create_udaf, create_udf, Accumulator, ColumnarValue, ExprSchemable, | ||
LogicalPlanBuilder, ScalarUDF, ScalarUDFImpl, Signature, Volatility, | ||
}; | ||
|
||
use rand::{thread_rng, Rng}; | ||
use std::any::Any; | ||
use std::iter; | ||
|
@@ -514,6 +519,101 @@ async fn deregister_udf() -> Result<()> { | |
Ok(()) | ||
} | ||
|
||
#[derive(Debug)] | ||
struct CastToI64UDF { | ||
signature: Signature, | ||
} | ||
|
||
impl CastToI64UDF { | ||
fn new() -> Self { | ||
Self { | ||
signature: Signature::any(1, Volatility::Immutable), | ||
} | ||
} | ||
} | ||
|
||
impl ScalarUDFImpl for CastToI64UDF { | ||
fn as_any(&self) -> &dyn Any { | ||
self | ||
} | ||
fn name(&self) -> &str { | ||
"cast_to_i64" | ||
} | ||
fn signature(&self) -> &Signature { | ||
&self.signature | ||
} | ||
fn return_type(&self, _args: &[DataType]) -> Result<DataType> { | ||
Ok(DataType::Int64) | ||
} | ||
|
||
// Demonstrate simplifying a UDF | ||
fn simplify( | ||
&self, | ||
mut args: Vec<Expr>, | ||
info: &dyn SimplifyInfo, | ||
) -> Result<ExprSimplifyResult> { | ||
// DataFusion should have ensured the function is called with just a | ||
// single argument | ||
assert_eq!(args.len(), 1); | ||
let arg = args.pop().unwrap(); | ||
|
||
// Note that Expr::cast_to requires an ExprSchema but simplify gets a | ||
// SimplifyInfo so we have to replicate some of the casting logic here. | ||
|
||
let source_type = info.get_data_type(&arg)?; | ||
let new_expr = if source_type == DataType::Int64 { | ||
// the argument's data type is already the correct type | ||
arg | ||
} else { | ||
// need to use an actual cast to get the correct type | ||
Expr::Cast(datafusion_expr::Cast { | ||
expr: Box::new(arg), | ||
data_type: DataType::Int64, | ||
}) | ||
}; | ||
// return the newly written argument to DataFusion | ||
Ok(ExprSimplifyResult::Simplified(new_expr)) | ||
} | ||
|
||
fn invoke(&self, _args: &[ColumnarValue]) -> Result<ColumnarValue> { | ||
unimplemented!("Function should have been simplified prior to evaluation") | ||
} | ||
} | ||
|
||
#[tokio::test] | ||
async fn test_user_defined_functions_cast_to_i64() -> Result<()> { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you so much for this test / example -- it makes seeing how the API would work really clear. 👏 |
||
let ctx = SessionContext::new(); | ||
|
||
let schema = Arc::new(Schema::new(vec![Field::new("x", DataType::Float32, false)])); | ||
|
||
let batch = RecordBatch::try_new( | ||
schema, | ||
vec![Arc::new(Float32Array::from(vec![1.0, 2.0, 3.0]))], | ||
)?; | ||
|
||
ctx.register_batch("t", batch)?; | ||
|
||
let cast_to_i64_udf = ScalarUDF::from(CastToI64UDF::new()); | ||
ctx.register_udf(cast_to_i64_udf); | ||
|
||
let result = plan_and_collect(&ctx, "SELECT cast_to_i64(x) FROM t").await?; | ||
|
||
assert_batches_eq!( | ||
&[ | ||
"+------------------+", | ||
"| cast_to_i64(t.x) |", | ||
"+------------------+", | ||
"| 1 |", | ||
"| 2 |", | ||
"| 3 |", | ||
"+------------------+" | ||
], | ||
&result | ||
); | ||
|
||
Ok(()) | ||
} | ||
|
||
#[derive(Debug)] | ||
struct TakeUDF { | ||
signature: Signature, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out how to do
cast_to
but I think this way is OK too.