-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make fields of ScalarUDF
, AggregateUDF
and WindowUDF
non pub
#8079
Conversation
|
||
/// Return the type of the function given its input types | ||
pub fn return_type(&self, args: &[DataType]) -> Result<DataType> { | ||
// Old API returns an Arc of the datatype for some reason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another change I did was to move the handling of function pointers into the implementation of AggregateUDF
etc -- so rather than passing out parts of themselves (a function pointer) the rest of DataFusion now just calls the function to get the appropriate information
This is in preparation for potentially changing the implementation of *UDF
internally
WindowFunction::WindowUDF(fun) => { | ||
Ok((*(fun.return_type)(input_expr_types)?).clone()) | ||
} | ||
WindowFunction::AggregateUDF(fun) => fun.return_type(input_expr_types), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this now reads more nicely and idomatically
monotonicity: Option<FuncMonotonicity>, | ||
) -> Self { | ||
Self { | ||
fun, | ||
name: name.to_owned(), | ||
args, | ||
return_type: return_type.clone(), | ||
return_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to clone within the function -- I also changed this signature so the caller can pass in a DataType if they already have it without forcing a new clone
return_type: Arc::new(return_type), | ||
partition_evaluator_factory: Arc::new(make_partition_evaluator), | ||
}; | ||
let dummy_window_udf = WindowUDF::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty good example of the changes that would be required from user code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. This way we can keep the API stable during migration from BuiltinScalarFunction
-> ScalarUDF
When ScalarUDF
fields needs to be extended, it should look like:
let udf = ScalarUDF::new(
"foo",
signature,
return_type,
fun
).with_aliases(["fo", "foo_func"])
Yes, that is a great idea! I am actually hoping it looks more like let udf = ScalarUDF::new_from_trait(
MyStructThatImplementsATrait{}
)
.with_aliases(["fo", "foo_func"]) But we'll see ! |
I plan to merge this on 2023-11-14 unless anyone objects or would like additional time to review (after it has been open for a week) |
@andygrove or @Dandandan I wonder if you might have time to review (and hopefully approve) this PR? |
@@ -86,7 +86,7 @@ impl ExprSchemable for Expr { | |||
.iter() | |||
.map(|e| e.get_type(schema)) | |||
.collect::<Result<Vec<_>>>()?; | |||
Ok((fun.return_type)(&data_types)?.as_ref().clone()) | |||
Ok(fun.return_type(&data_types)?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code looks really clean.
Thank you everyone |
Which issue does this PR close?
Part of #8045
Closes #8039
Rationale for this change
Rationale is that we can add new features to ScalarUDF to bring it up to parity with BuiltInScalarFunction (such as adding
montonicity
) in a backwards compatible way if the fields are not pub. If the fieldsIn order to maintain parity with the other user defined APIs (aggregates and window functions) and to prepare them for an eventual consolidation as well, apply the same to
AggregateUDF
andWindowUDF
see #8045 for more details
What changes are included in this PR?
Make all fields of ScalarUDF private and add accessors.
2. Change return_type to just call the function rather than passing back a function implementation
Are these changes tested?
yes, by existing tests
Are there any user-facing changes?
yes, users can no longer create
ScalarUDF
,AggregateUDF
orWindowUDF
with thestruct
syntax:So previously this was allowed:
Now, users must call
ScalarUDF::new
: