Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make fields of ScalarUDF , AggregateUDF and WindowUDF non pub #8079

Merged
merged 3 commits into from
Nov 15, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Nov 7, 2023

Which issue does this PR close?

Part of #8045

Closes #8039

Rationale for this change

Rationale is that we can add new features to ScalarUDF to bring it up to parity with BuiltInScalarFunction (such as adding montonicity) in a backwards compatible way if the fields are not pub. If the fields

In order to maintain parity with the other user defined APIs (aggregates and window functions) and to prepare them for an eventual consolidation as well, apply the same to AggregateUDF and WindowUDF

see #8045 for more details

What changes are included in this PR?

Make all fields of ScalarUDF private and add accessors.
2. Change return_type to just call the function rather than passing back a function implementation

Are these changes tested?

yes, by existing tests

Are there any user-facing changes?

yes, users can no longer create ScalarUDF, AggregateUDF or WindowUDF with the struct syntax:

So previously this was allowed:

let udf = ScalarUDF {
  name: "foo".to_string()
  signature, 
  return_type,
  fun
}

Now, users must call ScalarUDF::new:

let udf = ScalarUDF::new(
  "foo",
  signature, 
  return_type,
  fun
);

@alamb alamb added the api change Changes the API exposed to users of the crate label Nov 7, 2023
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate labels Nov 7, 2023

/// Return the type of the function given its input types
pub fn return_type(&self, args: &[DataType]) -> Result<DataType> {
// Old API returns an Arc of the datatype for some reason
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another change I did was to move the handling of function pointers into the implementation of AggregateUDF etc -- so rather than passing out parts of themselves (a function pointer) the rest of DataFusion now just calls the function to get the appropriate information

This is in preparation for potentially changing the implementation of *UDF internally

WindowFunction::WindowUDF(fun) => {
Ok((*(fun.return_type)(input_expr_types)?).clone())
}
WindowFunction::AggregateUDF(fun) => fun.return_type(input_expr_types),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this now reads more nicely and idomatically

monotonicity: Option<FuncMonotonicity>,
) -> Self {
Self {
fun,
name: name.to_owned(),
args,
return_type: return_type.clone(),
return_type,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to clone within the function -- I also changed this signature so the caller can pass in a DataType if they already have it without forcing a new clone

return_type: Arc::new(return_type),
partition_evaluator_factory: Arc::new(make_partition_evaluator),
};
let dummy_window_udf = WindowUDF::new(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty good example of the changes that would be required from user code

@alamb alamb marked this pull request as ready for review November 7, 2023 16:28
Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This way we can keep the API stable during migration from BuiltinScalarFunction -> ScalarUDF
When ScalarUDF fields needs to be extended, it should look like:

let udf = ScalarUDF::new(
  "foo",
  signature, 
  return_type,
  fun
).with_aliases(["fo", "foo_func"])

@alamb
Copy link
Contributor Author

alamb commented Nov 10, 2023

When ScalarUDF fields needs to be extended, it should look like:

Yes, that is a great idea!

I am actually hoping it looks more like

let udf = ScalarUDF::new_from_trait(
  MyStructThatImplementsATrait{}
)
  .with_aliases(["fo", "foo_func"])

But we'll see !

@alamb
Copy link
Contributor Author

alamb commented Nov 11, 2023

I plan to merge this on 2023-11-14 unless anyone objects or would like additional time to review (after it has been open for a week)

@alamb
Copy link
Contributor Author

alamb commented Nov 12, 2023

@andygrove or @Dandandan I wonder if you might have time to review (and hopefully approve) this PR?

@@ -86,7 +86,7 @@ impl ExprSchemable for Expr {
.iter()
.map(|e| e.get_type(schema))
.collect::<Result<Vec<_>>>()?;
Ok((fun.return_type)(&data_types)?.as_ref().clone())
Ok(fun.return_type(&data_types)?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code looks really clean.

@tustvold tustvold merged commit 7f11125 into apache:main Nov 15, 2023
22 checks passed
@tustvold
Copy link
Contributor

tustvold commented Nov 15, 2023

Looks like this had merge conflicts, but there are also other merge conflicts in-flight... - #8186

Edit: All fixes bundled into #8187

This was referenced Nov 15, 2023
@alamb alamb deleted the alamb/close_udf_api branch November 15, 2023 12:47
@alamb
Copy link
Contributor Author

alamb commented Nov 15, 2023

Thank you everyone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants