Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add field trait method to WindowUDFImpl, remove return_type/nullable #12374

Merged
merged 55 commits into from
Sep 21, 2024

Conversation

jcsherin
Copy link
Contributor

@jcsherin jcsherin commented Sep 7, 2024

Which issue does this PR close?

Closes #12373.

Rationale for this change

The result field from evaluating the user-defined window function is composed from the return_type and nullable trait methods in WindowUDFImpl.

This change explores folding both methods into a single trait method. The user-defined window functions have to implement only the field trait method which makes the intent more explicit.

The current implementation for a user-defined window function (without field trait method) looks like this:

impl WindowUDFImpl for RowNumber {
    fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> {
        Ok(DataType::UInt64)
    }

    fn nullable(&self) -> bool {
        true
    }
}

The implementation for a user-defined window function after this change:

impl WindowUDFImpl for RowNumber {
    fn field(&self, field_args: WindowUDFFieldArgs) -> Result<Field> {
        Ok(Field::new(
             field_args.name(), /* window function display name */
             DataType::UInt64,  /* result data type */
             false              /* row number is not nullable */
       ))
    }
}

What changes are included in this PR?

  1. Add field trait method:
fn field(&self, field_args: WindowUDFFieldArgs) -> Result<Field>
  1. Remove return_type trait method.
    /// What [`DataType`] will be returned by this function, given the types of
    /// the arguments
    fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>;
  2. Remove nullable trait method which was added in Convert built-in row_number to user-defined window function #12030.
  3. Add WindowUDFFieldArgs:
/// Contains metadata necessary for defining the field which represents
/// the final result of evaluating a user-defined window function.
pub struct WindowUDFFieldArgs<'a> {
    /// The data types of input expressions to the user-defined window
    /// function.
    input_types: &'a [DataType],
    /// The display name of the user-defined window function.
    function_name: &'a str,
}

Are these changes tested?

Yes, against existing tests in CI.

Are there any user-facing changes?

Yes, this is a breaking change for user-defined window functions API.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Sep 17, 2024
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Sep 17, 2024
Comment on lines +458 to +472
impl Expr {
/// Common method for window functions that applies type coercion
/// to all arguments of the window function to check if it matches
/// its signature.
///
/// If successful, this method returns the data type and
/// nullability of the window function's result.
///
/// Otherwise, returns an error if there's a type mismatch between
/// the window function's signature and the provided arguments.
fn data_type_and_nullable_with_window_function(
&self,
schema: &dyn ExprSchema,
window_function: &WindowFunction,
) -> Result<(DataType, bool)> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted a common method to handle type coercion for all window function types (built-in, udaf and udwf) which is then reused by methods:,

  • data_type_and_nullable,
  • get_type and,
  • nullable

Comment on lines -162 to -148
/// Return the type of the function given its input types
///
/// See [`WindowUDFImpl::return_type`] for more details.
pub fn return_type(&self, args: &[DataType]) -> Result<DataType> {
self.inner.return_type(args)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed return_type.

Comment on lines -181 to -165
/// Returns if column values are nullable for this window function.
/// Returns the field of the final result of evaluating this window function.
///
/// See [`WindowUDFImpl::nullable`] for more details.
pub fn nullable(&self) -> bool {
self.inner.nullable()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed nullable

)
)
})?;
let (_, function_name) = self.qualified_name();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Expr::qualified_name which also handles:

  • Expr::Column and,
  • Expr::Alias

Comment on lines +719 to +721
WindowFunctionDefinition::WindowUDF(fun) => fun
.field(WindowUDFFieldArgs::new(input_expr_types, display_name))
.map(|field| field.data_type().clone()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return data type for udwf.

@jcsherin jcsherin marked this pull request as ready for review September 17, 2024 14:27
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@jcsherin
Copy link
Contributor Author

Thanks @jayzhan211

Copy link
Contributor

@Michael-J-Ward Michael-J-Ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT:

If this is "too late" for this kind of comment, please let me know and I'll delete. I hadn't seen the issue / PR work until today.


One performance consideration is that Field::new allocates a new string on each invocation.

Could that be why WindowUDF (and both ScalarUDF and AggregateUDF) preferred to keep the methods separate?

I dug into it because needing to add empty &str to all the tests in datafusion/expr/src/expr.rs made me think it's not the right abstraction.

Lastly, do we want to diverge Aggregate and Window UDFs?

I thought the recent trend had been to unify them, like in this PR #11550 by @timsaucer?

@jayzhan211
Copy link
Contributor

jayzhan211 commented Sep 20, 2024

One performance consideration is that Field::new allocates a new string on each invocation.

I don't think there is any issue given that we usually require the whole Field

Note that data_type_and_nullable is introduced because of performance, avoid to call the function twice to get data_type and nullalbe. I think it makes sense to have a single call to get the Field which includes data_type, nullable and name. If not, maybe it indicates we need to redesign the code to ensure it is called once.

Could that be why WindowUDF (and both ScalarUDF and AggregateUDF) preferred to keep the methods separate?

Maybe historically issue?

Lastly, do we want to diverge Aggregate and Window UDFs?

I don't think so, they should be separated, no any good reason to mix them

@jcsherin
Copy link
Contributor Author

If this is "too late" for this kind of comment, please let me know and I'll delete. I hadn't seen the issue / PR work until today.

@Michael-J-Ward Appreciate the extra set of eyes. Thank you, for reviewing the code 🙌

I dug into it because needing to add empty &str to all the tests in datafusion/expr/src/expr.rs made me think it's not the right abstraction.

impl WindowFunctionDefinition {
/// Returns the datatype of the window function
pub fn return_type(
&self,
input_expr_types: &[DataType],
_input_expr_nullable: &[bool],
display_name: &str,
) -> Result<DataType> {
match self {
WindowFunctionDefinition::BuiltInWindowFunction(fun) => {
fun.return_type(input_expr_types)
}
WindowFunctionDefinition::AggregateUDF(fun) => {
fun.return_type(input_expr_types)
}
WindowFunctionDefinition::WindowUDF(fun) => fun
.field(WindowUDFFieldArgs::new(input_expr_types, display_name))
.map(|field| field.data_type().clone()),
}
}

The WindowFunctionDefintion::return_type is used only for tests. And I believe these unit tests do not provide much value.

When row_number() was converted to WindowUDF the corresponding return type unit test for row_number() was removed. And I think this applies to the remaining builtin window methods which are pending conversion to WindowUDF.

@Michael-J-Ward @jayzhan211 Thank you.

@alamb alamb added the api change Changes the API exposed to users of the crate label Sep 20, 2024
@alamb alamb changed the title Add field trait method to WindowUDFImpl Add field trait method to WindowUDFImpl, remove return_type Sep 20, 2024
@alamb alamb changed the title Add field trait method to WindowUDFImpl, remove return_type Add field trait method to WindowUDFImpl, remove return_type/nullable Sep 20, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👨‍🍳 👌

Looks really nice to me -- thank you @jcsherin and @jayzhan211

I also have it on my list to file some more sub task tickets for #8709 to remove the rest of the built in WindowFunctions

@alamb alamb merged commit e1b992a into apache:main Sep 21, 2024
28 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 21, 2024

Thanks again @jayzhan211 and @jcsherin

fn nullable(&self) -> bool {
true
}
/// The [`Field`] of the final result of evaluating this window function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be useful to document here how the "name" for the returned field is supposed to be set :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. It's a great suggestion. I'll implement in a follow-up PR.

Thanks @Blizzara

@jcsherin jcsherin deleted the field-args-for-window-udf branch September 23, 2024 18:51
bgjackma pushed a commit to bgjackma/datafusion that referenced this pull request Sep 25, 2024
…llable` (apache#12374)

* Adds new library `functions-window-common`

* Adds `FieldArgs` struct for field of final result

* Adds `field` method to `WindowUDFImpl` trait

* Minor: fixes formatting

* Fixes: udwf doc test

* Fixes: implements missing trait items

* Updates `datafusion-cli` dependencies

* Fixes: formatting of `Cargo.toml` files

* Fixes: implementation of `field` in udwf example

* Pass `FieldArgs` argument to `field`

* Use `field` in place of `return_type` for udwf

* Update `field` in udwf implementations

* Fixes: implementation of `field` in udwf example

* Revert unrelated change

* Mark `return_type` for udwf as unreachable

* Delete code

* Uses schema name of udwf to construct `FieldArgs`

* Adds deprecated notice to `return_type` trait method

* Add doc comments to `field` trait method

* Reify `input_types` when creating the udwf window expression

* Rename name field to `schema_name` in `FieldArgs`

* Make `FieldArgs` opaque

* Minor refactor

* Removes `nullable` trait method from `WindowUDFImpl`

* Add doc comments

* Rename to `WindowUDFResultArgs`

* Minor: fixes formatting

* Copy edits for doc comments

* Renames field to `function_name`

* Rename struct to `WindowUDFFieldArgs`

* Add comments for unreachable code

* Copy edit for `WindowUDFImpl::field` trait method

* Renames module

* Fix warning: unused doc comment

* Minor: rename bindings

* Minor refactor

* Minor: copy edit

* Fixes: use `Expr::qualified_name` for window function name

* Fixes: apply previous fix to `Expr::nullable`

* Refactor: reuse type coercion for window functions

* Fixes: clippy errors

* Adds name parameter to `WindowFunctionDefinition::return_type`

* Removes `return_type` field from `SimpleWindowUDF`

* Add doc comment for helper method

* Rewrite doc comments

* Minor: remove empty comment

* Remove `WindowUDFImpl::return_type`

* Fixes doc test
@alamb
Copy link
Contributor

alamb commented Sep 27, 2024

I also have it on my list to file some more sub task tickets for #8709 to remove the rest of the built in WindowFunctions

I filed a few more tickets to hopefully get this process started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add field trait method to WindowUDFImpl
5 participants