Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move coalesce to datafusion-functions and remove BuiltInScalarFunction #10098

Merged
merged 4 commits into from
Apr 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 0 additions & 10 deletions datafusion/core/src/datasource/listing/helpers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,16 +90,6 @@ pub fn expr_applicable_for_cols(col_names: &[String], expr: &Expr) -> bool {

Expr::ScalarFunction(scalar_function) => {
match &scalar_function.func_def {
ScalarFunctionDefinition::BuiltIn(fun) => {
match fun.volatility() {
Volatility::Immutable => Ok(TreeNodeRecursion::Continue),
// TODO: Stable functions could be `applicable`, but that would require access to the context
Volatility::Stable | Volatility::Volatile => {
is_applicable = false;
Ok(TreeNodeRecursion::Stop)
}
}
}
ScalarFunctionDefinition::UDF(fun) => {
match fun.signature().volatility {
Volatility::Immutable => Ok(TreeNodeRecursion::Continue),
Expand Down
207 changes: 0 additions & 207 deletions datafusion/expr/src/built_in_function.rs

This file was deleted.

22 changes: 3 additions & 19 deletions datafusion/expr/src/expr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ use crate::logical_plan::Subquery;
use crate::utils::expr_to_columns;
use crate::window_frame;
use crate::{
aggregate_function, built_in_function, built_in_window_function, udaf,
BuiltinScalarFunction, ExprSchemable, Operator, Signature,
aggregate_function, built_in_window_function, udaf, ExprSchemable, Operator,
Signature,
};

use arrow::datatypes::DataType;
Expand Down Expand Up @@ -362,10 +362,6 @@ impl Between {
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
/// Defines which implementation of a function for DataFusion to call.
pub enum ScalarFunctionDefinition {
/// Resolved to a `BuiltinScalarFunction`
/// There is plan to migrate `BuiltinScalarFunction` to UDF-based implementation (issue#8045)
/// This variant is planned to be removed in long term
BuiltIn(BuiltinScalarFunction),
/// Resolved to a user defined function
UDF(Arc<crate::ScalarUDF>),
/// A scalar function constructed with name. This variant can not be executed directly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, I don't think Name is ever used anymore -- we could eventually simply remove ScalarFunctionDefinition to avoid another level of wrapping.

However, I think there has been enough API churn for a while. We can maybe clean that up as a follow on PR in a few months or somthing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On argument against keeping it: if it's unused and you keep it around, someone will likely use it again and make the removal even harder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracked in #10175

Expand Down Expand Up @@ -393,7 +389,6 @@ impl ScalarFunctionDefinition {
/// Function's name for display
pub fn name(&self) -> &str {
match self {
ScalarFunctionDefinition::BuiltIn(fun) => fun.name(),
ScalarFunctionDefinition::UDF(udf) => udf.name(),
ScalarFunctionDefinition::Name(func_name) => func_name.as_ref(),
}
Expand All @@ -403,9 +398,6 @@ impl ScalarFunctionDefinition {
/// when evaluated multiple times with the same input.
pub fn is_volatile(&self) -> Result<bool> {
match self {
ScalarFunctionDefinition::BuiltIn(fun) => {
Ok(fun.volatility() == crate::Volatility::Volatile)
}
ScalarFunctionDefinition::UDF(udf) => {
Ok(udf.signature().volatility == crate::Volatility::Volatile)
}
Expand All @@ -419,14 +411,6 @@ impl ScalarFunctionDefinition {
}

impl ScalarFunction {
/// Create a new ScalarFunction expression
pub fn new(fun: built_in_function::BuiltinScalarFunction, args: Vec<Expr>) -> Self {
Self {
func_def: ScalarFunctionDefinition::BuiltIn(fun),
args,
}
}

/// Create a new ScalarFunction expression with a user-defined function (UDF)
pub fn new_udf(udf: Arc<crate::ScalarUDF>, args: Vec<Expr>) -> Self {
Self {
Expand Down Expand Up @@ -1282,7 +1266,7 @@ impl Expr {
pub fn short_circuits(&self) -> bool {
match self {
Expr::ScalarFunction(ScalarFunction { func_def, .. }) => {
matches!(func_def, ScalarFunctionDefinition::BuiltIn(fun) if *fun == BuiltinScalarFunction::Coalesce)
matches!(func_def, ScalarFunctionDefinition::UDF(fun) if fun.name().eq("coalesce"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a good way to directly refer to the coalesce function here so went with checking by name. There is another by-name check in /physical-expr/src/scalar_function.rs for make_array so I believe this is acceptable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I"m thinking if its possible if user creates udf with coalesce name

Copy link
Contributor Author

@Omega359 Omega359 Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that and my thought was that if that was done it would replace the existing coalesce function and one would think it would (hopefully) have similar functionality. If there is another way to determine it here it should be a reasonable change. Just importing the crate though isn't possible as best as I can tell because that would cause a circular dependency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the new API is_short_circuits()(default to false, when need set to true) to ScalarUDF and ScalarUDFImpl might be a good way to do this because users may want to define their own short-circuit functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an option but I think if we go that route it should be it's own PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that @haohuaijin 's idea is a good one (and keeping with the spirit of "anything we can do with built in functions can be done with ScalarAPIs"). I filed #10162 to track

}
Expr::BinaryExpr(BinaryExpr { op, .. }) => {
matches!(op, Operator::And | Operator::Or)
Expand Down
25 changes: 4 additions & 21 deletions datafusion/expr/src/expr_fn.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,15 @@

use crate::expr::{
AggregateFunction, BinaryExpr, Cast, Exists, GroupingSet, InList, InSubquery,
Placeholder, ScalarFunction, TryCast,
Placeholder, TryCast,
};
use crate::function::{
AccumulatorArgs, AccumulatorFactoryFunction, PartitionEvaluatorFactory,
};
use crate::{
aggregate_function, built_in_function, conditional_expressions::CaseBuilder,
logical_plan::Subquery, AggregateUDF, Expr, LogicalPlan, Operator,
ScalarFunctionImplementation, ScalarUDF, Signature, Volatility,
aggregate_function, conditional_expressions::CaseBuilder, logical_plan::Subquery,
AggregateUDF, Expr, LogicalPlan, Operator, ScalarFunctionImplementation, ScalarUDF,
Signature, Volatility,
};
use crate::{AggregateUDFImpl, ColumnarValue, ScalarUDFImpl, WindowUDF, WindowUDFImpl};
use arrow::datatypes::{DataType, Field};
Expand Down Expand Up @@ -478,23 +478,6 @@ pub fn is_not_unknown(expr: Expr) -> Expr {
Expr::IsNotUnknown(Box::new(expr))
}

macro_rules! nary_scalar_expr {
($ENUM:ident, $FUNC:ident, $DOC:expr) => {
#[doc = $DOC ]
pub fn $FUNC(args: Vec<Expr>) -> Expr {
Expr::ScalarFunction(ScalarFunction::new(
built_in_function::BuiltinScalarFunction::$ENUM,
args,
))
}
};
}

// generate methods for creating the supported unary/binary expressions

// math functions
nary_scalar_expr!(Coalesce, coalesce, "returns `coalesce(args...)`, which evaluates to the value of the first [Expr] which is not NULL");

/// Create a CASE WHEN statement with literal WHEN expressions for comparison to the base expression.
pub fn case(expr: Expr) -> CaseBuilder {
CaseBuilder::new(Some(Box::new(expr)), vec![], vec![], None)
Expand Down
17 changes: 0 additions & 17 deletions datafusion/expr/src/expr_schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -139,23 +139,6 @@ impl ExprSchemable for Expr {
.map(|e| e.get_type(schema))
.collect::<Result<Vec<_>>>()?;
match func_def {
ScalarFunctionDefinition::BuiltIn(fun) => {
// verify that function is invoked with correct number and type of arguments as defined in `TypeSignature`
data_types(&arg_data_types, &fun.signature()).map_err(|_| {
plan_datafusion_err!(
"{}",
utils::generate_signature_error_msg(
&format!("{fun}"),
fun.signature(),
&arg_data_types,
)
)
})?;

// perform additional function arguments validation (due to limited
// expressiveness of `TypeSignature`), then infer return type
fun.return_type(&arg_data_types)
}
ScalarFunctionDefinition::UDF(fun) => {
// verify that function is invoked with correct number and type of arguments as defined in `TypeSignature`
data_types(&arg_data_types, fun.signature()).map_err(|_| {
Expand Down
2 changes: 0 additions & 2 deletions datafusion/expr/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
//! The [expr_fn] module contains functions for creating expressions.

mod accumulator;
mod built_in_function;
mod built_in_window_function;
mod columnar_value;
mod literal;
Expand Down Expand Up @@ -60,7 +59,6 @@ pub mod window_state;

pub use accumulator::Accumulator;
pub use aggregate_function::AggregateFunction;
pub use built_in_function::BuiltinScalarFunction;
pub use built_in_window_function::BuiltInWindowFunction;
pub use columnar_value::ColumnarValue;
pub use expr::{
Expand Down
3 changes: 0 additions & 3 deletions datafusion/expr/src/tree_node.rs
Original file line number Diff line number Diff line change
Expand Up @@ -283,9 +283,6 @@ impl TreeNode for Expr {
.update_data(|be| Expr::Sort(Sort::new(be, asc, nulls_first))),
Expr::ScalarFunction(ScalarFunction { func_def, args }) => {
transform_vec(args, &mut f)?.map_data(|new_args| match func_def {
ScalarFunctionDefinition::BuiltIn(fun) => {
Ok(Expr::ScalarFunction(ScalarFunction::new(fun, new_args)))
}
ScalarFunctionDefinition::UDF(fun) => {
Ok(Expr::ScalarFunction(ScalarFunction::new_udf(fun, new_args)))
}
Expand Down
Loading