Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add subtrait support for IS NULL and IS NOT NULL #8093

Merged
merged 2 commits into from
Nov 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions datafusion/substrait/src/logical_plan/consumer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,10 @@ enum ScalarFunctionType {
Like,
/// [Expr::Like] Case insensitive operator counterpart of `Like`
ILike,
/// [Expr::IsNull]
IsNull,
/// [Expr::IsNotNull]
IsNotNull,
}

pub fn name_to_op(name: &str) -> Result<Operator> {
Expand Down Expand Up @@ -126,6 +130,8 @@ fn scalar_function_type_from_str(name: &str) -> Result<ScalarFunctionType> {
"not" => Ok(ScalarFunctionType::Not),
"like" => Ok(ScalarFunctionType::Like),
"ilike" => Ok(ScalarFunctionType::ILike),
"is_null" => Ok(ScalarFunctionType::IsNull),
"is_not_null" => Ok(ScalarFunctionType::IsNotNull),
others => not_impl_err!("Unsupported function name: {others:?}"),
}
}
Expand Down Expand Up @@ -880,6 +886,42 @@ pub async fn from_substrait_rex(
ScalarFunctionType::ILike => {
make_datafusion_like(true, f, input_schema, extensions).await
}
ScalarFunctionType::IsNull => {
let arg = f.arguments.first().ok_or_else(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it matters, but this code doesn't check for f.arguments.len() > 1 so I think it will silently ignore any arguments after the first.

The same comment applies to IsNotNull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On review, this is the same pattern used elsewhere in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for merging my PR! I think I could add checks for arg length here and also in other places where they are required in another PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tgujar -- A follow on to make the argument checking handle too many arguments would be most appreciated. Thank you 🙏

DataFusionError::Substrait(
"expect one argument for `IS NULL` expr".to_string(),
)
})?;
match &arg.arg_type {
Some(ArgType::Value(e)) => {
let expr = from_substrait_rex(e, input_schema, extensions)
.await?
.as_ref()
.clone();
Ok(Arc::new(Expr::IsNull(Box::new(expr))))
}
_ => not_impl_err!("Invalid arguments for IS NULL expression"),
}
}
ScalarFunctionType::IsNotNull => {
let arg = f.arguments.first().ok_or_else(|| {
DataFusionError::Substrait(
"expect one argument for `IS NOT NULL` expr".to_string(),
)
})?;
match &arg.arg_type {
Some(ArgType::Value(e)) => {
let expr = from_substrait_rex(e, input_schema, extensions)
.await?
.as_ref()
.clone();
Ok(Arc::new(Expr::IsNotNull(Box::new(expr))))
}
_ => {
not_impl_err!("Invalid arguments for IS NOT NULL expression")
}
}
}
}
}
Some(RexType::Literal(lit)) => {
Expand Down
48 changes: 47 additions & 1 deletion datafusion/substrait/src/logical_plan/producer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1025,7 +1025,53 @@ pub fn to_substrait_rex(
col_ref_offset,
extension_info,
),
_ => not_impl_err!("Unsupported expression: {expr:?}"),
Expr::IsNull(arg) => {
let arguments: Vec<FunctionArgument> = vec![FunctionArgument {
arg_type: Some(ArgType::Value(to_substrait_rex(
arg,
schema,
col_ref_offset,
extension_info,
)?)),
}];

let function_name = "is_null".to_string();
let function_anchor = _register_function(function_name, extension_info);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor, but why not call this function_reference to match the field name used below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While reviewing the code again, I found this simply follows the same pattern as the existing substrait code, so looks good to me

Copy link
Contributor

@alamb alamb Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgujar if you have time, it would also be awesome if you could make a PR that renames this variable (and other uses of _register_function to function_reference which I think would make the code cleaner

Ok(Expression {
rex_type: Some(RexType::ScalarFunction(ScalarFunction {
function_reference: function_anchor,
arguments,
output_type: None,
args: vec![],
options: vec![],
})),
})
}
Expr::IsNotNull(arg) => {
let arguments: Vec<FunctionArgument> = vec![FunctionArgument {
arg_type: Some(ArgType::Value(to_substrait_rex(
arg,
schema,
col_ref_offset,
extension_info,
)?)),
}];

let function_name = "is_not_null".to_string();
let function_anchor = _register_function(function_name, extension_info);
Ok(Expression {
rex_type: Some(RexType::ScalarFunction(ScalarFunction {
function_reference: function_anchor,
arguments,
output_type: None,
args: vec![],
options: vec![],
})),
})
}
_ => {
not_impl_err!("Unsupported expression: {expr:?}")
}
}
}

Expand Down
10 changes: 10 additions & 0 deletions datafusion/substrait/tests/cases/roundtrip_logical_plan.rs
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,16 @@ async fn simple_scalar_function_substr() -> Result<()> {
roundtrip("SELECT * FROM data WHERE a = SUBSTR('datafusion', 0, 3)").await
}

#[tokio::test]
async fn simple_scalar_function_is_null() -> Result<()> {
roundtrip("SELECT * FROM data WHERE a IS NULL").await
}

#[tokio::test]
async fn simple_scalar_function_is_not_null() -> Result<()> {
roundtrip("SELECT * FROM data WHERE a IS NOT NULL").await
}

#[tokio::test]
async fn case_without_base_expression() -> Result<()> {
roundtrip("SELECT (CASE WHEN a >= 0 THEN 'positive' ELSE 'negative' END) FROM data")
Expand Down