Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numeric, String, Boolean comparisons with literal NULL #2481

Merged
merged 1 commit into from
May 11, 2022

Conversation

WinkerDu
Copy link
Contributor

@WinkerDu WinkerDu commented May 7, 2022

Which issue does this PR close?

Closes #1179 and #2482 .

Rationale for this change

This pr aims to solve Numeric, String, Boolean comparisons (Binary Expressions) with literal NULL
For now, DF would fail when running the following SQLs:

> select * from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
+---------+---------+---------+
| column1 | column2 | column3 |
+---------+---------+---------+
| 1       | foo     | 2.3     |
| 2       | bar     | 5.4     |
+---------+---------+---------+
2 rows in set. Query took 0.004 seconds.
> select column1 < null from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
Plan("'Int64 < Utf8' can't be evaluated because there isn't a common type to coerce the types to")
> select column2 < null from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
ArrowError(ExternalError(Internal("compute_utf8_op_scalar for 'lt' failed to cast literal value NULL")))
> select column3 < null from (VALUES (1, 'foo' ,2.3), (2, 'bar', 5.4)) as t;
Plan("'Float64 < Utf8' can't be evaluated because there isn't a common type to coerce the types to")

What changes are included in this PR?

  • Introduces null_coercion to comparison_eq_coercion and comparison_order_coercion
  • Enhances binary expression macros to support NULL scalar value comparing with array, like binary_array_op_dyn_scalar

Are there any user-facing changes?

No.

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label May 7, 2022
@WinkerDu
Copy link
Contributor Author

WinkerDu commented May 8, 2022

cc @alamb @andygrove @yjshen @xudong963
Please have a review, thank you!

stringify!($OP),
)))
// when the $RIGHT is a NULL, generate a NULL array of $OP_TYPE
Ok(Arc::new(new_null_array($OP_TYPE, $LEFT.len())))
Copy link
Member

@yjshen yjshen May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think we should respect SortOptions here for ScalaValue.
  2. Question: how about the NULL != column case? Do we have an expression normalizer that asserts Nulls often coming as the right-hand operand for a binary comparison?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yjshen what do you mean SortOptions? I don't think these expressions are used directly in ORDER BY clauses -- they are used instead for expressions like a < b

Adding some more tests in sql/expr.rs for NULL < column and NULL > column sounds like a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After checking Postgres and MySQL, I found I'm wrong on the expected behavior while comparing columns with NULL using < > ..., they will return NULL in all cases. I was expecting to apply null first or null last while comparing column with NULL, so always true or always false at the first time.

Copy link
Contributor Author

@WinkerDu WinkerDu May 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb @yjshen
I agree this macro is just used in expression evaluate, maybe we can ignore SortOptions in this pr.
I'll look into this case NULL != column carefully and add more cases in ut.
Besides, I am super busy these days, I'll do it ASAP 🤟

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will spend some time writing the NULL != column

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow on PR: #2510 turns out there was a bug!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me -- thank you @WinkerDu

I think we should make sure @yjshen is satisfied with the answers about SortOptions and a few more tests would be good. But otherwise 👍

Very nice

stringify!($OP),
)))
// when the $RIGHT is a NULL, generate a NULL array of $OP_TYPE
Ok(Arc::new(new_null_array($OP_TYPE, $LEFT.len())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yjshen what do you mean SortOptions? I don't think these expressions are used directly in ORDER BY clauses -- they are used instead for expressions like a < b

Adding some more tests in sql/expr.rs for NULL < column and NULL > column sounds like a good idea.

Copy link
Member

@yjshen yjshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @WinkerDu. I think the current behavior that always yields null while comparing with null is consistent with other systems, so no more efforts are required on this.

It would be nice to guarantee expression normalization exists and always have scalar value as the right operand after normalization. I think this goes beyond the scope of this PR and we could do it as follow-ups.

Thanks for your perseverance and nice work!

@yjshen yjshen merged commit ac35f34 into apache:master May 11, 2022
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @yjshen -- thank you @WinkerDu

stringify!($OP),
)))
// when the $RIGHT is a NULL, generate a NULL array of $OP_TYPE
Ok(Arc::new(new_null_array($OP_TYPE, $LEFT.len())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will spend some time writing the NULL != column

ovr pushed a commit to cube-js/arrow-datafusion that referenced this pull request Aug 15, 2022
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request May 11, 2023
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request May 11, 2023
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Jun 9, 2023
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Jun 9, 2023
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Jan 19, 2024
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Feb 5, 2024
MazterQyou pushed a commit to cube-js/arrow-datafusion that referenced this pull request Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comparisons (BinaryExprs) to literal NULL don't work
3 participants