Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect ... LIKE '%' simplification with NULLs #13259

Conversation

findepi
Copy link
Member

@findepi findepi commented Nov 5, 2024

expr LIKE '%' was previously simplified to true, but the expression returns NULL when expr is null.
The conversion was conditional on !is_null(expr) which means "is not always true, i.e. is not a null literal".

This commit adds correct simplification logic. It additionally expands the rule coverage to include string view (Utf8View) and large string (LargeUtf8). This allows writing shared test cases even despite utf8_view LIKE '%' returning incorrect results at execution time (tracked by #12637). I.e. the simplification masks the bug for cases where pattern is statically known.

Which issue does this PR close?

Rationale for this change

fix correctness

What changes are included in this PR?

correctness fix for ... LIKE '%' simplification

Are these changes tested?

yes

Are there any user-facing changes?

bug fix

@findepi
Copy link
Member Author

findepi commented Nov 5, 2024

this will conflict with #13061 cc @adriangb
i asked this to be fixed in #13061 (comment) but this should have tests.
i realized fixing with tests is actually hard because of #12637, but it seems i mostly managed

@findepi findepi force-pushed the findepi/fix-incorrect-like-simplification-2c0bcb branch from 6c063d9 to eedd384 Compare November 5, 2024 11:12
{
Transformed::yes(lit(!negated))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was incorrect

`expr LIKE '%'` was previously simplified to `true`, but the expression
returns `NULL` when `expr` is null.
The conversion was conditional on `!is_null(expr)` which means "is not
always true, i.e. is not a null literal".

This commit adds correct simplification logic. It additionally expands
the rule coverage to include string view (Utf8View) and large string
(LargeUtf8). This allows writing shared test cases even despite
`utf8_view LIKE '%'` returning incorrect results at execution time
(tracked by apache#12637). I.e. the
simplification masks the bug for cases where pattern is statically
known.
@findepi findepi force-pushed the findepi/fix-incorrect-like-simplification-2c0bcb branch from eedd384 to 3085835 Compare November 5, 2024 11:33
Copy link
Contributor

@crepererum crepererum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you

Copy link
Contributor

@goldmedal goldmedal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @findepi. It looks good to me but some testing suggestions. 👍

datafusion/sqllogictest/test_files/string/string.slt Outdated Show resolved Hide resolved
@alamb alamb changed the title Fix incorrect ... LIKE '%' simplification Fix incorrect ... LIKE '%' simplification with NULLs Nov 5, 2024
@goldmedal
Copy link
Contributor

Thanks, @findepi and @crepererum for reviewing.
Because #13260 depends on this PR, I'll merge this PR and make the follow-up PR can keep going first. If anyone has more comments, let's move to #13260

@goldmedal goldmedal merged commit 6612d7c into apache:main Nov 6, 2024
25 checks passed
@findepi findepi deleted the findepi/fix-incorrect-like-simplification-2c0bcb branch November 6, 2024 07:30
findepi added a commit to sdf-labs/arrow-datafusion that referenced this pull request Nov 6, 2024
* Fix incorrect `... LIKE '%'` simplification

`expr LIKE '%'` was previously simplified to `true`, but the expression
returns `NULL` when `expr` is null.
The conversion was conditional on `!is_null(expr)` which means "is not
always true, i.e. is not a null literal".

This commit adds correct simplification logic. It additionally expands
the rule coverage to include string view (Utf8View) and large string
(LargeUtf8). This allows writing shared test cases even despite
`utf8_view LIKE '%'` returning incorrect results at execution time
(tracked by apache#12637). I.e. the
simplification masks the bug for cases where pattern is statically
known.

* fixup! Fix incorrect `... LIKE '%'` simplification

* fix tests (re review comments)
findepi added a commit to sdf-labs/arrow-datafusion that referenced this pull request Nov 6, 2024
…che#13259)

* Fix incorrect `... LIKE '%'` simplification

`expr LIKE '%'` was previously simplified to `true`, but the expression
returns `NULL` when `expr` is null.
The conversion was conditional on `!is_null(expr)` which means "is not
always true, i.e. is not a null literal".

This commit adds correct simplification logic. It additionally expands
the rule coverage to include string view (Utf8View) and large string
(LargeUtf8). This allows writing shared test cases even despite
`utf8_view LIKE '%'` returning incorrect results at execution time
(tracked by apache#12637). I.e. the
simplification masks the bug for cases where pattern is statically
known.

* fixup! Fix incorrect `... LIKE '%'` simplification

* fix tests (re review comments)
findepi added a commit to sdf-labs/arrow-datafusion that referenced this pull request Nov 6, 2024
…che#13259)

* Fix incorrect `... LIKE '%'` simplification

`expr LIKE '%'` was previously simplified to `true`, but the expression
returns `NULL` when `expr` is null.
The conversion was conditional on `!is_null(expr)` which means "is not
always true, i.e. is not a null literal".

This commit adds correct simplification logic. It additionally expands
the rule coverage to include string view (Utf8View) and large string
(LargeUtf8). This allows writing shared test cases even despite
`utf8_view LIKE '%'` returning incorrect results at execution time
(tracked by apache#12637). I.e. the
simplification masks the bug for cases where pattern is statically
known.

* fixup! Fix incorrect `... LIKE '%'` simplification

* fix tests (re review comments)
findepi added a commit to sdf-labs/arrow-datafusion that referenced this pull request Nov 6, 2024
…che#13259)

* Fix incorrect `... LIKE '%'` simplification

`expr LIKE '%'` was previously simplified to `true`, but the expression
returns `NULL` when `expr` is null.
The conversion was conditional on `!is_null(expr)` which means "is not
always true, i.e. is not a null literal".

This commit adds correct simplification logic. It additionally expands
the rule coverage to include string view (Utf8View) and large string
(LargeUtf8). This allows writing shared test cases even despite
`utf8_view LIKE '%'` returning incorrect results at execution time
(tracked by apache#12637). I.e. the
simplification masks the bug for cases where pattern is statically
known.

* fixup! Fix incorrect `... LIKE '%'` simplification

* fix tests (re review comments)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants