-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand LIKE simplification: cover NULL
pattern/expression and constant
#13260
base: main
Are you sure you want to change the base?
Expand LIKE simplification: cover NULL
pattern/expression and constant
#13260
Conversation
c6285bf
to
be26107
Compare
currently depends on #13259 |
seemed easy enough, done. |
88d6df1
to
6c57af7
Compare
I’m happy with my changes being included in this PR :) |
draft - to be rebased after #13259 lands still ready to review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just roughly review now. Overall looks to me. I will check the test cases tomorrow.
#13259 has been merged. 👍 |
- cover expression known not to be null - cover NULL pattern - cover repeated '%%' in pattern
e4eae46
to
520ad2b
Compare
@goldmedal rebased, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @findepi and @goldmedal
This looks like a great change to me except for the handling of %%
which I am not sure about. Otherwise 👍
@@ -2987,41 +3051,41 @@ mod tests { | |||
}) | |||
} | |||
|
|||
fn like(expr: Expr, pattern: &str) -> Expr { | |||
fn like(expr: Expr, pattern: impl Into<Expr>) -> Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use Expr::like https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html#method.like and similar here instead of these functions
This is likely left over from when the Expr API was less expressive
let expr = not_ilike(col("c1"), lit("%")); | ||
assert_eq!(simplify(expr), if_not_null(col("c1"), false)); | ||
|
||
// expr [NOT] [I]LIKE '%%' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this simplification
I thought expr LIKE '%%'
means the same as expr = '%'
(match the actual literal character %
)
Specifically that %%
is how to escape the wildcard to not be a wildcard
This test looks like it has applied the rewrite needed for expr LIKE '%'
which matches any non null input
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, since the code seems to handle more than two %
, could you add a test for patterns like
col LIKE '17%%' // 17%
col LIKE '%%five%% // %five%
col LIKE '%%%%five%%%% // %%five%%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought
expr LIKE '%%'
means the same asexpr = '%'
(match the actual literal character%
)Specifically that
%%
is how to escape the wildcard to not be a wildcard
I think it depends on how we define it. I assume we prefer to follow the Postgres behavior. 🤔
In Postgres, %%
doesn't mean matching the actual literal '%'. The escape character is \
.
test=# select '1' like '%%';
?column?
----------
t
(1 row)
test=# select '%' like '%%';
?column?
----------
t
(1 row)
test=# select '%' like '\%';
?column?
----------
t
(1 row)
test=# select '1' like '\%';
?column?
----------
f
(1 row)
I think %%
is just a redundant wildcard. So, we can simplify it to be only one %
.
By the way, in DuckDB, there is a similar behavior for double '%%'.
D select '1' like '%%';
┌───────────────┐
│ ('1' ~~ '%%') │
│ boolean │
├───────────────┤
│ true │
└───────────────┘
D select '%' like '%%';
┌───────────────┐
│ ('%' ~~ '%%') │
│ boolean │
├───────────────┤
│ true │
└───────────────┘
However, the escape character should be set by ESCAPE
syntax
D select '%' like '\%';
┌───────────────┐
│ ('%' ~~ '\%') │
│ boolean │
├───────────────┤
│ false │
└───────────────┘
D select '%' like '\%' escape '\';
┌──────────────────────────────────┐
│ main.like_escape('%', '\%', '\') │
│ boolean │
├──────────────────────────────────┤
│ true │
└──────────────────────────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was definitely confused -- thank you
}) | ||
}) | ||
} | ||
Some(pattern_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this rule (see questions below) -- maybe you could add some comments explanation on what it is supposed to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some(pattern_str) | |
Some(pattern_str) | |
// Repeated occurrences of wildcard are redundant so remove them | |
// exp LIKE '%%' --> exp LIKE '%' |
@@ -396,8 +396,9 @@ EXPLAIN SELECT | |||
FROM test; | |||
---- | |||
logical_plan | |||
01)Projection: test.column1_utf8view LIKE Utf8View("foo") AS like, test.column1_utf8view ILIKE Utf8View("foo") AS ilike | |||
02)--TableScan: test projection=[column1_utf8view] | |||
01)Projection: __common_expr_1 AS like, __common_expr_1 AS ilike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Though I think this test's meaning is now changed it is supposed to be verifying cast exprs for like
Perhaps you can change the patterns to like '%foo%'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I did some tests. I think they still use the native StringView implementation for pattern matching.
query TT
EXPLAIN SELECT
column1_utf8view like '%foo%' as "like",
column1_utf8view ilike '%foo%' as "ilike"
FROM test;
----
logical_plan
01)Projection: test.column1_utf8view LIKE Utf8View("%foo%") AS like, test.column1_utf8view ILIKE Utf8View("%foo%") AS ilike
02)--TableScan: test projection=[column1_utf8view]
NULL
pattern/expression and constant
let null = lit(ScalarValue::Utf8(None)); | ||
|
||
// expr [NOT] [I]LIKE NULL | ||
let expr = like(col("c1"), null.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let expr = like(col("c1"), null.clone()); | |
let expr = col("c1").like(null.clone()); |
As @alamb's comment (#13260 (comment)), we can do some refactor like this.
@@ -396,8 +396,9 @@ EXPLAIN SELECT | |||
FROM test; | |||
---- | |||
logical_plan | |||
01)Projection: test.column1_utf8view LIKE Utf8View("foo") AS like, test.column1_utf8view ILIKE Utf8View("foo") AS ilike | |||
02)--TableScan: test projection=[column1_utf8view] | |||
01)Projection: __common_expr_1 AS like, __common_expr_1 AS ilike |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I did some tests. I think they still use the native StringView implementation for pattern matching.
query TT
EXPLAIN SELECT
column1_utf8view like '%foo%' as "like",
column1_utf8view ilike '%foo%' as "ilike"
FROM test;
----
logical_plan
01)Projection: test.column1_utf8view LIKE Utf8View("%foo%") AS like, test.column1_utf8view ILIKE Utf8View("%foo%") AS ilike
02)--TableScan: test projection=[column1_utf8view]
since there are already two reviewers on it, I'll gonna skip this PR. However if you need my input, ping me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @findepi and @goldmedal
I had some small testing suggestions but I think we can add that coverage as a follow on PR as well
EXPR LIKE 'constant'
toexpr = 'constant'
#13061 to resolve conflicts