-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Implement null handling for IN(v1, v2, ...)
#34750
Conversation
Implemented null handling for both the value tested but also for values inside the list of values tested against. The null handling is implemented for local processors, painless scripts and Lucene Terms queries making it available for `IN` expressions occuring in `SELECT`, `WHERE` and `HAVING` clauses. Closes: elastic#34582
Pinging @elastic/es-search-aggs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the NULL
handling inside the values is incorrect since it should signify missing value.
However this can be a separate PR.
|
||
for (int i = 0; i < processsors.size() - 1; i++) { | ||
Boolean compResult = Comparisons.eq(leftValue, processsors.get(i).process(input)); | ||
if (compResult == null) { | ||
result = null; | ||
} | ||
if (compResult != null && compResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compResult
is not null already so no need to check it;
if (compResult == null) {..} else if(compResult) { ..}
return valuesOf(list, to, false); | ||
} | ||
|
||
public static <T> List<T> valuesOf(List<Expression> list, DataType to, boolean removeNulls) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why nulls need to be removed - if the expression in the list is NULL
, it should be return accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my comment here: #34750 (comment)
for (Expression rightValue : list) { | ||
Boolean compResult = Comparisons.eq(foldedLeftValue, rightValue.fold()); | ||
if (compResult != null && compResult) { | ||
if (compResult == null) { | ||
// if (value().dataType() == DataType.NULL || rightValue.dataType() == DataType.NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comments are confusing (not sure if this is the final version or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that, some mixup with git stash.
@@ -24,7 +24,7 @@ | |||
public TermsQuery(Location location, String term, List<Expression> values) { | |||
super(location); | |||
this.term = term; | |||
this.values = new LinkedHashSet<>(Foldables.valuesOf(values, values.get(0).dataType())); | |||
this.values = new LinkedHashSet<>(Foldables.valuesOf(values, values.get(0).dataType(), true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid having the overloaded method, it might be more convenient to use Collections.removeIf(p -> p == null)
since this is something local to the TermsQuery
. On the other hand, it might also imply that when the value is missing (is null), there should be a match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did that only to avoid a second iteration on the list (to remove the nulls), but I can change it. what do you think? I guess it can make a difference only if the list is really long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. I wouldn't worry about it. I'm with you regarding avoiding the iteration but again, the Foldable method looks obscure with the null exclusion.
In other words, it's not Foldable
that should handle null - I'm fain with handling it inside Terms
I wonder though if that has any implications for inline IN
.
@@ -173,6 +173,19 @@ public void testTranslateInExpression_WhereClause() throws IOException { | |||
assertEquals("keyword:(bar foo lala)", tq.asBuilder().toQuery(createShardContext()).toString()); | |||
} | |||
|
|||
public void testTranslateInExpression_WhereClauseAndNullHAndling() throws IOException { | |||
LogicalPlan p = plan("SELECT * FROM test WHERE keyword IN ('foo', null, 'lala', null, 'foo', concat('la', 'la'))"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think null
means the value is missing. That is the IN
should become a bool
query between missing
and terms
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to copy the behaviour to Postgres:
test=# select * from t1 where a in (null);
a
---
(0 rows)
test=# select * from t1 where a is null;
a
---
(4 rows)
WHERE col IN (null)
is different than WHERE col is NULL
, the first one evaluates to NULL
which in turn becomes false for WHERE and HAVING clauses.
MySQL behaves the same way too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. IN
means =
and that fails against null (need to use IS (NOT) NULL
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Left just one comment.
@@ -229,6 +229,8 @@ public static DataType fromEsType(String esType) { | |||
public boolean isCompatibleWith(DataType other) { | |||
if (this == other) { | |||
return true; | |||
} else if (this == NULL || other == NULL) { | |||
return true; | |||
} else if (isString() && other.isString()) { | |||
return true; | |||
} else if (isNumeric() && other.isNumeric()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these conditions that return true
wouldn't look better if there is one if ()
only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it only to be more readable, can combine them in one if.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use one if
with one condition evaluation per line - best of both worlds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Implemented null handling for both the value tested but also for values inside the list of values tested against. The null handling is implemented for local processors, painless scripts and Lucene Terms queries making it available for `IN` expressions occuring in `SELECT`, `WHERE` and `HAVING` clauses. Closes: #34582
Backported to |
Handle the case when `null` is the only value in the list so that it's translated to a `MatchNoDocsQuery`. Followup to: elastic#34750
Handle the case when `null` is the only value in the list so that it's translated to a `MatchNoDocsQuery`. Followup to: #34750
Handle the case when `null` is the only value in the list so that it's translated to a `MatchNoDocsQuery`. Followup to: #34750
Handle the case when `null` is the only value in the list so that it's translated to a `MatchNoDocsQuery`. Followup to: #34750
Implemented null handling for both the value tested but also for values inside the list of values tested against. The null handling is implemented for local processors, painless scripts and Lucene Terms queries making it available for `IN` expressions occuring in `SELECT`, `WHERE` and `HAVING` clauses. Closes: #34582
Handle the case when `null` is the only value in the list so that it's translated to a `MatchNoDocsQuery`. Followup to: #34750
Replace standard `||` and `==` painless operators with `or` and `eq` null safe alternatives from `InternalSqlScriptUtils`. Follow up to elastic#34750
Replace standard `||` and `==` painless operators with new `in` method introduced in `InternalSqlScriptUtils`. This allows the list of values to become a script variable which is replaced each time with the list of values provided by the user. Move In to the same package as InPipe & InProcessor Follow up to #34750 Co-authored-by: Costin Leau <[email protected]>
Replace standard `||` and `==` painless operators with new `in` method introduced in `InternalSqlScriptUtils`. This allows the list of values to become a script variable which is replaced each time with the list of values provided by the user. Move In to the same package as InPipe & InProcessor Follow up to #34750 Co-authored-by: Costin Leau <[email protected]>
Replace standard `||` and `==` painless operators with new `in` method introduced in `InternalSqlScriptUtils`. This allows the list of values to become a script variable which is replaced each time with the list of values provided by the user. Move In to the same package as InPipe & InProcessor Follow up to #34750 Co-authored-by: Costin Leau <[email protected]>
Implemented null handling for both the value tested but also for
values inside the list of values tested against.
The null handling is implemented for local processors, painless scripts
and Lucene Terms queries making it available for
IN
expressions occuringin
SELECT
,WHERE
andHAVING
clauses.Closes: #34582