-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: Simpify IS NULL/IS NOT NULL evaluation #103099
Conversation
Hi @costin, I've created a changelog YAML for you. |
The IS NULL, IS NOT NULL predicates only care about the nullability not the value of an expression. Given the expression x + 1 / 2, the actual result does not matter only if it's null or not - that is, it only matters if x is null or not. So x + 1 / 2 IS NULL becomes x IS NULL - which can be opportunistically pushed down or evaluated. Fix elastic#103097
Preserve the original expression to cope with under/overflow or mv fields
Pinging @elastic/es-ql (Team:QL) |
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL) |
Hi @costin, I've created a changelog YAML for you. |
This seems to work, though I can't comment on the code itself. If I run
|
@elasticsearchmachine update branch |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected some optimizer tests in the ESQL project.
* Implementation-wise this rule goes bottom-up, keeping an alias up to date to the current plan | ||
* and then looks for replacing the target. | ||
*/ | ||
public static class InferIsNotNull extends Rule<LogicalPlan, LogicalPlan> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a rule that could be used in other QL projects... but is it really going to be used there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the intent behind putting in there - there is nothing specific about it and both EQL and SQL can benefit from it since it's mv compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested one query where I am not using an explicit is not null
, but a variant that should be, in some scenarios, equivalent: where NOT(field1 IS NULL)
it is actually not treated the same. I don't know in which order the rules apply or if we should deal with not (isnull)
but I was a bit surprised by the behavior.
For ip1 is not null
the plan result from this rule is
Project[[ip1{f}#9, ip1.keyword{f}#10, ip2{f}#11, ip2.keyword{f}#12]]
\_Limit[500[INTEGER]]
\_Filter[ISNOTNULL(ip1{f}#9) AND ISNOTNULL(ip1{f}#9)]
\_EsRelation[test][ip1{f}#9, ip1.keyword{f}#10, ip2{f}#11, ip2.keyword..]
First of all, why \_Filter[ISNOTNULL(ip1{f}#9) AND ISNOTNULL(ip1{f}#9)]
(two identical conditions)?
But for not (ip1 is null)
the plan is:
Project[[ip1{f}#15, ip1.keyword{f}#16, ip2{f}#17, ip2.keyword{f}#18]]
\_Limit[500[INTEGER]]
\_Filter[NOT(ISNULL(ip1{f}#15))]
\_EsRelation[test][ip1{f}#15, ip1.keyword{f}#16, ip2{f}#17, ip2.keywor..]
not (is null)
seems to be not be equivalent to is not null
. I don't think this has to do with specific PR, likely other rules are not doing the right thing, or we don't do this replacement on purpose (though, I don't remember why in that case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general, but before merging it's worth investigating the problem identified by Andrei (\_Filter[ISNOTNULL(ip1{f}#9) AND ISNOTNULL(ip1{f}#9)]
). The fact that NOT (IS NULL)
is not optimized can probably be addressed in a follow-up PR, as it's an optimization by itself
* This handles the case of fields nested inside functions or expressions in order to avoid: | ||
* - having to evaluate the whole expression | ||
* - not pushing down the filter due to expression evaluation | ||
* IS NULL cannot be simplified since it leads to a disjunction which cannot prevents the simplified IS NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* IS NULL cannot be simplified since it leads to a disjunction which cannot prevents the simplified IS NULL
should be
* IS NULL cannot be simplified since it leads to a disjunction which prevents the simplified IS NULL
@@ -229,7 +229,7 @@ protected final boolean enableWarningsCheck() { | |||
} | |||
|
|||
public boolean logResults() { | |||
return true; | |||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
||
@Override | ||
protected boolean skipExpression(Expression e) { | ||
return e instanceof Coalesce; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This scares me a bit TBH, especially if we decide to add user-defined functions in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed - null-aware conditions are special and we don't have a marker for them as of now.
Nor for user functions which likely have their own super / marker class.
Separately I've raised - #103301 since I was expecting the duplicate to be removed by a separate rule in the chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add some tests specifically for ESQL in LocalLogicalPlanOptimizerTests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. LGTM
The IS NOT NULL predicate cares only about the nullability not
the precise value of an expression. Given the expression x + 1 / 2, the actual
result does not matter only if it's null or not which can occur if there's an
overflow/underflow, the value is multi value or if the value is null.
We cannot detect the overflow in advanced, we cannot yet detect if a field
is multi value, however we can detect if a field is not null.
In practice this means x + 1 / 2 IS NOT NULL can be prefixed with
x IS NOT NULL check:
x IS NOT NULL AND x + 1 / 2 IS NOT NULL
which can be pushed down before the expression gets evaluated.
Fix #103097