-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Fix SUM(all zeroes) to return 0 instead of NULL #65796
Conversation
Previously the SUM(all zeroes) was `NULL`, but after this change the SUM SQL function call is automatically upgraded into a `stats` aggregation instead of a `sum` aggregation. The `stats` aggregation only results in `NULL` if the there were no rows, no values to aggregate, which is the expected behaviour across different SQL implementations. This is a workaround for elastic#45251 .
Pinging @elastic/es-ql (Team:QL) |
#65792 is prerequisite of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments and questions. Also, I'd like to see a test being added to QueryTranslatorTests. Thanks.
x-pack/plugin/sql/qa/server/src/main/resources/agg-nulls-zeros.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/qa/server/src/main/resources/agg-nulls-zeros.csv-spec
Outdated
Show resolved
Hide resolved
|
||
aggregatingAllNullsWithCountStar | ||
schema::COUNT_AllNulls:l | ||
SELECT COUNT(*) as "COUNT_AllNulls" FROM logs WHERE bytes_out IS NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a test that deals with this scenario: SELECT COUNT(*) count FROM test_emp WHERE first_name IS NULL
|
||
aggregatingAllNullsWithSum | ||
schema::SUM_AllNulls:i | ||
SELECT SUM(bytes_out) as "SUM_AllNulls" FROM logs WHERE bytes_out IS NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would look in checking if adding or changing one of the entries in logs
to have bytes_in
as null
would have a big impact on the existing tests. If not, then I would make the change (either adding an entry or changing an existent one) and then a more complex query like SELECT bytes_in, SUM(bytes_in) as SUM_AllNulls, MIN(bytes_in), MAX(bytes_in), AVG(bytes_in) FROM logs WHERE bytes_in = 0 OR bytes_in IS NULL GROUP BY bytes_in
would be possible.
@@ -0,0 +1,73 @@ | |||
|
|||
aggregatingAllZerosWithFirst-Ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why -Ignore
. Also, why adding the test as sql-spec
if the test already exists in .csv-spec
?
x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left also a few comments, I agree with @astefan for having also a test in QueryTranslatorTests.
x-pack/plugin/sql/qa/server/src/main/resources/agg-nulls-zeros.csv-spec
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good however the testing needs cleaning up.
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/qa/server/src/main/resources/setup_test_emp.sql
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/qa/server/src/main/resources/agg-nulls-zeros.csv-spec
Outdated
Show resolved
Hide resolved
Why? That fix focused on PIVOT, this is a SUM. |
PIVOT is a GROUP BY with aggregations underneath. Without the #65792 change I cannot promote the SUM aggregation to I have two (+ one) options:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some minor comments, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments but otherwise LGTM.
x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java
Show resolved
Hide resolved
x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/planner/QueryTranslatorTests.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
public LogicalPlan apply(LogicalPlan plan) { | ||
final Map<Expression, Stats> statsPerField = new LinkedHashMap<>(); | ||
|
||
plan.forEachExpressionsUp(e -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but I was wondering: most forEachExpressionsUp/Down
methods invocations do pattern matching as first thing. Wouldn't an alternative method similar to Node#forEachUp/Down
taking a type token make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue lays with collections. Expressions are not just used as individual nodes but also as properties. Take Project(List<NamedExpression> projections)
- this led to issues in not only filtering but in reconstructing said collections with the new expressions while preserving their types. See the comment in LogicalPlan.doTransformExpression
It would be nicer to do:
plan.forEachExpressionsUp(s -> , Sum.class)
instead of doing the instanceof
check however the issue right now is preserving the type information before and after transformation without causing a CCE.
That said, I plan to take another look at this to see whether it can be sorted out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Previously the SUM(all zeroes) was `NULL`, but after this change the SUM SQL function call is automatically upgraded into a `stats` aggregation instead of a `sum` aggregation. The `stats` aggregation only results in `NULL` if the there were no rows, no values (all nulls) to aggregate, which is the expected behaviour across different SQL implementations. This is a workaround for the issue elastic#45251 . Once the results of the `sum` aggregation can differentiate between `SUM(all nulls)` and `SUM(all zeroes`) the optimizer rule introduced in this commit needs to be removed. (cherry-picked from b74792a)
Previously the SUM(all zeroes) was `NULL`, but after this change the SUM SQL function call is automatically upgraded into a `stats` aggregation instead of a `sum` aggregation. The `stats` aggregation only results in `NULL` if the there were no rows, no values (all nulls) to aggregate, which is the expected behaviour across different SQL implementations. This is a workaround for the issue #45251 . Once the results of the `sum` aggregation can differentiate between `SUM(all nulls)` and `SUM(all zeroes`) the optimizer rule introduced in this commit needs to be removed. (cherry-picked from b74792a)
Previously the SUM(all zeroes) was
NULL
, but after this change the SUMSQL function call is automatically upgraded into a
stats
aggregationinstead of a
sum
aggregation. Thestats
aggregation only results inNULL
if the there were no rows, no values to aggregate, which is theexpected behaviour across different SQL implementations.
This is a workaround for #45251 .