SQL: Fix SUM(all zeroes) to return 0 instead of NULL #65796

palesz · 2020-12-03T00:40:36Z

Previously the SUM(all zeroes) was NULL, but after this change the SUM
SQL function call is automatically upgraded into a stats aggregation
instead of a sum aggregation. The stats aggregation only results in
NULL if the there were no rows, no values to aggregate, which is the
expected behaviour across different SQL implementations.

This is a workaround for #45251 .

Previously the SUM(all zeroes) was `NULL`, but after this change the SUM SQL function call is automatically upgraded into a `stats` aggregation instead of a `sum` aggregation. The `stats` aggregation only results in `NULL` if the there were no rows, no values to aggregate, which is the expected behaviour across different SQL implementations. This is a workaround for elastic#45251 .

elasticmachine · 2020-12-03T00:40:38Z

Pinging @elastic/es-ql (Team:QL)

palesz · 2020-12-03T00:41:24Z

#65792 is prerequisite of this PR.

astefan

Left some comments and questions. Also, I'd like to see a test being added to QueryTranslatorTests. Thanks.

x-pack/plugin/sql/qa/server/src/main/resources/agg-nulls-zeros.csv-spec

astefan · 2020-12-03T06:36:08Z

x-pack/plugin/sql/qa/server/src/main/resources/agg-nulls-zeros.csv-spec

+
+aggregatingAllNullsWithCountStar
+schema::COUNT_AllNulls:l
+SELECT COUNT(*) as "COUNT_AllNulls" FROM logs WHERE bytes_out IS NULL;


We already have a test that deals with this scenario: SELECT COUNT(*) count FROM test_emp WHERE first_name IS NULL

astefan · 2020-12-03T07:03:56Z

x-pack/plugin/sql/qa/server/src/main/resources/agg-nulls-zeros.csv-spec

+
+aggregatingAllNullsWithSum
+schema::SUM_AllNulls:i
+SELECT SUM(bytes_out) as "SUM_AllNulls" FROM logs WHERE bytes_out IS NULL;


I would look in checking if adding or changing one of the entries in logs to have bytes_in as null would have a big impact on the existing tests. If not, then I would make the change (either adding an entry or changing an existent one) and then a more complex query like SELECT bytes_in, SUM(bytes_in) as SUM_AllNulls, MIN(bytes_in), MAX(bytes_in), AVG(bytes_in) FROM logs WHERE bytes_in = 0 OR bytes_in IS NULL GROUP BY bytes_in would be possible.

astefan · 2020-12-03T07:06:00Z

x-pack/plugin/sql/qa/server/src/main/resources/agg-nulls-zeros.sql-spec

@@ -0,0 +1,73 @@
+
+aggregatingAllZerosWithFirst-Ignore


Why -Ignore. Also, why adding the test as sql-spec if the test already exists in .csv-spec?

x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java

matriv

Left also a few comments, I agree with @astefan for having also a test in QueryTranslatorTests.

x-pack/plugin/sql/qa/server/src/main/resources/agg-nulls-zeros.csv-spec

x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java

costin

The fix looks good however the testing needs cleaning up.

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java

x-pack/plugin/sql/qa/server/src/main/resources/setup_test_emp.sql

x-pack/plugin/sql/qa/server/src/main/resources/agg-nulls-zeros.csv-spec

costin · 2020-12-03T17:15:06Z

#65792 is prerequisite of this PR.

Why? That fix focused on PIVOT, this is a SUM.

palesz · 2020-12-07T22:14:39Z

#65792 is prerequisite of this PR.

Why? That fix focused on PIVOT, this is a SUM.

PIVOT is a GROUP BY with aggregations underneath. Without the #65792 change I cannot promote the SUM aggregation to stats inside PIVOT and we would end up with SUM returning 0 in the GROUP BY, but returning NULL inside the PIVOT.

I have two (+ one) options:

Do SQL: Enable the InnerAggregates inside PIVOT #65792 before the fix of SUM
Don't do SQL: Enable the InnerAggregates inside PIVOT #65792 and only fix SUM inside the GROUP BY (double workaround)
Do not support SUM inside PIVOT (breaking change and lost major functionality of PIVOTs)

astefan

LGTM. Left some minor comments, though.

x-pack/plugin/sql/qa/server/src/main/resources/agg.csv-spec

costin

Left some comments but otherwise LGTM.

x-pack/plugin/sql/qa/server/src/main/resources/logs.csv

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java

x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java

x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/planner/QueryTranslatorTests.java

astefan

LGTM

bpintea · 2020-12-09T08:36:59Z

x-pack/plugin/sql/src/main/java/org/elasticsearch/xpack/sql/optimizer/Optimizer.java

+        public LogicalPlan apply(LogicalPlan plan) {
+            final Map<Expression, Stats> statsPerField = new LinkedHashMap<>();
+
+            plan.forEachExpressionsUp(e -> {


Not related to this PR, but I was wondering: most forEachExpressionsUp/Down methods invocations do pattern matching as first thing. Wouldn't an alternative method similar to Node#forEachUp/Down taking a type token make sense?

The issue lays with collections. Expressions are not just used as individual nodes but also as properties. Take Project(List<NamedExpression> projections) - this led to issues in not only filtering but in reconstructing said collections with the new expressions while preserving their types. See the comment in LogicalPlan.doTransformExpression

It would be nicer to do:
plan.forEachExpressionsUp(s -> , Sum.class) instead of doing the instanceof check however the issue right now is preserving the type information before and after transformation without causing a CCE.

That said, I plan to take another look at this to see whether it can be sorted out.

costin

LGTM

Previously the SUM(all zeroes) was `NULL`, but after this change the SUM SQL function call is automatically upgraded into a `stats` aggregation instead of a `sum` aggregation. The `stats` aggregation only results in `NULL` if the there were no rows, no values (all nulls) to aggregate, which is the expected behaviour across different SQL implementations. This is a workaround for the issue elastic#45251 . Once the results of the `sum` aggregation can differentiate between `SUM(all nulls)` and `SUM(all zeroes`) the optimizer rule introduced in this commit needs to be removed. (cherry-picked from b74792a)

Previously the SUM(all zeroes) was `NULL`, but after this change the SUM SQL function call is automatically upgraded into a `stats` aggregation instead of a `sum` aggregation. The `stats` aggregation only results in `NULL` if the there were no rows, no values (all nulls) to aggregate, which is the expected behaviour across different SQL implementations. This is a workaround for the issue #45251 . Once the results of the `sum` aggregation can differentiate between `SUM(all nulls)` and `SUM(all zeroes`) the optimizer rule introduced in this commit needs to be removed. (cherry-picked from b74792a)

Luegg · 2021-06-23T06:10:59Z

Closes #45251.

(see also #74396)

palesz added >bug :Analytics/SQL SQL querying v8.0.0 Team:QL (Deprecated) Meta label for query languages team v7.11.0 labels Dec 3, 2020

palesz requested review from costin, astefan, bpintea and matriv December 3, 2020 00:40

astefan reviewed Dec 3, 2020

View reviewed changes

matriv reviewed Dec 3, 2020

View reviewed changes

x-pack/plugin/sql/qa/server/src/main/resources/agg-nulls-zeros.csv-spec Outdated Show resolved Hide resolved

x-pack/plugin/sql/src/test/java/org/elasticsearch/xpack/sql/optimizer/OptimizerTests.java Outdated Show resolved Hide resolved

costin requested changes Dec 3, 2020

View reviewed changes

Andras Palinkas added 2 commits December 7, 2020 17:03

PR suggestions

2381ca0

Merge remote-tracking branch 'origin/master' into fix/sum-null

25fd2f0

Andras Palinkas added 2 commits December 7, 2020 17:30

Minor test fixes

ebe06e8

Merge remote-tracking branch 'origin/master' into fix/sum-null

89987b1

palesz requested review from costin, matriv and astefan December 7, 2020 22:38

astefan approved these changes Dec 8, 2020

View reviewed changes

x-pack/plugin/sql/qa/server/src/main/resources/agg.csv-spec Outdated Show resolved Hide resolved

x-pack/plugin/sql/qa/server/src/main/resources/agg.csv-spec Outdated Show resolved Hide resolved

x-pack/plugin/sql/qa/server/src/main/resources/agg.csv-spec Show resolved Hide resolved

costin approved these changes Dec 8, 2020

View reviewed changes

Andras Palinkas added 2 commits December 8, 2020 11:17

PR suggestions

051d11c

Merge remote-tracking branch 'origin/master' into fix/sum-null

92c3482

palesz requested review from costin and astefan December 8, 2020 21:13

astefan approved these changes Dec 8, 2020

View reviewed changes

bpintea reviewed Dec 9, 2020

View reviewed changes

costin approved these changes Dec 9, 2020

View reviewed changes

palesz merged commit b74792a into elastic:master Dec 9, 2020

palesz added the backport pending label Dec 9, 2020

palesz deleted the fix/sum-null branch December 9, 2020 17:21

palesz removed the backport pending label Dec 9, 2020

This was referenced Jun 21, 2021

SQL: SUM of multiple 0 values returns NULL #45251

Closed

Reference relevant issue in ReplaceSumWithStats #74396

Merged

Luegg linked an issue Jun 23, 2021 that may be closed by this pull request

SQL: SUM of multiple 0 values returns NULL #45251

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL: Fix SUM(all zeroes) to return 0 instead of NULL #65796

SQL: Fix SUM(all zeroes) to return 0 instead of NULL #65796

palesz commented Dec 3, 2020

elasticmachine commented Dec 3, 2020

palesz commented Dec 3, 2020

astefan left a comment

astefan Dec 3, 2020

astefan Dec 3, 2020

astefan Dec 3, 2020

matriv left a comment

costin left a comment

costin commented Dec 3, 2020

palesz commented Dec 7, 2020

astefan left a comment

costin left a comment

astefan left a comment

bpintea Dec 9, 2020

costin Dec 9, 2020 •

edited

Loading

costin left a comment

Luegg commented Jun 23, 2021

SQL: Fix SUM(all zeroes) to return 0 instead of NULL #65796

SQL: Fix SUM(all zeroes) to return 0 instead of NULL #65796

Conversation

palesz commented Dec 3, 2020

elasticmachine commented Dec 3, 2020

palesz commented Dec 3, 2020

astefan left a comment

Choose a reason for hiding this comment

astefan Dec 3, 2020

Choose a reason for hiding this comment

astefan Dec 3, 2020

Choose a reason for hiding this comment

astefan Dec 3, 2020

Choose a reason for hiding this comment

matriv left a comment

Choose a reason for hiding this comment

costin left a comment

Choose a reason for hiding this comment

costin commented Dec 3, 2020

palesz commented Dec 7, 2020

astefan left a comment

Choose a reason for hiding this comment

costin left a comment

Choose a reason for hiding this comment

astefan left a comment

Choose a reason for hiding this comment

bpintea Dec 9, 2020

Choose a reason for hiding this comment

costin Dec 9, 2020 • edited Loading

Choose a reason for hiding this comment

costin left a comment

Choose a reason for hiding this comment

Luegg commented Jun 23, 2021

costin Dec 9, 2020 •

edited

Loading