Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Enable the InnerAggregates inside PIVOT #65792

Merged
merged 6 commits into from
Dec 7, 2020

Conversation

palesz
Copy link
Contributor

@palesz palesz commented Dec 2, 2020

  • Remove the limitation of not being able use InnerAggregate inside
    PIVOT (aggregations using extended and matrix stats)
  • Integration tests that cover the extended stats and the matrix stats aggregations

* Remove the limitation of not being able use `InnerAggregate` inside
PIVOT (aggregations using extended and matrix stats)
@palesz palesz added >feature :Analytics/SQL SQL querying v8.0.0 Team:QL (Deprecated) Meta label for query languages team v7.11.0 labels Dec 2, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@astefan
Copy link
Contributor

astefan commented Dec 3, 2020

Why were the aggregates not supported inside PIVOT and what changed that made them now useable there?

@palesz
Copy link
Contributor Author

palesz commented Dec 3, 2020

The limitation was introduced as part of the original PIVOT implementation: #46489 (comment) , d668751 .
The comment from @costin mentions that after the NamedExpression refactor (#49693), probably should not be issues with the InnerAggregates inside PIVOT.

Note: I will beef up the tests on this, and add extra, more complicated test-cases specifically mentioned in the original comment by @costin. For example: ROUND(SUM_OF_SQUARES(...)).

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reviewers list is incomplete

@costin
Copy link
Member

costin commented Dec 3, 2020

Note: I will beef up the tests on this, and add extra, more complicated test-cases specifically mentioned in the original comment by @costin. For example: ROUND(SUM_OF_SQUARES(...)).

I'm in favor of removing code however when dealing with known issues, the tests are critical to assess that there are no regressions. In fact, for this type of PR they become the main part - without them this PR is incomplete.

@@ -201,6 +201,29 @@ null |10044 |Mingsen |F |1994-05-21
// end::sumWithoutSubquery
;

sumWithInnerAggregateSumOfSquares
schema::birth_date:ts|emp_no:i|first_name:s|gender:s|hire_date:ts|last_name:s|1:d|2:d
SELECT * FROM test_emp PIVOT (SUM_OF_SQUARES(salary) FOR languages IN (1, 2)) LIMIT 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a test with multiple arguments that get folded into compound aggs, such as sum / min / max.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean multiple aggregations inside the PIVOT? Currently there is a check that we can only have a single aggregation inside the PIVOT, so we can't have PIVOT(MIN(salary), MAX(salary) FOR ...).

This change only lifts the limitation on the type of that single aggregation within the PIVOT, so aggregations that are translated into an InnerAggregate will be allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any validation or check for this (now that PostOptimizerTest has been removed)? this goes beyond the scope of this ticket so please raise a ticket if this functionality is missing or needs to be further investigated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PostOptimizerTests checked that the type of the aggregation cannot be one that results in an InnerAggregate vs the number of the aggregations inside the PIVOT.

There is already a check in the LogicalPlanBuilder that only allows one aggregation inside the PIVOT:

throw new ParsingException(source(pivotClause.aggs), "PIVOT currently supports only one aggregation, found [{}]",

I cannot see it tested in unit tests, but it is there and works. Added: #65894

@palesz palesz requested review from matriv and bpintea December 3, 2020 20:05
Test that the `PIVOT` results in the same query as
the `GROUP BY`. This should hold across all the `AggregateFunction`s we
have.

client_port | 'OK' | 'Error'
---------------+------------------+---------------
null |2.002 |-0.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #65863

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left another round of comments.

String q = ee.toString().replaceAll("\\s+", "");
assertThat(q, containsString("\"query\":{\"terms\":{\"keyword\":[\"A\",\"B\"]"));
String a = ee.queryContainer().aggs().asAggBuilder().toString().replaceAll("\\s+", "");
assertThat(a, containsString("\"terms\":{\"field\":\"bool\""));
assertThat(a, containsString("\"terms\":{\"field\":\"keyword\""));
assertThat(a, containsString("{\"avg\":{\"field\":\"int\"}"));
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to get the list from the Function registry so that all existing aggs and future ones are picked up without having to manually update this test (since it is unlikely it will happen).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Yep, that sounds like a better solution (was thinking about possible going with reflection, but this is better).

@@ -201,6 +201,29 @@ null |10044 |Mingsen |F |1994-05-21
// end::sumWithoutSubquery
;

sumWithInnerAggregateSumOfSquares
schema::birth_date:ts|emp_no:i|first_name:s|gender:s|hire_date:ts|last_name:s|1:d|2:d
SELECT * FROM test_emp PIVOT (SUM_OF_SQUARES(salary) FOR languages IN (1, 2)) LIMIT 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any validation or check for this (now that PostOptimizerTest has been removed)? this goes beyond the scope of this ticket so please raise a ticket if this functionality is missing or needs to be further investigated.

@palesz palesz requested a review from costin December 4, 2020 18:32
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

String q = ee.toString().replaceAll("\\s+", "");
assertThat(q, containsString("\"query\":{\"terms\":{\"keyword\":[\"A\",\"B\"]"));
String a = ee.queryContainer().aggs().asAggBuilder().toString().replaceAll("\\s+", "");
assertThat(a, containsString("\"terms\":{\"field\":\"bool\""));
assertThat(a, containsString("\"terms\":{\"field\":\"keyword\""));
assertThat(a, containsString("{\"avg\":{\"field\":\"int\"}"));
}

public void testPivotHasSameQueryAsGroupBy() {
final Map<String, String> aggFnsWithMultipleArguments = Map.of(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use a different construct to ease backporting to 7.x due to the JDK difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this. I would prefer using the new APIs on master and change during backporting to Java 8 compatible, so moving forward we use the newer APIs.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@palesz palesz merged commit 67704b0 into elastic:master Dec 7, 2020
Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one issue that needs further analysis (added a comment for pivot.csv-spec). Still reviewing.

@@ -201,6 +201,51 @@ null |10044 |Mingsen |F |1994-05-21
// end::sumWithoutSubquery
;

sumWithInnerAggregateSumOfSquares
schema::birth_date:ts|emp_no:i|first_name:s|gender:s|hire_date:ts|last_name:s|1:d|2:d
SELECT * FROM test_emp PIVOT (SUM_OF_SQUARES(salary) FOR languages IN (1, 2)) LIMIT 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query fails for me in a REST test (using a REST client): it's only returning 3 rows, whereas it should have returned 5. We need to look into why the test passes and/or why the REST call with the same query fails. @palesz please, open an issue for this investigation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JDBC integration tests fetch all the rows and follow the cursor (hence 5 rows shows up there). The tests are valid and successful, no issue there. Created #65982 to investigate this unexpected behaviour related to PIVOT+LIMIT.

@astefan
Copy link
Contributor

astefan commented Dec 7, 2020

LGTM otherwise.

palesz pushed a commit to palesz/elasticsearch that referenced this pull request Dec 7, 2020
* Remove the limitation of not being able to use `InnerAggregate`
inside PIVOTs (aggregations using extended and matrix stats)
* The limitation was introduced as part of the original `PIVOT`
implementation in elastic#46489, but after elastic#49693 it could be lifted.
* Test that the `PIVOT` results in the same query as the
`GROUP BY`. This should hold across all the
`AggregateFunction`s we have.
(cherry-pick 67704b0)
@palesz palesz deleted the feature/pivot-inneraggregates branch December 7, 2020 22:34
palesz pushed a commit that referenced this pull request Dec 7, 2020
* Remove the limitation of not being able to use `InnerAggregate`
inside PIVOTs (aggregations using extended and matrix stats)
* The limitation was introduced as part of the original `PIVOT`
implementation in #46489, but after #49693 it could be lifted.
* Test that the `PIVOT` results in the same query as the
`GROUP BY`. This should hold across all the
`AggregateFunction`s we have.

(cherry-picked from  67704b0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants