Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Fix groupings on empty results and HAVING on local relations #74809

Merged
merged 15 commits into from
Jul 30, 2021

Conversation

Luegg
Copy link
Contributor

@Luegg Luegg commented Jul 1, 2021

Resolves #74311

The goal of this PR is to make implicit and explicit groupings behave consistently on empty result sets no matter whether the query is run on ES or optimized to a local relation. A typical example is a query like SELECT COUNT(*) FROM test_emp WHERE languages = 2 vs SELECT COUNT(*) FROM test_emp WHERE 1 = 2 where the later used to produce an empty result set instead of the expected 1 row with a count of 0.

The PR also fixes other edge case around contradictions in HAVING clauses that used to produce a wrong result: SELECT COUNT(*) c, 'a' a HAVING COUNT(*) > 1000

Previously:

       c       |       a       
---------------+---------------
1              |a              

Now:

       c       |       a       
---------------+---------------

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (Team:QL)

@Luegg Luegg force-pushed the fix/aggregatesWoGroupByAndFoldableFilter branch 2 times, most recently from b86477c to f13ac46 Compare July 1, 2021 08:20
@Luegg Luegg removed request for costin, astefan, bpintea and matriv July 1, 2021 08:28
@Luegg Luegg marked this pull request as draft July 1, 2021 08:36
@Luegg Luegg force-pushed the fix/aggregatesWoGroupByAndFoldableFilter branch from f13ac46 to 8503aa3 Compare July 1, 2021 08:55
@Luegg Luegg marked this pull request as ready for review July 1, 2021 11:48
@Luegg Luegg requested review from costin, matriv, astefan and bpintea July 1, 2021 11:48
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Since some rules were refactored, could you please annotate the PR with some comments on what moved where and why? Thanks!

);

Batch refs = new Batch("Replace References", Limiter.ONCE,
new ReplaceReferenceAttributeWithSource()
);

Batch pushDownFilters = new Batch("Push down filters",
// must happen in a batch before ReplaceAggregationsOnLocalRelations
new PushDownAndCombineFilters());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why move this rule into a separate batch instead of moving if before ReplaceAggregationsOnLocalRelationships? The advantage of rules is that two separated filters by themselves could be combined after applying other optimization rules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now managed to fix the orders of the rules such that PushDownAndCombineFilters can remain in the original batch.

My problem is that ReplaceAggregationsInLocalRelations, ConstantFolding and PruneFilters form a series of transformations that must be applied in this sequence before PushDownAndCombineFilters. If PushDownAndCombineFilters is applied in the middle of this sequence, it pushes down filters it shouldn't. E.g. for the query SELECT COUNT(*) HAVING COUNT(*) = 0:

Correct sequence:

Having[count(*) = 0]
\ Agg[count(*)]
 \ Local[single]

=Fold=>

Having[false]
\ Agg[1]
 \ Local[single]

=Prune=>

Local[empty]

Incorrect sequence:

Having[count(*) = 0]
\ Agg[count(*)]
 \ Local[single]

=Fold=>

Having[false]
\ Agg[1]
 \ Local[single]

=PushDownFilter=> // pushes down because Having does not contain an aggregation anymore

Agg[1]
 \ Having[false]
  \ Local[single]

=Prune=>

Agg[1]
\ Local[empty]

It's like ReplaceAggregationsInLocalRelations removes the information that we're dealing with an aggregation and the correct context is only restored with PruneFilter.


return plan.transformExpressionsDown(AggregateFunction.class, aggregateFunction -> {
if (aggregateFunction instanceof Count) {
return new Literal(Source.EMPTY, count, DataTypes.LONG);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy the aggregateFunction source and datatype instead of Source.EMPTY/Long - it's the same thing but conveys the intent better.

if (aggregateFunction instanceof Count) {
return new Literal(Source.EMPTY, count, DataTypes.LONG);
} else if (count == 0) {
return new Literal(Source.EMPTY, null, aggregateFunction.dataType());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely this needs to be DataTypes.NULL as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the NULL type failed in SQL spec because H2 also returned the original type. I'm using Literal.of now which also preserves the original data type.

// This optimization is only correct if `plan` has a LocalRelation descendant that is not filtered.
// Because PushDownAndCombineFilters might push down filters to become descendants of `plan`, this optimization
// must never run before PushDownAndCombineFilters.
LocalRelation relation = unfilteredLocalRelation(plan.child());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tight relationships between rules are typically an indicator a rule does too much. Here the count evaluation only applies in case there are no filters since they potentially affect the count.
Aren't filters folded for local relationships anyway? If that's the case this rule can only expect to be applied when the child is always a LocalRelation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protected LogicalPlan rule(Aggregate a) {
boolean hasLocalRelation = a.anyMatch(LocalRelation.class::isInstance);
protected LogicalPlan rule(UnaryPlan plan) {
if (plan instanceof Aggregate || plan instanceof Filter) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about OrderBy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I forgot about order by. I think it will always be pruned in earlier optimizations for the relevant queries but will add it anyway.


if (isNonSkippedLocalRelation || onlyConstantAggregations) {
foldedValues = folded;
if(relation.executable() instanceof SingletonExecutable) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad formatting.

foldedValues = folded;
if(relation.executable() instanceof SingletonExecutable) {
if(plan instanceof Project){
foldedValues = extractConstants(((Project) plan).projections());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extractConstants -> extractLiterals

}
}
return values;
}

}

static class SkipQueryForOnlyConstantAggregations extends OptimizerRule<Aggregate> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constant -> Literal

Comment on lines 1263 to 1267
if(e instanceof Alias){
return foldable(((Alias) e).child());
} else {
return e.foldable();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is familiar - I wonder if it's somewhere else (could be moved into Expressions).
Small nit, if you make it a one liner you can put it directly in the if statement.

Suggested change
if(e instanceof Alias){
return foldable(((Alias) e).child());
} else {
return e.foldable();
}
return (e instanceof Alias ? foldable((Alias) e).child()) : e.foldable());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are a few places where we do the alias resolution manually but the methods that are called in the end are always a bit different. So if we would put it into Expression it would need to be fairly generic. Something like T onResolved(Function<Expression, T> fn) that could be used like e.onResolved(Expression::foldable).

Alternatively, it might be possible to resolve the aliases in fold and foldable.

@@ -195,8 +195,8 @@ public void testLocalExecWithCountAndWhereFalseFilter() {
PhysicalPlan p = plan("SELECT COUNT(10), COUNT(DISTINCT 20) WHERE 1 = 2" + randomOrderByAndLimit(2));
assertEquals(LocalExec.class, p.getClass());
LocalExec le = (LocalExec) p;
assertEquals(EmptyExecutable.class, le.executable().getClass());
EmptyExecutable ee = (EmptyExecutable) le.executable();
assertEquals(SingletonExecutable.class, le.executable().getClass());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did the empty executable was replaced by singleton? Was the initial test wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that's one of the cases where SQL engines return one row with an "empty" result. E.g. Postgres:

lukas=# SELECT COUNT(10), COUNT(DISTINCT 20) WHERE 1 = 2;
 count | count
-------+-------
     0 |     0
(1 row)

@Luegg
Copy link
Contributor Author

Luegg commented Jul 9, 2021

Thanks Costin, I've reverted everything and the PR is now ready for another round of reviews. All other comments from the first reviews are addressed and otherwise there were no changes.

Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

new SkipQueryIfFoldingProjection()
new SkipQueryForOnlyLiteralAggregations(),
new PushProjectionsIntoLocalRelation(),
// must run after `PushProjectionsIntoLocalRelation` because it removes the the distinction between implicit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// must run after `PushProjectionsIntoLocalRelation` because it removes the the distinction between implicit
// must run after `PushProjectionsIntoLocalRelation` because it removes the distinction between implicit

if (isNonSkippedLocalRelation || onlyConstantAggregations) {
foldedValues = folded;
if (relation.executable() instanceof SingletonExecutable) {
if (plan instanceof Project) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal pref: it'd be ever so slightly more legible if you swapped the branching to match the if (plan instanceof Aggregate) in the else if below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Bogdan's comment - having the same condition (check for Aggregate) makes it easier to 'parse' the two branches.

if (plan instanceof Aggregate || plan instanceof Filter || plan instanceof OrderBy) {
LocalRelation relation = unfilteredLocalRelation(plan.child());
if (relation != null) {
long count = (relation.executable() instanceof EmptyExecutable ? 0L : 1L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: parentheses aren't necessary.

@joegallo joegallo added v7.13.5 and removed v7.13.4 labels Jul 19, 2021
Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for iterating on this!

@@ -177,8 +178,13 @@ public LogicalPlan optimize(LogicalPlan verified) {

Batch local = new Batch("Skip Elasticsearch",
new SkipQueryOnLimitZero(),
new SkipQueryIfFoldingProjection()
new SkipQueryForOnlyLiteralAggregations(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove Only - SkipQueryForLiteralAggregrations.

if (isNonSkippedLocalRelation || onlyConstantAggregations) {
foldedValues = folded;
if (relation.executable() instanceof SingletonExecutable) {
if (plan instanceof Project) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Bogdan's comment - having the same condition (check for Aggregate) makes it easier to 'parse' the two branches.

return plan;
}

private boolean foldable(Expression e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the ternary operator : return e instanceof Alias ? foldable(((Alias) e).child()) : e.foldable()
Additionally since there's no double aliasing, after unwrapping an alias, return the foldability of the child:

if (e instanceof Alias) {
 e = ((Alias) e).child()
}
return e.foldable()

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Luegg Luegg added the auto-backport Automatically create backport pull requests when merged label Jul 30, 2021
@Luegg Luegg merged commit 54bdc09 into elastic:master Jul 30, 2021
@Luegg Luegg deleted the fix/aggregatesWoGroupByAndFoldableFilter branch July 30, 2021 13:28
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Jul 30, 2021
…astic#74809)

Resolves elastic#74311

The goal of this PR is to make implicit and explicit groupings behave consistently on empty result sets no matter whether the query is run on ES or optimized to a local relation. A typical example is a query like `SELECT COUNT(*) FROM test_emp WHERE languages = 2` vs `SELECT COUNT(*) FROM test_emp WHERE 1 = 2` where the later used to produce an empty result set instead of the expected 1 row with a count of 0.

The PR also fixes other edge case around contradictions in `HAVING` clauses that used to produce a wrong result: `SELECT COUNT(*) c, 'a' a HAVING COUNT(*) > 1000`
elasticsearchmachine pushed a commit to elasticsearchmachine/elasticsearch that referenced this pull request Jul 30, 2021
…astic#74809)

Resolves elastic#74311

The goal of this PR is to make implicit and explicit groupings behave consistently on empty result sets no matter whether the query is run on ES or optimized to a local relation. A typical example is a query like `SELECT COUNT(*) FROM test_emp WHERE languages = 2` vs `SELECT COUNT(*) FROM test_emp WHERE 1 = 2` where the later used to produce an empty result set instead of the expected 1 row with a count of 0.

The PR also fixes other edge case around contradictions in `HAVING` clauses that used to produce a wrong result: `SELECT COUNT(*) c, 'a' a HAVING COUNT(*) > 1000`
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
7.14
7.x

elasticsearchmachine added a commit that referenced this pull request Jul 30, 2021
…4809) (#75889)

Resolves #74311

The goal of this PR is to make implicit and explicit groupings behave consistently on empty result sets no matter whether the query is run on ES or optimized to a local relation. A typical example is a query like `SELECT COUNT(*) FROM test_emp WHERE languages = 2` vs `SELECT COUNT(*) FROM test_emp WHERE 1 = 2` where the later used to produce an empty result set instead of the expected 1 row with a count of 0.

The PR also fixes other edge case around contradictions in `HAVING` clauses that used to produce a wrong result: `SELECT COUNT(*) c, 'a' a HAVING COUNT(*) > 1000`

Co-authored-by: Lukas Wegmann <[email protected]>
elasticsearchmachine added a commit that referenced this pull request Jul 30, 2021
…4809) (#75890)

Resolves #74311

The goal of this PR is to make implicit and explicit groupings behave consistently on empty result sets no matter whether the query is run on ES or optimized to a local relation. A typical example is a query like `SELECT COUNT(*) FROM test_emp WHERE languages = 2` vs `SELECT COUNT(*) FROM test_emp WHERE 1 = 2` where the later used to produce an empty result set instead of the expected 1 row with a count of 0.

The PR also fixes other edge case around contradictions in `HAVING` clauses that used to produce a wrong result: `SELECT COUNT(*) c, 'a' a HAVING COUNT(*) > 1000`

Co-authored-by: Lukas Wegmann <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying auto-backport Automatically create backport pull requests when merged >bug Team:QL (Deprecated) Meta label for query languages team v7.14.1 v7.15.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SQL: Empty result set on aggregates with filter foldable to false
9 participants