-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Fix groupings on empty results and HAVING on local relations #74809
SQL: Fix groupings on empty results and HAVING on local relations #74809
Conversation
Pinging @elastic/es-ql (Team:QL) |
b86477c
to
f13ac46
Compare
f13ac46
to
8503aa3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Since some rules were refactored, could you please annotate the PR with some comments on what moved where and why? Thanks!
); | ||
|
||
Batch refs = new Batch("Replace References", Limiter.ONCE, | ||
new ReplaceReferenceAttributeWithSource() | ||
); | ||
|
||
Batch pushDownFilters = new Batch("Push down filters", | ||
// must happen in a batch before ReplaceAggregationsOnLocalRelations | ||
new PushDownAndCombineFilters()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why move this rule into a separate batch instead of moving if before ReplaceAggregationsOnLocalRelationships
? The advantage of rules is that two separated filters by themselves could be combined after applying other optimization rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now managed to fix the orders of the rules such that PushDownAndCombineFilters
can remain in the original batch.
My problem is that ReplaceAggregationsInLocalRelations
, ConstantFolding
and PruneFilters
form a series of transformations that must be applied in this sequence before PushDownAndCombineFilters
. If PushDownAndCombineFilters
is applied in the middle of this sequence, it pushes down filters it shouldn't. E.g. for the query SELECT COUNT(*) HAVING COUNT(*) = 0
:
Correct sequence:
Having[count(*) = 0]
\ Agg[count(*)]
\ Local[single]
=Fold=>
Having[false]
\ Agg[1]
\ Local[single]
=Prune=>
Local[empty]
Incorrect sequence:
Having[count(*) = 0]
\ Agg[count(*)]
\ Local[single]
=Fold=>
Having[false]
\ Agg[1]
\ Local[single]
=PushDownFilter=> // pushes down because Having does not contain an aggregation anymore
Agg[1]
\ Having[false]
\ Local[single]
=Prune=>
Agg[1]
\ Local[empty]
It's like ReplaceAggregationsInLocalRelations
removes the information that we're dealing with an aggregation and the correct context is only restored with PruneFilter
.
|
||
return plan.transformExpressionsDown(AggregateFunction.class, aggregateFunction -> { | ||
if (aggregateFunction instanceof Count) { | ||
return new Literal(Source.EMPTY, count, DataTypes.LONG); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy the aggregateFunction source and datatype instead of Source.EMPTY/Long
- it's the same thing but conveys the intent better.
if (aggregateFunction instanceof Count) { | ||
return new Literal(Source.EMPTY, count, DataTypes.LONG); | ||
} else if (count == 0) { | ||
return new Literal(Source.EMPTY, null, aggregateFunction.dataType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely this needs to be DataTypes.NULL
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the NULL
type failed in SQL spec because H2 also returned the original type. I'm using Literal.of
now which also preserves the original data type.
// This optimization is only correct if `plan` has a LocalRelation descendant that is not filtered. | ||
// Because PushDownAndCombineFilters might push down filters to become descendants of `plan`, this optimization | ||
// must never run before PushDownAndCombineFilters. | ||
LocalRelation relation = unfilteredLocalRelation(plan.child()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tight relationships between rules are typically an indicator a rule does too much. Here the count evaluation only applies in case there are no filters since they potentially affect the count.
Aren't filters folded for local relationships anyway? If that's the case this rule can only expect to be applied when the child is always a LocalRelation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #74809 (comment)
protected LogicalPlan rule(Aggregate a) { | ||
boolean hasLocalRelation = a.anyMatch(LocalRelation.class::isInstance); | ||
protected LogicalPlan rule(UnaryPlan plan) { | ||
if (plan instanceof Aggregate || plan instanceof Filter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about OrderBy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I forgot about order by
. I think it will always be pruned in earlier optimizations for the relevant queries but will add it anyway.
|
||
if (isNonSkippedLocalRelation || onlyConstantAggregations) { | ||
foldedValues = folded; | ||
if(relation.executable() instanceof SingletonExecutable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad formatting.
foldedValues = folded; | ||
if(relation.executable() instanceof SingletonExecutable) { | ||
if(plan instanceof Project){ | ||
foldedValues = extractConstants(((Project) plan).projections()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extractConstants -> extractLiterals
} | ||
} | ||
return values; | ||
} | ||
|
||
} | ||
|
||
static class SkipQueryForOnlyConstantAggregations extends OptimizerRule<Aggregate> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constant -> Literal
if(e instanceof Alias){ | ||
return foldable(((Alias) e).child()); | ||
} else { | ||
return e.foldable(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is familiar - I wonder if it's somewhere else (could be moved into Expressions
).
Small nit, if you make it a one liner you can put it directly in the if statement.
if(e instanceof Alias){ | |
return foldable(((Alias) e).child()); | |
} else { | |
return e.foldable(); | |
} | |
return (e instanceof Alias ? foldable((Alias) e).child()) : e.foldable()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are a few places where we do the alias resolution manually but the methods that are called in the end are always a bit different. So if we would put it into Expression
it would need to be fairly generic. Something like T onResolved(Function<Expression, T> fn)
that could be used like e.onResolved(Expression::foldable)
.
Alternatively, it might be possible to resolve the aliases in fold
and foldable
.
@@ -195,8 +195,8 @@ public void testLocalExecWithCountAndWhereFalseFilter() { | |||
PhysicalPlan p = plan("SELECT COUNT(10), COUNT(DISTINCT 20) WHERE 1 = 2" + randomOrderByAndLimit(2)); | |||
assertEquals(LocalExec.class, p.getClass()); | |||
LocalExec le = (LocalExec) p; | |||
assertEquals(EmptyExecutable.class, le.executable().getClass()); | |||
EmptyExecutable ee = (EmptyExecutable) le.executable(); | |||
assertEquals(SingletonExecutable.class, le.executable().getClass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did the empty executable was replaced by singleton? Was the initial test wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's one of the cases where SQL engines return one row with an "empty" result. E.g. Postgres:
lukas=# SELECT COUNT(10), COUNT(DISTINCT 20) WHERE 1 = 2;
count | count
-------+-------
0 | 0
(1 row)
Thanks Costin, I've reverted everything and the PR is now ready for another round of reviews. All other comments from the first reviews are addressed and otherwise there were no changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
new SkipQueryIfFoldingProjection() | ||
new SkipQueryForOnlyLiteralAggregations(), | ||
new PushProjectionsIntoLocalRelation(), | ||
// must run after `PushProjectionsIntoLocalRelation` because it removes the the distinction between implicit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// must run after `PushProjectionsIntoLocalRelation` because it removes the the distinction between implicit | |
// must run after `PushProjectionsIntoLocalRelation` because it removes the distinction between implicit |
if (isNonSkippedLocalRelation || onlyConstantAggregations) { | ||
foldedValues = folded; | ||
if (relation.executable() instanceof SingletonExecutable) { | ||
if (plan instanceof Project) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal pref: it'd be ever so slightly more legible if you swapped the branching to match the if (plan instanceof Aggregate)
in the else if
below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to Bogdan's comment - having the same condition (check for Aggregate) makes it easier to 'parse' the two branches.
if (plan instanceof Aggregate || plan instanceof Filter || plan instanceof OrderBy) { | ||
LocalRelation relation = unfilteredLocalRelation(plan.child()); | ||
if (relation != null) { | ||
long count = (relation.executable() instanceof EmptyExecutable ? 0L : 1L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: parentheses aren't necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for iterating on this!
@@ -177,8 +178,13 @@ public LogicalPlan optimize(LogicalPlan verified) { | |||
|
|||
Batch local = new Batch("Skip Elasticsearch", | |||
new SkipQueryOnLimitZero(), | |||
new SkipQueryIfFoldingProjection() | |||
new SkipQueryForOnlyLiteralAggregations(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove Only
- SkipQueryForLiteralAggregrations
.
if (isNonSkippedLocalRelation || onlyConstantAggregations) { | ||
foldedValues = folded; | ||
if (relation.executable() instanceof SingletonExecutable) { | ||
if (plan instanceof Project) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to Bogdan's comment - having the same condition (check for Aggregate) makes it easier to 'parse' the two branches.
return plan; | ||
} | ||
|
||
private boolean foldable(Expression e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the ternary operator : return e instanceof Alias ? foldable(((Alias) e).child()) : e.foldable()
Additionally since there's no double aliasing, after unwrapping an alias, return the foldability of the child:
if (e instanceof Alias) {
e = ((Alias) e).child()
}
return e.foldable()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…astic#74809) Resolves elastic#74311 The goal of this PR is to make implicit and explicit groupings behave consistently on empty result sets no matter whether the query is run on ES or optimized to a local relation. A typical example is a query like `SELECT COUNT(*) FROM test_emp WHERE languages = 2` vs `SELECT COUNT(*) FROM test_emp WHERE 1 = 2` where the later used to produce an empty result set instead of the expected 1 row with a count of 0. The PR also fixes other edge case around contradictions in `HAVING` clauses that used to produce a wrong result: `SELECT COUNT(*) c, 'a' a HAVING COUNT(*) > 1000`
…astic#74809) Resolves elastic#74311 The goal of this PR is to make implicit and explicit groupings behave consistently on empty result sets no matter whether the query is run on ES or optimized to a local relation. A typical example is a query like `SELECT COUNT(*) FROM test_emp WHERE languages = 2` vs `SELECT COUNT(*) FROM test_emp WHERE 1 = 2` where the later used to produce an empty result set instead of the expected 1 row with a count of 0. The PR also fixes other edge case around contradictions in `HAVING` clauses that used to produce a wrong result: `SELECT COUNT(*) c, 'a' a HAVING COUNT(*) > 1000`
…4809) (#75889) Resolves #74311 The goal of this PR is to make implicit and explicit groupings behave consistently on empty result sets no matter whether the query is run on ES or optimized to a local relation. A typical example is a query like `SELECT COUNT(*) FROM test_emp WHERE languages = 2` vs `SELECT COUNT(*) FROM test_emp WHERE 1 = 2` where the later used to produce an empty result set instead of the expected 1 row with a count of 0. The PR also fixes other edge case around contradictions in `HAVING` clauses that used to produce a wrong result: `SELECT COUNT(*) c, 'a' a HAVING COUNT(*) > 1000` Co-authored-by: Lukas Wegmann <[email protected]>
…4809) (#75890) Resolves #74311 The goal of this PR is to make implicit and explicit groupings behave consistently on empty result sets no matter whether the query is run on ES or optimized to a local relation. A typical example is a query like `SELECT COUNT(*) FROM test_emp WHERE languages = 2` vs `SELECT COUNT(*) FROM test_emp WHERE 1 = 2` where the later used to produce an empty result set instead of the expected 1 row with a count of 0. The PR also fixes other edge case around contradictions in `HAVING` clauses that used to produce a wrong result: `SELECT COUNT(*) c, 'a' a HAVING COUNT(*) > 1000` Co-authored-by: Lukas Wegmann <[email protected]>
Resolves #74311
The goal of this PR is to make implicit and explicit groupings behave consistently on empty result sets no matter whether the query is run on ES or optimized to a local relation. A typical example is a query like
SELECT COUNT(*) FROM test_emp WHERE languages = 2
vsSELECT COUNT(*) FROM test_emp WHERE 1 = 2
where the later used to produce an empty result set instead of the expected 1 row with a count of 0.The PR also fixes other edge case around contradictions in
HAVING
clauses that used to produce a wrong result:SELECT COUNT(*) c, 'a' a HAVING COUNT(*) > 1000
Previously:
Now: