-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: Disallow queries with inner LIMIT that cannot be executed in ES #75960
Conversation
Pinging @elastic/es-ql (Team:QL) |
if (unary.child() instanceof Limit) { | ||
Limit limit = (Limit) unary.child(); | ||
|
||
return limit.replaceChildrenSameSize(Arrays.asList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Collections#singletonList()
might work too.
selectOrderByLimitSameOrderBy | ||
SELECT * FROM (SELECT * FROM (SELECT * FROM test_emp LIMIT 10) ORDER BY emp_no) ORDER BY emp_no LIMIT 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this (type of) sequence of sub-selects couldn't be flattened safely. The outer ones don't change the ordering and out of multiple limitations, the smallest value could be determined (there's already code doing this comparison, IIRC?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this particular query cannot be flattened as well because it also first limits and then sorts.
But I think you're right that there is a special case where the outer ORDER BY is redundant and could be eliminated as in SELECT * FROM (SELECT * FROM test_emp ORDER BY emp_no LIMIT 10) ORDER BY emp_no
. This query should be equivalent to SELECT * FROM (SELECT * FROM test_emp ORDER BY emp_no LIMIT 10)
which can safely be flattened.
I think the same principle applies for redundant filters as well: SELECT * FROM (SELECT * FROM test_emp WHERE emp_no > 10 LIMIT 10) WHERE emp_no > 10
. And maybe also GROUP BY
s.
I have the feeling this would best be implemented as an additional optimization though. A rule that eliminates redundant WHERE and ORDER BY.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that in this particular case, the query could even be flattened because the order is not specified in the inner query. This would be a somewhat surprising behavior but probably aligned with SQL's "unordered bag of tuples" semantics.
return; | ||
} | ||
if (p instanceof OrderExec) { | ||
if (hasLimit.get() && orderBy.get() != null && ((OrderExec) p).order().equals(orderBy.get()) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the removed query comment: can't we keep this "pass-through" check? I.e. chaining multiple equal orderings if they aren't combined with other clauses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments on improving the complexity of the validation rule and the top parser.
@@ -119,6 +124,21 @@ public LogicalPlan visitQueryNoWith(QueryNoWithContext ctx) { | |||
return plan; | |||
} | |||
|
|||
private LogicalPlan pullUpTopClauseFromQuerySpecification(LogicalPlan plan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a more focused approach in handling top. TOP
is handled through visitQuerySpecification
while ORDER BY
is handled inside queryNoWith
hence why one can check if the LIMIT
exists in case of an OrderBy
and do the switch then - replace the child of the Limit
with the new OrderBy
which points to the Limit
child.
|
||
for (LimitExec limit : limits) { | ||
failures.add( | ||
fail(limit, "LIMIT or TOP cannot be used in a subquery if outer query contains GROUP BY, WHERE, ORDER BY or PIVOT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please order the upper cased keywords alphabetically (for consistency).
Set<LimitExec> limits = new TreeSet<>(Comparator.comparing(l -> l.source().text())); | ||
|
||
plan.forEachDown(p -> { | ||
if (p instanceof OrderExec || p instanceof FilterExec || p instanceof PivotExec || p instanceof AggregateExec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to do double iteration - the main forEach already traverses the tree so for each matching node doing another p.forEachDown is unnecessary.
Further more I don't think it's useful to the user to see all the instances where the same type of error occurs since a query with ORDER BY and a WHERE clause will report the error twice (once per entity).
I favor the initial approach since it's O(n) - traverse the tree once. (this could be optimized by ending the traversal on the first error however it's a rare enough case that we can ignore it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for the single iteration.
Also reporting only one error shouldn't make too much of a difference but I was more thinking about the cases where there is more than one offending Limit
as in SELECT * FROM (SELECT * FROM (SELECT * FROM test_emp LIMIT 10) LIMIT 20) WHERE languages > 5
. Since this seems to be a rather far-fetched case, the simpler one-error implementation should be fine.
2c263c4
to
264e042
Compare
@elasticmachine run elasticsearch-ci/bwc |
1 similar comment
@elasticmachine run elasticsearch-ci/bwc |
@costin and @bpintea I've addressed your comments and had a look at what Bogdan mentioned about queries with redundant sort orders ( The last commit is an attempt to allow these queries and also to address #76013 . Basically, it introduces a |
@@ -165,6 +167,9 @@ public LogicalPlan optimize(LogicalPlan verified) { | |||
new PushDownAndCombineFilters() | |||
); | |||
|
|||
Batch pushDownOrderBy = new Batch("Push Down Order By", | |||
new PushDownAndCombineOrderBy()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to run isolated from PushDownAndCombineFilters
in order to avoid interfering push downs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the side effect? Keeping this rule in the main loop has the benefit of picking up plans where some orderby might be removed that would otherwise prevent the pushdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the optimizer would never converge to a stable plan in some cases as OrderBy
and Filter
would alternately be pushed down.
@@ -701,6 +701,10 @@ protected PhysicalPlan rule(OrderExec plan) { | |||
EsQueryExec exec = (EsQueryExec) plan.child(); | |||
QueryContainer qContainer = exec.queryContainer(); | |||
|
|||
if (qContainer.sort().isEmpty() == false) { | |||
throw new SqlIllegalArgumentException("QueryContainer already defines sort order"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be defensive here because the naive approach to just prepend additional sort orders leads to #76013 .
Alternatively, FoldOrderBy
(and Verifier
) could take over the responsibility to merge OrderBy
s from PushDownAndCombineOrderBy
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, FoldOrderBy (and Verifier) could take over the responsibility to merge OrderBys from PushDownAndCombineOrderBy.
I like this better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit is an attempt to allow these queries and also to address #76013 .
Please address that through a different PR (and +1 at enhancing FoldOrderBy instead of a separate rule) to keep things clean.
Otherwise LGTM.
@@ -165,6 +167,9 @@ public LogicalPlan optimize(LogicalPlan verified) { | |||
new PushDownAndCombineFilters() | |||
); | |||
|
|||
Batch pushDownOrderBy = new Batch("Push Down Order By", | |||
new PushDownAndCombineOrderBy()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the side effect? Keeping this rule in the main loop has the benefit of picking up plans where some orderby might be removed that would otherwise prevent the pushdown.
UnaryPlan child = (UnaryPlan) plan.child(); | ||
Holder<Boolean> hasAggregate = new Holder<>(false); | ||
if (child instanceof Filter) { | ||
child.forEachExpressionDown(e -> hasAggregate.set(hasAggregate.get() || Functions.isAggregate(e))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is compact but it also triggers the set/get for each expression.
if (Function.isAggregate(e)) {
hasAggregate.set(true);
}
Is more verbose but sane.
} | ||
|
||
if (hasAggregate.get() == false) { | ||
return child.replaceChildrenSameSize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could take advantage of the replaceChild
method.
return plan; | ||
} | ||
|
||
private List<Order> combineOrders(List<Order> parentOrders, List<Order> childOrders) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me how this method combines orders?
Source source = source(ctx.ORDER(), endContext); | ||
List<Order> order = visitList(ctx.orderBy(), Order.class); | ||
|
||
if (plan instanceof Limit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -701,6 +701,10 @@ protected PhysicalPlan rule(OrderExec plan) { | |||
EsQueryExec exec = (EsQueryExec) plan.child(); | |||
QueryContainer qContainer = exec.queryContainer(); | |||
|
|||
if (qContainer.sort().isEmpty() == false) { | |||
throw new SqlIllegalArgumentException("QueryContainer already defines sort order"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, FoldOrderBy (and Verifier) could take over the responsibility to merge OrderBys from PushDownAndCombineOrderBy.
I like this better.
I've reverted the last commit and will try to address #76013 in a separate PR. This means that we will have a temporary regression as will no longer support queries with redundant sorts (e.g. |
…lastic#75960) * Disallow queries with inner limits that cannot be executed correctly * address review comments * introduce PushDownAndCombineOrderBy optimization * Revert "introduce PushDownAndCombineOrderBy optimization" 8d8d992 * use replaceChild
…lastic#75960) * Disallow queries with inner limits that cannot be executed correctly * address review comments * introduce PushDownAndCombineOrderBy optimization * Revert "introduce PushDownAndCombineOrderBy optimization" 8d8d992 * use replaceChild
…75960) (#76222) * Disallow queries with inner limits that cannot be executed correctly * address review comments * introduce PushDownAndCombineOrderBy optimization * Revert "introduce PushDownAndCombineOrderBy optimization" 8d8d992 * use replaceChild Co-authored-by: Lukas Wegmann <[email protected]>
…75960) (#76223) * Disallow queries with inner limits that cannot be executed correctly * address review comments * introduce PushDownAndCombineOrderBy optimization * Revert "introduce PushDownAndCombineOrderBy optimization" 8d8d992 * use replaceChild Co-authored-by: Lukas Wegmann <[email protected]>
Resolves #75830
This PR only disallows subqueries with
LIMIT
if they cannot be translated into an semantically equivalent ES query. Thus, queries likeSELECT languages FROM (SELECT * FROM test_emp LIMIT 10)
that can be flattened correctly are still valid. Compared to disallowing all subqueries withLIMIT
, this will only break queries that might return wrong results.Additionally, this change also uncovered an issue with
TOP
. BecauseTOP
is part of thequerySpecification
syntax, theLogicalPlanBuilder
creates it as a child ofOrderBy
nodes. Thus, the querySELECT TOP 10 * FROM test_emp ORDER BY salary
used to produce the logical planthe semnatically correct plan would be
This also had the effect that some queries with both
TOP
andLIMIT
were not rejected by the parser.