Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Disallow queries with inner LIMIT that cannot be executed in ES #75960

Merged
merged 10 commits into from
Aug 9, 2021
Original file line number Diff line number Diff line change
Expand Up @@ -153,14 +153,10 @@ selectOrderByOrderByOrderByLimit
SELECT * FROM (SELECT * FROM (SELECT * FROM test_emp ORDER BY emp_no DESC) ORDER BY emp_no ASC) ORDER BY emp_no DESC LIMIT 5;
selectOrderByOrderByOrderByLimitLimit
SELECT * FROM (SELECT * FROM (SELECT * FROM (SELECT * FROM test_emp ORDER BY emp_no DESC) ORDER BY emp_no ASC) ORDER BY emp_no DESC LIMIT 12) LIMIT 6;
selectOrderByLimitSameOrderBy
SELECT * FROM (SELECT * FROM (SELECT * FROM test_emp LIMIT 10) ORDER BY emp_no) ORDER BY emp_no LIMIT 5;
Comment on lines -156 to -157
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this (type of) sequence of sub-selects couldn't be flattened safely. The outer ones don't change the ordering and out of multiple limitations, the smallest value could be determined (there's already code doing this comparison, IIRC?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this particular query cannot be flattened as well because it also first limits and then sorts.

But I think you're right that there is a special case where the outer ORDER BY is redundant and could be eliminated as in SELECT * FROM (SELECT * FROM test_emp ORDER BY emp_no LIMIT 10) ORDER BY emp_no. This query should be equivalent to SELECT * FROM (SELECT * FROM test_emp ORDER BY emp_no LIMIT 10) which can safely be flattened.

I think the same principle applies for redundant filters as well: SELECT * FROM (SELECT * FROM test_emp WHERE emp_no > 10 LIMIT 10) WHERE emp_no > 10. And maybe also GROUP BYs.

I have the feeling this would best be implemented as an additional optimization though. A rule that eliminates redundant WHERE and ORDER BY.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that in this particular case, the query could even be flattened because the order is not specified in the inner query. This would be a somewhat surprising behavior but probably aligned with SQL's "unordered bag of tuples" semantics.

selectGroupByOrderByLimit
SELECT * FROM (SELECT max(salary) AS max, languages FROM test_emp GROUP BY languages) ORDER BY max DESC LIMIT 3;
selectGroupByOrderByLimitNulls
SELECT * FROM (SELECT max(salary) AS max, languages FROM test_emp GROUP BY languages) ORDER BY max DESC NULLS FIRST LIMIT 3;
selectGroupByLimitOrderByLimit
SELECT * FROM (SELECT max(salary) AS max, languages FROM test_emp WHERE languages IS NOT NULL GROUP BY languages LIMIT 5) ORDER BY max DESC LIMIT 3;
selectGroupByOrderByOrderByLimit
SELECT * FROM (SELECT max(salary) AS max, languages FROM test_emp GROUP BY languages ORDER BY max ASC) ORDER BY max DESC NULLS FIRST LIMIT 4;
selectGroupByOrderByOrderByLimitNulls
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,16 @@ public LogicalPlan visitQueryNoWith(QueryNoWithContext ctx) {
if (ctx.orderBy().isEmpty() == false) {
List<OrderByContext> orders = ctx.orderBy();
OrderByContext endContext = orders.get(orders.size() - 1);
plan = new OrderBy(source(ctx.ORDER(), endContext), plan, visitList(ctx.orderBy(), Order.class));
Source source = source(ctx.ORDER(), endContext);
List<Order> order = visitList(ctx.orderBy(), Order.class);

if (plan instanceof Limit) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// Limit from TOP clauses must be the parent of the OrderBy clause
Limit limit = (Limit) plan;
plan = limit.replaceChild(new OrderBy(source, limit.child(), order));
} else {
plan = new OrderBy(source, plan, order);
}
}

LimitClauseContext limitClause = ctx.limitClause();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,14 @@
package org.elasticsearch.xpack.sql.planner;

import org.elasticsearch.xpack.ql.common.Failure;
import org.elasticsearch.xpack.ql.expression.Order;
import org.elasticsearch.xpack.ql.util.Holder;
import org.elasticsearch.xpack.sql.plan.physical.AggregateExec;
import org.elasticsearch.xpack.sql.plan.physical.FilterExec;
import org.elasticsearch.xpack.sql.plan.physical.LimitExec;
import org.elasticsearch.xpack.sql.plan.physical.OrderExec;
import org.elasticsearch.xpack.sql.plan.physical.PhysicalPlan;
import org.elasticsearch.xpack.sql.plan.physical.PivotExec;
import org.elasticsearch.xpack.sql.plan.physical.UnaryExec;
import org.elasticsearch.xpack.sql.plan.physical.Unexecutable;
import org.elasticsearch.xpack.sql.plan.physical.UnplannedExec;

Expand Down Expand Up @@ -59,20 +62,23 @@ static List<Failure> verifyExecutingPlan(PhysicalPlan plan) {
}

private static void checkForNonCollapsableSubselects(PhysicalPlan plan, List<Failure> failures) {
Holder<Boolean> hasLimit = new Holder<>(Boolean.FALSE);
Holder<List<Order>> orderBy = new Holder<>();
Holder<LimitExec> limit = new Holder<>();
Holder<UnaryExec> limitedExec = new Holder<>();

plan.forEachUp(p -> {
if (hasLimit.get() == false && p instanceof LimitExec) {
hasLimit.set(Boolean.TRUE);
return;
}
if (p instanceof OrderExec) {
if (hasLimit.get() && orderBy.get() != null && ((OrderExec) p).order().equals(orderBy.get()) == false) {
Copy link
Contributor

@bpintea bpintea Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to the removed query comment: can't we keep this "pass-through" check? I.e. chaining multiple equal orderings if they aren't combined with other clauses?

failures.add(fail(p, "Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT"));
} else {
orderBy.set(((OrderExec) p).order());
if (limit.get() == null && p instanceof LimitExec) {
limit.set((LimitExec) p);
} else if (limit.get() != null && limitedExec.get() == null) {
if (p instanceof OrderExec || p instanceof FilterExec || p instanceof PivotExec || p instanceof AggregateExec) {
limitedExec.set((UnaryExec) p);
}
}
});

if (limitedExec.get() != null) {
failures.add(
fail(limit.get(), "LIMIT or TOP cannot be used in a subquery if outer query contains GROUP BY, ORDER BY, PIVOT or WHERE")
);
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -142,9 +142,14 @@ public void testTop() {
public void testUseBothTopAndLimitInvalid() {
ParsingException e = expectThrows(ParsingException.class, () -> parseStatement("SELECT TOP 10 * FROM test LIMIT 20"));
assertEquals("line 1:28: TOP and LIMIT are not allowed in the same query - use one or the other", e.getMessage());

e = expectThrows(ParsingException.class,
() -> parseStatement("SELECT TOP 30 a, count(*) cnt FROM test WHERE b = 20 GROUP BY a HAVING cnt > 10 LIMIT 40"));
assertEquals("line 1:82: TOP and LIMIT are not allowed in the same query - use one or the other", e.getMessage());

e = expectThrows(ParsingException.class,
() -> parseStatement("SELECT TOP 30 * FROM test ORDER BY a LIMIT 40"));
assertEquals("line 1:39: TOP and LIMIT are not allowed in the same query - use one or the other", e.getMessage());
}

public void testsSelectNonReservedKeywords() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
import org.elasticsearch.xpack.sql.analysis.analyzer.Verifier;
import org.elasticsearch.xpack.sql.expression.function.SqlFunctionRegistry;
import org.elasticsearch.xpack.sql.parser.SqlParser;
import org.elasticsearch.xpack.sql.plan.physical.EsQueryExec;
import org.elasticsearch.xpack.sql.plan.physical.PhysicalPlan;
import org.elasticsearch.xpack.sql.stats.Metrics;

import static org.elasticsearch.xpack.sql.SqlTestUtils.TEST_CFG;
Expand All @@ -33,10 +35,14 @@ public class VerifierTests extends ESTestCase {
);
private final Planner planner = new Planner();

private PhysicalPlan verify(String sql) {
return planner.plan(analyzer.analyze(parser.createStatement(sql), true), true);
}

private String error(String sql) {
PlanningException e = expectThrows(
PlanningException.class,
() -> planner.plan(analyzer.analyze(parser.createStatement(sql), true), true)
() -> verify(sql)
);
String message = e.getMessage();
assertTrue(message.startsWith("Found "));
Expand All @@ -45,21 +51,28 @@ private String error(String sql) {
return message.substring(index + pattern.length());
}

private String innerLimitMsg(int line, int column) {
return line
+ ":"
+ column
+ ": LIMIT or TOP cannot be used in a subquery if outer query contains GROUP BY, ORDER BY, PIVOT or WHERE";
}

public void testSubselectWithOrderByOnTopOfOrderByAndLimit() {
assertEquals(
"1:60: Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT",
innerLimitMsg(1, 50),
error("SELECT * FROM (SELECT * FROM test ORDER BY 1 ASC LIMIT 10) ORDER BY 2")
);
assertEquals(
"1:72: Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT",
innerLimitMsg(1, 50),
error("SELECT * FROM (SELECT * FROM (SELECT * FROM test LIMIT 10) ORDER BY 1) ORDER BY 2")
);
assertEquals(
"1:75: Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT",
innerLimitMsg(1, 66),
error("SELECT * FROM (SELECT * FROM (SELECT * FROM test ORDER BY 1 ASC) LIMIT 5) ORDER BY 1 DESC")
);
assertEquals(
"1:152: Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT",
innerLimitMsg(1, 142),
error("SELECT * FROM (" +
"SELECT * FROM (" +
"SELECT * FROM (" +
Expand All @@ -68,17 +81,21 @@ public void testSubselectWithOrderByOnTopOfOrderByAndLimit() {
"ORDER BY int DESC NULLS LAST LIMIT 12) " +
"ORDER BY int DESC NULLS FIRST")
);
assertEquals(
innerLimitMsg(1, 50),
error("SELECT * FROM (SELECT * FROM (SELECT * FROM test LIMIT 10) ORDER BY 1 LIMIT 20) ORDER BY 2")
);
}

public void testSubselectWithOrderByOnTopOfGroupByOrderByAndLimit() {
assertEquals(
"1:96: Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT",
innerLimitMsg(1, 86),
error(
"SELECT * FROM (SELECT max(int) AS max, bool FROM test GROUP BY bool ORDER BY max ASC LIMIT 10) ORDER BY max DESC"
)
);
assertEquals(
"1:112: Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT",
innerLimitMsg(1, 102),
error(
"SELECT * FROM ("
+ "SELECT * FROM ("
Expand All @@ -88,7 +105,7 @@ public void testSubselectWithOrderByOnTopOfGroupByOrderByAndLimit() {
)
);
assertEquals(
"1:186: Cannot use ORDER BY on top of a subquery with ORDER BY and LIMIT",
innerLimitMsg(1, 176),
error("SELECT * FROM (" +
"SELECT * FROM (" +
"SELECT * FROM (" +
Expand All @@ -98,4 +115,41 @@ public void testSubselectWithOrderByOnTopOfGroupByOrderByAndLimit() {
"ORDER BY max DESC NULLS FIRST")
);
}

public void testInnerLimitWithWhere() {
assertEquals(innerLimitMsg(1, 35),
error("SELECT * FROM (SELECT * FROM test LIMIT 10) WHERE int = 1"));
assertEquals(innerLimitMsg(1, 50),
error("SELECT * FROM (SELECT * FROM (SELECT * FROM test LIMIT 10)) WHERE int = 1"));
assertEquals(innerLimitMsg(1, 51),
error("SELECT * FROM (SELECT * FROM (SELECT * FROM test) LIMIT 10) WHERE int = 1"));
}

public void testInnerLimitWithGroupBy() {
assertEquals(innerLimitMsg(1, 37),
error("SELECT int FROM (SELECT * FROM test LIMIT 10) GROUP BY int"));
assertEquals(innerLimitMsg(1, 52),
error("SELECT int FROM (SELECT * FROM (SELECT * FROM test LIMIT 10)) GROUP BY int"));
assertEquals(innerLimitMsg(1, 53),
error("SELECT int FROM (SELECT * FROM (SELECT * FROM test) LIMIT 10) GROUP BY int"));
}

public void testInnerLimitWithPivot() {
assertEquals(innerLimitMsg(1, 52),
error("SELECT * FROM (SELECT int, bool, keyword FROM test LIMIT 10) PIVOT (AVG(int) FOR bool IN (true, false))"));
}

public void testTopWithOrderBySucceeds() {
PhysicalPlan plan = verify("SELECT TOP 5 * FROM test ORDER BY int");
assertEquals(EsQueryExec.class, plan.getClass());
}

public void testInnerTop() {
assertEquals(innerLimitMsg(1, 23),
error("SELECT * FROM (SELECT TOP 10 * FROM test) WHERE int = 1"));
assertEquals(innerLimitMsg(1, 23),
error("SELECT * FROM (SELECT TOP 10 * FROM test) ORDER BY int"));
assertEquals(innerLimitMsg(1, 23),
error("SELECT * FROM (SELECT TOP 10 int, bool, keyword FROM test) PIVOT (AVG(int) FOR bool IN (true, false))"));
}
}