-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: hoist uncorrelated scalar subquery with equality #83392
Comments
However, in your |
Spell it out to me more why we can decorrelate the |
Does it have to do with the need to assert that the subquery does not return more than 1 row? |
I was wrong about this.
As long as we use a |
Today, we only perform decorrelation for correlated subqueries. Each subquery example here is not correlated, and so we execute it using a subquery rather than a left join. If we decide to change our execution strategy here, we'll need to be careful that we don't regress performance, as there may be cases where the subquery plan is better than the left join plan. @mgartner, isn't this a dup of #51820 and friends? A bit of history: I deliberately didn't use left join for uncorrelated subqueries b/c I was afraid of regressing perf from the heuristic planner. I didn't have the same concern about correlated subqueries, b/c the heuristic planner didn't support those, so there was no possible backwards-compat perf problem. |
But we do decorrelate the subquery into a join in the |
I believe it's because While you know that the two cases are the same in this case b/c |
I think this is where the FWIW, I think it's pretty important to optimize this form of query. I think we got to it from a TPCC query, but now I'm not sure. In any case, I've definitely seen people write this query. |
The optimizer would have no trouble turning
Notice that the |
I agree with @andy-kimball. I think the best solution would be #51820, rather than trying to determine when it is optimal to transform an uncorrelated subquery into a join. Although, lazy evaluation of subqueries #20298 is also a desired featured, and it's hard to imagine that playing nicely with any solution for #51820. Regardless, I think we can close this in favor of #51820. |
I've thought about this a bit more, and I'm now fairly convinced that hoisting uncorrelated subqueries should mostly lead to query plans that are no slower than equivalent plans with unhoisted, uncorrelated subqueries. Hoisting a subquery transforms it into an expression with an apply-join. The code comment @andy-kimball quoted above is pointing out that subquery execution is faster than apply-join execution because the results of the subquery are cached and can be scanned multiple times. So, hoisting an uncorrelated subquery would only lead to a worse plan if the resulting apply-join executed the RHS more than once or the apply-join was not reduced to a regular join. Apply-joins resulting from hoisting uncorrelated subqueries are usually uncorrelated as well, because the source subquery was. If the apply-join is uncorrelated, it is trivially reduced to a regular join. This is the most common outcome of hoisting an uncorrelated subquery, and the performance would be improved because the subquery has been removed and no apply-join remains, likely enabling further optimizations (like using a lookup join or merge join with the The
Here's an example showing the inner-apply-join that remains after
This is a rare case, though, and I think it's unlikely to regress performance significantly. The resulting apply-join should execute the RHS only once, because the cardinality of the LHS is 1: the LHS produces exactly one row from a single-row values expression left-cross-joined with the subquery input, which produces 0 or 1 rows. Since the RHS of the apply-join is executed only once, the runtime should be similar to executing it as a subquery - there is no benefit of a cache from a single invocation. However, plans with subqueries can be distributed, while plans with apply-joins cannot. In summary, if we hoist uncorrelated subqueries, I think we'd get better query plans in the majority of cases. In the rare case that To be safe, we can add a session setting that customers can use to revert to the previous behavior. If we think the |
While working on #98375, I did notice that for |
Hey @mgartner, do we have a timeline for a fix?/release I notice that these two issues have already been opened for some time. |
I'm going to try to enabling hoisting of a subset of these types subqueries, behind a session setting so we can backport it. to 22.2 and 23.1. |
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is enabled by default, but can be disabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `false`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer.
100881: opt: hoist uncorrelated equality subqueries r=mgartner a=mgartner #### opt: hoist uncorrelated equality subqueries Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is enabled by default, but can be disabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `false`. Fixes #83392 Informs #51820 Informs #93829 Informs #100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer. Co-authored-by: Marcus Gartner <[email protected]>
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is enabled by default, but can be disabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `false`. Fixes #83392 Informs #51820 Informs #93829 Informs #100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is enabled by default, but can be disabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `false`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is disabled by default, but can be enabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `true`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer when `optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is disabled by default, but can be enabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `true`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer when `optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is disabled by default, but can be enabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `true`. Fixes #83392 Informs #51820 Informs #93829 Informs #100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer when `optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is disabled by default, but can be enabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `true`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer when `optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is disabled by default, but can be enabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `true`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer when `optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
The following query plan shows a simple point lookup:
When a subquery is used in an IN predicate, we can decorrelate the subquery and turn the predicate into a join. Arguably the join is redundant and we could eliminate it in this case because the tables on both sides are the same, but this is good enough.
However, when a subquery is used in an equality predicate, we can't decorrelate the subquery. Instead, we evaluate it, then perform a full table scan, then filter the result of that scan using the result of the subquery.
This is unfortunate and may look artificial, but while flying over the rocky mountains, @andreimatei and I found this by looking at real queries, like the following:
Jira issue: CRDB-17044
The text was updated successfully, but these errors were encountered: