-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: investigate using apply joins to execute subqueries #93829
Comments
This is a really clever idea. This would address #51820. Though, if we hoist uncorrelated subqueries, as described in #83392 (comment), we'll get better query plans in many cases without needing this. For example: CREATE TABLE a (a INT PRIMARY KEY)
CREATE TABLE b (b INT PRIMARY KEY)
SELECT * FROM a WHERE a = (SELECT b FROM b LIMIT 1);
Currently this requires a full-table scan:
With hoisting uncorrelated subqueries:
And with recent improvements to lookup joins in v22.2, we see better query plans for queries like: SELECT * FROM a WHERE a > (SELECT b FROM b LIMIT 1) ORDER BY a LIMIT 1; Currently this requires a full-table scan:
With hoisting uncorrelated subqueries:
There are some cases where we can't hoist uncorrelated subqueries (e.g., subqueries in a branch of a |
Cool! So what exactly is the difference in your approach? Is it that the subquery actually becomes the right side of the apply join rather than the left (and since it's actually uncorrelated, it can be converted to regular join)? I wonder if it's worth using an exploration rule to try both approaches? I could imagine some cases where we'd really benefit from being able to reoptimize the main query. But you're also probably right that recent improvements to lookup join have made this situation much better, and maybe there aren't so many cases like that now. |
Exactly!
Ya it might be good to try both. I think either on it's own would be an improvement because we don't do any sort of optimization at all. Here's a case that shows the query plan resulting from hoisting two uncorrelated subqueries. Using the approach described above (moving the subqueries to the LHS of apply-joins and the main query to the RHS) would probably lead to a better query plan with a constrained scan bounded by the min and max
|
Nice! Sounds like it's worth investigating, anyway. |
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is enabled by default, but can be disabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `false`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is enabled by default, but can be disabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `false`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer.
100881: opt: hoist uncorrelated equality subqueries r=mgartner a=mgartner #### opt: hoist uncorrelated equality subqueries Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is enabled by default, but can be disabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `false`. Fixes #83392 Informs #51820 Informs #93829 Informs #100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer. Co-authored-by: Marcus Gartner <[email protected]>
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is enabled by default, but can be disabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `false`. Fixes #83392 Informs #51820 Informs #93829 Informs #100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is enabled by default, but can be disabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `false`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is disabled by default, but can be enabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `true`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer when `optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is disabled by default, but can be enabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `true`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer when `optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is disabled by default, but can be enabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `true`. Fixes #83392 Informs #51820 Informs #93829 Informs #100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer when `optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is disabled by default, but can be enabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `true`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer when `optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
Subqueries that are in equality expressions with a variable are now hoisted. When these expressions exist in a filter, hoisting the subquery can allow the main query to plan a lookup join, rather than an inefficient full-table scan. For example, consider the table and query: CREATE TABLE t ( a INT, INDEX (a) ); SELECT * FROM t WHERE a = (SELECT max(a) FROM t); Prior to this commit, the query plan for this query required a full table scan: select ├── columns: a:1 ├── scan t@t_a_idx │ ├── columns: a:1 │ └── constraint: /1/2: (/NULL - ] └── filters └── eq ├── a:1 └── subquery └── scalar-group-by ├── columns: max:9 ├── scan t@t_a_idx,rev │ ├── columns: a:5 │ ├── constraint: /5/6: (/NULL - ] │ └── limit: 1(rev) └── aggregations └── const-agg [as=max:9, outer=(5)] └── a:5 By hoisting the subquery, the full table scan is replaced with a lookup join: project ├── columns: a:1 └── inner-join (lookup t@t_a_idx) ├── columns: a:1 max:9 ├── key columns: [9] = [1] ├── scalar-group-by │ ├── columns: max:9 │ ├── scan t@t_a_idx,rev │ │ ├── columns: a:5 │ │ ├── constraint: /5/6: (/NULL - ] │ │ └── limit: 1(rev) │ └── aggregations │ └── const-agg [as=max:9, outer=(5)] │ └── a:5 └── filters (true) This hoisting is disabled by default, but can be enabled by setting the `optimizer_hoist_uncorrelated_equality_subqueries` session setting to `true`. Fixes cockroachdb#83392 Informs cockroachdb#51820 Informs cockroachdb#93829 Informs cockroachdb#100855 Release note (performance improvement): Queries that have subqueries in equality expressions are now more efficiently planned by the optimizer when `optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
Today, uncorrelated subqueries are not optimized the same way as correlated subqueries, which are hoisted into apply joins. As a result, the final optimizer plan typically includes subqueries as separate plans that must be executed and buffered to be used by the main query plan. A problem with this approach is that the main query is never re-optimized with the output of the subquery, and therefore the plan may be suboptimal. Apply joins, however, do cause the right side of the join to be re-optimized before it is executed. If we treat a subquery as the left side of an apply join, this will allow us to fully optimize the main query before execution.
A potential problem with this approach is that today apply joins are quite inefficient. Additionally, today plans containing apply joins cannot be distributed. We will need to allow the plan to be distributed in order to avoid regressing performance of queries with non-correlated subqueries.
Jira issue: CRDB-22543
The text was updated successfully, but these errors were encountered: