Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: investigate using apply joins to execute subqueries #93829

Open
rytaft opened this issue Dec 16, 2022 · 4 comments
Open

sql: investigate using apply joins to execute subqueries #93829

rytaft opened this issue Dec 16, 2022 · 4 comments
Labels
C-investigation Further steps needed to qualify. C-label will change. T-sql-queries SQL Queries Team

Comments

@rytaft
Copy link
Collaborator

rytaft commented Dec 16, 2022

Today, uncorrelated subqueries are not optimized the same way as correlated subqueries, which are hoisted into apply joins. As a result, the final optimizer plan typically includes subqueries as separate plans that must be executed and buffered to be used by the main query plan. A problem with this approach is that the main query is never re-optimized with the output of the subquery, and therefore the plan may be suboptimal. Apply joins, however, do cause the right side of the join to be re-optimized before it is executed. If we treat a subquery as the left side of an apply join, this will allow us to fully optimize the main query before execution.

A potential problem with this approach is that today apply joins are quite inefficient. Additionally, today plans containing apply joins cannot be distributed. We will need to allow the plan to be distributed in order to avoid regressing performance of queries with non-correlated subqueries.

Jira issue: CRDB-22543

@rytaft rytaft added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Dec 16, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Dec 16, 2022
@rytaft rytaft added C-investigation Further steps needed to qualify. C-label will change. and removed C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Dec 16, 2022
@mgartner
Copy link
Collaborator

mgartner commented Jan 6, 2023

This is a really clever idea. This would address #51820.

Though, if we hoist uncorrelated subqueries, as described in #83392 (comment), we'll get better query plans in many cases without needing this. For example:

CREATE TABLE a (a INT PRIMARY KEY)
CREATE TABLE b (b INT PRIMARY KEY)

SELECT * FROM a WHERE a = (SELECT b FROM b LIMIT 1);

Currently this requires a full-table scan:

select
 ├── columns: a:1!null
 ├── key: (1)
 ├── scan a
 │    ├── columns: a:1!null
 │    └── key: (1)
 └── filters
      └── eq [outer=(1), subquery, constraints=(/1: (/NULL - ])]
           ├── a:1
           └── subquery
                └── scan b
                     ├── columns: b:4!null
                     ├── limit: 1
                     ├── key: ()
                     └── fd: ()-->(4)

With hoisting uncorrelated subqueries:

project
 ├── columns: a:1!null
 ├── cardinality: [0 - 1]
 ├── key: ()
 ├── fd: ()-->(1)
 └── inner-join (lookup a)
      ├── columns: a:1!null b:4!null
      ├── key columns: [4] = [1]
      ├── lookup columns are key
      ├── cardinality: [0 - 1]
      ├── key: ()
      ├── fd: ()-->(1,4), (4)==(1), (1)==(4)
      ├── scan b
      │    ├── columns: b:4!null
      │    ├── limit: 1
      │    ├── key: ()
      │    └── fd: ()-->(4)
      └── filters (true)

And with recent improvements to lookup joins in v22.2, we see better query plans for queries like:

SELECT * FROM a WHERE a > (SELECT b FROM b LIMIT 1) ORDER BY a LIMIT 1;

Currently this requires a full-table scan:

limit
 ├── columns: a:1!null
 ├── internal-ordering: +1
 ├── cardinality: [0 - 1]
 ├── key: ()
 ├── fd: ()-->(1)
 ├── select
 │    ├── columns: a:1!null
 │    ├── key: (1)
 │    ├── ordering: +1
 │    ├── limit hint: 1.00
 │    ├── scan a
 │    │    ├── columns: a:1!null
 │    │    ├── key: (1)
 │    │    ├── ordering: +1
 │    │    └── limit hint: 3.00
 │    └── filters
 │         └── gt [outer=(1), subquery, constraints=(/1: (/NULL - ])]
 │              ├── a:1
 │              └── subquery
 │                   └── scan b
 │                        ├── columns: b:4!null
 │                        ├── limit: 1
 │                        ├── key: ()
 │                        └── fd: ()-->(4)
 └── 1

With hoisting uncorrelated subqueries:

project
 ├── columns: a:1!null
 ├── cardinality: [0 - 1]
 ├── key: ()
 ├── fd: ()-->(1)
 └── limit
      ├── columns: a:1!null b:4!null
      ├── internal-ordering: +1 opt(4)
      ├── cardinality: [0 - 1]
      ├── key: ()
      ├── fd: ()-->(1,4)
      ├── inner-join (lookup a)
      │    ├── columns: a:1!null b:4!null
      │    ├── lookup expression
      │    │    └── filters
      │    │         └── a:1 > b:4 [outer=(1,4), constraints=(/1: (/NULL - ]; /4: (/NULL - ])]
      │    ├── key: (1)
      │    ├── fd: ()-->(4)
      │    ├── ordering: +1 opt(4) [actual: +1]
      │    ├── limit hint: 1.00
      │    ├── scan b
      │    │    ├── columns: b:4!null
      │    │    ├── limit: 1
      │    │    ├── key: ()
      │    │    ├── fd: ()-->(4)
      │    │    └── limit hint: 1.00
      │    └── filters (true)
      └── 1

There are some cases where we can't hoist uncorrelated subqueries (e.g., subqueries in a branch of a CASE expression that have non-leakproof expressions), but those seem like rare edge cases.

@rytaft
Copy link
Collaborator Author

rytaft commented Jan 6, 2023

Cool! So what exactly is the difference in your approach? Is it that the subquery actually becomes the right side of the apply join rather than the left (and since it's actually uncorrelated, it can be converted to regular join)?

I wonder if it's worth using an exploration rule to try both approaches? I could imagine some cases where we'd really benefit from being able to reoptimize the main query. But you're also probably right that recent improvements to lookup join have made this situation much better, and maybe there aren't so many cases like that now.

@mgartner
Copy link
Collaborator

So what exactly is the difference in your approach? Is it that the subquery actually becomes the right side of the apply join rather than the left (and since it's actually uncorrelated, it can be converted to regular join)?

Exactly!

I wonder if it's worth using an exploration rule to try both approaches? I could imagine some cases where we'd really benefit from being able to reoptimize the main query. But you're also probably right that recent improvements to lookup join have made this situation much better, and maybe there aren't so many cases like that now.

Ya it might be good to try both. I think either on it's own would be an improvement because we don't do any sort of optimization at all.

Here's a case that shows the query plan resulting from hoisting two uncorrelated subqueries. Using the approach described above (moving the subqueries to the LHS of apply-joins and the main query to the RHS) would probably lead to a better query plan with a constrained scan bounded by the min and max b rather than the range lookup join with the addition filter.

EXPLAIN (OPT, VERBOSE)
SELECT a FROM a WHERE a > (SELECT min(b) FROM b) AND a < (SELECT max(b) FROM b)
----
project
 ├── columns: a:1
 ├── stats: [rows=111.1111]
 ├── cost: 722.222944
 ├── key: (1)
 ├── distribution: test
 ├── prune: (1)
 └── inner-join (cross)
      ├── columns: a:1 min:7 max:11
      ├── multiplicity: left-rows(zero-or-one), right-rows(zero-or-more)
      ├── stats: [rows=111.1111, distinct(1)=111.111, null(1)=0, distinct(11)=1, null(11)=0]
      ├── cost: 721.091833
      ├── key: (1)
      ├── fd: ()-->(7,11)
      ├── distribution: test
      ├── inner-join (lookup a)
      │    ├── columns: a:1 min:7
      │    ├── lookup expression
      │    │    └── filters
      │    │         └── a:1 > min:7 [outer=(1,7), constraints=(/1: (/NULL - ]; /7: (/NULL - ])]
      │    ├── stats: [rows=333.3333, distinct(1)=333.333, null(1)=0, distinct(7)=1, null(7)=0]
      │    ├── cost: 708.454333
      │    ├── key: (1)
      │    ├── fd: ()-->(7)
      │    ├── distribution: test
      │    ├── scalar-group-by
      │    │    ├── columns: min:7
      │    │    ├── cardinality: [1 - 1]
      │    │    ├── stats: [rows=1, distinct(7)=1, null(7)=0]
      │    │    ├── cost: 5.09
      │    │    ├── key: ()
      │    │    ├── fd: ()-->(7)
      │    │    ├── distribution: test
      │    │    ├── prune: (7)
      │    │    ├── scan b
      │    │    │    ├── columns: b:4
      │    │    │    ├── limit: 1
      │    │    │    ├── stats: [rows=1]
      │    │    │    ├── cost: 5.05
      │    │    │    ├── key: ()
      │    │    │    ├── fd: ()-->(4)
      │    │    │    └── distribution: test
      │    │    └── aggregations
      │    │         └── const-agg [as=min:7, outer=(4)]
      │    │              └── b:4
      │    └── filters (true)
      ├── scalar-group-by
      │    ├── columns: max:11
      │    ├── cardinality: [1 - 1]
      │    ├── stats: [rows=1, distinct(11)=1, null(11)=0]
      │    ├── cost: 5.09
      │    ├── key: ()
      │    ├── fd: ()-->(11)
      │    ├── distribution: test
      │    ├── prune: (11)
      │    ├── scan b,rev
      │    │    ├── columns: b:8
      │    │    ├── limit: 1(rev)
      │    │    ├── stats: [rows=1]
      │    │    ├── cost: 5.05
      │    │    ├── key: ()
      │    │    ├── fd: ()-->(8)
      │    │    └── distribution: test
      │    └── aggregations
      │         └── const-agg [as=max:11, outer=(8)]
      │              └── b:8
      └── filters
           └── a:1 < max:11 [outer=(1,11), constraints=(/1: (/NULL - ]; /11: (/NULL - ])]

@rytaft
Copy link
Collaborator Author

rytaft commented Jan 11, 2023

Nice! Sounds like it's worth investigating, anyway.

mgartner added a commit to mgartner/cockroach that referenced this issue Apr 12, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is enabled by default, but can be disabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`false`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 17, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is enabled by default, but can be disabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`false`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer.
craig bot pushed a commit that referenced this issue Apr 17, 2023
100881: opt: hoist uncorrelated equality subqueries r=mgartner a=mgartner

#### opt: hoist uncorrelated equality subqueries

Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is enabled by default, but can be disabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`false`.

Fixes #83392
Informs #51820
Informs #93829
Informs #100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer.


Co-authored-by: Marcus Gartner <[email protected]>
blathers-crl bot pushed a commit that referenced this issue Apr 17, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is enabled by default, but can be disabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`false`.

Fixes #83392
Informs #51820
Informs #93829
Informs #100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 18, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is enabled by default, but can be disabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`false`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 18, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is disabled by default, but can be enabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`true`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer when
`optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 18, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is disabled by default, but can be enabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`true`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer when
`optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
mgartner added a commit that referenced this issue Apr 18, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is disabled by default, but can be enabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`true`.

Fixes #83392
Informs #51820
Informs #93829
Informs #100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer when
`optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 20, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is disabled by default, but can be enabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`true`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer when
`optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 21, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is disabled by default, but can be enabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`true`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer when
`optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
@mgartner mgartner moved this to New Backlog in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-investigation Further steps needed to qualify. C-label will change. T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

2 participants