Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: hoist uncorrelated scalar subquery with equality #83392

Closed
nvanbenschoten opened this issue Jun 27, 2022 · 14 comments · Fixed by #100881
Closed

opt: hoist uncorrelated scalar subquery with equality #83392

nvanbenschoten opened this issue Jun 27, 2022 · 14 comments · Fixed by #100881
Labels
A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team

Comments

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Jun 27, 2022

The following query plan shows a simple point lookup:

root@127.0.0.1:26257/movr> explain select * from promo_codes where code = 'test';

  • scan
    estimated row count: 1 (<0.01% of the table; stats collected 23 minutes ago)
    table: promo_codes@promo_codes_pkey
    spans: [/'test' - /'test']

When a subquery is used in an IN predicate, we can decorrelate the subquery and turn the predicate into a join. Arguably the join is redundant and we could eliminate it in this case because the tables on both sides are the same, but this is good enough.

root@127.0.0.1:26257/movr> explain select * from promo_codes where code IN (select code from promo_codes where code = 'test');

  • cross join (semi)
  │ estimated row count: 1
  │
  ├── • scan
  │     estimated row count: 1 (<0.01% of the table; stats collected 23 minutes ago)
  │     table: promo_codes@promo_codes_pkey
  │     spans: [/'test' - /'test']
  │
  └── • scan
        estimated row count: 1 (<0.01% of the table; stats collected 23 minutes ago)
        table: promo_codes@promo_codes_pkey
        spans: [/'test' - /'test']

However, when a subquery is used in an equality predicate, we can't decorrelate the subquery. Instead, we evaluate it, then perform a full table scan, then filter the result of that scan using the result of the subquery.

root@127.0.0.1:26257/movr> explain select * from promo_codes where code = (select code from promo_codes where code = 'test');

  • root
  │
  ├── • filter
  │   │ estimated row count: 333,664
  │   │ filter: code = @S1
  │   │
  │   └── • scan
  │         estimated row count: 1,000,991 (100% of the table; stats collected 26 minutes ago)
  │         table: promo_codes@promo_codes_pkey
  │         spans: FULL SCAN
  │
  └── • subquery
      │ id: @S1
      │ original sql: (SELECT code FROM promo_codes WHERE code = 'test')
      │ exec mode: one row
      │
      └── • scan
            estimated row count: 1 (<0.01% of the table; stats collected 26 minutes ago)
            table: promo_codes@promo_codes_pkey
            spans: [/'test' - /'test']

This is unfortunate and may look artificial, but while flying over the rocky mountains, @andreimatei and I found this by looking at real queries, like the following:

root@127.0.0.1:26257/movr> explain delete from promo_codes where code = (select code from promo_codes limit 1);

  • root
  │
  ├── • delete
  │   │ from: promo_codes
  │   │ auto commit
  │   │
  │   └── • filter
  │       │ estimated row count: 333,664
  │       │ filter: code = @S1
  │       │
  │       └── • scan
  │             estimated row count: 1,000,991 (100% of the table; stats collected 35 minutes ago)
  │             table: promo_codes@promo_codes_pkey
  │             spans: FULL SCAN
  │
  └── • subquery
      │ id: @S1
      │ original sql: (SELECT code FROM promo_codes LIMIT 1)
      │ exec mode: one row
      │
      └── • scan
            estimated row count: 1 (<0.01% of the table; stats collected 35 minutes ago)
            table: promo_codes@promo_codes_pkey
            spans: LIMITED SCAN
            limit: 1

Jira issue: CRDB-17044

@nvanbenschoten nvanbenschoten added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-optimizer SQL logical planning and optimizations. T-sql-queries SQL Queries Team labels Jun 27, 2022
@mgartner
Copy link
Collaborator

mgartner commented Jun 28, 2022

With your = example, the subquery is a scalar subquery. I don't think there's a way to decorrelate it like we do with IN (<subquery>). It would certainly be nice to replan the main query based on the values fetched from the subquery.

However, in your DELETE example, it sounds like the best solution is for us to implement DELETE FROM ... USING so you don't have to use a subquery at all. #40963

@andreimatei
Copy link
Contributor

Spell it out to me more why we can decorrelate the IN but not the =. Whatever reason there is, is it a good one?

@nvanbenschoten
Copy link
Member Author

Does it have to do with the need to assert that the subquery does not return more than 1 row?

@mgartner
Copy link
Collaborator

Spell it out to me more why we can decorrelate the IN but not the =. Whatever reason there is, is it a good one?

I was wrong about this. = is the same as IN (...) with a single item, so we should be able to decorrelate it.

Does it have to do with the need to assert that the subquery does not return more than 1 row?

As long as we use a Max1RowExpr on the right side of the semi-join, decorrelation should be valid.

@mgartner mgartner changed the title sql: subquery not decorrelated when used in equality predicate, resulting in full table scan opt: decorrelate scalar subquery with equality Jun 28, 2022
@andy-kimball
Copy link
Contributor

Today, we only perform decorrelation for correlated subqueries. Each subquery example here is not correlated, and so we execute it using a subquery rather than a left join. If we decide to change our execution strategy here, we'll need to be careful that we don't regress performance, as there may be cases where the subquery plan is better than the left join plan. @mgartner, isn't this a dup of #51820 and friends?

A bit of history: I deliberately didn't use left join for uncorrelated subqueries b/c I was afraid of regressing perf from the heuristic planner. I didn't have the same concern about correlated subqueries, b/c the heuristic planner didn't support those, so there was no possible backwards-compat perf problem.

@nvanbenschoten
Copy link
Member Author

Today, we only perform decorrelation for correlated subqueries. Each subquery example here is not correlated, and so we execute it using a subquery rather than a left join.

But we do decorrelate the subquery into a join in the code IN (select ...) case. How does that differ from the code = (select ...) case?

@andy-kimball
Copy link
Contributor

I believe it's because IN is a "multi-row subquery" construct whereas = is a "single-row subquery" construct. The IN case can return multiple rows, and so we always map it to the Any operator (as are NOT IN, SOME, ANY, and ALL operators), which is always turned into a left join. By contrast, the = case will error if it returns more than one row, and so we treat it differently in the optimizer.

While you know that the two cases are the same in this case b/c code is a primary key, the optbuilder, which does the mapping from IN and = does not have that information, and so must choose how to build the operators based solely on syntax.

@andreimatei
Copy link
Contributor

By contrast, the = case will error if it returns more than one row, and so we treat it differently in the optimizer.

I think this is where the Max1RowExpr operator that @mgartner mentioned would come in, right?

FWIW, I think it's pretty important to optimize this form of query. I think we got to it from a TPCC query, but now I'm not sure. In any case, I've definitely seen people write this query.

@andy-kimball
Copy link
Contributor

andy-kimball commented Jul 6, 2022

The optimizer would have no trouble turning = into a left-join query - that's not the problem. The main thing we'd want to verify is that changing the behavior here to address this important case doesn't regress other important cases. If we convince ourselves of that, "fixing" this is near trivial. Below is the relevant code from decorrelate.go. Notice the comment, which means we have to be really careful how we proceed:

func (c *CustomFuncs) deriveHasHoistableSubquery(scalar opt.ScalarExpr) bool {
	switch t := scalar.(type) {
	case *memo.SubqueryExpr:
		return !t.Input.Relational().OuterCols.Empty()

	case *memo.ExistsExpr:
		return !t.Input.Relational().OuterCols.Empty()

	case *memo.ArrayFlattenExpr:
		return !t.Input.Relational().OuterCols.Empty()

	case *memo.AnyExpr:
		// Don't hoist Any when only its Scalar operand is correlated, because it
		// executes much slower. It's better to cache the results of the constant
		// subquery in this case. Note that if an Any is at the top-level of a
		// WHERE clause, it will be transformed to an Exists operator, so this case
		// only occurs when the Any is nested, in a projection, etc.
		return !t.Input.Relational().OuterCols.Empty()
	}

Notice that the *memo.SubqueryExpr case is checking for correlation (!t.Input.Relational().OuterCols.Empty()). That's basically saying, "only turn this subquery into a left-join if it's correlated" (meaning that it refers to a column that is not defined within itself - an "outer" column).

@mgartner
Copy link
Collaborator

I agree with @andy-kimball. I think the best solution would be #51820, rather than trying to determine when it is optimal to transform an uncorrelated subquery into a join. Although, lazy evaluation of subqueries #20298 is also a desired featured, and it's hard to imagine that playing nicely with any solution for #51820. Regardless, I think we can close this in favor of #51820.

@mgartner
Copy link
Collaborator

mgartner commented Jan 6, 2023

I've thought about this a bit more, and I'm now fairly convinced that hoisting uncorrelated subqueries should mostly lead to query plans that are no slower than equivalent plans with unhoisted, uncorrelated subqueries.

Hoisting a subquery transforms it into an expression with an apply-join. The code comment @andy-kimball quoted above is pointing out that subquery execution is faster than apply-join execution because the results of the subquery are cached and can be scanned multiple times. So, hoisting an uncorrelated subquery would only lead to a worse plan if the resulting apply-join executed the RHS more than once or the apply-join was not reduced to a regular join.

Apply-joins resulting from hoisting uncorrelated subqueries are usually uncorrelated as well, because the source subquery was. If the apply-join is uncorrelated, it is trivially reduced to a regular join. This is the most common outcome of hoisting an uncorrelated subquery, and the performance would be improved because the subquery has been removed and no apply-join remains, likely enabling further optimizations (like using a lookup join or merge join with the select * from promo_codes where code = (select code from promo_codes where code = 'test') above, assuming an index on promo_coes.code exists).

The HoistValuesSubquery rule, however, would create correlated apply-joins when operating on an uncorrelated subquery. The rule creates a values expression on the RHS of an inner-apply-join that references a column in the LHS:

join := c.f.ConstructInnerJoinApply(hoister.input(), values, memo.TrueFilter, memo.EmptyJoinPrivate)

Here's an example showing the inner-apply-join that remains after HoistValuesSubquery fires:

norm
SELECT v FROM (VALUES (1), ((SELECT c FROM c WHERE c=2))) f(v)
----
project
 ├── columns: v:5
 ├── cardinality: [2 - 2]
 └── inner-join-apply
      ├── columns: c:1 column1:5
      ├── cardinality: [2 - 2]
      ├── fd: ()-->(1)
      ├── left-join (cross)
      │    ├── columns: c:1
      │    ├── cardinality: [1 - 1]
      │    ├── multiplicity: left-rows(exactly-one), right-rows(exactly-one)
      │    ├── key: ()
      │    ├── fd: ()-->(1)
      │    ├── values
      │    │    ├── cardinality: [1 - 1]
      │    │    ├── key: ()
      │    │    └── ()
      │    ├── max1-row
      │    │    ├── columns: c:1!null
      │    │    ├── error: "more than one row returned by a subquery used as an expression"
      │    │    ├── cardinality: [0 - 1]
      │    │    ├── key: ()
      │    │    ├── fd: ()-->(1)
      │    │    └── select
      │    │         ├── columns: c:1!null
      │    │         ├── fd: ()-->(1)
      │    │         ├── scan c
      │    │         │    └── columns: c:1
      │    │         └── filters
      │    │              └── c:1 = 2 [outer=(1), constraints=(/1: [/2 - /2]; tight), fd=()-->(1)]
      │    └── filters (true)
      ├── values
      │    ├── columns: column1:5
      │    ├── outer: (1)
      │    ├── cardinality: [2 - 2]
      │    ├── (1,)
      │    └── (c:1,)
      └── filters (true)

This is a rare case, though, and I think it's unlikely to regress performance significantly. The resulting apply-join should execute the RHS only once, because the cardinality of the LHS is 1: the LHS produces exactly one row from a single-row values expression left-cross-joined with the subquery input, which produces 0 or 1 rows. Since the RHS of the apply-join is executed only once, the runtime should be similar to executing it as a subquery - there is no benefit of a cache from a single invocation. However, plans with subqueries can be distributed, while plans with apply-joins cannot.

In summary, if we hoist uncorrelated subqueries, I think we'd get better query plans in the majority of cases. In the rare case that HoistValuesSubquery triggers and creates an apply-join, we'd get a logical query plan with the same complexity as before, but we'd lose the ability to physically distribute the execution of the query.

To be safe, we can add a session setting that customers can use to revert to the previous behavior. If we think the HoistValuesSubquery case is more common than I'm assuming, then we can explore ways to disable that rule for uncorrelated subqueries.

@mgartner mgartner changed the title opt: decorrelate scalar subquery with equality opt: hoist uncorrelated scalar subquery with equality Jan 19, 2023
@mgartner
Copy link
Collaborator

mgartner commented Apr 3, 2023

While working on #98375, I did notice that for ANY subqueries, hoisting can lead to a significantly less efficient query plan.

@smcvey
Copy link
Contributor

smcvey commented Apr 4, 2023

Hey @mgartner, do we have a timeline for a fix?/release I notice that these two issues have already been opened for some time.

@mgartner
Copy link
Collaborator

mgartner commented Apr 4, 2023

I'm going to try to enabling hoisting of a subset of these types subqueries, behind a session setting so we can backport it. to 22.2 and 23.1.

mgartner added a commit to mgartner/cockroach that referenced this issue Apr 12, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is enabled by default, but can be disabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`false`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer.
craig bot pushed a commit that referenced this issue Apr 17, 2023
100881: opt: hoist uncorrelated equality subqueries r=mgartner a=mgartner

#### opt: hoist uncorrelated equality subqueries

Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is enabled by default, but can be disabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`false`.

Fixes #83392
Informs #51820
Informs #93829
Informs #100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer.


Co-authored-by: Marcus Gartner <[email protected]>
@craig craig bot closed this as completed in 41fc214 Apr 17, 2023
blathers-crl bot pushed a commit that referenced this issue Apr 17, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is enabled by default, but can be disabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`false`.

Fixes #83392
Informs #51820
Informs #93829
Informs #100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 18, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is enabled by default, but can be disabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`false`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 18, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is disabled by default, but can be enabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`true`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer when
`optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 18, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is disabled by default, but can be enabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`true`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer when
`optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
mgartner added a commit that referenced this issue Apr 18, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is disabled by default, but can be enabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`true`.

Fixes #83392
Informs #51820
Informs #93829
Informs #100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer when
`optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 20, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is disabled by default, but can be enabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`true`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer when
`optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
mgartner added a commit to mgartner/cockroach that referenced this issue Apr 21, 2023
Subqueries that are in equality expressions with a variable are now
hoisted. When these expressions exist in a filter, hoisting the subquery
can allow the main query to plan a lookup join, rather than an
inefficient full-table scan.

For example, consider the table and query:

    CREATE TABLE t (
      a INT,
      INDEX (a)
    );

    SELECT * FROM t WHERE a = (SELECT max(a) FROM t);

Prior to this commit, the query plan for this query required a full
table scan:

    select
     ├── columns: a:1
     ├── scan t@t_a_idx
     │    ├── columns: a:1
     │    └── constraint: /1/2: (/NULL - ]
     └── filters
          └── eq
               ├── a:1
               └── subquery
                    └── scalar-group-by
                         ├── columns: max:9
                         ├── scan t@t_a_idx,rev
                         │    ├── columns: a:5
                         │    ├── constraint: /5/6: (/NULL - ]
                         │    └── limit: 1(rev)
                         └── aggregations
                              └── const-agg [as=max:9, outer=(5)]
                                   └── a:5

By hoisting the subquery, the full table scan is replaced with a lookup
join:

    project
     ├── columns: a:1
     └── inner-join (lookup t@t_a_idx)
          ├── columns: a:1 max:9
          ├── key columns: [9] = [1]
          ├── scalar-group-by
          │    ├── columns: max:9
          │    ├── scan t@t_a_idx,rev
          │    │    ├── columns: a:5
          │    │    ├── constraint: /5/6: (/NULL - ]
          │    │    └── limit: 1(rev)
          │    └── aggregations
          │         └── const-agg [as=max:9, outer=(5)]
          │              └── a:5
          └── filters (true)

This hoisting is disabled by default, but can be enabled by setting the
`optimizer_hoist_uncorrelated_equality_subqueries` session setting to
`true`.

Fixes cockroachdb#83392
Informs cockroachdb#51820
Informs cockroachdb#93829
Informs cockroachdb#100855

Release note (performance improvement): Queries that have subqueries in
equality expressions are now more efficiently planned by the optimizer when
`optimizer_hoist_uncorrelated_equality_subqueries` is set to `true`.
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-optimizer SQL logical planning and optimizations. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants