Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: split disjunction in join conditions in more cases #97695

Closed
rytaft opened this issue Feb 27, 2023 · 0 comments · Fixed by #97696
Closed

opt: split disjunction in join conditions in more cases #97695

rytaft opened this issue Feb 27, 2023 · 0 comments · Fixed by #97696
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team

Comments

@rytaft
Copy link
Collaborator

rytaft commented Feb 27, 2023

Is your feature request related to a problem? Please describe.
When a join condition includes a disjunction (e.g. a OR b), in some cases we can remove the disjunction by splitting the join into a UNION of joins to create a more efficient plan. However, we currently only perform this transformation if one side of the OR predicate contains an equijoin predicate (e.g., t1.col1 = t2.col1). There are other cases where we could improve the plan by splitting the disjunction, but we don't currently do so. For example, consider the following:

CREATE TABLE a (
  a1 INT PRIMARY KEY,
  a2 INT NOT NULL,
  a3 INT NOT NULL,
  a4 INT NULL,
  INDEX (a3) STORING (a2, a4),
  INDEX (a4) STORING (a2, a3)
);

CREATE TABLE b (
  b1 INT PRIMARY KEY,
  b2 INT NULL,
  b3 INT NULL,
  b4 INT NOT NULL,
  INDEX (b2) STORING (b3),
  INDEX (b3) STORING (b2)
);

EXPLAIN SELECT a.*
FROM
  a INNER JOIN b ON b.b3 = a.a2
WHERE
  a.a4 = 30
  OR (a.a3 = 10 AND b.b2 = 20);
                            info
------------------------------------------------------------
  distribution: local
  vectorized: true

  • hash join
  │ equality: (a2) = (b3)
  │ pred: (a4 = 30) OR ((a3 = 10) AND (b2 = 20))
  │
  ├── • scan
  │     missing stats
  │     table: a@a_pkey
  │     spans: FULL SCAN
  │
  └── • scan
        missing stats
        table: b@b_b2_idx
        spans: FULL SCAN

Note that because we do not split the disjunction, we cannot push any predicate below the join, and we perform full table scans of both a and b.

Describe the solution you'd like
We should be able to split the disjunction in this case to allow us to push predicates down and perform constrained scans of a and b. For example, we should be able to produce the following plan for the above query:

EXPLAIN SELECT a.*
FROM
  a INNER JOIN b ON b.b3 = a.a2
WHERE
  a.a4 = 30
  OR (a.a3 = 10 AND b.b2 = 20);
                info
------------------------------------
  distribution: local
  vectorized: true

  • distinct
  │ distinct on: a1, b1
  │
  └── • union all
      │
      ├── • lookup join
      │   │ table: b@b_b3_idx
      │   │ equality: (a2) = (b3)
      │   │
      │   └── • scan
      │         missing stats
      │         table: a@a_a4_idx
      │         spans: [/30 - /30]
      │
      └── • hash join
          │ equality: (a2) = (b3)
          │
          ├── • scan
          │     missing stats
          │     table: a@a_a3_idx
          │     spans: [/10 - /10]
          │
          └── • scan
                missing stats
                table: b@b_b2_idx
                spans: [/20 - /20]

Jira issue: CRDB-24828

@rytaft rytaft added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Feb 27, 2023
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Feb 27, 2023
@rytaft rytaft self-assigned this Feb 27, 2023
rytaft added a commit to rytaft/cockroach that referenced this issue Feb 27, 2023
Prior to this commit, when a join condition included a disjunction
(e.g. a OR b), in some cases we could remove the disjunction by splitting
the join into a UNION of joins to create a more efficient plan. However,
we were only performing this transformation if at least one side of the OR
predicate contained an equijoin predicate (e.g., t1.col1 = t2.col1). There
were other cases where we could have improved the plan by splitting the
disjunction, but we did not do so.

This commit improves our ability to optimize joins with disjunctions in
the join condition when there is the possibility to push one or both sides
of the disjunction below the join. This commit adds logic to detect these
cases and splits the disjunction to make predicate push-down possible.

Fixes cockroachdb#97695

Release note (performance improvement): the optimizer now creates a
better query plan in some cases where an inner, semi, or anti join
contains a join predicate with a disjuction (OR condition). In cases where
one or both sides of the OR condition contains a conjunction with at least
one conjunct that references a single table, the optimizer now splits the
disjunction so that the conjunct referencing a single table can be pushed
below the join.
rytaft added a commit to rytaft/cockroach that referenced this issue Feb 28, 2023
Prior to this commit, when a join condition included a disjunction
(e.g. a OR b), in some cases we could remove the disjunction by splitting
the join into a UNION of joins to create a more efficient plan. However,
we were only performing this transformation if at least one side of the OR
predicate contained an equijoin predicate (e.g., t1.col1 = t2.col1). There
were other cases where we could have improved the plan by splitting the
disjunction, but we did not do so.

This commit improves our ability to optimize joins with disjunctions in
the join condition when there is the possibility to push one or both sides
of the disjunction below the join. This commit removes the requirement that
the disjunction contains an equijoin predicate, and instead splits the
disjunction in all cases where it is possible to do so, thus enabling
more optimization opportunities.

Fixes cockroachdb#97695

Release note (performance improvement): if the session setting
optimizer_use_improved_split_disjunction_for_joins is true, the optimizer
now creates a better query plan in some cases where an inner, semi, or
anti join contains a join predicate with a disjuction (OR condition).
craig bot pushed a commit that referenced this issue Feb 28, 2023
97696: opt: split disjunction in join conditions in more cases r=rytaft a=rytaft

**sql: add session setting `optimizer_use_improved_split_disjunction_for_joins`**

This commit adds a new session setting,
`optimizer_use_improved_split_disjunction_for_joins`, which will be used in
the next commit.

Release note (sql change): added a new session setting,
`optimizer_use_improved_split_disjunction_for_joins`, which enables the
optimizer to split disjunctions (`OR` expressions) in more cases in join
conditions by building a `UNION` of two join expressions. If this setting
is true, all disjunctions in inner, semi, and anti joins will be split.
If false, only disjunctions potentially containing an equijoin condition
will be split.

**opt: split disjunction in join conditions in more cases**

Prior to this commit, when a join condition included a disjunction
(e.g. `a OR b`), in some cases we could remove the disjunction by splitting
the join into a `UNION` of joins to create a more efficient plan. However,
we were only performing this transformation if at least one side of the `OR`
predicate contained an equijoin predicate (e.g., `t1.col1 = t2.col1`). There
were other cases where we could have improved the plan by splitting the
disjunction, but we did not do so.

This commit improves our ability to optimize joins with disjunctions in
the join condition when there is the possibility to push one or both sides
of the disjunction below the join. This commit removes the requirement that
the disjunction contains an equijoin predicate, and instead splits the
disjunction in all cases where it is possible to do so, thus enabling
more optimization opportunities.

Fixes #97695

Release note (performance improvement): if the session setting
`optimizer_use_improved_split_disjunction_for_joins` is true, the optimizer
now creates a better query plan in some cases where an inner, semi, or
anti join contains a join predicate with a disjuction (`OR` condition).


Co-authored-by: Rebecca Taft <[email protected]>
@craig craig bot closed this as completed in 554b1dd Feb 28, 2023
rytaft added a commit that referenced this issue Mar 1, 2023
Prior to this commit, when a join condition included a disjunction
(e.g. a OR b), in some cases we could remove the disjunction by splitting
the join into a UNION of joins to create a more efficient plan. However,
we were only performing this transformation if at least one side of the OR
predicate contained an equijoin predicate (e.g., t1.col1 = t2.col1). There
were other cases where we could have improved the plan by splitting the
disjunction, but we did not do so.

This commit improves our ability to optimize joins with disjunctions in
the join condition when there is the possibility to push one or both sides
of the disjunction below the join. This commit removes the requirement that
the disjunction contains an equijoin predicate, and instead splits the
disjunction in all cases where it is possible to do so, thus enabling
more optimization opportunities.

Fixes #97695

Release note (performance improvement): if the session setting
optimizer_use_improved_split_disjunction_for_joins is true, the optimizer
now creates a better query plan in some cases where an inner, semi, or
anti join contains a join predicate with a disjuction (OR condition).
@mgartner mgartner moved this to Done in SQL Queries Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant