Skip to content

Commit

Permalink
opt: split disjunction in join conditions in more cases
Browse files Browse the repository at this point in the history
Prior to this commit, when a join condition included a disjunction
(e.g. a OR b), in some cases we could remove the disjunction by splitting
the join into a UNION of joins to create a more efficient plan. However,
we were only performing this transformation if at least one side of the OR
predicate contained an equijoin predicate (e.g., t1.col1 = t2.col1). There
were other cases where we could have improved the plan by splitting the
disjunction, but we did not do so.

This commit improves our ability to optimize joins with disjunctions in
the join condition when there is the possibility to push one or both sides
of the disjunction below the join. This commit removes the requirement that
the disjunction contains an equijoin predicate, and instead splits the
disjunction in all cases where it is possible to do so, thus enabling
more optimization opportunities.

Fixes #97695

Release note (performance improvement): if the session setting
optimizer_use_improved_split_disjunction_for_joins is true, the optimizer
now creates a better query plan in some cases where an inner, semi, or
anti join contains a join predicate with a disjuction (OR condition).
  • Loading branch information
rytaft committed Feb 28, 2023
1 parent 861b0ee commit 12cdf64
Show file tree
Hide file tree
Showing 9 changed files with 1,836 additions and 629 deletions.
67 changes: 25 additions & 42 deletions pkg/sql/opt/exec/execbuilder/testdata/tpch_vec
Original file line number Diff line number Diff line change
Expand Up @@ -20925,48 +20925,31 @@ EXPLAIN (VEC) SELECT sum(l_extendedprice* (1 - l_discount)) AS revenue FROM line
└ *colexec.orderedAggregator
└ *colexecproj.projMultFloat64Float64Op
└ *colexecprojconst.projMinusFloat64ConstFloat64Op
└ *colexec.caseOp
├ *colexec.bufferOp
│ └ *colexecjoin.hashJoiner
│ ├ *colexecsel.selEQBytesBytesConstOp
│ │ └ *colexec.selectInOpBytes
│ │ └ *colfetcher.ColBatchScan
│ └ *colexecsel.selGEInt64Int64ConstOp
│ └ *colfetcher.ColBatchScan
├ *colexecbase.constBoolOp
│ └ *colexec.orProjOp
│ ├ *colexec.bufferOp
│ ├ *colexec.andProjOp
│ │ ├ *colexec.andProjOp
│ │ │ ├ *colexec.andProjOp
│ │ │ │ ├ *colexec.andProjOp
│ │ │ │ │ ├ *colexecprojconst.projEQBytesBytesConstOp
│ │ │ │ │ └ *colexec.projectInOpBytes
│ │ │ │ └ *colexecprojconst.projGEFloat64Float64ConstOp
│ │ │ └ *colexecprojconst.projLEFloat64Float64ConstOp
│ │ └ *colexecprojconst.projLEInt64Int64ConstOp
│ └ *colexec.andProjOp
│ ├ *colexec.andProjOp
│ │ ├ *colexec.andProjOp
│ │ │ ├ *colexec.andProjOp
│ │ │ │ ├ *colexecprojconst.projEQBytesBytesConstOp
│ │ │ │ └ *colexec.projectInOpBytes
│ │ │ └ *colexecprojconst.projGEFloat64Float64ConstOp
│ │ └ *colexecprojconst.projLEFloat64Float64ConstOp
│ └ *colexecprojconst.projLEInt64Int64ConstOp
├ *colexecbase.constBoolOp
│ └ *colexec.andProjOp
│ ├ *colexec.bufferOp
│ ├ *colexec.andProjOp
│ │ ├ *colexec.andProjOp
│ │ │ ├ *colexec.andProjOp
│ │ │ │ ├ *colexecprojconst.projEQBytesBytesConstOp
│ │ │ │ └ *colexec.projectInOpBytes
│ │ │ └ *colexecprojconst.projGEFloat64Float64ConstOp
│ │ └ *colexecprojconst.projLEFloat64Float64ConstOp
│ └ *colexecprojconst.projLEInt64Int64ConstOp
└ *colexecbase.constBoolOp
└ *colexec.bufferOp
└ *colexec.UnorderedDistinct
└ *colexec.SerialUnorderedSynchronizer
├ *rowexec.joinReader
│ └ *rowexec.joinReader
│ └ *colexec.selectInOpBytes
│ └ *colexecsel.selEQBytesBytesConstOp
│ └ *colexecsel.selLEInt64Int64ConstOp
│ └ *colexecsel.selGEInt64Int64ConstOp
│ └ *colfetcher.ColBatchScan
└ *colexec.UnorderedDistinct
└ *colexec.SerialUnorderedSynchronizer
├ *rowexec.joinReader
│ └ *rowexec.joinReader
│ └ *colexec.selectInOpBytes
│ └ *colexecsel.selEQBytesBytesConstOp
│ └ *colexecsel.selLEInt64Int64ConstOp
│ └ *colexecsel.selGEInt64Int64ConstOp
│ └ *colfetcher.ColBatchScan
└ *rowexec.joinReader
└ *rowexec.joinReader
└ *colexec.selectInOpBytes
└ *colexecsel.selEQBytesBytesConstOp
└ *colexecsel.selLEInt64Int64ConstOp
└ *colexecsel.selGEInt64Int64ConstOp
└ *colfetcher.ColBatchScan

# Query 20
query T
Expand Down
94 changes: 83 additions & 11 deletions pkg/sql/opt/memo/testdata/stats/join
Original file line number Diff line number Diff line change
Expand Up @@ -1648,20 +1648,92 @@ ALTER TABLE uv INJECT STATISTICS '[
opt
SELECT * FROM xysd, uv WHERE (s = 'foo' AND u = 3 AND v = 4) OR (s = 'bar' AND u = 5 AND v = 6)
----
inner-join (cross)
project
├── columns: x:1(int!null) y:2(int) s:3(string!null) d:4(decimal!null) u:7(int!null) v:8(int!null)
├── stats: [rows=59573.61, distinct(3)=2, null(3)=0, distinct(7)=2, null(7)=0, distinct(8)=2, null(8)=0, distinct(7,8)=2.18365, null(7,8)=0]
├── fd: (1)-->(2-4), (3,4)-->(1,2)
├── scan uv
│ ├── columns: u:7(int) v:8(int!null)
│ └── stats: [rows=10000, distinct(7)=500, null(7)=0, distinct(8)=100, null(8)=0, distinct(7,8)=550, null(7,8)=0]
├── scan xysd
│ ├── columns: x:1(int!null) y:2(int) s:3(string) d:4(decimal!null)
│ ├── stats: [rows=5000, distinct(1)=5000, null(1)=0, distinct(3)=10, null(3)=0, distinct(4)=500, null(4)=0]
│ ├── key: (1)
│ └── fd: (1)-->(2-4), (3,4)~~>(1,2)
└── filters
└── (((s:3 = 'foo') AND (u:7 = 3)) AND (v:8 = 4)) OR (((s:3 = 'bar') AND (u:7 = 5)) AND (v:8 = 6)) [type=bool, outer=(3,7,8), constraints=(/3: [/'bar' - /'bar'] [/'foo' - /'foo']; /7: [/3 - /3] [/5 - /5]; /8: [/4 - /4] [/6 - /6])]
└── distinct-on
├── columns: x:1(int!null) y:2(int) s:3(string!null) d:4(decimal!null) u:7(int!null) v:8(int!null) rowid:9(int!null)
├── grouping columns: x:1(int!null) rowid:9(int!null)
├── stats: [rows=16383.64, distinct(1,9)=16383.6, null(1,9)=0]
├── key: (1,9)
├── fd: (1,9)-->(2-4,7,8)
├── union-all
│ ├── columns: x:1(int!null) y:2(int) s:3(string!null) d:4(decimal!null) u:7(int!null) v:8(int!null) rowid:9(int!null)
│ ├── left columns: x:12(int) y:13(int) s:14(string) d:15(decimal) u:18(int) v:19(int) rowid:20(int)
│ ├── right columns: x:23(int) y:24(int) s:25(string) d:26(decimal) u:29(int) v:30(int) rowid:31(int)
│ ├── stats: [rows=16383.64, distinct(1,9)=16383.6, null(1,9)=0]
│ ├── inner-join (cross)
│ │ ├── columns: x:12(int!null) y:13(int) s:14(string!null) d:15(decimal!null) u:18(int!null) v:19(int!null) rowid:20(int!null)
│ │ ├── stats: [rows=8191.818, distinct(12,20)=8191.82, null(12,20)=0]
│ │ ├── key: (12,20)
│ │ ├── fd: ()-->(14,18,19), (12)-->(13,15), (15)-->(12,13)
│ │ ├── index-join xysd
│ │ │ ├── columns: x:12(int!null) y:13(int) s:14(string!null) d:15(decimal!null)
│ │ │ ├── stats: [rows=500, distinct(12)=500, null(12)=0, distinct(14)=1, null(14)=0]
│ │ │ ├── key: (12)
│ │ │ ├── fd: ()-->(14), (12)-->(13,15), (15)-->(12,13)
│ │ │ └── scan xysd@xysd_s_d_key
│ │ │ ├── columns: x:12(int!null) s:14(string!null) d:15(decimal!null)
│ │ │ ├── constraint: /-14/15: [/'foo' - /'foo']
│ │ │ ├── stats: [rows=500, distinct(14)=1, null(14)=0]
│ │ │ ├── key: (12)
│ │ │ └── fd: ()-->(14), (12)-->(15), (15)-->(12)
│ │ ├── select
│ │ │ ├── columns: u:18(int!null) v:19(int!null) rowid:20(int!null)
│ │ │ ├── stats: [rows=16.38364, distinct(18)=1, null(18)=0, distinct(19)=1, null(19)=0, distinct(20)=16.3836, null(20)=0, distinct(18,19)=1, null(18,19)=0]
│ │ │ ├── key: (20)
│ │ │ ├── fd: ()-->(18,19)
│ │ │ ├── scan uv
│ │ │ │ ├── columns: u:18(int) v:19(int!null) rowid:20(int!null)
│ │ │ │ ├── stats: [rows=10000, distinct(18)=500, null(18)=0, distinct(19)=100, null(19)=0, distinct(20)=10000, null(20)=0, distinct(18,19)=550, null(18,19)=0]
│ │ │ │ ├── key: (20)
│ │ │ │ └── fd: (20)-->(18,19)
│ │ │ └── filters
│ │ │ ├── u:18 = 3 [type=bool, outer=(18), constraints=(/18: [/3 - /3]; tight), fd=()-->(18)]
│ │ │ └── v:19 = 4 [type=bool, outer=(19), constraints=(/19: [/4 - /4]; tight), fd=()-->(19)]
│ │ └── filters (true)
│ └── inner-join (cross)
│ ├── columns: x:23(int!null) y:24(int) s:25(string!null) d:26(decimal!null) u:29(int!null) v:30(int!null) rowid:31(int!null)
│ ├── stats: [rows=8191.818, distinct(23,31)=8191.82, null(23,31)=0]
│ ├── key: (23,31)
│ ├── fd: ()-->(25,29,30), (23)-->(24,26), (26)-->(23,24)
│ ├── index-join xysd
│ │ ├── columns: x:23(int!null) y:24(int) s:25(string!null) d:26(decimal!null)
│ │ ├── stats: [rows=500, distinct(23)=500, null(23)=0, distinct(25)=1, null(25)=0]
│ │ ├── key: (23)
│ │ ├── fd: ()-->(25), (23)-->(24,26), (26)-->(23,24)
│ │ └── scan xysd@xysd_s_d_key
│ │ ├── columns: x:23(int!null) s:25(string!null) d:26(decimal!null)
│ │ ├── constraint: /-25/26: [/'bar' - /'bar']
│ │ ├── stats: [rows=500, distinct(25)=1, null(25)=0]
│ │ ├── key: (23)
│ │ └── fd: ()-->(25), (23)-->(26), (26)-->(23)
│ ├── select
│ │ ├── columns: u:29(int!null) v:30(int!null) rowid:31(int!null)
│ │ ├── stats: [rows=16.38364, distinct(29)=1, null(29)=0, distinct(30)=1, null(30)=0, distinct(31)=16.3836, null(31)=0, distinct(29,30)=1, null(29,30)=0]
│ │ ├── key: (31)
│ │ ├── fd: ()-->(29,30)
│ │ ├── scan uv
│ │ │ ├── columns: u:29(int) v:30(int!null) rowid:31(int!null)
│ │ │ ├── stats: [rows=10000, distinct(29)=500, null(29)=0, distinct(30)=100, null(30)=0, distinct(31)=10000, null(31)=0, distinct(29,30)=550, null(29,30)=0]
│ │ │ ├── key: (31)
│ │ │ └── fd: (31)-->(29,30)
│ │ └── filters
│ │ ├── u:29 = 5 [type=bool, outer=(29), constraints=(/29: [/5 - /5]; tight), fd=()-->(29)]
│ │ └── v:30 = 6 [type=bool, outer=(30), constraints=(/30: [/6 - /6]; tight), fd=()-->(30)]
│ └── filters (true)
└── aggregations
├── const-agg [as=y:2, type=int, outer=(2)]
│ └── y:2 [type=int]
├── const-agg [as=s:3, type=string, outer=(3)]
│ └── s:3 [type=string]
├── const-agg [as=d:4, type=decimal, outer=(4)]
│ └── d:4 [type=decimal]
├── const-agg [as=u:7, type=int, outer=(7)]
│ └── u:7 [type=int]
└── const-agg [as=v:8, type=int, outer=(8)]
└── v:8 [type=int]

# Test selectivity of ORed join predicates
# Estimate of # rows should be low, and nowhere near the no-stats
Expand Down
Loading

0 comments on commit 12cdf64

Please sign in to comment.