Skip to content

Commit

Permalink
opt: split disjunction in join conditions in more cases
Browse files Browse the repository at this point in the history
Prior to this commit, when a join condition included a disjunction
(e.g. a OR b), in some cases we could remove the disjunction by splitting
the join into a UNION of joins to create a more efficient plan. However,
we were only performing this transformation if at least one side of the OR
predicate contained an equijoin predicate (e.g., t1.col1 = t2.col1). There
were other cases where we could have improved the plan by splitting the
disjunction, but we did not do so.

This commit improves our ability to optimize joins with disjunctions in
the join condition when there is the possibility to push one or both sides
of the disjunction below the join. This commit adds logic to detect these
cases and splits the disjunction to make predicate push-down possible.

Fixes #97695

Release note (performance improvement): the optimizer now creates a
better query plan in some cases where an inner, semi, or anti join
contains a join predicate with a disjuction (OR condition). In cases where
one or both sides of the OR condition contains a conjunction with at least
one conjunct that references a single table, the optimizer now splits the
disjunction so that the conjunct referencing a single table can be pushed
below the join.
  • Loading branch information
rytaft committed Feb 27, 2023
1 parent ba85378 commit f0c4165
Show file tree
Hide file tree
Showing 9 changed files with 1,771 additions and 608 deletions.
67 changes: 25 additions & 42 deletions pkg/sql/opt/exec/execbuilder/testdata/tpch_vec
Original file line number Diff line number Diff line change
Expand Up @@ -20925,48 +20925,31 @@ EXPLAIN (VEC) SELECT sum(l_extendedprice* (1 - l_discount)) AS revenue FROM line
└ *colexec.orderedAggregator
└ *colexecproj.projMultFloat64Float64Op
└ *colexecprojconst.projMinusFloat64ConstFloat64Op
└ *colexec.caseOp
├ *colexec.bufferOp
│ └ *colexecjoin.hashJoiner
│ ├ *colexecsel.selEQBytesBytesConstOp
│ │ └ *colexec.selectInOpBytes
│ │ └ *colfetcher.ColBatchScan
│ └ *colexecsel.selGEInt64Int64ConstOp
│ └ *colfetcher.ColBatchScan
├ *colexecbase.constBoolOp
│ └ *colexec.orProjOp
│ ├ *colexec.bufferOp
│ ├ *colexec.andProjOp
│ │ ├ *colexec.andProjOp
│ │ │ ├ *colexec.andProjOp
│ │ │ │ ├ *colexec.andProjOp
│ │ │ │ │ ├ *colexecprojconst.projEQBytesBytesConstOp
│ │ │ │ │ └ *colexec.projectInOpBytes
│ │ │ │ └ *colexecprojconst.projGEFloat64Float64ConstOp
│ │ │ └ *colexecprojconst.projLEFloat64Float64ConstOp
│ │ └ *colexecprojconst.projLEInt64Int64ConstOp
│ └ *colexec.andProjOp
│ ├ *colexec.andProjOp
│ │ ├ *colexec.andProjOp
│ │ │ ├ *colexec.andProjOp
│ │ │ │ ├ *colexecprojconst.projEQBytesBytesConstOp
│ │ │ │ └ *colexec.projectInOpBytes
│ │ │ └ *colexecprojconst.projGEFloat64Float64ConstOp
│ │ └ *colexecprojconst.projLEFloat64Float64ConstOp
│ └ *colexecprojconst.projLEInt64Int64ConstOp
├ *colexecbase.constBoolOp
│ └ *colexec.andProjOp
│ ├ *colexec.bufferOp
│ ├ *colexec.andProjOp
│ │ ├ *colexec.andProjOp
│ │ │ ├ *colexec.andProjOp
│ │ │ │ ├ *colexecprojconst.projEQBytesBytesConstOp
│ │ │ │ └ *colexec.projectInOpBytes
│ │ │ └ *colexecprojconst.projGEFloat64Float64ConstOp
│ │ └ *colexecprojconst.projLEFloat64Float64ConstOp
│ └ *colexecprojconst.projLEInt64Int64ConstOp
└ *colexecbase.constBoolOp
└ *colexec.bufferOp
└ *colexec.UnorderedDistinct
└ *colexec.SerialUnorderedSynchronizer
├ *rowexec.joinReader
│ └ *rowexec.joinReader
│ └ *colexec.selectInOpBytes
│ └ *colexecsel.selEQBytesBytesConstOp
│ └ *colexecsel.selLEInt64Int64ConstOp
│ └ *colexecsel.selGEInt64Int64ConstOp
│ └ *colfetcher.ColBatchScan
└ *colexec.UnorderedDistinct
└ *colexec.SerialUnorderedSynchronizer
├ *rowexec.joinReader
│ └ *rowexec.joinReader
│ └ *colexec.selectInOpBytes
│ └ *colexecsel.selEQBytesBytesConstOp
│ └ *colexecsel.selLEInt64Int64ConstOp
│ └ *colexecsel.selGEInt64Int64ConstOp
│ └ *colfetcher.ColBatchScan
└ *rowexec.joinReader
└ *rowexec.joinReader
└ *colexec.selectInOpBytes
└ *colexecsel.selEQBytesBytesConstOp
└ *colexecsel.selLEInt64Int64ConstOp
└ *colexecsel.selGEInt64Int64ConstOp
└ *colfetcher.ColBatchScan

# Query 20
query T
Expand Down
94 changes: 83 additions & 11 deletions pkg/sql/opt/memo/testdata/stats/join
Original file line number Diff line number Diff line change
Expand Up @@ -1648,20 +1648,92 @@ ALTER TABLE uv INJECT STATISTICS '[
opt
SELECT * FROM xysd, uv WHERE (s = 'foo' AND u = 3 AND v = 4) OR (s = 'bar' AND u = 5 AND v = 6)
----
inner-join (cross)
project
├── columns: x:1(int!null) y:2(int) s:3(string!null) d:4(decimal!null) u:7(int!null) v:8(int!null)
├── stats: [rows=59573.61, distinct(3)=2, null(3)=0, distinct(7)=2, null(7)=0, distinct(8)=2, null(8)=0, distinct(7,8)=2.18365, null(7,8)=0]
├── fd: (1)-->(2-4), (3,4)-->(1,2)
├── scan uv
│ ├── columns: u:7(int) v:8(int!null)
│ └── stats: [rows=10000, distinct(7)=500, null(7)=0, distinct(8)=100, null(8)=0, distinct(7,8)=550, null(7,8)=0]
├── scan xysd
│ ├── columns: x:1(int!null) y:2(int) s:3(string) d:4(decimal!null)
│ ├── stats: [rows=5000, distinct(1)=5000, null(1)=0, distinct(3)=10, null(3)=0, distinct(4)=500, null(4)=0]
│ ├── key: (1)
│ └── fd: (1)-->(2-4), (3,4)~~>(1,2)
└── filters
└── (((s:3 = 'foo') AND (u:7 = 3)) AND (v:8 = 4)) OR (((s:3 = 'bar') AND (u:7 = 5)) AND (v:8 = 6)) [type=bool, outer=(3,7,8), constraints=(/3: [/'bar' - /'bar'] [/'foo' - /'foo']; /7: [/3 - /3] [/5 - /5]; /8: [/4 - /4] [/6 - /6])]
└── distinct-on
├── columns: x:1(int!null) y:2(int) s:3(string!null) d:4(decimal!null) u:7(int!null) v:8(int!null) rowid:9(int!null)
├── grouping columns: x:1(int!null) rowid:9(int!null)
├── stats: [rows=16383.64, distinct(1,9)=16383.6, null(1,9)=0]
├── key: (1,9)
├── fd: (1,9)-->(2-4,7,8)
├── union-all
│ ├── columns: x:1(int!null) y:2(int) s:3(string!null) d:4(decimal!null) u:7(int!null) v:8(int!null) rowid:9(int!null)
│ ├── left columns: x:12(int) y:13(int) s:14(string) d:15(decimal) u:18(int) v:19(int) rowid:20(int)
│ ├── right columns: x:23(int) y:24(int) s:25(string) d:26(decimal) u:29(int) v:30(int) rowid:31(int)
│ ├── stats: [rows=16383.64, distinct(1,9)=16383.6, null(1,9)=0]
│ ├── inner-join (cross)
│ │ ├── columns: x:12(int!null) y:13(int) s:14(string!null) d:15(decimal!null) u:18(int!null) v:19(int!null) rowid:20(int!null)
│ │ ├── stats: [rows=8191.818, distinct(12,20)=8191.82, null(12,20)=0]
│ │ ├── key: (12,20)
│ │ ├── fd: ()-->(14,18,19), (12)-->(13,15), (15)-->(12,13)
│ │ ├── index-join xysd
│ │ │ ├── columns: x:12(int!null) y:13(int) s:14(string!null) d:15(decimal!null)
│ │ │ ├── stats: [rows=500, distinct(12)=500, null(12)=0, distinct(14)=1, null(14)=0]
│ │ │ ├── key: (12)
│ │ │ ├── fd: ()-->(14), (12)-->(13,15), (15)-->(12,13)
│ │ │ └── scan xysd@xysd_s_d_key
│ │ │ ├── columns: x:12(int!null) s:14(string!null) d:15(decimal!null)
│ │ │ ├── constraint: /-14/15: [/'foo' - /'foo']
│ │ │ ├── stats: [rows=500, distinct(14)=1, null(14)=0]
│ │ │ ├── key: (12)
│ │ │ └── fd: ()-->(14), (12)-->(15), (15)-->(12)
│ │ ├── select
│ │ │ ├── columns: u:18(int!null) v:19(int!null) rowid:20(int!null)
│ │ │ ├── stats: [rows=16.38364, distinct(18)=1, null(18)=0, distinct(19)=1, null(19)=0, distinct(20)=16.3836, null(20)=0, distinct(18,19)=1, null(18,19)=0]
│ │ │ ├── key: (20)
│ │ │ ├── fd: ()-->(18,19)
│ │ │ ├── scan uv
│ │ │ │ ├── columns: u:18(int) v:19(int!null) rowid:20(int!null)
│ │ │ │ ├── stats: [rows=10000, distinct(18)=500, null(18)=0, distinct(19)=100, null(19)=0, distinct(20)=10000, null(20)=0, distinct(18,19)=550, null(18,19)=0]
│ │ │ │ ├── key: (20)
│ │ │ │ └── fd: (20)-->(18,19)
│ │ │ └── filters
│ │ │ ├── u:18 = 3 [type=bool, outer=(18), constraints=(/18: [/3 - /3]; tight), fd=()-->(18)]
│ │ │ └── v:19 = 4 [type=bool, outer=(19), constraints=(/19: [/4 - /4]; tight), fd=()-->(19)]
│ │ └── filters (true)
│ └── inner-join (cross)
│ ├── columns: x:23(int!null) y:24(int) s:25(string!null) d:26(decimal!null) u:29(int!null) v:30(int!null) rowid:31(int!null)
│ ├── stats: [rows=8191.818, distinct(23,31)=8191.82, null(23,31)=0]
│ ├── key: (23,31)
│ ├── fd: ()-->(25,29,30), (23)-->(24,26), (26)-->(23,24)
│ ├── index-join xysd
│ │ ├── columns: x:23(int!null) y:24(int) s:25(string!null) d:26(decimal!null)
│ │ ├── stats: [rows=500, distinct(23)=500, null(23)=0, distinct(25)=1, null(25)=0]
│ │ ├── key: (23)
│ │ ├── fd: ()-->(25), (23)-->(24,26), (26)-->(23,24)
│ │ └── scan xysd@xysd_s_d_key
│ │ ├── columns: x:23(int!null) s:25(string!null) d:26(decimal!null)
│ │ ├── constraint: /-25/26: [/'bar' - /'bar']
│ │ ├── stats: [rows=500, distinct(25)=1, null(25)=0]
│ │ ├── key: (23)
│ │ └── fd: ()-->(25), (23)-->(26), (26)-->(23)
│ ├── select
│ │ ├── columns: u:29(int!null) v:30(int!null) rowid:31(int!null)
│ │ ├── stats: [rows=16.38364, distinct(29)=1, null(29)=0, distinct(30)=1, null(30)=0, distinct(31)=16.3836, null(31)=0, distinct(29,30)=1, null(29,30)=0]
│ │ ├── key: (31)
│ │ ├── fd: ()-->(29,30)
│ │ ├── scan uv
│ │ │ ├── columns: u:29(int) v:30(int!null) rowid:31(int!null)
│ │ │ ├── stats: [rows=10000, distinct(29)=500, null(29)=0, distinct(30)=100, null(30)=0, distinct(31)=10000, null(31)=0, distinct(29,30)=550, null(29,30)=0]
│ │ │ ├── key: (31)
│ │ │ └── fd: (31)-->(29,30)
│ │ └── filters
│ │ ├── u:29 = 5 [type=bool, outer=(29), constraints=(/29: [/5 - /5]; tight), fd=()-->(29)]
│ │ └── v:30 = 6 [type=bool, outer=(30), constraints=(/30: [/6 - /6]; tight), fd=()-->(30)]
│ └── filters (true)
└── aggregations
├── const-agg [as=y:2, type=int, outer=(2)]
│ └── y:2 [type=int]
├── const-agg [as=s:3, type=string, outer=(3)]
│ └── s:3 [type=string]
├── const-agg [as=d:4, type=decimal, outer=(4)]
│ └── d:4 [type=decimal]
├── const-agg [as=u:7, type=int, outer=(7)]
│ └── u:7 [type=int]
└── const-agg [as=v:8, type=int, outer=(8)]
└── v:8 [type=int]

# Test selectivity of ORed join predicates
# Estimate of # rows should be low, and nowhere near the no-stats
Expand Down
Loading

0 comments on commit f0c4165

Please sign in to comment.