-
Notifications
You must be signed in to change notification settings - Fork 3.9k
/
Copy pathjoin.opt
427 lines (413 loc) · 15.4 KB
/
join.opt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
# =============================================================================
# join.opt contains exploration rules for the Join operator.
# =============================================================================
# ReorderJoins matches the first expression of a join group and adds to the memo
# all valid join orderings that do not introduce cross joins. If the join has
# join hints or is the result of a previous join reordering, the join tree is
# not reordered. For more information, see the comment in join_order_builder.go.
#
# Citations: [8]
[ReorderJoins, Explore]
(InnerJoin | SemiJoin | AntiJoin | LeftJoin | FullJoin
* & (ShouldReorderJoins (Root))
)
=>
(ReorderJoins)
# CommuteLeftJoin creates a Join with the left and right inputs swapped.
# This is symmetric with the CommuteRightJoin normalization rule.
[CommuteLeftJoin, Explore]
(LeftJoin $left:* $right:* $on:* $private:*)
=>
(RightJoin $right $left $on (CommuteJoinFlags $private))
# CommuteSemiJoin generates an InnerJoin that is equivalent to the SemiJoin.
# SemiJoins impose a partial order on the joining tables. We can convert a
# SemiJoin into an InnerJoin by applying a DistinctOn operator on the selected
# rows of the RHS and then relaxing the partial join order restriction.
#
# This allows the join orders (A SemiJoin B) and (Distinct(B*) InnerJoin A) to be
# both considered by the optimizer. This is useful as a different join order may
# allow other rules to trigger. A common case is that, it would allow the inner
# join to use a lookup join in some cases (For example, a different join order
# would allow the use of a lookup join if A has much higher cardinality than B).
#
# We only do this when the On conditions guarantee that for each row in the LHS
# there is at most one unique matching row in the RHS. We need this because a
# SemiJoin emits a maximum of one row for every matching row in the LHS.
# This is an important difference between the behavior of a SemiJoin and an
# InnerJoin. For each row in the LHS, an InnerJoin emits a matching row for every
# row in the RHS where the conditions are met. For example consider the tables:
#
# lhs rhs
# +-----+ +------+
# 1 10
# 2 20
#
# If we do an InnerJoin on the table where the On condition is (lhs < rhs),
# you'll notice that each of the lhs rows are matched twice. And so the output
# of the InnerJoin would contain 2 rows for each matching row in the LHS.
# In order to guarantee that there is at most 1 matching row for every row in
# the LHS, we only commute a SemiJoin into an InnerJoin when the On conditions
# are only composed of equalities.
#
# Note: We only consider the columns of the RHS that are used in the On
# conditions (B* in the example above). And so we can be certain that
# ((Distinct(RHS*) InnerJoin LHS) will have at most 1 matching row for each row
# in the LHS if the On conditions are simple equalities.
#
# Note that if the join was a result of a join reordering, the new join will not
# be reordered again (because it inherits the the SkipReorderJoins flag in the
# private).
#
# Citations: [7] (see section 2.1.1)
[CommuteSemiJoin, Explore]
(SemiJoin
$left:*
$right:*
$on:* & (IsSimpleEquality $on)
$private:* & (NoJoinHints $private)
)
=>
(Project
(InnerJoin
$left
(DistinctOn
$right
[]
(MakeGrouping
(IntersectionCols
(OutputCols $right)
(FilterOuterCols $on)
)
(EmptyOrdering)
)
)
$on
$private
)
[]
(OutputCols $left)
)
# ConvertSemiToInnerJoin applies in the cases where CommuteSemiJoin does not:
# when the ON condition is not a simple equality. In this case, we must perform
# a DisinctOn operation *after* the inner join, followed by a project to remove
# the right-side columns.
#
# Similar to CommuteSemiJoin, this rule allows semi joins to be commuted. This
# rule is also useful because it allows us to generate lookup joins and
# inverted lookup joins for cases where the index is not covering. For inverted
# lookup joins, the GenerateInvertedJoins* rules will also apply, and should
# be lower cost since they avoid the DistinctOn operation.
[ConvertSemiToInnerJoin, Explore]
(SemiJoin
$left:*
$right:*
$on:* & ^(IsSimpleEquality $on)
$private:* & (NoJoinHints $private)
)
=>
(Project
(DistinctOn
(InnerJoin
$newLeft:(EnsureKey $left)
$right
$on
$private
)
(MakeAggCols ConstAgg (NonKeyCols $newLeft))
(MakeGrouping (KeyCols $newLeft) (EmptyOrdering))
)
[]
(OutputCols $left)
)
# GenerateMergeJoins creates MergeJoin operators for the join, using the
# interesting orderings property.
[GenerateMergeJoins, Explore]
(JoinNonApply $left:* $right:* $on:* $private:*)
=>
(GenerateMergeJoins (OpName) $left $right $on $private)
# GenerateLookupJoins creates LookupJoin operators for all indexes (of the Scan
# table) which allow it (including non-covering indexes). See the
# GenerateLookupJoins custom function for more details.
[GenerateLookupJoins, Explore]
(InnerJoin | LeftJoin | SemiJoin | AntiJoin
$left:*
(Scan $scanPrivate:*) & (IsCanonicalScan $scanPrivate)
$on:*
$private:*
)
=>
(GenerateLookupJoins (OpName) $left $scanPrivate $on $private)
# TODO(rytaft): The following uses paired-joins, in GenerateInvertedJoins.
# Using paired-joins may be desirable in more cases than just spatial queries.
# For example, it could enable the use of a lookup join with a non-covering index.
# GenerateInvertedJoins creates InvertedJoin operators for all inverted
# indexes (of the Scan table) which allow it. See the GenerateInvertedJoins
# custom function for more details.
[GenerateInvertedJoins, Explore]
(InnerJoin | LeftJoin | SemiJoin | AntiJoin
$left:*
(Scan $scanPrivate:*) &
(IsCanonicalScan $scanPrivate) &
(HasInvertedIndexes $scanPrivate)
$on:*
$private:*
)
=>
(GenerateInvertedJoins (OpName) $left $scanPrivate $on $private)
# GenerateInvertedJoinsFromSelect is similar to GenerateInvertedJoins, but
# applies when the input is a Select.
[GenerateInvertedJoinsFromSelect, Explore]
(InnerJoin | LeftJoin | SemiJoin | AntiJoin
$left:*
(Select
(Scan $scanPrivate:*) &
(IsCanonicalScan $scanPrivate) &
(HasInvertedIndexes $scanPrivate)
$filters:*
)
$on:*
$private:*
)
=>
(GenerateInvertedJoins
(OpName)
$left
$scanPrivate
(ConcatFilters $on $filters)
$private
)
# GenerateLookupJoinWithFilter creates a LookupJoin alternative for a Join which
# has a Select->Scan combination as its right input. The filter can get merged
# with the ON condition (this is correct for inner, left, and semi/anti join).
[GenerateLookupJoinsWithFilter, Explore]
(InnerJoin | LeftJoin | SemiJoin | AntiJoin
$left:*
(Select
(Scan $scanPrivate:*) & (IsCanonicalScan $scanPrivate)
$filters:*
)
$on:*
$private:*
)
=>
(GenerateLookupJoins
(OpName)
$left
$scanPrivate
(ConcatFilters $on $filters)
$private
)
# PushJoinIntoIndexJoin pushes an InnerJoin into an IndexJoin. The IndexJoin is
# replaced with a LookupJoin, since it now must output columns from the right
# side of the InnerJoin as well as from the original lookup table. This can
# be useful when the InnerJoin that is pushed down reduces cardinality, since an
# index lookup can be expensive.
#
# Matching conditions:
# 1. The right input of the InnerJoin does not have outer columns.
# 2. The ON condition of the InnerJoin only references columns from its right
# input and the input of the IndexJoin.
# 3. The InnerJoin does not have any join hints.
#
# TODO(drewk): match LeftJoins as well if a good use case is found.
[PushJoinIntoIndexJoin, Explore]
(InnerJoin
$left:(IndexJoin $indexInput:* $indexPrivate:*)
$right:* & ^(HasOuterCols $right)
$on:* & (FiltersBoundBy $on (OutputCols2 $indexInput $right))
$joinPrivate:* & (NoJoinHints $joinPrivate)
)
=>
(LookupJoin
(InnerJoin $indexInput $right $on $joinPrivate)
[]
(ConvertIndexToLookupJoinPrivate
$indexPrivate
(OutputCols2 $left $right)
)
)
# HoistProjectFromInnerJoin lifts a Project from under an InnerJoin; the join
# can then be subject to other rules, most importantly allowing it to become a
# lookup join.
#
# As long as the projections are non-volatile, it is equivalent to calculate
# them on every output row.
#
# Note that if the join was a result of a join reordering, the new join will not
# be reordered again (because it inherits the the SkipReorderJoins flag in the
# private).
#
# TODO(radu): we could make the rule work even when the ON condition depends on
# a projection (by inlining the projection).
[HoistProjectFromInnerJoin, Explore]
(InnerJoin
$left:*
$proj:(Project
$right:*
$projections:* & ^(HasVolatileProjection $projections)
$passthrough:*
)
$on:* &
^(ColsIntersect
(ProjectionCols $projections)
(FilterOuterCols $on)
)
$private:*
)
=>
(Project
(InnerJoin $left $right $on $private)
$projections
(UnionCols $passthrough (OutputCols $left))
)
# HoistProjectFromLeftJoin pulls a project from under an LeftJoin; the join can
# then be subject to other rules, most importantly allowing it to become a
# lookup join.
#
# This rule is similar to HoistProjectFromInnerJoin, except that we have to
# handle "outer" rows correctly and make sure we project NULLs, regardless of
# the expressions. First we find a canary column from the right input which is
# null iff the output row is a null-extended left row. Then, we wrap each
# projection in a CASE statement which is null if the canary column is null and
# equivalent to the original projection if the canary column is not null".
#
# Note that if the join was a result of a join reordering, the new join will not
# be reordered again (because it inherits the the SkipReorderJoins flag in the
# private).
#
# TODO(radu): we could make the rule work even when the ON condition depends on
# a projection (by inlining the projection).
#
# TODO(radu): we could make the rule augment an input Scan with a non-null
# column from the table.
[HoistProjectFromLeftJoin, Explore]
(LeftJoin
$left:*
(Project
$right:*
$projections:* & ^(HasVolatileProjection $projections)
$passthrough:*
)
$on:* &
^(ColsIntersect
(ProjectionCols $projections)
(FilterOuterCols $on)
) &
(FoundCanaryColumn
$canaryCol:(FindLeftJoinCanaryColumn $right $on)
)
$private:*
)
=>
(Project
(LeftJoin $left $right $on $private)
(MakeProjectionsForOuterJoin $canaryCol $projections)
(UnionCols $passthrough (OutputCols $left))
)
# GenerateLocalityOptimizedAntiJoin converts an anti join into a locality
# optimized anti join if possible. A locality optimized anti join is implemented
# as a nested pair of anti lookup joins and is designed to avoid communicating
# with remote nodes (relative to the gateway region) if at all possible.
#
# A locality optimized anti join can be planned under the following conditions:
# - The anti join can be planned as a lookup join.
# - The lookup join scans multiple spans in the lookup index for each input
# row, with some spans targeting partitions on local nodes (relative to the
# gateway region), and some targeting partitions on remote nodes. It is not
# known which span(s) will contain the matching row(s).
#
# The result of GenerateLocalityOptimizedAntiJoin will be a nested pair of anti
# lookup joins in which the first lookup join is an anti join targeting the
# local values from the original join, and the second lookup join is an anti
# join targeting the remote values. Because of the way anti join is defined, a
# row will only be returned by the first anti join if a match is *not* found
# locally. If a match is found, no row will be returned and therefore the second
# lookup join will not need to search the remote nodes. This nested pair of anti
# joins is logically equivalent to the original, single anti join.
#
# This is a useful optimization if there is locality of access in the workload,
# such that rows tend to be accessed from the region where they are located. If
# there is no locality of access, using a locality optimized anti join could be
# a slight pessimization, since rows residing in remote regions will be found
# slightly more slowly than they would be otherwise.
#
# For example, suppose we have a multi-region database with regions 'us-east1',
# 'us-west1' and 'europe-west1', and we have the following tables and query,
# issued from 'us-east1':
#
# CREATE TABLE parent (
# p_id INT PRIMARY KEY
# ) LOCALITY REGIONAL BY ROW;
#
# CREATE TABLE child (
# c_id INT PRIMARY KEY,
# c_p_id INT REFERENCES parent (p_id)
# ) LOCALITY REGIONAL BY ROW;
#
# SELECT * FROM child WHERE NOT EXISTS (
# SELECT * FROM parent WHERE p_id = c_p_id
# ) AND c_id = 10;
#
# Normally, this would produce the following plan:
#
# anti-join (lookup parent)
# ├── lookup columns are key
# ├── lookup expr: (p_id = c_p_id) AND (crdb_region IN ('europe-west1', 'us-east1', 'us-west1'))
# ├── scan child
# │ └── constraint: /7/5
# │ ├── [/'europe-west1'/10 - /'europe-west1'/10]
# │ ├── [/'us-east1'/10 - /'us-east1'/10]
# │ └── [/'us-west1'/10 - /'us-west1'/10]
# └── filters (true)
#
# but if the session setting locality_optimized_partitioned_index_scan is enabled,
# the optimizer will produce this plan, using locality optimized search, both for
# the scan of child and for the lookup join with parent. See the rule
# GenerateLocalityOptimizedScan for details about how the optimization is applied
# for scans.
#
# anti-join (lookup parent)
# ├── lookup columns are key
# ├── lookup expr: (p_id = c_p_id) AND (crdb_region IN ('europe-west1', 'us-west1'))
# ├── anti-join (lookup parent)
# │ ├── lookup columns are key
# │ ├── lookup expr: (p_id = c_p_id) AND (crdb_region = 'us-east1')
# │ ├── locality-optimized-search
# │ │ ├── scan child
# │ │ │ └── constraint: /13/11: [/'us-east1'/10 - /'us-east1'/10]
# │ │ └── scan child
# │ │ └── constraint: /18/16
# │ │ ├── [/'europe-west1'/10 - /'europe-west1'/10]
# │ │ └── [/'us-west1'/10 - /'us-west1'/10]
# │ └── filters (true)
# └── filters (true)
#
# As long as child.c_id = 10 and the matching row in parent are both located in
# 'us-east1', the second plan will be much faster. But if they are located in
# one of the other regions, the first plan would be slightly faster.
[GenerateLocalityOptimizedAntiJoin, Explore]
(LookupJoin
$input:*
$on:*
$private:* &
(LocalAndRemoteLookupExprsSucceeded
$localAndRemoteLookupExprs:(GetLocalityOptimizedAntiJoinLookupExprs
$input
$private
)
)
)
=>
(LookupJoin
(LookupJoin
$input
$on
(CreateLocalityOptimizedAntiLookupJoinPrivate
(LocalLookupExpr $localAndRemoteLookupExprs) $private
)
)
$on
(CreateLocalityOptimizedAntiLookupJoinPrivate
(RemoteLookupExpr $localAndRemoteLookupExprs) $private
)
)