Skip to content

Commit

Permalink
opt: do not build cross-joins on inputs of lookup joins
Browse files Browse the repository at this point in the history
Rules that generate lookup joins no longer build cross-joins on the
input of the lookup join to constrain an index column to a set of
constant values. Instead, a lookup expression is used which generates
multiple lookup spans during execution.

Building a cross-join added extra complexity to the optimizer logic and
has been shown to be error prone (see cockroachdb#78681 and cockroachdb#79384). Also, a
cross-join could significantly increase the number of input rows, so it
was likely less efficient than using a lookup expression.

If an index column is constrained to a single constant value, we still
project the value in the input of the lookup join so that it can be used
as an equality column. This is more performant than using a lookup
expression with an equality condition (see cockroachdb#66726).

This commit the some other notable changes to the lookup join generation
logic:

  1. `inputProjections` is now initialized with a specific capacity to
     ensure that future appends will not cause allocations.
  2. The logic has been reordered. The case when a single constant value
     is projected is handled before the multiple constant value or range
     cases.
  3. An unnecessary projection is no longer added for single constant
     values when we revert to using a lookup expression. An explicit
     test was added for this.
  4. Fix `isCanonicalLookupJoinFilter` so that expressions with
     inequalities and either an equality or `IN` are not used as lookup
     expressions. The `multiSpanGenerator` does not support these types
     of expressions, and would panic if it encountered them. This bug has
     existed for a while, but has been hidden because of the preference
     for the cross-join strategy. Note that there are several cases
     where a lookup join contains filters in the `ON` expression that
     are redundant with the lookup expression. I'm planning on
     addressing this in a future commit.

Note that this commit does not affect inverted joins. Cross-joins can
still be generated on inverted join inputs.

Release note: None
  • Loading branch information
mgartner committed Apr 11, 2022
1 parent cbbfe24 commit 12c0e9b
Show file tree
Hide file tree
Showing 14 changed files with 756 additions and 1,025 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2080,43 +2080,25 @@ SELECT * FROM [EXPLAIN UPSERT INTO regional_by_row_table_virt (pk, a, b) VALUES
│ │
│ └── • error if rows
│ │
│ └── • limit
│ │ count: 1
│ └── • lookup join (semi)
│ │ table: regional_by_row_table_virt@regional_by_row_table_virt_v_key
│ │ lookup condition: (v_comp = v) AND (crdb_region IN ('ap-southeast-2', 'ca-central-1', 'us-east-1'))
│ │ pred: (upsert_pk != pk) OR (upsert_crdb_region != crdb_region)
│ │
│ └── • lookup join
│ │ table: regional_by_row_table_virt@regional_by_row_table_virt_v_key
│ │ equality: (lookup_join_const_col_@30, v_comp) = (crdb_region,v)
│ │ equality cols are key
│ │ pred: (upsert_pk != pk) OR (upsert_crdb_region != crdb_region)
│ │
│ └── • cross join
│ │
│ ├── • values
│ │ size: 1 column, 3 rows
│ │
│ └── • scan buffer
│ label: buffer 1
│ └── • scan buffer
│ label: buffer 1
└── • constraint-check
└── • error if rows
└── • limit
│ count: 1
└── • lookup join (semi)
│ table: regional_by_row_table_virt@regional_by_row_table_virt_expr_key
│ lookup condition: (crdb_internal_idx_expr_comp = crdb_internal_idx_expr) AND (crdb_region IN ('ap-southeast-2', 'ca-central-1', 'us-east-1'))
│ pred: (upsert_pk != pk) OR (upsert_crdb_region != crdb_region)
└── • lookup join
│ table: regional_by_row_table_virt@regional_by_row_table_virt_expr_key
│ equality: (lookup_join_const_col_@44, crdb_internal_idx_expr_comp) = (crdb_region,crdb_internal_idx_expr)
│ equality cols are key
│ pred: (upsert_pk != pk) OR (upsert_crdb_region != crdb_region)
└── • cross join
├── • values
│ size: 1 column, 3 rows
└── • scan buffer
label: buffer 1
└── • scan buffer
label: buffer 1

statement ok
INSERT INTO regional_by_row_table_virt (pk, a, b) VALUES (1, 1, 1)
Expand Down
2 changes: 1 addition & 1 deletion pkg/sql/logictest/testdata/logic_test/lookup_join_spans
Original file line number Diff line number Diff line change
Expand Up @@ -462,7 +462,7 @@ FROM metric_values as v
INNER JOIN metrics as m
ON metric_id=id
WHERE
time < '2020-01-01 00:00:10+00:00' AND
time < '2020-01-01 00:00:10+00:00' AND
name='cpu' AND
v.nullable = m.nullable
ORDER BY value
Expand Down
30 changes: 9 additions & 21 deletions pkg/sql/opt/exec/execbuilder/testdata/lookup_join_spans
Original file line number Diff line number Diff line change
Expand Up @@ -915,27 +915,15 @@ vectorized: true
│ columns: (s_i_id, ol_o_id)
│ estimated row count: 196 (missing stats)
└── • project
└── • lookup join (inner)
│ columns: (s_i_id, ol_o_id, ol_i_id)
│ estimated row count: 196 (missing stats)
│ table: order_line@ol_io
│ lookup condition: (s_i_id = ol_i_id) AND (ol_o_id IN (20, 21))
│ pred: (ol_o_id IN (19, 20, 21)) AND (ol_o_id >= 20)
└── • lookup join (inner)
│ columns: (s_i_id, "lookup_join_const_col_@5", ol_o_id, ol_i_id)
│ table: order_line@ol_io
│ equality: (s_i_id, lookup_join_const_col_@5) = (ol_i_id,ol_o_id)
└── • cross join (inner)
│ columns: (s_i_id, "lookup_join_const_col_@5")
│ estimated row count: 2,000 (missing stats)
├── • scan
│ columns: (s_i_id)
│ estimated row count: 1,000 (missing stats)
│ table: stock@stock_pkey
│ spans: FULL SCAN
└── • values
columns: ("lookup_join_const_col_@5")
size: 1 column, 2 rows
row 0, expr 0: 20
row 1, expr 0: 21
└── • scan
columns: (s_i_id)
estimated row count: 1,000 (missing stats)
table: stock@stock_pkey
spans: FULL SCAN
402 changes: 77 additions & 325 deletions pkg/sql/opt/exec/execbuilder/testdata/unique

Large diffs are not rendered by default.

12 changes: 5 additions & 7 deletions pkg/sql/opt/exec/explain/testdata/gists_tpce
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ WHERE tr_b_id = b_id
GROUP BY b_name
ORDER BY 2 DESC
----
hash: 14732118561700026222
plan-gist: AgGUAQQAPAAAAAGqAQQAAwIAABQAogEEAgAUAJgBBAIAFACsAQQCAAkAAgIAARQAhgECAgEHBAsCBwQRBgQ=
hash: 13421148216330463175
plan-gist: AgGUAQQAPAAAAAGqAQQAAwIAABQAogEEAgAUAJgBBAIAFACsAQQCAAkAAgIAARQAhgEEAAEHBAsCBwQRBgQ=
explain(shape):
• sort
│ order: -sum
Expand All @@ -35,10 +35,9 @@ explain(shape):
└── • render
└── • lookup join
│ table: broker@broker_pkey
│ equality: (tr_b_id) = (b_id)
│ table: broker@broker_b_name_idx
│ equality cols are key
pred: b_name IN _
lookup condition: (tr_b_id = b_id) AND (b_name IN _)
└── • hash join
│ equality: (tr_s_symb) = (s_symb)
Expand Down Expand Up @@ -77,8 +76,7 @@ explain(gist):
└── • render
└── • lookup join
│ table: broker@broker_pkey
│ equality: (tr_s_symb) = (b_id)
│ table: broker@broker_b_name_idx
│ equality cols are key
└── • hash join
Expand Down
161 changes: 72 additions & 89 deletions pkg/sql/opt/xform/join_funcs.go
Original file line number Diff line number Diff line change
Expand Up @@ -446,81 +446,61 @@ func (c *CustomFuncs) generateLookupJoinsImpl(
// join implements logic equivalent to simple equality between
// columns (where NULL never equals anything).
foundVals, allIdx, ok := c.findJoinFilterConstants(allFilters, idxCol)
var foundRange bool
if !ok {
// Also allow a limited form of range condition filters.
allIdx, foundRange = c.findJoinFilterRange(allFilters, idxCol)
if !foundRange {
break
}
if ok && len(foundVals) == 1 {
// If a single constant value was found, project it in the input
// and use it as an equality column.
idxColType := c.e.f.Metadata().ColumnMeta(idxCol).Type
constColID := c.e.f.Metadata().AddColumn(
fmt.Sprintf("lookup_join_const_col_@%d", idxCol),
idxColType,
)
inputProjections = append(inputProjections, c.e.f.ConstructProjectionsItem(
c.e.f.ConstructConstVal(foundVals[0], idxColType),
constColID,
))
constFilters = append(constFilters, allFilters[allIdx])
lookupJoin.KeyCols = append(lookupJoin.KeyCols, constColID)
rightSideCols = append(rightSideCols, idxCol)
continue
}

if len(foundVals) > 1 {
if joinType == opt.LeftJoinOp || joinType == opt.SemiJoinOp || joinType == opt.AntiJoinOp {
// We cannot use the method constructJoinWithConstants to
// create a cross join for left, semi, or anti joins,
// because constructing a cross join with foundVals will
// increase the size of the input. As a result, non-matching
// input rows will show up more than once in the output,
// which is incorrect (see #59615 and #78681).
shouldBuildMultiSpanLookupJoin = true
break
}
if j == 0 && projectedVirtualCols.Empty() && index.PartitionCount() > 1 {
// If this is the first index column and there is more than one
// partition, we may be able to build a locality optimized lookup
// join. This requires a multi-span lookup join as a starting point.
// See GenerateLocalityOptimizedLookupJoin for details.
//
// Note that we do not currently support locality optimized
// lookup joins for indexes on virtual columns.
shouldBuildMultiSpanLookupJoin = true
break
}
var foundRange bool
if !ok {
// If constant values were not found, try to find a filter that
// constrains this index column to a range.
_, foundRange = c.findJoinFilterRange(allFilters, idxCol)
}

if foundRange {
// If more than one constant value or a range to constrain the index
// column was found, use a LookupExpr rather than KeyCols.
if len(foundVals) > 1 || foundRange {
shouldBuildMultiSpanLookupJoin = true
break
}

// We will join these constant values with the input to make
// equality columns for the lookup join.
if constFilters == nil {
constFilters = make(memo.FiltersExpr, 0, numIndexKeyCols-j)
}

idxColType := c.e.f.Metadata().ColumnMeta(idxCol).Type
constColAlias := fmt.Sprintf("lookup_join_const_col_@%d", idxCol)
join, constColID := c.constructJoinWithConstants(
lookupJoin.Input,
foundVals,
idxColType,
constColAlias,
)

lookupJoin.Input = join
lookupJoin.KeyCols = append(lookupJoin.KeyCols, constColID)
rightSideCols = append(rightSideCols, idxCol)
constFilters = append(constFilters, allFilters[allIdx])
// Either multiple constant values or a range were found, or the
// index column cannot be constrained. In all cases, we cannot
// continue on to the next index column, so we break out of the
// loop.
break
}

if shouldBuildMultiSpanLookupJoin {
// Some of the index columns were constrained to multiple constant values
// or a range expression, and we did not use the method
// constructJoinWithConstants to create a cross join as the input (either
// because it would have been incorrect or because it would have
// eliminated the opportunity to apply other optimizations such as
// locality optimized search; see above).
// Some of the index columns were constrained to multiple constant
// values or a range expression, so we cannot build a lookup join
// with KeyCols. As an alternative, we store all the filters needed
// for the lookup in LookupExpr, which will be used to construct
// spans at execution time. Each input row will generate multiple
// spans to lookup in the index.
//
// For example, if the index cols are (region, id) and the
// LookupExpr is `region in ('east', 'west') AND id = input.id`,
// each input row will generate two spans to be scanned in the
// lookup:
//
// As an alternative, we store all the filters needed for the lookup in
// LookupExpr, which will be used to construct spans at execution time.
// The result is that each input row will generate multiple spans to
// lookup in the index. For example, if the index cols are (region, id)
// and the LookupExpr is `region in ('east', 'west') AND id = input.id`,
// each input row will generate two spans to be scanned in the lookup:
// [/'east'/<id> - /'east'/<id>] [/'west'/<id> - /'west'/<id>]
// where <id> is the value of input.id for the current input row.
// [/'east'/<id> - /'east'/<id>]
// [/'west'/<id> - /'west'/<id>]
//
// Where <id> is the value of input.id for the current input row.
var eqFilters memo.FiltersExpr
extractEqualityFilter := func(leftCol, rightCol opt.ColumnID) memo.FiltersItem {
return memo.ExtractJoinEqualityFilter(
Expand All @@ -534,9 +514,9 @@ func (c *CustomFuncs) generateLookupJoinsImpl(

// Reset KeyCols since we're not using it anymore.
lookupJoin.KeyCols = opt.ColList{}
// Reset input since we don't need any constant values that may have
// been joined on the input above.
lookupJoin.Input = input
// Reset the input projections since we don't need any constant
// values projected.
inputProjections = nil
}

if len(lookupJoin.KeyCols) == 0 && len(lookupJoin.LookupExpr) == 0 {
Expand Down Expand Up @@ -805,9 +785,9 @@ func (c *CustomFuncs) findFiltersForIndexLookup(
constFilters = make(memo.FiltersExpr, 0, numIndexKeyCols-j)
}

// Ensure that the constant filter is an equality, IN or inequality
// expression. These are the only types of expressions currently supported
// by the lookupJoiner for building lookup spans.
// Construct a constant filter as an equality, IN expression, or
// inequality. These are the only types of expressions currently
// supported by the lookupJoiner for building lookup spans.
constFilter := filters[allIdx]
if !c.isCanonicalLookupJoinFilter(constFilter) {
if len(values) > 0 {
Expand Down Expand Up @@ -835,31 +815,36 @@ func (c *CustomFuncs) findFiltersForIndexLookup(
// isCanonicalLookupJoinFilter returns true for the limited set of expr's that are
// supported by the lookup joiner at execution time.
func (c *CustomFuncs) isCanonicalLookupJoinFilter(filter memo.FiltersItem) bool {
var checkExpr func(expr opt.Expr) bool
checkExpr = func(expr opt.Expr) bool {
isVar := func(expr opt.Expr) bool {
_, ok := expr.(*memo.VariableExpr)
return ok
}
var isCanonicalInequality func(expr opt.Expr) bool
isCanonicalInequality = func(expr opt.Expr) bool {
switch t := expr.(type) {
case *memo.RangeExpr:
return checkExpr(t.And)
return isCanonicalInequality(t.And)
case *memo.AndExpr:
return checkExpr(t.Left) && checkExpr(t.Right)
return isCanonicalInequality(t.Left) && isCanonicalInequality(t.Right)
case *memo.GeExpr:
return checkExpr(t.Left) && checkExpr(t.Right)
return isCanonicalInequality(t.Left) && isCanonicalInequality(t.Right)
case *memo.GtExpr:
return checkExpr(t.Left) && checkExpr(t.Right)
return isCanonicalInequality(t.Left) && isCanonicalInequality(t.Right)
case *memo.LeExpr:
return checkExpr(t.Left) && checkExpr(t.Right)
return isCanonicalInequality(t.Left) && isCanonicalInequality(t.Right)
case *memo.LtExpr:
return checkExpr(t.Left) && checkExpr(t.Right)
case *memo.VariableExpr:
return true
case *memo.EqExpr:
return checkExpr(t.Left) && checkExpr(t.Right)
case *memo.InExpr:
return checkExpr(t.Left) && memo.CanExtractConstTuple(t.Right)
return isCanonicalInequality(t.Left) && isCanonicalInequality(t.Right)
}
return opt.IsConstValueOp(expr)
return isVar(expr) || opt.IsConstValueOp(expr)
}
switch t := filter.Condition.(type) {
case *memo.EqExpr:
return isVar(t.Left) && opt.IsConstValueOp(t.Right)
case *memo.InExpr:
return isVar(t.Left) && memo.CanExtractConstTuple(t.Right)
default:
return isCanonicalInequality(t)
}
return checkExpr(filter.Condition)
}

// makeConstFilter builds a filter that constrains the given column to the given
Expand Down Expand Up @@ -1364,9 +1349,7 @@ func (c *CustomFuncs) findJoinFilterRange(
constraintCol := constraint.Columns.Get(0).ID()
// See comment in findFiltersForIndexLookup for why we check filter here.
// We only support 1 span in the execution engine so check that.
if constraintCol != col ||
constraint.Spans.Count() != 1 ||
!c.isCanonicalLookupJoinFilter(filters[filterIdx]) {
if constraintCol != col || constraint.Spans.Count() != 1 {
continue
}
return filterIdx, true
Expand Down
6 changes: 5 additions & 1 deletion pkg/sql/opt/xform/join_funcs_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -171,12 +171,16 @@ func TestCustomFuncs_isCanonicalFilter(t *testing.T) {
},
{name: "and-eq-lt",
filter: "i = 10 AND i < 10",
want: true,
want: false,
},
{name: "or-eq-lt",
filter: "i = 10 OR i < 10",
want: false,
},
{name: "and-in-lt",
filter: "i IN (10, 20, 30) AND i > 10",
want: false,
},
}
fut := xform.TestingIsCanonicalLookupJoinFilter
for _, tt := range tests {
Expand Down
10 changes: 6 additions & 4 deletions pkg/sql/opt/xform/testdata/external/tpce
Original file line number Diff line number Diff line change
Expand Up @@ -80,15 +80,17 @@ sort
│ │ │ │ │ │ ├── key columns: [41] = [21]
│ │ │ │ │ │ ├── lookup columns are key
│ │ │ │ │ │ ├── fd: (21)-->(26), (47)-->(49), (44)==(47), (47)==(44), (21)==(41), (41)==(21)
│ │ │ │ │ │ ├── inner-join (lookup broker)
│ │ │ │ │ │ ├── inner-join (lookup broker@broker_b_name_idx)
│ │ │ │ │ │ │ ├── columns: tr_s_symb:41!null tr_qty:42!null tr_bid_price:43!null tr_b_id:44!null b_id:47!null b_name:49!null
│ │ │ │ │ │ │ ├── key columns: [44] = [47]
│ │ │ │ │ │ │ ├── lookup expression
│ │ │ │ │ │ │ │ └── filters
│ │ │ │ │ │ │ │ ├── tr_b_id:44 = b_id:47 [outer=(44,47), constraints=(/44: (/NULL - ]; /47: (/NULL - ]), fd=(44)==(47), (47)==(44)]
│ │ │ │ │ │ │ │ └── b_name:49 IN ('Broker1', 'Broker10', 'Broker11', 'Broker12', 'Broker13', 'Broker14', 'Broker15', 'Broker16', 'Broker17', 'Broker18', 'Broker19', 'Broker2', 'Broker20', 'Broker21', 'Broker22', 'Broker23', 'Broker24', 'Broker25', 'Broker26', 'Broker27', 'Broker28', 'Broker29', 'Broker3', 'Broker30', 'Broker4', 'Broker5', 'Broker6', 'Broker7', 'Broker8', 'Broker9') [outer=(49), constraints=(/49: [/'Broker1' - /'Broker1'] [/'Broker10' - /'Broker10'] [/'Broker11' - /'Broker11'] [/'Broker12' - /'Broker12'] [/'Broker13' - /'Broker13'] [/'Broker14' - /'Broker14'] [/'Broker15' - /'Broker15'] [/'Broker16' - /'Broker16'] [/'Broker17' - /'Broker17'] [/'Broker18' - /'Broker18'] [/'Broker19' - /'Broker19'] [/'Broker2' - /'Broker2'] [/'Broker20' - /'Broker20'] [/'Broker21' - /'Broker21'] [/'Broker22' - /'Broker22'] [/'Broker23' - /'Broker23'] [/'Broker24' - /'Broker24'] [/'Broker25' - /'Broker25'] [/'Broker26' - /'Broker26'] [/'Broker27' - /'Broker27'] [/'Broker28' - /'Broker28'] [/'Broker29' - /'Broker29'] [/'Broker3' - /'Broker3'] [/'Broker30' - /'Broker30'] [/'Broker4' - /'Broker4'] [/'Broker5' - /'Broker5'] [/'Broker6' - /'Broker6'] [/'Broker7' - /'Broker7'] [/'Broker8' - /'Broker8'] [/'Broker9' - /'Broker9']; tight)]
│ │ │ │ │ │ │ ├── lookup columns are key
│ │ │ │ │ │ │ ├── fd: (47)-->(49), (44)==(47), (47)==(44)
│ │ │ │ │ │ │ ├── scan trade_request@trade_request_tr_b_id_tr_s_symb_idx
│ │ │ │ │ │ │ │ └── columns: tr_s_symb:41!null tr_qty:42!null tr_bid_price:43!null tr_b_id:44!null
│ │ │ │ │ │ │ └── filters
│ │ │ │ │ │ │ └── b_name:49 IN ('Broker1', 'Broker10', 'Broker11', 'Broker12', 'Broker13', 'Broker14', 'Broker15', 'Broker16', 'Broker17', 'Broker18', 'Broker19', 'Broker2', 'Broker20', 'Broker21', 'Broker22', 'Broker23', 'Broker24', 'Broker25', 'Broker26', 'Broker27', 'Broker28', 'Broker29', 'Broker3', 'Broker30', 'Broker4', 'Broker5', 'Broker6', 'Broker7', 'Broker8', 'Broker9') [outer=(49), constraints=(/49: [/'Broker1' - /'Broker1'] [/'Broker10' - /'Broker10'] [/'Broker11' - /'Broker11'] [/'Broker12' - /'Broker12'] [/'Broker13' - /'Broker13'] [/'Broker14' - /'Broker14'] [/'Broker15' - /'Broker15'] [/'Broker16' - /'Broker16'] [/'Broker17' - /'Broker17'] [/'Broker18' - /'Broker18'] [/'Broker19' - /'Broker19'] [/'Broker2' - /'Broker2'] [/'Broker20' - /'Broker20'] [/'Broker21' - /'Broker21'] [/'Broker22' - /'Broker22'] [/'Broker23' - /'Broker23'] [/'Broker24' - /'Broker24'] [/'Broker25' - /'Broker25'] [/'Broker26' - /'Broker26'] [/'Broker27' - /'Broker27'] [/'Broker28' - /'Broker28'] [/'Broker29' - /'Broker29'] [/'Broker3' - /'Broker3'] [/'Broker30' - /'Broker30'] [/'Broker4' - /'Broker4'] [/'Broker5' - /'Broker5'] [/'Broker6' - /'Broker6'] [/'Broker7' - /'Broker7'] [/'Broker8' - /'Broker8'] [/'Broker9' - /'Broker9']; tight)]
│ │ │ │ │ │ │ └── filters (true)
│ │ │ │ │ │ └── filters (true)
│ │ │ │ │ └── filters (true)
│ │ │ │ └── filters (true)
Expand Down
Loading

0 comments on commit 12c0e9b

Please sign in to comment.