Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
100242: xform: constrain index scans on computed columns in more cases r=msirek a=msirek

IN list predicates on index columns which are the inputs to computed
column expressions can't currently constrain the scan if the computed                                                        
columns are a prefix of the index key, e.g.      
```sql
CREATE TABLE t1 (
    id VARCHAR(32) NOT NULL,
    b INT2 NOT NULL AS (CAST(substr(id, 2:::INT8, 1:::INT8) AS INT8)) VIRTUAL,
    CONSTRAINT pkey PRIMARY KEY (b ASC, id ASC));

-- This uses a constrained scan.
SELECT * FROM t1 WHERE id IN ('12345');

-- This does not.
SELECT * FROM t1 WHERE id IN ('12345', '6789');
```

Function `computedColFilters` derives filters on computed columns by
calling `findConstantFilterCols` on all required and optional filters to
find the constants in single-column single-span constraints, then
converts them into single-column single-span predicates on computed
columns.

This change introduces a new, similar function `combineComputedColFilters` 
which combines the steps of finding span key values and building up new
predicates based on them within the same nested loops, so if the derived
predicate needs to be AND'ed with the span keys they were derived from
or nested under an OR term, this can be done.

New terms can be generated from spans with a single key based on one or
more columns, though each `FiltersItem` is processed in isolation, so
there are cases `computedColFilters` can handle which
`combineComputedColFilters` can't.

Fixes: #100206
Fixes: #99557
Fixes: #83343

Release note (performance improvement): This adds support for
constrained scans using computed columns which are part of an index
when there is an IN list predicate on the columns that appears
in the computed column expression.

----
opt: track columns in computed column expressions in metadata

This commit adds tracking of columns present in computed column
expressions to table metadata, tracked in `ColsInComputedColsExpressions`.

Function `combineComputedColFilters` is updated to skip checking of
constraints which do not involve columns referenced in any computed column
expression of the table.

Release note: None

----
sql: add optimizer_use_improved_computed_column_filters_derivation

This commit adds the `optimizer_use_improved_computed_column_filters_derivation`
session setting that defaults to `true`. When `true`, the optimizer will
derive filters on computed columns in more cases. Previously, filters
could only be derived when the expression defining the computed column
involved a single column and that column was equated with a single
constant value in a WHERE clause filter.

Release note: None

103323: copy: fix vectorized copy handling of column families r=yuzefovich a=cucaroach

Before handing off KV batches to the KV layer we sort them but when
multiple column families are in use this sorting garbles the kys making
reusing it for the prefix keys across families invalid. Fix by saving
a copy of the keys when finishing the first family before sorting.

In order to test this improve the copy-from-kvtrace command to sort
the KVs so we can get consistent results from row vs. vector.

Fixes: #103220

Release note (bug fix): COPY in 23.1.0 and beta versions would incorrectly
encode data with multiple column families. The data must be dropped and
re-imported to be encoded correctly.


Co-authored-by: Mark Sirek <[email protected]>
Co-authored-by: Tommy Reilly <[email protected]>
  • Loading branch information
3 people committed May 15, 2023
3 parents f56f57e + 952e6ad + 5be7f64 commit 190aa54
Show file tree
Hide file tree
Showing 24 changed files with 1,569 additions and 854 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2616,40 +2616,35 @@ OFFSET
----
│ ├── scan regional_by_row_table_as4@a_idx
│ │ ├── constraint: /11/10/9
│ │ │ ├── [/'ap-southeast-2'/1 - /'ap-southeast-2'/2]
│ │ │ ├── [/'ap-southeast-2'/4 - /'ap-southeast-2'/6]
│ │ │ ├── [/'ap-southeast-2'/8 - /'ap-southeast-2'/8]
│ │ │ ├── [/'ap-southeast-2'/10 - /'ap-southeast-2'/12]
│ │ │ ├── [/'ap-southeast-2'/14 - /'ap-southeast-2'/19]
│ │ │ ├── [/'ap-southeast-2'/22 - /'ap-southeast-2'/25]
│ │ │ ├── [/'ap-southeast-2'/28 - /'ap-southeast-2'/28]
│ │ │ ├── [/'ap-southeast-2'/6 - /'ap-southeast-2'/6]
│ │ │ ├── [/'ap-southeast-2'/12 - /'ap-southeast-2'/12]
│ │ │ ├── [/'ap-southeast-2'/15 - /'ap-southeast-2'/15]
│ │ │ ├── [/'ap-southeast-2'/18 - /'ap-southeast-2'/18]
│ │ │ ├── [/'ap-southeast-2'/24 - /'ap-southeast-2'/24]
│ │ │ ├── [/'ap-southeast-2'/30 - /'ap-southeast-2'/30]
│ │ │ ├── [/'ap-southeast-2'/33 - /'ap-southeast-2'/34]
│ │ │ └── [/'ap-southeast-2'/39 - /'ap-southeast-2'/40]
│ │ │ ├── [/'ap-southeast-2'/33 - /'ap-southeast-2'/33]
│ │ │ └── [/'ap-southeast-2'/39 - /'ap-southeast-2'/39]
│ │ ├── limit: 5
│ │ └── flags: force-index=a_idx
│ └── scan regional_by_row_table_as4@a_idx
│ ├── constraint: /16/15/14
│ │ ├── [/'ca-central-1'/1 - /'ca-central-1'/2]
│ │ ├── [/'ca-central-1'/4 - /'ca-central-1'/6]
│ │ ├── [/'ca-central-1'/8 - /'ca-central-1'/8]
│ │ ├── [/'ca-central-1'/10 - /'ca-central-1'/12]
│ │ ├── [/'ca-central-1'/14 - /'ca-central-1'/19]
│ │ ├── [/'ca-central-1'/22 - /'ca-central-1'/25]
│ │ ├── [/'ca-central-1'/1 - /'ca-central-1'/1]
│ │ ├── [/'ca-central-1'/4 - /'ca-central-1'/4]
│ │ ├── [/'ca-central-1'/10 - /'ca-central-1'/10]
│ │ ├── [/'ca-central-1'/16 - /'ca-central-1'/16]
│ │ ├── [/'ca-central-1'/19 - /'ca-central-1'/19]
│ │ ├── [/'ca-central-1'/22 - /'ca-central-1'/22]
│ │ ├── [/'ca-central-1'/25 - /'ca-central-1'/25]
│ │ ├── [/'ca-central-1'/28 - /'ca-central-1'/28]
│ │ ├── [/'ca-central-1'/30 - /'ca-central-1'/30]
│ │ ├── [/'ca-central-1'/33 - /'ca-central-1'/34]
│ │ ├── [/'ca-central-1'/39 - /'ca-central-1'/40]
│ │ ├── [/'us-east-1'/1 - /'us-east-1'/2]
│ │ ├── [/'us-east-1'/4 - /'us-east-1'/6]
│ │ ├── [/'ca-central-1'/34 - /'ca-central-1'/34]
│ │ ├── [/'ca-central-1'/40 - /'ca-central-1'/40]
│ │ ├── [/'us-east-1'/2 - /'us-east-1'/2]
│ │ ├── [/'us-east-1'/5 - /'us-east-1'/5]
│ │ ├── [/'us-east-1'/8 - /'us-east-1'/8]
│ │ ├── [/'us-east-1'/10 - /'us-east-1'/12]
│ │ ├── [/'us-east-1'/14 - /'us-east-1'/19]
│ │ ├── [/'us-east-1'/22 - /'us-east-1'/25]
│ │ ├── [/'us-east-1'/28 - /'us-east-1'/28]
│ │ ├── [/'us-east-1'/30 - /'us-east-1'/30]
│ │ ├── [/'us-east-1'/33 - /'us-east-1'/34]
│ │ └── [/'us-east-1'/39 - /'us-east-1'/40]
│ │ ├── [/'us-east-1'/11 - /'us-east-1'/11]
│ │ ├── [/'us-east-1'/14 - /'us-east-1'/14]
│ │ ├── [/'us-east-1'/17 - /'us-east-1'/17]
│ │ └── [/'us-east-1'/23 - /'us-east-1'/23]
│ ├── limit: 5
│ └── flags: force-index=a_idx
└── aggregations
Expand Down
25 changes: 23 additions & 2 deletions pkg/sql/colenc/encode.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ type BatchEncoder struct {
// Slice of keys we can reuse across each call to Prepare and between each
// column family.
keys []roachpb.Key
// Slice of keys prefixes so we don't have to re-encode PK for each family.
savedPrefixes []roachpb.Key
// Slice of value we can reuse across each call to Prepare and between each
// column family.
values [][]byte
Expand Down Expand Up @@ -205,6 +207,8 @@ func (b *BatchEncoder) resetBuffers() {
b.extraKeys[row] = b.extraKeys[row][:0]
b.lastColIDs[row] = 0
}

b.savedPrefixes = nil
}

func intMax(a, b int) int {
Expand Down Expand Up @@ -389,6 +393,14 @@ func (b *BatchEncoder) encodePK(ctx context.Context, ind catalog.Index) error {
}
}

// If we have more than one family we have to copy the keys in order to
// re-use their prefixes because the putter routines will sort
// and mutate the kys slice.
if len(families) > 1 && b.savedPrefixes == nil {
b.savedPrefixes = make([]roachpb.Key, len(kys))
copy(b.savedPrefixes, kys)
}

// TODO(cucaroach): For updates overwrite makes this a plain put.
b.p.CPutTuplesEmpty(kys, values)

Expand Down Expand Up @@ -537,6 +549,15 @@ func (b *BatchEncoder) encodeSecondaryIndexWithFamilies(
continue
}
}

// If we have more than one family we have to copy the keys in order to
// re-use their prefixes because the putter routines will sort
// and mutate the kys slice.
if len(familyIDs) > 1 && b.savedPrefixes == nil {
b.savedPrefixes = make([]roachpb.Key, len(kys))
copy(b.savedPrefixes, kys)
}

// If we are looking at family 0, encode the data as BYTES, as it might
// include encoded primary key columns. For other families,
// use the tuple encoding for the value.
Expand Down Expand Up @@ -651,8 +672,8 @@ func (b *BatchEncoder) initFamily(familyIndex, familyID int) {
continue
}
offset := row * b.keyBufSize
// Save old slice.
prefix := kys[row][:b.keyPrefixOffsets[row]]
// Get a slice pointing to prefix bytes.
prefix := b.savedPrefixes[row][:b.keyPrefixOffsets[row]]
// Set slice to new space.
kys[row] = keyBuf[offset : offset : b.keyBufSize+offset]
// Append prefix.
Expand Down
21 changes: 21 additions & 0 deletions pkg/sql/colenc/encode_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,27 @@ func TestColFamDropPKNot(t *testing.T) {
checkEqual(t, kvs1, kvs2)
}

func TestColFamilies(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)
ctx := context.Background()
s, db, kvdb := serverutils.StartServer(t, testArgs)
defer s.Stopper().Stop(ctx)
r := sqlutils.MakeSQLRunner(db)
r.Exec(t, "CREATE TABLE t (id INT PRIMARY KEY, c1 INT NOT NULL, c2 INT NOT NULL, FAMILY cf1 (id, c1), FAMILY cf2(c2))")
sv := &s.ClusterSettings().SV
desc := desctestutils.TestingGetTableDescriptor(
kvdb, keys.SystemSQLCodec, "defaultdb", "public", "t")

row1 := []tree.Datum{tree.NewDInt(2), tree.NewDInt(1), tree.NewDInt(2)}
row2 := []tree.Datum{tree.NewDInt(1), tree.NewDInt(2), tree.NewDInt(1)}
kvs1, err1 := buildRowKVs([]tree.Datums{row1, row2}, desc, desc.PublicColumns(), sv)
require.NoError(t, err1)
kvs2, err2 := buildVecKVs([]tree.Datums{row1, row2}, desc, desc.PublicColumns(), sv)
require.NoError(t, err2)
checkEqual(t, kvs1, kvs2)
}

// TestColIDToRowIndexNull tests case where insert cols is subset of public columns.
func TestColIDToRowIndexNull(t *testing.T) {
defer leaktest.AfterTest(t)()
Expand Down
8 changes: 5 additions & 3 deletions pkg/sql/copy/copy_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"math/rand"
"net/url"
"regexp"
"sort"
"strconv"
"strings"
"testing"
Expand Down Expand Up @@ -155,16 +156,17 @@ func TestDataDriven(t *testing.T) {
}()
require.NoError(t, err)
vals := make([]driver.Value, 1)
var results strings.Builder
var results []string
for err = nil; err == nil; {
err = rows.Next(vals)
if err == io.EOF {
break
}
require.NoError(t, err)
results.WriteString(fmt.Sprintf("%v\n", vals[0]))
results = append(results, fmt.Sprintf("%v", vals[0]))
}
return results.String()
sort.Strings(results)
return strings.Join(results, "\n")
}
case "copy-to", "copy-to-error":
var buf bytes.Buffer
Expand Down
52 changes: 52 additions & 0 deletions pkg/sql/copy/testdata/copy_from
Original file line number Diff line number Diff line change
Expand Up @@ -660,3 +660,55 @@ COPY tcomp FROM STDIN
1 (1, 2)
----
CPut /Table/<>/1/1/0 -> /TUPLE/

# Regression test for #103220
exec-ddl
CREATE TABLE tfam (id INT PRIMARY KEY, c1 INT NOT NULL, c2 INT NOT NULL, FAMILY cf1 (id, c1), FAMILY cf2(c2))
----

copy-from-kvtrace
COPY tfam FROM STDIN QUOTE '"' CSV
2,1,2
1,2,1
----
CPut /Table/<>/1/1/0 -> /TUPLE/2:2:Int/2
CPut /Table/<>/1/1/1/1 -> /INT/1
CPut /Table/<>/1/2/0 -> /TUPLE/2:2:Int/1
CPut /Table/<>/1/2/1/1 -> /INT/2

query
SELECT * FROM tfam
----
1|2|1
2|1|2


exec-ddl
CREATE TABLE tfam2 (id INT PRIMARY KEY, c1 INT NOT NULL, c2 INT NOT NULL, c3 INT NOT NULL, FAMILY cf1 (id, c1), FAMILY cf2(c2), FAMILY cf3(c3), INDEX(c2,c1,c3))
----

copy-from-kvtrace
COPY tfam2 FROM STDIN QUOTE '"' CSV
2,1,2,3
1,2,1,4
3,5,2,1
----
CPut /Table/<>/1/1/0 -> /TUPLE/2:2:Int/2
CPut /Table/<>/1/1/1/1 -> /INT/1
CPut /Table/<>/1/1/2/1 -> /INT/4
CPut /Table/<>/1/2/0 -> /TUPLE/2:2:Int/1
CPut /Table/<>/1/2/1/1 -> /INT/2
CPut /Table/<>/1/2/2/1 -> /INT/3
CPut /Table/<>/1/3/0 -> /TUPLE/2:2:Int/5
CPut /Table/<>/1/3/1/1 -> /INT/2
CPut /Table/<>/1/3/2/1 -> /INT/1
InitPut /Table/<>/2/1/2/4/1/0 -> /BYTES/
InitPut /Table/<>/2/2/1/3/2/0 -> /BYTES/
InitPut /Table/<>/2/2/5/1/3/0 -> /BYTES/

query
SELECT * FROM tfam2
----
1|2|1|4
2|1|2|3
3|5|2|1
4 changes: 4 additions & 0 deletions pkg/sql/exec_util.go
Original file line number Diff line number Diff line change
Expand Up @@ -3542,6 +3542,10 @@ func (m *sessionDataMutator) SetOptimizerHoistUncorrelatedEqualitySubqueries(val
m.data.OptimizerHoistUncorrelatedEqualitySubqueries = val
}

func (m *sessionDataMutator) SetOptimizerUseImprovedComputedColumnFiltersDerivation(val bool) {
m.data.OptimizerUseImprovedComputedColumnFiltersDerivation = val
}

func (m *sessionDataMutator) SetEnableCreateStatsUsingExtremes(val bool) {
m.data.EnableCreateStatsUsingExtremes = val
}
Expand Down
Loading

0 comments on commit 190aa54

Please sign in to comment.