Skip to content

Commit

Permalink
sql: Add column family support for secondary indexes
Browse files Browse the repository at this point in the history
This PR adds support for stored columns in secondary indexes to respect
column families.

Secondary indexes respect the family definitions applied
to tables, and break secondary index k/v pairs into mulitple
depending on the family and stored column configurations.

This PR adds and details an extension of the primary index
column family encoding for secondary indexes, and implements it.

This encoding was implemented by updating how secondary indexes are
encoded and decoded in `EncodeSecondaryIndex` and the fetchers.

This change will not be respected until all nodes in the cluster are
at least running version 20.1.

Release note (sql change): Allow stored columns in secondary indexes
to respect column family definitions upon they table they are on.
  • Loading branch information
rohany committed Dec 4, 2019
1 parent 9ea9c6a commit 1bcc7b8
Show file tree
Hide file tree
Showing 27 changed files with 1,205 additions and 519 deletions.
2 changes: 1 addition & 1 deletion docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,6 @@
<tr><td><code>trace.debug.enable</code></td><td>boolean</td><td><code>false</code></td><td>if set, traces for recent requests can be seen in the /debug page</td></tr>
<tr><td><code>trace.lightstep.token</code></td><td>string</td><td><code></code></td><td>if set, traces go to Lightstep using this token</td></tr>
<tr><td><code>trace.zipkin.collector</code></td><td>string</td><td><code></code></td><td>if set, traces go to the given Zipkin instance (example: '127.0.0.1:9411'); ignored if trace.lightstep.token is set</td></tr>
<tr><td><code>version</code></td><td>custom validation</td><td><code>19.2-3</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
<tr><td><code>version</code></td><td>custom validation</td><td><code>19.2-4</code></td><td>set the active cluster version in the format '<major>.<minor>'</td></tr>
</tbody>
</table>
58 changes: 50 additions & 8 deletions docs/tech-notes/encoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,12 @@ Users also specify whether a secondary index should be unique. Unique
secondary indexes constrain the table data not to have two rows where,
for each indexed column, the data therein are non-null and equal.

As of #42073 (after version 19.2), secondary indexes have been extended to
include support for column families. These families are the same as the ones
defined upon the table. Families will apply to the stored columns in the index.
Like in primary indexes, column family 0 on a secondary index will always be
present for a row so that each row in the index has at least one k/v entry.

### Key encoding

The main encoding function for secondary indexes is
Expand All @@ -299,8 +305,8 @@ mirroring the primary index encoding:
5. If the index is non-unique or the row has a NULL in an indexed
column, and the index uses the old format for stored columns, data
from where the row intersects the stored columns
6. Zero (instead of the column family ID; all secondary KV pairs are
sentinels).
6. The column family ID.
7. When the previous field is nonzero (non-sentinel), its length in bytes.

Unique indexes relegate the data in extra columns to KV values so that
the KV layer detects constraint violations. The special case for an
Expand All @@ -311,21 +317,37 @@ achieved by including the non-indexed primary key data. For the sake of
simplicity, data in stored columns are also included.

### Value encoding
KV values for secondary indexes are encoded using the following rules:

KV values for secondary indexes have value type `BYTES` and consist of:
If the value corresponds to column family 0:

The KV value will have value type bytes, and will consist of
1. If the index is unique, data from where the row intersects the
non-indexed primary key (implicit) columns, encoded as in the KV key
2. If the index is unique, and the index uses the old format for stored
columns, data from where the row intersects the stored columns,
encoded as in the KV key
3. If needed, `TUPLE`-encoded bytes for non-null composite and stored
column data (new format).
column data in family 0 (new format).

Since column family 0 is always included, it contains extra information
that the index stores in the value, such as composite column values and
stored primary key columns. All of these fields are optional, so the
`BYTES` value may be empty. Note that, in a unique index, rows with a NULL
in an indexed column have their implicit column data stored in both the
KV key and the KV value. (Ditto for stored column data in the old format.)

For indexes with more than one column family, the remaining column families'
KV values will have value type `TUPLE` and will consist of all stored
columns in that family in the `TUPLE` encoded format.

All of these fields are optional, so the `BYTES` value may be empty.
Note that, in a unique index, rows with a NULL in an indexed column have
their implicit column data stored in both the KV key and the KV value.
(Ditto for stored column data in the old format.)
### Backwards Compatibility With Indexes Encoded Without Families

Index descriptors hold on to a version bit that denotes what encoding
format the descriptor was written in. The default value of the bit denotes
the original secondary index encoding, and indexes created when all
nodes in a cluster are version 20.1 or greater will have the version representing
secondary indexes with column families.

### Example dump

Expand Down Expand Up @@ -453,6 +475,26 @@ Index ID 3 is the non-unique secondary index `i3`.
^------ ^ ^-
Indexed column Implicit column BYTES

### Example dump with families
```
CREATE TABLE t (
a INT, b INT, c INT, d INT, e INT, f INT,
PRIMARY KEY (a, b),
UNIQUE INDEX i (d, e) STORING (c, f),
FAMILY (a, b, c), FAMILY (d, e), FAMILY (f)
);
INSERT INTO t VALUES (1, 2, 3, 4, 5, 6);
/Table/52/2/4/5/0/1572546219.386986000,0 : 0xBDD6D93003898A3306
^-- ^ ^_^_______
Indexed cols Column family 0 BYTES Stored PK cols + column c
// Notice that /Table/52/2/4/5/1/1/ is not present, because these values are already indexed
/Table/52/2/4/5/2/1/1572546219.386986000,0 : 0x46CC99AE0A630C
^__ ^_^___
Column Family 2 TUPLE column f
```

### Composite encoding

Secondary indexes use key encoding for all indexed columns, implicit
Expand Down
2 changes: 1 addition & 1 deletion pkg/ccl/importccl/read_import_mysql.go
Original file line number Diff line number Diff line change
Expand Up @@ -448,7 +448,7 @@ func mysqlTableToCockroach(
stmt.Defs = append(stmt.Defs, c)
}

desc, err := MakeSimpleTableDescriptor(evalCtx.Ctx(), nil, stmt, parentID, id, fks, time.WallTime)
desc, err := MakeSimpleTableDescriptor(evalCtx.Ctx(), evalCtx.Settings, stmt, parentID, id, fks, time.WallTime)
if err != nil {
return nil, nil, err
}
Expand Down
2 changes: 1 addition & 1 deletion pkg/server/settingsworker.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ func (s *Server) refreshSettings() {
{
types := []types.T{tbl.Columns[0].Type}
nameRow := make([]sqlbase.EncDatum, 1)
_, matches, err := sqlbase.DecodeIndexKey(tbl, &tbl.PrimaryIndex, types, nameRow, nil, kv.Key)
_, matches, _, err := sqlbase.DecodeIndexKey(tbl, &tbl.PrimaryIndex, types, nameRow, nil, kv.Key)
if err != nil {
return errors.Wrap(err, "failed to decode key")
}
Expand Down
9 changes: 9 additions & 0 deletions pkg/settings/cluster/cockroach_versions.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ const (
VersionStart20_1
VersionContainsEstimatesCounter
VersionChangeReplicasDemotion
VersionSecondaryIndexColumnFamilies

// Add new versions here (step one of two).

Expand Down Expand Up @@ -320,6 +321,14 @@ var versionsSingleton = keyedVersions([]keyedVersion{
Key: VersionChangeReplicasDemotion,
Version: roachpb.Version{Major: 19, Minor: 2, Unstable: 3},
},
{
// VersionSecondaryIndexColumnFamilies is https://github.com/cockroachdb/cockroach/pull/42073.
//
// It allows secondary indexes to respect table level column family definitions.
Key: VersionSecondaryIndexColumnFamilies,
Version: roachpb.Version{Major: 19, Minor: 2, Unstable: 4},
},

// Add new versions here (step two of two).

})
Expand Down
5 changes: 3 additions & 2 deletions pkg/settings/cluster/versionkey_string.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

59 changes: 35 additions & 24 deletions pkg/sql/colencoding/key_encoding.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ func DecodeIndexKeyToCols(
types []types.T,
colDirs []sqlbase.IndexDescriptor_Direction,
key roachpb.Key,
) (remainingKey roachpb.Key, matches bool, _ error) {
) (remainingKey roachpb.Key, matches bool, foundNull bool, _ error) {
var decodedTableID sqlbase.ID
var decodedIndexID sqlbase.IndexID
var err error
Expand All @@ -54,24 +54,28 @@ func DecodeIndexKeyToCols(
if i != 0 {
key, decodedTableID, decodedIndexID, err = sqlbase.DecodeTableIDIndexID(key)
if err != nil {
return nil, false, err
return nil, false, false, err
}
if decodedTableID != ancestor.TableID || decodedIndexID != ancestor.IndexID {
// We don't match. Return a key with the table ID / index ID we're
// searching for, so the caller knows what to seek to.
curPos := len(origKey) - len(key)
key = sqlbase.EncodeTableIDIndexID(origKey[:curPos], ancestor.TableID, ancestor.IndexID)
return key, false, nil
return key, false, false, nil
}
}

length := int(ancestor.SharedPrefixLen)
key, err = DecodeKeyValsToCols(vecs, idx, indexColIdx[:length], types[:length], colDirs[:length],
// We don't care about whether this call to DecodeKeyVals found a null or not, because
// it is a interleaving ancestor.
var isNull bool
key, isNull, err = DecodeKeyValsToCols(vecs, idx, indexColIdx[:length], types[:length], colDirs[:length],
nil /* unseen */, key)
if err != nil {
return nil, false, err
return nil, false, false, err
}
indexColIdx, types, colDirs = indexColIdx[length:], types[length:], colDirs[length:]
foundNull = foundNull || isNull

// Consume the interleaved sentinel.
var ok bool
Expand All @@ -81,38 +85,40 @@ func DecodeIndexKeyToCols(
// one so the caller can seek to it.
curPos := len(origKey) - len(key)
key = encoding.EncodeInterleavedSentinel(origKey[:curPos])
return key, false, nil
return key, false, false, nil
}
}

key, decodedTableID, decodedIndexID, err = sqlbase.DecodeTableIDIndexID(key)
if err != nil {
return nil, false, err
return nil, false, false, err
}
if decodedTableID != desc.ID || decodedIndexID != index.ID {
// We don't match. Return a key with the table ID / index ID we're
// searching for, so the caller knows what to seek to.
curPos := len(origKey) - len(key)
key = sqlbase.EncodeTableIDIndexID(origKey[:curPos], desc.ID, index.ID)
return key, false, nil
return key, false, false, nil
}
}

key, err = DecodeKeyValsToCols(vecs, idx, indexColIdx, types, colDirs, nil /* unseen */, key)
var isNull bool
key, isNull, err = DecodeKeyValsToCols(vecs, idx, indexColIdx, types, colDirs, nil /* unseen */, key)
if err != nil {
return nil, false, err
return nil, false, false, err
}
foundNull = foundNull || isNull

// We're expecting a column family id next (a varint). If
// interleavedSentinel is actually next, then this key is for a child
// table.
if _, ok := encoding.DecodeIfInterleavedSentinel(key); ok {
curPos := len(origKey) - len(key)
key = encoding.EncodeNullDescending(origKey[:curPos])
return key, false, nil
return key, false, false, nil
}

return key, true, nil
return key, true, foundNull, nil
}

// DecodeKeyValsToCols decodes the values that are part of the key, writing the
Expand All @@ -123,6 +129,7 @@ func DecodeIndexKeyToCols(
// i will be removed from the set to facilitate tracking whether or not columns
// have been observed during decoding.
// See the analog in sqlbase/index_encoding.go.
// DecodeKeyValsToCols additionally returns whether a NULL was encountered when decoding.
func DecodeKeyValsToCols(
vecs []coldata.Vec,
idx uint16,
Expand All @@ -131,7 +138,8 @@ func DecodeKeyValsToCols(
directions []sqlbase.IndexDescriptor_Direction,
unseen *util.FastIntSet,
key []byte,
) ([]byte, error) {
) ([]byte, bool, error) {
foundNull := false
for j := range types {
enc := sqlbase.IndexDescriptor_ASC
if directions != nil {
Expand All @@ -141,33 +149,36 @@ func DecodeKeyValsToCols(
i := indexColIdx[j]
if i == -1 {
// Don't need the coldata - skip it.
key, err = skipTableKey(&types[j], key, enc)
key, err = SkipTableKey(&types[j], key, enc)
} else {
if unseen != nil {
unseen.Remove(i)
}
key, err = decodeTableKeyToCol(vecs[i], idx, &types[j], key, enc)
var isNull bool
key, isNull, err = decodeTableKeyToCol(vecs[i], idx, &types[j], key, enc)
foundNull = isNull || foundNull
}
if err != nil {
return nil, err
return nil, false, err
}
}
return key, nil
return key, foundNull, nil
}

// decodeTableKeyToCol decodes a value encoded by EncodeTableKey, writing the result
// to the idx'th slot of the input colexec.Vec.
// See the analog, DecodeTableKey, in sqlbase/column_type_encoding.go.
// decodeTableKeyToCol also returns whether or not the decoded value was NULL.
func decodeTableKeyToCol(
vec coldata.Vec, idx uint16, valType *types.T, key []byte, dir sqlbase.IndexDescriptor_Direction,
) ([]byte, error) {
) ([]byte, bool, error) {
if (dir != sqlbase.IndexDescriptor_ASC) && (dir != sqlbase.IndexDescriptor_DESC) {
return nil, errors.AssertionFailedf("invalid direction: %d", log.Safe(dir))
return nil, false, errors.AssertionFailedf("invalid direction: %d", log.Safe(dir))
}
var isNull bool
if key, isNull = encoding.DecodeIfNull(key); isNull {
vec.Nulls().SetNull(idx)
return key, nil
return key, true, nil
}
var rkey []byte
var err error
Expand Down Expand Up @@ -236,16 +247,16 @@ func decodeTableKeyToCol(
}
vec.Timestamp()[idx] = t
default:
return rkey, errors.AssertionFailedf("unsupported type %+v", log.Safe(valType))
return rkey, false, errors.AssertionFailedf("unsupported type %+v", log.Safe(valType))
}
return rkey, err
return rkey, false, err
}

// skipTableKey skips a value of type valType in key, returning the remainder
// SkipTableKey skips a value of type valType in key, returning the remainder
// of the key.
// TODO(jordan): each type could be optimized here.
// TODO(jordan): should use this approach in the normal row fetcher.
func skipTableKey(
func SkipTableKey(
valType *types.T, key []byte, dir sqlbase.IndexDescriptor_Direction,
) ([]byte, error) {
if (dir != sqlbase.IndexDescriptor_ASC) && (dir != sqlbase.IndexDescriptor_DESC) {
Expand Down
Loading

0 comments on commit 1bcc7b8

Please sign in to comment.