Skip to content

Commit

Permalink
db: deprecate older format versions
Browse files Browse the repository at this point in the history
This change deprecates format versions below `FormatFlushableIngest`
(the 23.1 version) and sstable formats below `TableFormatPebblev1`.

As part of this change, we remove code that deals with split keys;
further simplifications will be made separately (e.g. removing atomic
unit logic, simplifying the truncation iterator).

We also remove all code related to the `CURRENT` file.

The plan is to merge this after we tag a `v1.0.0` release, which will
for the time being be the recommended version series for users other
than CockroachDB.

Informs #3064
  • Loading branch information
RaduBerinde committed Dec 19, 2023
1 parent 48b54c2 commit bbf7dc4
Show file tree
Hide file tree
Showing 81 changed files with 577 additions and 2,367 deletions.
76 changes: 43 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,17 +86,22 @@ differences.

## RocksDB Compatibility

Pebble strives for forward compatibility with RocksDB 6.2.1 (the latest
version of RocksDB used by CockroachDB). Forward compatibility means
that a DB generated by RocksDB can be used by Pebble. Currently, Pebble
provides bidirectional compatibility with RocksDB (a Pebble generated DB
can be used by RocksDB) when using its FormatMostCompatible format. New
functionality that is backwards incompatible is gated behind new format
major versions. In general, Pebble only provides compatibility with the
subset of functionality and configuration used by CockroachDB. The scope
of RocksDB functionality and configuration is too large to adequately
test and document all the incompatibilities. The list below contains
known incompatibilities.
Pebble strives for forward compatibility with RocksDB 6.2.1 (the latest version
of RocksDB used by CockroachDB). Forward compatibility means that a DB generated
by RocksDB 6.2.1 can be upgraded for use by Pebble. Pebble versions in the `v1`
series may open DBs generated by RocksDB 6.2.1. Since its introduction, Pebble
has adopted various backwards-incompatible format changes that are gated behind
new 'format major versions'. The Pebble `master` branch does not support opening
DBs generated by RocksDB. DBs generated by RocksDB may only be used with recent
versions of Pebble after migrating them through format major version upgrades
using previous versions of Pebble. See the below section of format major
versions.

Even the RocksDB-compatible versions of Pebble only provide compatibility with
the subset of functionality and configuration used by CockroachDB. The scope of
RocksDB functionality and configuration is too large to adequately test and
document all the incompatibilities. The list below contains known
incompatibilities.

* Pebble's use of WAL recycling is only compatible with RocksDB's
`kTolerateCorruptedTailRecords` WAL recovery mode. Older versions of
Expand All @@ -119,9 +124,14 @@ known incompatibilities.

Over time Pebble has introduced new physical file formats. Backwards
incompatible changes are made through the introduction of 'format major
versions'. By default, when Pebble opens a database, it defaults to
`FormatMostCompatible`. This version is bi-directionally compatible with RocksDB
6.2.1 (with the caveats described above).
versions'. By default, when Pebble opens a database, it defaults to the lowest
supported version. In `v1`, this is `FormatMostCompatible`, which is
bi-directionally compatible with RocksDB 6.2.1 (with the caveats described
above).

Databases created by RocksDB or Pebble versions `v1` and earlier must be upgraded
to a compatible format major version before running newer Pebble versions. Newer
Pebble versions will refuse to open databases in no longer supported formats.

To opt into new formats, a user may set `FormatMajorVersion` on the
[`Options`](https://pkg.go.dev/github.com/cockroachdb/pebble#Options)
Expand All @@ -132,24 +142,25 @@ upgrade the format major version at runtime using
Format major version upgrades are permanent; There is no option to
return to an earlier format.

The table below outlines the history of format major versions:

| Name | Value | Migration |
|------------------------------------|-------|------------|
| FormatMostCompatible | 1 | No |
| FormatVersioned | 3 | No |
| FormatSetWithDelete | 4 | No |
| FormatBlockPropertyCollector | 5 | No |
| FormatSplitUserKeysMarked | 6 | Background |
| FormatSplitUserKeysMarkedCompacted | 7 | Blocking |
| FormatRangeKeys | 8 | No |
| FormatMinTableFormatPebblev1 | 9 | No |
| FormatPrePebblev1Marked | 10 | Background |
| FormatSSTableValueBlocks | 12 | No |
| FormatFlushableIngest | 13 | No |
| FormatPrePebblev1MarkedCompacted | 14 | Blocking |
| FormatDeleteSizedAndObsolete | 15 | No |
| FormatVirtualSSTables | 16 | No |
The table below outlines the history of format major versions, along with what
range of Pebble versions support that format.

| Name | Value | Migration | Pebble support |
|------------------------------------|-------|------------|----------------|
| FormatMostCompatible | 1 | No | v1 |
| FormatVersioned | 3 | No | v1 |
| FormatSetWithDelete | 4 | No | v1 |
| FormatBlockPropertyCollector | 5 | No | v1 |
| FormatSplitUserKeysMarked | 6 | Background | v1 |
| FormatSplitUserKeysMarkedCompacted | 7 | Blocking | v1 |
| FormatRangeKeys | 8 | No | v1 |
| FormatMinTableFormatPebblev1 | 9 | No | v1 |
| FormatPrePebblev1Marked | 10 | Background | v1 |
| FormatSSTableValueBlocks | 12 | No | v1 |
| FormatFlushableIngest | 13 | No | v1, master |
| FormatPrePebblev1MarkedCompacted | 14 | Blocking | v1, master |
| FormatDeleteSizedAndObsolete | 15 | No | v1, master |
| FormatVirtualSSTables | 16 | No | v1, master |

Upgrading to a format major version with 'Background' in the migration
column may trigger background activity to rewrite physical file
Expand All @@ -172,7 +183,6 @@ versions for CockroachDB releases.
| 22.2 | FormatMostCompatible | FormatPrePebblev1Marked |
| 23.1 | FormatSplitUserKeysMarkedCompacted | FormatFlushableIngest |
| 23.2 | FormatSplitUserKeysMarkedCompacted | FormatVirtualSSTables |
| 24.1 plan | FormatSSTableValueBlocks | |

## Pedigree

Expand Down
6 changes: 0 additions & 6 deletions batch.go
Original file line number Diff line number Diff line change
Expand Up @@ -485,9 +485,6 @@ func (b *Batch) refreshMemTableSize() error {
}
b.memTableSize += memTableEntrySize(len(key), len(value))
}
if b.countRangeKeys > 0 && b.minimumFormatMajorVersion < FormatRangeKeys {
b.minimumFormatMajorVersion = FormatRangeKeys
}
return nil
}

Expand Down Expand Up @@ -968,9 +965,6 @@ func (b *Batch) rangeKeySetDeferred(startLen, internalValueLen int) *DeferredBat

func (b *Batch) incrementRangeKeysCount() {
b.countRangeKeys++
if b.minimumFormatMajorVersion < FormatRangeKeys {
b.minimumFormatMajorVersion = FormatRangeKeys
}
if b.index != nil {
b.rangeKeys = nil
b.rangeKeysSeqNum = 0
Expand Down
7 changes: 1 addition & 6 deletions checkpoint.go
Original file line number Diff line number Diff line change
Expand Up @@ -411,17 +411,12 @@ func (d *DB) writeCheckpointManifest(
return err
}

// Recent format versions use an atomic marker for setting the
// active manifest. Older versions use the CURRENT file. The
// setCurrentFunc function will return a closure that will
// take the appropriate action for the database's format
// version.
var manifestMarker *atomicfs.Marker
manifestMarker, _, err := atomicfs.LocateMarker(fs, destDirPath, manifestMarkerName)
if err != nil {
return err
}
if err := setCurrentFunc(formatVers, manifestMarker, fs, destDirPath, destDir)(manifestFileNum); err != nil {
if err := manifestMarker.Move(base.MakeFilename(fileTypeManifest, manifestFileNum)); err != nil {
return err
}
return manifestMarker.Close()
Expand Down
21 changes: 4 additions & 17 deletions compaction.go
Original file line number Diff line number Diff line change
Expand Up @@ -1277,22 +1277,16 @@ func (c *compaction) newInputIter(
newIters tableNewIters, newRangeKeyIter keyspan.TableNewSpanIter, snapshots []uint64,
) (_ internalIterator, retErr error) {
// Validate the ordering of compaction input files for defense in depth.
// TODO(jackson): Some of the CheckOrdering calls may be adapted to pass
// ProhibitSplitUserKeys if we thread the active format major version in. Or
// if we remove support for earlier FMVs, we can remove the parameter
// altogether.
if len(c.flushing) == 0 {
if c.startLevel.level >= 0 {
err := manifest.CheckOrdering(c.cmp, c.formatKey,
manifest.Level(c.startLevel.level), c.startLevel.files.Iter(),
manifest.AllowSplitUserKeys)
manifest.Level(c.startLevel.level), c.startLevel.files.Iter())
if err != nil {
return nil, err
}
}
err := manifest.CheckOrdering(c.cmp, c.formatKey,
manifest.Level(c.outputLevel.level), c.outputLevel.files.Iter(),
manifest.AllowSplitUserKeys)
manifest.Level(c.outputLevel.level), c.outputLevel.files.Iter())
if err != nil {
return nil, err
}
Expand All @@ -1302,9 +1296,7 @@ func (c *compaction) newInputIter(
}
for _, info := range c.startLevel.l0SublevelInfo {
err := manifest.CheckOrdering(c.cmp, c.formatKey,
info.sublevel, info.Iter(),
// NB: L0 sublevels have never allowed split user keys.
manifest.ProhibitSplitUserKeys)
info.sublevel, info.Iter())
if err != nil {
return nil, err
}
Expand All @@ -1316,8 +1308,7 @@ func (c *compaction) newInputIter(
}
interLevel := c.extraLevels[0]
err := manifest.CheckOrdering(c.cmp, c.formatKey,
manifest.Level(interLevel.level), interLevel.files.Iter(),
manifest.AllowSplitUserKeys)
manifest.Level(interLevel.level), interLevel.files.Iter())
if err != nil {
return nil, err
}
Expand Down Expand Up @@ -3173,10 +3164,6 @@ func (d *DB) runCompaction(
}

writerOpts := d.opts.MakeWriterOptions(c.outputLevel.level, tableFormat)
if formatVers < FormatBlockPropertyCollector {
// Cannot yet write block properties.
writerOpts.BlockPropertyCollectors = nil
}

// prevPointKey is a sstable.WriterOption that provides access to
// the last point key written to a writer's sstable. When a new
Expand Down
7 changes: 2 additions & 5 deletions compaction_iter.go
Original file line number Diff line number Diff line change
Expand Up @@ -753,12 +753,9 @@ func (i *compactionIter) setNext() {
i.valid = true
i.maybeZeroSeqnum(i.curSnapshotIdx)

// There are two cases where we can early return and skip the remaining
// If this key is already a SETWITHDEL we can early return and skip the remaining
// records in the stripe:
// - If the DB does not SETWITHDEL.
// - If this key is already a SETWITHDEL.
if i.formatVersion < FormatSetWithDelete ||
i.iterKey.Kind() == InternalKeyKindSetWithDelete {
if i.iterKey.Kind() == InternalKeyKindSetWithDelete {
i.skip = true
return
}
Expand Down
7 changes: 1 addition & 6 deletions compaction_iter_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -90,9 +90,6 @@ func TestCompactionIter(t *testing.T) {
// The input to the data-driven test is dependent on the format major
// version we are testing against.
fileFunc := func(formatVersion FormatMajorVersion) string {
if formatVersion < FormatSetWithDelete {
return "testdata/compaction_iter"
}
if formatVersion < FormatDeleteSizedAndObsolete {
return "testdata/compaction_iter_set_with_del"
}
Expand Down Expand Up @@ -330,9 +327,7 @@ func TestCompactionIter(t *testing.T) {
// Rather than testing against all format version, we test against the
// significant boundaries.
formatVersions := []FormatMajorVersion{
FormatMostCompatible,
FormatSetWithDelete - 1,
FormatSetWithDelete,
FormatMinSupported,
internalFormatNewest,
}
for _, formatVersion := range formatVersions {
Expand Down
52 changes: 15 additions & 37 deletions compaction_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1480,63 +1480,41 @@ func TestManualCompaction(t *testing.T) {

testCases := []struct {
testData string
minVersion FormatMajorVersion
maxVersion FormatMajorVersion // inclusive
minVersion FormatMajorVersion // inclusive, FormatMinSupported if unspecified.
maxVersion FormatMajorVersion // inclusive, internalFormatNewest if unspecified.
verbose bool
}{
{
testData: "testdata/manual_compaction",
minVersion: FormatMostCompatible,
maxVersion: FormatSetWithDelete - 1,
testData: "testdata/singledel_manual_compaction_set_with_del",
},
{
testData: "testdata/manual_compaction_set_with_del",
minVersion: FormatBlockPropertyCollector,
// This test exercises split user keys.
maxVersion: FormatSplitUserKeysMarkedCompacted - 1,
},
{
testData: "testdata/singledel_manual_compaction",
minVersion: FormatMostCompatible,
maxVersion: FormatSetWithDelete - 1,
},
{
testData: "testdata/singledel_manual_compaction_set_with_del",
minVersion: FormatSetWithDelete,
maxVersion: internalFormatNewest,
},
{
testData: "testdata/manual_compaction_range_keys",
minVersion: FormatRangeKeys,
maxVersion: internalFormatNewest,
verbose: true,
},
{
testData: "testdata/manual_compaction_file_boundaries",
minVersion: FormatBlockPropertyCollector,
// This test exercises split user keys.
maxVersion: FormatSplitUserKeysMarkedCompacted - 1,
testData: "testdata/manual_compaction_range_keys",
verbose: true,
},
{
testData: "testdata/manual_compaction_file_boundaries_delsized",
minVersion: FormatDeleteSizedAndObsolete,
maxVersion: internalFormatNewest,
},
{
testData: "testdata/manual_compaction_set_with_del_sstable_Pebblev4",
minVersion: FormatDeleteSizedAndObsolete,
maxVersion: internalFormatNewest,
},
{
testData: "testdata/manual_compaction_multilevel",
minVersion: FormatMostCompatible,
maxVersion: internalFormatNewest,
testData: "testdata/manual_compaction_multilevel",
},
}

for _, tc := range testCases {
t.Run(tc.testData, func(t *testing.T) {
runTest(t, tc.testData, tc.minVersion, tc.maxVersion, tc.verbose)
minVersion, maxVersion := tc.minVersion, tc.maxVersion
if minVersion == 0 {
minVersion = FormatMinSupported
}
if maxVersion == 0 {
maxVersion = internalFormatNewest
}

runTest(t, tc.testData, minVersion, maxVersion, tc.verbose)
})
}
}
Expand Down
39 changes: 0 additions & 39 deletions data_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -504,10 +504,6 @@ func runBuildRemoteCmd(td *datadriven.TestData, d *DB, storage remote.Storage) e
switch cmdArg.Key {
case "format":
switch cmdArg.Vals[0] {
case "leveldb":
tableFormat = sstable.TableFormatLevelDB
case "rocksdbv2":
tableFormat = sstable.TableFormatRocksDBv2
case "pebblev1":
tableFormat = sstable.TableFormatPebblev1
case "pebblev2":
Expand Down Expand Up @@ -594,10 +590,6 @@ func runBuildCmd(td *datadriven.TestData, d *DB, fs vfs.FS) error {
switch cmdArg.Key {
case "format":
switch cmdArg.Vals[0] {
case "leveldb":
tableFormat = sstable.TableFormatLevelDB
case "rocksdbv2":
tableFormat = sstable.TableFormatRocksDBv2
case "pebblev1":
tableFormat = sstable.TableFormatPebblev1
case "pebblev2":
Expand Down Expand Up @@ -1304,37 +1296,6 @@ func runIngestExternalCmd(td *datadriven.TestData, d *DB, locator string) error
return nil
}

func runForceIngestCmd(td *datadriven.TestData, d *DB) error {
var paths []string
var level int
for _, arg := range td.CmdArgs {
switch arg.Key {
case "paths":
paths = append(paths, arg.Vals...)
case "level":
var err error
level, err = strconv.Atoi(arg.Vals[0])
if err != nil {
return err
}
}
}
_, err := d.ingest(paths, func(
tableNewIters,
keyspan.TableNewSpanIter,
IterOptions,
*Comparer,
*version,
int,
map[*compaction]struct{},
*fileMetadata,
bool,
) (int, *fileMetadata, error) {
return level, nil, nil
}, nil /* shared */, KeyRange{}, nil /* external */)
return err
}

func runLSMCmd(td *datadriven.TestData, d *DB) string {
d.mu.Lock()
defer d.mu.Unlock()
Expand Down
Loading

0 comments on commit bbf7dc4

Please sign in to comment.