Skip to content

Commit

Permalink
Merge #120784
Browse files Browse the repository at this point in the history
120784: storage: add storage.sstable.compression_algorithm cluster setting r=jbowens a=jbowens

Introduce a new cluster setting that allows the operator to configure the compression algorithm used when compressing sstable blocks. This allows operators to opt into use of zstd (as opposed to the previous setting of snappy). ZSTD typically achieves better compression ratios than snappy, and operators may find that they can achieve higher node densities through enabling zstd. Future releases may change the default compression algorithm.

In a side-by-side comparison of a 10000-warehouse tpcc import, the zstd cluster achieved a higher import speed of 146 MiB/s versus snappy's 135 MiB/s. The zstd cluster's physical database size was significantly less (~30%).

<img width="976" alt="Screenshot 2024-03-20 at 3 07 13 PM" src="https://github.com/cockroachdb/cockroach/assets/867352/a92a4d6a-135d-4e5c-9b38-794123e8fcec">
<img width="983" alt="Screenshot 2024-03-20 at 3 06 49 PM" src="https://github.com/cockroachdb/cockroach/assets/867352/275f9ecf-1783-4b7b-8159-a734a6275dea">

Informs #105568.
Epic: none
Release note (ops change): Add `storage.sstable.compression_algorithm` cluster setting that configures the compression algorithm to use when compressing sstable blocks.

Co-authored-by: Jackson Owens <[email protected]>
  • Loading branch information
craig[bot] and jbowens committed Mar 21, 2024
2 parents 19d874a + 99f8a25 commit 8dcbdc4
Show file tree
Hide file tree
Showing 5 changed files with 54 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -328,6 +328,7 @@ sql.txn.read_committed_isolation.enabled boolean true set to true to allow trans
sql.txn_fingerprint_id_cache.capacity integer 100 the maximum number of txn fingerprint IDs stored application
storage.max_sync_duration duration 20s maximum duration for disk operations; any operations that take longer than this setting trigger a warning log entry or process crash system-visible
storage.max_sync_duration.fatal.enabled boolean true if true, fatal the process when a disk operation exceeds storage.max_sync_duration application
storage.sstable.compression_algorithm enumeration snappy "determines the compression algorithm to use when compressing sstable data blocks; supported values: ""snappy"", ""zstd"" [snappy = 1, zstd = 2]" system-visible
storage.value_blocks.enabled boolean true set to true to enable writing of value blocks in sstables application
timeseries.storage.resolution_10s.ttl duration 240h0m0s the maximum age of time series data stored at the 10 second resolution. Data older than this is subject to rollup and deletion. system-visible
timeseries.storage.resolution_30m.ttl duration 2160h0m0s the maximum age of time series data stored at the 30 minute resolution. Data older than this is subject to deletion. system-visible
Expand Down
1 change: 1 addition & 0 deletions docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,7 @@
<tr><td><div id="setting-storage-ingest-split-enabled" class="anchored"><code>storage.ingest_split.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>set to false to disable ingest-time splitting that lowers write-amplification</td><td>Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-storage-max-sync-duration" class="anchored"><code>storage.max_sync_duration</code></div></td><td>duration</td><td><code>20s</code></td><td>maximum duration for disk operations; any operations that take longer than this setting trigger a warning log entry or process crash</td><td>Serverless/Dedicated/Self-Hosted (read-only)</td></tr>
<tr><td><div id="setting-storage-max-sync-duration-fatal-enabled" class="anchored"><code>storage.max_sync_duration.fatal.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>if true, fatal the process when a disk operation exceeds storage.max_sync_duration</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-storage-sstable-compression-algorithm" class="anchored"><code>storage.sstable.compression_algorithm</code></div></td><td>enumeration</td><td><code>snappy</code></td><td>determines the compression algorithm to use when compressing sstable data blocks; supported values: &#34;snappy&#34;, &#34;zstd&#34; [snappy = 1, zstd = 2]</td><td>Serverless/Dedicated/Self-Hosted (read-only)</td></tr>
<tr><td><div id="setting-storage-value-blocks-enabled" class="anchored"><code>storage.value_blocks.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>set to true to enable writing of value blocks in sstables</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-storage-wal-failover-unhealthy-op-threshold" class="anchored"><code>storage.wal_failover.unhealthy_op_threshold</code></div></td><td>duration</td><td><code>100ms</code></td><td>the latency of a WAL write considered unhealthy and triggers a failover to a secondary WAL location</td><td>Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-timeseries-storage-enabled" class="anchored"><code>timeseries.storage.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>if set, periodic timeseries data is stored within the cluster; disabling is not recommended unless you are storing the data elsewhere</td><td>Dedicated/Self-Hosted</td></tr>
Expand Down
2 changes: 1 addition & 1 deletion pkg/ccl/backupccl/file_sst_sink_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ func TestFileSSTSinkExtendOneFile(t *testing.T) {

getKeys := func(prefix string, n int) []byte {
var b bytes.Buffer
sst := storage.MakeBackupSSTWriter(ctx, nil, &b)
sst := storage.MakeBackupSSTWriter(ctx, cluster.MakeTestingClusterSettings(), &b)
for i := 0; i < n; i++ {
require.NoError(t, sst.PutUnversioned([]byte(fmt.Sprintf("%s%08d", prefix, i)), nil))
}
Expand Down
49 changes: 49 additions & 0 deletions pkg/storage/pebble.go
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,50 @@ var IngestAsFlushable = settings.RegisterBoolSetting(
util.ConstantWithMetamorphicTestBool(
"storage.ingest_as_flushable.enabled", true))

const (
compressionAlgorithmSnappy int64 = 1
compressionAlgorithmZstd int64 = 2
)

// compressionAlgorithm determines the compression algorithm used to compress
// data blocks when writing sstables. Users should call getCompressionAlgorithm
// rather than calling compressionAlgorithm.Get directly.
var compressionAlgorithm = settings.RegisterEnumSetting(
// NB: We can't use settings.SystemOnly today because we may need to read the
// value from within a tenant building an sstable for AddSSTable.
settings.SystemVisible,
"storage.sstable.compression_algorithm",
`determines the compression algorithm to use when compressing sstable data blocks;`+
` supported values: "snappy", "zstd"`,
// TODO(jackson): Consider using a metamorphic constant here, but many tests
// will need to override it because they depend on a deterministic sstable
// size.
"snappy",
map[int64]string{
compressionAlgorithmSnappy: "snappy",
compressionAlgorithmZstd: "zstd",
},
settings.WithPublic,
)

func getCompressionAlgorithm(ctx context.Context, settings *cluster.Settings) pebble.Compression {
switch compressionAlgorithm.Get(&settings.SV) {
case compressionAlgorithmSnappy:
return pebble.SnappyCompression
case compressionAlgorithmZstd:
// Pre-24.1 Pebble's implementation of zstd had bugs that could cause
// in-memory corruption. We require that the cluster version is 24.1 which
// implies that all nodes are running 24.1 code and will never run code
// < 24.1 again.
if settings.Version.ActiveVersionOrEmpty(ctx).IsActive(clusterversion.V24_1) {
return pebble.ZstdCompression
}
return pebble.DefaultCompression
default:
return pebble.DefaultCompression
}
}

// DO NOT set storage.single_delete.crash_on_invariant_violation.enabled or
// storage.single_delete.crash_on_ineffectual.enabled to true.
//
Expand Down Expand Up @@ -1025,6 +1069,11 @@ func newPebble(ctx context.Context, cfg PebbleConfig) (p *Pebble, err error) {
}
opts.FS = cfg.Env
opts.Lock = cfg.Env.DirectoryLock
for _, l := range opts.Levels {
l.Compression = func() sstable.Compression {
return getCompressionAlgorithm(ctx, cfg.Settings)
}
}
opts.EnsureDefaults()

// The context dance here is done so that we have a clean context without
Expand Down
2 changes: 2 additions & 0 deletions pkg/storage/sst_writer.go
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ func MakeIngestionWriterOptions(ctx context.Context, cs *cluster.Settings) sstab
format = sstable.TableFormatPebblev4
}
opts := DefaultPebbleOptions().MakeWriterOptions(0, format)
opts.Compression = getCompressionAlgorithm(ctx, cs)
opts.MergerName = "nullptr"
return opts
}
Expand Down Expand Up @@ -117,6 +118,7 @@ func MakeBackupSSTWriter(ctx context.Context, cs *cluster.Settings, f io.Writer)
// block checksums and more index entries are just overhead and smaller blocks
// reduce compression ratio.
opts.BlockSize = 128 << 10
opts.Compression = getCompressionAlgorithm(ctx, cs)
opts.MergerName = "nullptr"
return SSTWriter{
fw: sstable.NewWriter(&noopFinishAbort{f}, opts),
Expand Down

0 comments on commit 8dcbdc4

Please sign in to comment.