-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: AddSSTable
option to write at current timestamp
#70422
Comments
In CockroachDB, there exist processes that build and ingest sstables. These sstables have timestamp-suffixed MVCC keys. Today, these keys' timestamps are dated in the past and rewrite history. This rewriting violates invariants in parts of the system. We would like to support ingesting these sstables with recent, invariant-maintaing MVCC timestamps. However, ingestion is used during bulk operations, and rewriting large sstables' keys with a recent MVCC timestamp is infeasibly expensive. This change introduces a facility for constructing an sstable with a placeholder suffix. When using this facility, a caller specifies a SuffixPlaceholder write option. The caller is also required to configure a Comparer that contains a non-nil Split function. When configured with a suffix placeholder, the sstable writer requires that all keys that have a suffix (as determined by Split) that exactly matches the provided SuffixPlaceholder. An sstable constructed in this fashion is still incomplete and unable to be read unless explicitly permitted through the AllowUnreplacedSuffix option. When a caller would like to complete an sstable constructed with a suffix placeholder, they may call ReplaceSuffix providing the original placeholder value and the replacement value. The placeholder and replacement values are required to be equal lengths. ReplaceSuffix performs an O(1) write to record the replacement value. After a suffix replacement the resulting sstable is complete, and sstable readers may read the sstable. Readers will perform a block transform to replace suffix placeholders with the replacement value on the fly as blocks are loaded. Informs cockroachdb/cockroach#70422.
In CockroachDB, there exist processes that build and ingest sstables. These sstables have timestamp-suffixed MVCC keys. Today, these keys' timestamps are dated in the past and rewrite history. This rewriting violates invariants in parts of the system. We would like to support ingesting these sstables with recent, invariant-maintaing MVCC timestamps. However, ingestion is used during bulk operations, and rewriting large sstables' keys with a recent MVCC timestamp is infeasibly expensive. This change introduces a facility for constructing an sstable with a placeholder suffix. When using this facility, a caller specifies a SuffixPlaceholder write option. The caller is also required to configure a Comparer that contains a non-nil Split function. When configured with a suffix placeholder, the sstable writer requires that all keys' suffixes (as determined by Split) exactly match the provided SuffixPlaceholder. An sstable constructed in this fashion is still incomplete and unable to be read unless explicitly permitted through the AllowUnreplacedSuffix option. When a caller would like to complete an sstable constructed with a suffix placeholder, they may call ReplaceSuffix providing the original placeholder value and the replacement value. The placeholder and replacement values are required to be equal lengths. ReplaceSuffix performs an O(1) write to record the replacement value. After a suffix replacement the resulting sstable is complete, and sstable readers may read the sstable. Readers detect the sstable property and apply a block transform to replace suffix placeholders with the replacement value on the fly as blocks are loaded. Informs cockroachdb/cockroach#70422.
In CockroachDB, there exist processes that build and ingest sstables. These sstables have timestamp-suffixed MVCC keys. Today, these keys' timestamps are dated in the past and rewrite history. This rewriting violates invariants in parts of the system. We would like to support ingesting these sstables with recent, invariant-maintaing MVCC timestamps. However, ingestion is used during bulk operations, and rewriting large sstables' keys with a recent MVCC timestamp is infeasibly expensive. This change introduces a facility for constructing an sstable with a placeholder suffix. When using this facility, a caller specifies a SuffixPlaceholder write option. The caller is also required to configure a Comparer that contains a non-nil Split function. When configured with a suffix placeholder, the sstable writer requires that all keys' suffixes (as determined by Split) exactly match the provided SuffixPlaceholder. An sstable constructed in this fashion is still incomplete and unable to be read unless explicitly permitted through the AllowUnreplacedSuffix option. When a caller would like to complete an sstable constructed with a suffix placeholder, they may call ReplaceSuffix providing the original placeholder value and the replacement value. The placeholder and replacement values are required to be equal lengths. ReplaceSuffix performs an O(1) write to record the replacement value. After a suffix replacement the resulting sstable is complete, and sstable readers may read the sstable. Readers detect the sstable property and apply a block transform to replace suffix placeholders with the replacement value on the fly as blocks are loaded. Informs cockroachdb/cockroach#70422.
In CockroachDB, there exist processes that build and ingest sstables. These sstables have timestamp-suffixed MVCC keys. Today, these keys' timestamps are dated in the past and rewrite history. This rewriting violates invariants in parts of the system. We would like to support ingesting these sstables with recent, invariant-maintaing MVCC timestamps. However, ingestion is used during bulk operations, and rewriting large sstables' keys with a recent MVCC timestamp is infeasibly expensive. This change introduces a facility for constructing an sstable with a placeholder suffix. When using this facility, a caller specifies a SuffixPlaceholder write option. The caller is also required to configure a Comparer that contains a non-nil Split function. When configured with a suffix placeholder, the sstable writer requires that all keys' suffixes (as determined by Split) exactly match the provided SuffixPlaceholder. An sstable constructed in this fashion is still incomplete and unable to be read unless explicitly permitted through the AllowUnreplacedSuffix option. When a caller would like to complete an sstable constructed with a suffix placeholder, they may call ReplaceSuffix providing the original placeholder value and the replacement value. The placeholder and replacement values are required to be equal lengths. ReplaceSuffix performs an O(1) write to record the replacement value. After a suffix replacement the resulting sstable is complete, and sstable readers may read the sstable. Readers detect the sstable property and apply a block transform to replace suffix placeholders with the replacement value on the fly as blocks are loaded. Informs cockroachdb/cockroach#70422.
In CockroachDB, there exist processes that build and ingest sstables. These sstables have timestamp-suffixed MVCC keys. Today, these keys' timestamps are dated in the past and rewrite history. This rewriting violates invariants in parts of the system. We would like to support ingesting these sstables with recent, invariant-maintaing MVCC timestamps. However, ingestion is used during bulk operations, and rewriting large sstables' keys with a recent MVCC timestamp is infeasibly expensive. This change introduces a facility for constructing an sstable with a placeholder suffix. When using this facility, a caller specifies a SuffixPlaceholder write option. The caller is also required to configure a Comparer that contains a non-nil Split function. When configured with a suffix placeholder, the sstable writer requires that all keys' suffixes (as determined by Split) exactly match the provided SuffixPlaceholder. An sstable constructed in this fashion is still incomplete and unable to be read unless explicitly permitted through the AllowUnreplacedSuffix option. When a caller would like to complete an sstable constructed with a suffix placeholder, they may call ReplaceSuffix providing the original placeholder value and the replacement value. The placeholder and replacement values are required to be equal lengths. ReplaceSuffix performs an O(1) write to record the replacement value. After a suffix replacement the resulting sstable is complete, and sstable readers may read the sstable. Readers detect the sstable property and apply a block transform to replace suffix placeholders with the replacement value on the fly as blocks are loaded. Informs cockroachdb/cockroach#70422.
In CockroachDB, there exist processes that build and ingest sstables. These sstables have timestamp-suffixed MVCC keys. Today, these keys' timestamps are dated in the past and rewrite history. This rewriting violates invariants in parts of the system. We would like to support ingesting these sstables with recent, invariant-maintaing MVCC timestamps. However, ingestion is used during bulk operations, and rewriting large sstables' keys with a recent MVCC timestamp is infeasibly expensive. This change introduces a facility for constructing an sstable with a placeholder suffix. When using this facility, a caller specifies a SuffixPlaceholder write option. The caller is also required to configure a Comparer that contains a non-nil Split function. When configured with a suffix placeholder, the sstable writer requires that all keys' suffixes (as determined by Split) exactly match the provided SuffixPlaceholder. An sstable constructed in this fashion is still incomplete and unable to be read unless explicitly permitted through the AllowUnreplacedSuffix option. When a caller would like to complete an sstable constructed with a suffix placeholder, they may call ReplaceSuffix providing the original placeholder value and the replacement value. The placeholder and replacement values are required to be equal lengths. ReplaceSuffix performs an O(1) write to record the replacement value. After a suffix replacement the resulting sstable is complete, and sstable readers may read the sstable. Readers detect the sstable property and apply a block transform to replace suffix placeholders with the replacement value on the fly as blocks are loaded. Informs cockroachdb/cockroach#70422.
In CockroachDB, there exist processes that build and ingest sstables. These sstables have timestamp-suffixed MVCC keys. Today, these keys' timestamps are dated in the past and rewrite history. This rewriting violates invariants in parts of the system. We would like to support ingesting these sstables with recent, invariant-maintaing MVCC timestamps. However, ingestion is used during bulk operations, and rewriting large sstables' keys with a recent MVCC timestamp is infeasibly expensive. This change introduces a facility for constructing an sstable with a placeholder suffix. When using this facility, a caller specifies a SuffixPlaceholder write option. The caller is also required to configure a Comparer that contains a non-nil Split function. When configured with a suffix placeholder, the sstable writer requires that all keys' suffixes (as determined by Split) exactly match the provided SuffixPlaceholder. An sstable constructed in this fashion is still incomplete and unable to be read unless explicitly permitted through the AllowUnreplacedSuffix option. When a caller would like to complete an sstable constructed with a suffix placeholder, they may call ReplaceSuffix providing the original placeholder value and the replacement value. The placeholder and replacement values are required to be equal lengths. ReplaceSuffix performs an O(1) write to record the replacement value. After a suffix replacement the resulting sstable is complete, and sstable readers may read the sstable. Readers detect the sstable property and apply a block transform to replace suffix placeholders with the replacement value on the fly as blocks are loaded. Informs cockroachdb/cockroach#70422.
We'll need to figure out how this should interact with concurrent transactions and locks/intents. See relevant discussion in #71676 (comment). Also related to #71697. |
AddSSTable
AddSSTable
option to write at current timestamp
The read-time timestamp synthesis ran into challenges because of the variable-length MVCC timestamp encoding, see cockroachdb/pebble#1314 (review). We're exploring rewriting the SST timestamps during |
AddSSTable
option to write at current timestampAddSSTable
option to write at current timestamp
As described in #69380, the fact that
AddSSTable
can write keys with timestamps far in the past is problematic, since it violates invariants that other parts of the system rely on, such as MVCC immutability and closed timestamps. To avoid this,AddSSTable
should in the typical case write at a current (given) HLC timestamp. However, rewriting the keys with a new timestamp is costly, and the whole point ofAddSSTable
is to efficiently write bulk data. To get around this, we should introduce support for synthesizing the timestamp of the added keys at read-time, and only rewrite them during Pebble compactions.Implementation details are intentionally underspecified here, and need to be explored. See also the RFC in #69380 for further info. Functionality that relies on writing at historical timestamps will need to be updated separately with this new behavior.
Epic CRDB-2624
The text was updated successfully, but these errors were encountered: