-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c2c: add fingerprinting for internal testing #89336
Comments
cc @cockroachdb/disaster-recovery |
From an offline conversation with @dt and @stevendanna:
Until now we were planning to use a primitive such as Each ExportRequest constructs an SST and ships it as part of the ExportResponse. For the purposes of fingerprinting, we could teach ExportRequest to funnel KVs into an fnv64 hasher instead of the sstWriter and return a XOR'ed hash of the span as part of the ExportResponse. This would eliminate the cost of transferring SSTs over the wire. Furthermore, ExportRequests are splittable and so we can lean entirely on the DistSender to split, send and combine all the responses (XOR the hashes) without setting up a DistSQL flow. Since we are hashing every KV (along with its timestamp) the outputted hash becomes order agnostic and can be run on the destination cluster without having to maintain any complicated batching logic. |
This is a refactor only change that pulls out the logic in `MVCCExportToSST` into `mvccExportToWriter` that accepts a `storage.Writer` interface. This will allow us to pass in a `FingerprintWriter` in a future commit. Informs: cockroachdb#89336 Release note: None
This change introduces a fingerprintWriter that hashes every key/timestamp and value for point keys, and combines their hashes via a XOR into a running aggregate. Range keys are not fingerprinted but instead written to a pebble SST that is returned to the caller. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic fingerprinting scheme across all returned range keys. The fingerprintWriter is used by `MVCCExportFingerprint` that exports a fingerprint for point keys in the keyrange [StartKey, EndKey) over the interval (StartTS, EndTS]. The export logic used by `MVCCExportFingerprint` is the same that drives `MVCCExportToSST`. The former writes to a fingerprintWriter while the latter writes to an sstWriter. Currently, this method only support using an `fnv64` hasher to fingerprint each KV. This change does not wire `MVCCExportFingerprint` to ExportRequest command evaluation. This will be done as a followup. Informs: cockroachdb#89336 Release note: None
This is a refactor only change that pulls out the logic in `MVCCExportToSST` into `mvccExportToWriter` that accepts a `storage.Writer` interface. This will allow us to pass in a `FingerprintWriter` in a future commit. Informs: cockroachdb#89336 Release note: None
This change introduces a fingerprintWriter that hashes every key/timestamp and value for point keys, and combines their hashes via a XOR into a running aggregate. Range keys are not fingerprinted but instead written to a pebble SST that is returned to the caller. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic fingerprinting scheme across all returned range keys. The fingerprintWriter is used by `MVCCExportFingerprint` that exports a fingerprint for point keys in the keyrange [StartKey, EndKey) over the interval (StartTS, EndTS]. The export logic used by `MVCCExportFingerprint` is the same that drives `MVCCExportToSST`. The former writes to a fingerprintWriter while the latter writes to an sstWriter. Currently, this method only support using an `fnv64` hasher to fingerprint each KV. This change does not wire `MVCCExportFingerprint` to ExportRequest command evaluation. This will be done as a followup. Informs: cockroachdb#89336 Release note: None
This change adds a `crdb_internal.fingerprint` builtin that accepts a `startTime`, `endTime`, `startKey` and `endKey` to define the interval the user wants to fingerprint. The builtin is powered by sending an ExportRequest with the defined intervals but with the `ExportFingerprint` option set to true. Setting this option on the ExportRequest means that instead of writing all point and rangekeys to an SST and sending them back to the client, command evaluation will use the newly introduced `fingerprintWriter` (cockroachdb#90848) when exporting keys. This writer computes an `fnv64` hash of the key/timestamp, value for each point key and maintains a running XOR aggregate of all the point keys processed as part of the ExportRequest. Rangekeys are not fingerprinted during command evaluation, but instead returned to the client in a pebble SST. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic ingerprinting scheme across all returned range keys. The ExportRequest sent as part of this builtin does not set any DistSender limit, thereby allowing concurrent execution across ranges. We are not concerned about the ExportResponses growing too large since the SSTs will only contain rangekeys that should be few in number. If this assumption is proved incorrect in the future, we can revisit setting a `TargetBytes` to the header of the BatchRequest. Fixes: cockroachdb#89336 Release note (sql change): introduces a `crdb_internal.fingerprint` builtin that can be used to generate a `fnv64` fingerprint of keys (and optionally their revisions) in a given key/time interval.
This is a refactor only change that pulls out the logic in `MVCCExportToSST` into `mvccExportToWriter` that accepts a `storage.Writer` interface. This will allow us to pass in a `FingerprintWriter` in a future commit. Informs: cockroachdb#89336 Release note: None
This change introduces a fingerprintWriter that hashes every key/timestamp and value for point keys, and combines their hashes via a XOR into a running aggregate. Range keys are not fingerprinted but instead written to a pebble SST that is returned to the caller. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic fingerprinting scheme across all returned range keys. The fingerprintWriter is used by `MVCCExportFingerprint` that exports a fingerprint for point keys in the keyrange [StartKey, EndKey) over the interval (StartTS, EndTS]. The export logic used by `MVCCExportFingerprint` is the same that drives `MVCCExportToSST`. The former writes to a fingerprintWriter while the latter writes to an sstWriter. Currently, this method only support using an `fnv64` hasher to fingerprint each KV. This change does not wire `MVCCExportFingerprint` to ExportRequest command evaluation. This will be done as a followup. Informs: cockroachdb#89336 Release note: None
This is a refactor only change that pulls out the logic in `MVCCExportToSST` into `mvccExportToWriter` that accepts a `storage.Writer` interface. This will allow us to pass in a `FingerprintWriter` in a future commit. Informs: cockroachdb#89336 Release note: None
This change introduces a fingerprintWriter that hashes every key/timestamp and value for point keys, and combines their hashes via a XOR into a running aggregate. Range keys are not fingerprinted but instead written to a pebble SST that is returned to the caller. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic fingerprinting scheme across all returned range keys. The fingerprintWriter is used by `MVCCExportFingerprint` that exports a fingerprint for point keys in the keyrange [StartKey, EndKey) over the interval (StartTS, EndTS]. The export logic used by `MVCCExportFingerprint` is the same that drives `MVCCExportToSST`. The former writes to a fingerprintWriter while the latter writes to an sstWriter. Currently, this method only support using an `fnv64` hasher to fingerprint each KV. This change does not wire `MVCCExportFingerprint` to ExportRequest command evaluation. This will be done as a followup. Informs: cockroachdb#89336 Release note: None
This change introduces a fingerprintWriter that hashes every key/timestamp and value for point keys, and combines their hashes via a XOR into a running aggregate. Range keys are not fingerprinted but instead written to a pebble SST that is returned to the caller. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic fingerprinting scheme across all returned range keys. The fingerprintWriter is used by `MVCCExportFingerprint` that exports a fingerprint for point keys in the keyrange [StartKey, EndKey) over the interval (StartTS, EndTS]. The export logic used by `MVCCExportFingerprint` is the same that drives `MVCCExportToSST`. The former writes to a fingerprintWriter while the latter writes to an sstWriter. Currently, this method only support using an `fnv64` hasher to fingerprint each KV. This change does not wire `MVCCExportFingerprint` to ExportRequest command evaluation. This will be done as a followup. Informs: cockroachdb#89336 Release note: None
This change introduces a fingerprintWriter that hashes every key/timestamp and value for point keys, and combines their hashes via a XOR into a running aggregate. Range keys are not fingerprinted but instead written to a pebble SST that is returned to the caller. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic fingerprinting scheme across all returned range keys. The fingerprintWriter is used by `MVCCExportFingerprint` that exports a fingerprint for point keys in the keyrange [StartKey, EndKey) over the interval (StartTS, EndTS]. The export logic used by `MVCCExportFingerprint` is the same that drives `MVCCExportToSST`. The former writes to a fingerprintWriter while the latter writes to an sstWriter. Currently, this method only support using an `fnv64` hasher to fingerprint each KV. This change does not wire `MVCCExportFingerprint` to ExportRequest command evaluation. This will be done as a followup. Informs: cockroachdb#89336 Release note: None
This is a refactor only change that pulls out the logic in `MVCCExportToSST` into `mvccExportToWriter` that accepts a `storage.Writer` interface. This will allow us to pass in a `FingerprintWriter` in a future commit. Informs: cockroachdb#89336 Release note: None
This change introduces a fingerprintWriter that hashes every key/timestamp and value for point keys, and combines their hashes via a XOR into a running aggregate. Range keys are not fingerprinted but instead written to a pebble SST that is returned to the caller. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic fingerprinting scheme across all returned range keys. The fingerprintWriter is used by `MVCCExportFingerprint` that exports a fingerprint for point keys in the keyrange [StartKey, EndKey) over the interval (StartTS, EndTS]. The export logic used by `MVCCExportFingerprint` is the same that drives `MVCCExportToSST`. The former writes to a fingerprintWriter while the latter writes to an sstWriter. Currently, this method only support using an `fnv64` hasher to fingerprint each KV. This change does not wire `MVCCExportFingerprint` to ExportRequest command evaluation. This will be done as a followup. Informs: cockroachdb#89336 Release note: None
crdb_internal.scan(crdb_internal.tenant_span($1))
to support revisions
This can be marked as done once some of our C2C tests uses the |
The following technical approach was replaced with:
...but the goal is the same.
Previous plan
crdb_internal.scan(crdb_internal.tenant_span($1))
returns the raw keys and values from the specified span. We would like to extend this generator to return timestamped ordered revisions of each key. While this is useful as a standalone tool, the motivation behind this change is to drive the on-demand fingerprinting that we are developing for c2c replication. At a high level this primitive will be executed by each processor on a pre-defined chunk of spans, the output of which will be fed to a checksum algorithm and sent downstream in the DistSQL flow.As part of this issue, we should investigate whether we can add a mode to
ExportRequest
(KV request that is already capable of reading and returning timestamp-ordered revisions of keys) that does not write these keys to an SST but returns them in a kvBuf slice that we can read from. We should benchmark, and trace what parts ofExportRequest
that are being used during backup are "slow" and "unnecessary" for simply reading and returning revisions of all rows in a given timebound.Epic: CRDB-21075
Jira issue: CRDB-20208
The text was updated successfully, but these errors were encountered: