storage: add MVCCExportFingerprint method and fingerprintWriter #90848

adityamaru · 2022-10-28T16:32:02Z

This change introduces a fingerprintWriter that
hashes every key/timestamp and value for point keys, and
combines their hashes via a XOR into a running aggregate.
Range keys are not fingerprinted but instead written to a pebble SST that is
returned to the caller. This is because range keys do not have a stable,
discrete identity and so it is up to the caller to define a deterministic
fingerprinting scheme across all returned range keys.

The fingerprintWriter is used by MVCCExportFingerprint that
exports a fingerprint for point keys in the keyrange
[StartKey, EndKey) over the interval (StartTS, EndTS].
The export logic used by MVCCExportFingerprint is the same
that drives MVCCExportToSST. The former writes to a fingerprintWriter
while the latter writes to an sstWriter. Currently, this
method only support using an fnv64 hasher to fingerprint each KV.

This change does not wire MVCCExportFingerprint to ExportRequest
command evaluation. This will be done as a followup.

Informs: #89336

Release note: None

cockroach-teamcity · 2022-10-28T16:32:19Z

This change is

erikgrinaker · 2022-10-31T14:22:22Z

Just a heads-up that I'm still sick, so PR reviews are on a best-effort basis, but will try to get to this soon.

adityamaru · 2022-10-31T14:39:44Z

Just a heads-up that I'm still sick, so PR reviews are on a best-effort basis

No problem! no real rush here.

This change adds a `crdb_internal.fingerprint` builtin that accepts a `startTime`, `endTime`, `startKey` and `endKey` to define the interval the user wants to fingerprint. The builtin is powered by sending an ExportRequest with the defined intervals but with the `ExportFingerprint` option set to true. Setting this option on the ExportRequest means that instead of writing all point and rangekeys to an SST and sending them back to the client, command evaluation will use the newly introduced `fingerprintWriter` (cockroachdb#90848) when exporting keys. This writer computes an `fnv64` hash of the key/timestamp, value for each point key and maintains a running XOR aggregate of all the point keys processed as part of the ExportRequest. Rangekeys are not fingerprinted during command evaluation, but instead returned to the client in a pebble SST. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic ingerprinting scheme across all returned range keys. The ExportRequest sent as part of this builtin does not set any DistSender limit, thereby allowing concurrent execution across ranges. We are not concerned about the ExportResponses growing too large since the SSTs will only contain rangekeys that should be few in number. If this assumption is proved incorrect in the future, we can revisit setting a `TargetBytes` to the header of the BatchRequest. Fixes: cockroachdb#89336 Release note (sql change): introduces a `crdb_internal.fingerprint` builtin that can be used to generate a `fnv64` fingerprint of keys (and optionally their revisions) in a given key/time interval.

stevendanna

This seems like good incremental progress.

What are we thinking about regarding pagination and return size accounting? Right now, it looks like if we set TargetSize > 0 then the fingerprinting will return based on a size calculation that includes point keys that aren't actually be returned. Perhaps this might be what we want anyway just as a crude way to give admission control a change to slow down our scan of the data.

stevendanna · 2022-11-02T16:02:25Z

pkg/storage/fingerprint_writer.go

+	defer f.hasher.Reset()
+
+	// Hash the key/timestamp and value of the RawMVCC.
+	_, err := f.hasher.Write(key.Key)


This can happen in a follow-up, but I think for the purpose of tenant fingerprinting, we'll eventually want a way to specify whether the tenant prefix of the key and the checksum of the value are included in the fingerprint.

Oh good point, I hadn't thought about this. Let me file an issue for this so I don't forget - #91150

pkg/storage/fingerprint_writer.go

stevendanna · 2022-11-02T16:04:36Z

pkg/storage/fingerprint_writer.go

+
+// ApplyBatchRepr implements the Writer interface.
+func (f *fingerprintWriter) ApplyBatchRepr(repr []byte, sync bool) error {
+	panic("unimplemented")


I'm fine either way, but in production I'm not sure the fingerprinting would ever be worth panic'ing the node for. Perhaps this could be an assertion error?

Also, given how much of this interface is unimplemented, it kinda makes me wonder if we want a smaller interface that we could use as the argument to MVCCExportToSST. Perhaps called ExportWriter or something.

Good point, I like the idea of introducing a trimmed down ExportWriter interface. fwiw I don't think this would be a situation where we see an unexpected panic because these methods are called by us in mvccExportToWriter and we only use 3 of the many at the moment. It is possible someone forgets to test a change with the fingerprint option set on the ExportRequest so I am onboard with introducing a trimmed down interface.

added an ExportWriter interface that is passed into mvccExportToWriter so that we can avoid all those stub methods in the fingerprint writer.

adityamaru · 2022-11-02T18:25:10Z

What are we thinking about regarding pagination and return size accounting?

The plan is to actually not set any distsender limits, to begin with, which should mean that there is no size based pagination. This allows for range requests to be sent and processed concurrently by the dist sender. I could be convinced that this is not ideal but the assumption we're working under is that rangekeys are few and small in size, and so we never run the risk of our SSTs becoming too large in the ExportResponse even with 100k+ ranges.

Perhaps this might be what we want anyway just as a crude way to give admission control a change to slow down our scan of the data.

Are you referring to the ElasticCPULimiter here? I don't think thats tied to our TargetBytes?

stevendanna · 2022-11-03T17:40:50Z

The plan is to actually not set any distsender limits, to begin with, which should mean that there is no size based pagination. This allows for range requests to be sent and processed concurrently by the dist sender. I could be convinced that this is not ideal but the assumption we're working under is that rangekeys are few and small in size, and so we never run the risk of our SSTs becoming too large in the ExportResponse even with 100k+ ranges.

Works for me.

Are you referring to the ElasticCPULimiter here? I don't think thats tied to our TargetBytes?

No, I was just thinking about the admission queue in general. The fact that the size limit + pagination would force us to periodically stop working and send another request which then needs to be re-admitted (I think?) rather than having one request that is allowed to scan for as long as it pleases. But I'll admit to not knowing much about how admission control works currently. This was just speculation really.

sumeerbhola

(drive-by)
Even if the sst is small because only rangekeys are being written to it, we shouldn't be doing large scans without any CPU resource constraint. That is, this ought to integrate with the elastic cpu limiting, which means it needs to have the ability to resume. I assume we already have that as part of of the refactoring below that reuses the mvccExportToWriter for this fingerprinting and regular export.
One would need to plumb the ElasticCPUWorkHandle by changing the code in

cockroach/pkg/kv/kvserver/kvadmission/kvadmission.go

Line 247 in 02a5bec

if ba.IsSingleExportRequest() {

@irfansharif

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, and @stevendanna)

irfansharif

The code now at least looks like it's already integrated with elastic CPU control (Aditya and I talked in person BTW -- it was part of why they're re-using the ExportRequest type instead of a special-cased finger-print only request). I assume the requests we're generating are batches with a single export request, so the conditional above evaluates to true?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @dt, @erikgrinaker, and @stevendanna)

pkg/storage/mvcc.go line 5783 at r6 (raw file):

	ctx context.Context, cs *cluster.Settings, reader Reader, opts MVCCExportOptions, dest io.Writer,
) (roachpb.BulkOpSummary, MVCCKey, uint64, error) {
	ctx, span := tracing.ChildSpan(ctx, "storage.MVCCExportToSST")

"storage.MVCCExportFingerprint" as the tag name.

adityamaru · 2022-11-07T13:36:28Z

I assume the requests we're generating are batches with a single export request, so the conditional above evaluates to true?

Yup, a follow-up PR that will teach ExportRequest to use the fingerprint writer is expected to send batches with a single export request so the conditional should be true. I expect the CPU-based pagination to work exactly as it does for backups. Happy to write a test in that follow-up PR where it's all hooked up.

adityamaru · 2022-11-07T13:36:59Z

friendly ping @erikgrinaker @stevendanna

erikgrinaker

friendly ping

Yep, sorry for the delay.

pkg/storage/mvcc.go

pkg/storage/fingerprint_writer.go

erikgrinaker · 2022-11-07T13:40:22Z

pkg/storage/fingerprint_writer.go

+	if err := f.hash(key.Key); err != nil {
+		return err
+	}
+	if err := f.hash([]byte(key.Timestamp.String())); err != nil {


This doesn't seem great, since the string representation isn't guaranteed to be stable nor lossless. How about hashing the encoded MVCC timestamp instead? We can use e.g. encodeMVCCTimestampToBuf() to avoid allocations.

makes sense, changed to encoding the buf instead. In the follow up that introduces the builtin that actually uses the fingerprint mode I was writing queries like:

WITH sfp AS (SELECT xor_agg(fnv64(key, ts::BYTES, value)) AS scanfp from crdb_internal.scan(crdb_internal.table_span('t'::regclass::int))) SELECT count(*) FROM crdb_internal.fingerprint(crdb_internal.table_span('t'::regclass::int),'$before_insert'::TIMESTAMPTZ, false) AS efp, sfp WHERE efp = sfp.scanfp;`

to validate that a scan and an exportrequest returned the same fingerprint. This won't work out of the box cause crdb_internal.scan returns the String() version of the timestamp. Maybe we can add an overload to crdb_internal.scan that returns the encoded MVCC timestamp instead.

Right, that's annoying. Wonder why it doesn't return a proper timestamp type. Might as well just parse it back to a timestamp in the test, I suppose.

By the way, I thought encodeMVCCTimestampToBuf would resize the buffer, but I see that it doesn't because of how it's used in the key encoding functions. This code is very performance-sensitive, so I can't add in additional size checks in the hot path. But here's a patch which adds EncodeMVCCTimestampToBuf that does resize the buffer (the name duplication isn't great, but at least it's consistent with EncodeMVCCKeyToBuf / encodeMVCCKeyToBuf).

encodets.patch.txt

thanks for the patch, added it in.

erikgrinaker · 2022-11-07T13:43:57Z

pkg/storage/mvcc_test.go

+// This test uses a `fingerprintOracle` to verify that the fingerprint generated
+// by `MVCCExportFingerprint` is what we would get if we iterated over an SST
+// with all keys and computed our own fingerprint.
+func TestMVCCExportFingerprint(t *testing.T) {


Can you add some test cases to TestMVCCHistories as well? We already have comprehensive export tests there in mvcc_histories/export, including an example dataset (but let's use a separate file for fingerprint tests).

Nice! Didn't know this test existed.

I added the same tests as in mvcc_histories/export but instead of printing point keys I validate the fingerprint against an oracle. Apologies if this file is hard to review but I ran a diff with some sed between mvcc_histories/export and mvcc_histories/export_fingerprint and the rangekeys outputted in both are identical as expected.

Thanks! These should output the actual fingerprint as well, to assert the stability of the fingerprint. I don't think it's necessary to output pointkeys_fingerprint_verified, since the test will fail if it doesn't match so it doesn't add anything, but no strong opinion.

I have a slight preference for adding a fingerprint option to the existing export command rather than adding a new command. Essentially, if e.hasArg("fingerprint") { ... }. In particular because the current approach has a bug: anexport command following an export_fingerprint will generate a fingerprint rather than an export. Either way, the command list comment at the top of the file needs to be updated.

good point, changed to what you recommended.

pkg/storage/fingerprint_writer.go

craig · 2022-11-09T23:45:44Z

This PR was included in a batch that was canceled, it will be automatically retried

craig · 2022-11-10T00:27:31Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-11-10T01:04:49Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-11-10T02:09:17Z

Build failed:

Bazel Essential CI (Cockroach)

This is a refactor only change that pulls out the logic in `MVCCExportToSST` into `mvccExportToWriter` that accepts a `storage.Writer` interface. This will allow us to pass in a `FingerprintWriter` in a future commit. Informs: cockroachdb#89336 Release note: None

This change introduces a fingerprintWriter that hashes every key/timestamp and value for point keys, and combines their hashes via a XOR into a running aggregate. Range keys are not fingerprinted but instead written to a pebble SST that is returned to the caller. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic fingerprinting scheme across all returned range keys. The fingerprintWriter is used by `MVCCExportFingerprint` that exports a fingerprint for point keys in the keyrange [StartKey, EndKey) over the interval (StartTS, EndTS]. The export logic used by `MVCCExportFingerprint` is the same that drives `MVCCExportToSST`. The former writes to a fingerprintWriter while the latter writes to an sstWriter. Currently, this method only support using an `fnv64` hasher to fingerprint each KV. This change does not wire `MVCCExportFingerprint` to ExportRequest command evaluation. This will be done as a followup. Informs: cockroachdb#89336 Release note: None

adityamaru · 2022-11-10T15:29:15Z

bors r+

craig · 2022-11-10T16:25:07Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-11-10T18:17:31Z

Build succeeded:

Bazel Essential CI (Cockroach)

This change adds a `crdb_internal.fingerprint` builtin that accepts a `startTime`, `endTime`, `startKey` and `endKey` to define the interval the user wants to fingerprint. The builtin is powered by sending an ExportRequest with the defined intervals but with the `ExportFingerprint` option set to true. Setting this option on the ExportRequest means that instead of writing all point and rangekeys to an SST and sending them back to the client, command evaluation will use the newly introduced `fingerprintWriter` (cockroachdb#90848) when exporting keys. This writer computes an `fnv64` hash of the key/timestamp, value for each point key and maintains a running XOR aggregate of all the point keys processed as part of the ExportRequest. Rangekeys are not fingerprinted during command evaluation, but instead returned to the client in a pebble SST. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic ingerprinting scheme across all returned range keys. The ExportRequest sent as part of this builtin does not set any DistSender limit, thereby allowing concurrent execution across ranges. We are not concerned about the ExportResponses growing too large since the SSTs will only contain rangekeys that should be few in number. If this assumption is proved incorrect in the future, we can revisit setting a `TargetBytes` to the header of the BatchRequest. Fixes: cockroachdb#89336 Release note: None

91124: builtins: add crdb_internal.fingerprint builtin r=adityamaru a=adityamaru This change adds a `crdb_internal.fingerprint` builtin that accepts a `startTime`, `endTime`, `startKey` and `endKey` to define the interval the user wants to fingerprint. The builtin is powered by sending an ExportRequest with the defined intervals but with the `ExportFingerprint` option set to true. Setting this option on the ExportRequest means that instead of writing all point and rangekeys to an SST and sending them back to the client, command evaluation will use the newly introduced `fingerprintWriter` (#90848) when exporting keys. This writer computes an `fnv64` hash of the key/timestamp, value for each point key and maintains a running XOR aggregate of all the point keys processed as part of the ExportRequest. Rangekeys are not fingerprinted during command evaluation, but instead returned to the client in a pebble SST. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic ingerprinting scheme across all returned range keys. The ExportRequest sent as part of this builtin does not set any DistSender limit, thereby allowing concurrent execution across ranges. We are not concerned about the ExportResponses growing too large since the SSTs will only contain rangekeys that should be few in number. If this assumption is proved incorrect in the future, we can revisit setting a `TargetBytes` to the header of the BatchRequest. Fixes: #89336 Release note: None Co-authored-by: adityamaru <[email protected]>

adityamaru requested review from dt, erikgrinaker and stevendanna October 28, 2022 16:32

adityamaru requested review from a team as code owners October 28, 2022 16:32

adityamaru mentioned this pull request Nov 1, 2022

builtins: add timestamp to crdb_internal.scan/list_sql_keys_in_range #90956

Merged

adityamaru mentioned this pull request Nov 2, 2022

builtins: add crdb_internal.fingerprint builtin #91124

Merged

stevendanna reviewed Nov 2, 2022

View reviewed changes

adityamaru mentioned this pull request Nov 2, 2022

batcheval: add option to trim tenant prefix and value metadata before fingerprinting #91150

Closed

adityamaru force-pushed the export-hasher branch from 43ee667 to 0fb467f Compare November 2, 2022 20:10

adityamaru requested a review from stevendanna November 2, 2022 20:11

sumeerbhola reviewed Nov 4, 2022

View reviewed changes

adityamaru force-pushed the export-hasher branch from 0fb467f to dea6f01 Compare November 4, 2022 20:28

irfansharif reviewed Nov 4, 2022

View reviewed changes

erikgrinaker reviewed Nov 7, 2022

View reviewed changes

stevendanna approved these changes Nov 7, 2022

View reviewed changes

adityamaru force-pushed the export-hasher branch from dea6f01 to 7e56760 Compare November 7, 2022 18:30

blathers-crl bot requested a review from irfansharif November 7, 2022 18:30

adityamaru requested a review from erikgrinaker November 7, 2022 18:37

adityamaru added 2 commits November 10, 2022 08:59

adityamaru force-pushed the export-hasher branch from 84cdd0b to b0c957d Compare November 10, 2022 13:59

blathers-crl bot requested a review from irfansharif November 10, 2022 13:59

craig bot merged commit 58e91e2 into cockroachdb:master Nov 10, 2022

adityamaru deleted the export-hasher branch November 11, 2022 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: add MVCCExportFingerprint method and fingerprintWriter #90848

storage: add MVCCExportFingerprint method and fingerprintWriter #90848

adityamaru commented Oct 28, 2022

cockroach-teamcity commented Oct 28, 2022

erikgrinaker commented Oct 31, 2022 •

edited

Loading

adityamaru commented Oct 31, 2022

stevendanna left a comment

stevendanna Nov 2, 2022

adityamaru Nov 2, 2022 •

edited

Loading

stevendanna Nov 2, 2022

adityamaru Nov 2, 2022

adityamaru Nov 2, 2022

adityamaru commented Nov 2, 2022 •

edited

Loading

stevendanna commented Nov 3, 2022

sumeerbhola left a comment

irfansharif left a comment

adityamaru commented Nov 7, 2022

adityamaru commented Nov 7, 2022

erikgrinaker left a comment

erikgrinaker Nov 7, 2022 •

edited

Loading

adityamaru Nov 7, 2022

erikgrinaker Nov 8, 2022

erikgrinaker Nov 8, 2022

adityamaru Nov 9, 2022

erikgrinaker Nov 7, 2022

adityamaru Nov 7, 2022

erikgrinaker Nov 8, 2022 •

edited

Loading

adityamaru Nov 9, 2022

craig bot commented Nov 9, 2022

craig bot commented Nov 10, 2022

craig bot commented Nov 10, 2022

craig bot commented Nov 10, 2022

adityamaru commented Nov 10, 2022

craig bot commented Nov 10, 2022

craig bot commented Nov 10, 2022

storage: add MVCCExportFingerprint method and fingerprintWriter #90848

storage: add MVCCExportFingerprint method and fingerprintWriter #90848

Conversation

adityamaru commented Oct 28, 2022

cockroach-teamcity commented Oct 28, 2022

erikgrinaker commented Oct 31, 2022 • edited Loading

adityamaru commented Oct 31, 2022

stevendanna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adityamaru Nov 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adityamaru commented Nov 2, 2022 • edited Loading

stevendanna commented Nov 3, 2022

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

adityamaru commented Nov 7, 2022

adityamaru commented Nov 7, 2022

erikgrinaker left a comment

Choose a reason for hiding this comment

erikgrinaker Nov 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erikgrinaker Nov 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

craig bot commented Nov 9, 2022

craig bot commented Nov 10, 2022

craig bot commented Nov 10, 2022

craig bot commented Nov 10, 2022

adityamaru commented Nov 10, 2022

craig bot commented Nov 10, 2022

craig bot commented Nov 10, 2022

erikgrinaker commented Oct 31, 2022 •

edited

Loading

adityamaru Nov 2, 2022 •

edited

Loading

adityamaru commented Nov 2, 2022 •

edited

Loading

erikgrinaker Nov 7, 2022 •

edited

Loading

erikgrinaker Nov 8, 2022 •

edited

Loading