-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: add MVCCExportFingerprint method and fingerprintWriter #90848
Conversation
Just a heads-up that I'm still sick, so PR reviews are on a best-effort basis, but will try to get to this soon. |
No problem! no real rush here. |
This change adds a `crdb_internal.fingerprint` builtin that accepts a `startTime`, `endTime`, `startKey` and `endKey` to define the interval the user wants to fingerprint. The builtin is powered by sending an ExportRequest with the defined intervals but with the `ExportFingerprint` option set to true. Setting this option on the ExportRequest means that instead of writing all point and rangekeys to an SST and sending them back to the client, command evaluation will use the newly introduced `fingerprintWriter` (cockroachdb#90848) when exporting keys. This writer computes an `fnv64` hash of the key/timestamp, value for each point key and maintains a running XOR aggregate of all the point keys processed as part of the ExportRequest. Rangekeys are not fingerprinted during command evaluation, but instead returned to the client in a pebble SST. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic ingerprinting scheme across all returned range keys. The ExportRequest sent as part of this builtin does not set any DistSender limit, thereby allowing concurrent execution across ranges. We are not concerned about the ExportResponses growing too large since the SSTs will only contain rangekeys that should be few in number. If this assumption is proved incorrect in the future, we can revisit setting a `TargetBytes` to the header of the BatchRequest. Fixes: cockroachdb#89336 Release note (sql change): introduces a `crdb_internal.fingerprint` builtin that can be used to generate a `fnv64` fingerprint of keys (and optionally their revisions) in a given key/time interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like good incremental progress.
What are we thinking about regarding pagination and return size accounting? Right now, it looks like if we set TargetSize > 0 then the fingerprinting will return based on a size calculation that includes point keys that aren't actually be returned. Perhaps this might be what we want anyway just as a crude way to give admission control a change to slow down our scan of the data.
pkg/storage/fingerprint_writer.go
Outdated
defer f.hasher.Reset() | ||
|
||
// Hash the key/timestamp and value of the RawMVCC. | ||
_, err := f.hasher.Write(key.Key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can happen in a follow-up, but I think for the purpose of tenant fingerprinting, we'll eventually want a way to specify whether the tenant prefix of the key and the checksum of the value are included in the fingerprint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point, I hadn't thought about this. Let me file an issue for this so I don't forget - #91150
pkg/storage/fingerprint_writer.go
Outdated
|
||
// ApplyBatchRepr implements the Writer interface. | ||
func (f *fingerprintWriter) ApplyBatchRepr(repr []byte, sync bool) error { | ||
panic("unimplemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way, but in production I'm not sure the fingerprinting would ever be worth panic'ing the node for. Perhaps this could be an assertion error?
Also, given how much of this interface is unimplemented, it kinda makes me wonder if we want a smaller interface that we could use as the argument to MVCCExportToSST. Perhaps called ExportWriter
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I like the idea of introducing a trimmed down ExportWriter
interface. fwiw I don't think this would be a situation where we see an unexpected panic because these methods are called by us in mvccExportToWriter
and we only use 3 of the many at the moment. It is possible someone forgets to test a change with the fingerprint option set on the ExportRequest so I am onboard with introducing a trimmed down interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added an ExportWriter interface that is passed into mvccExportToWriter
so that we can avoid all those stub methods in the fingerprint writer.
The plan is to actually not set any distsender limits, to begin with, which should mean that there is no size based pagination. This allows for range requests to be sent and processed concurrently by the dist sender. I could be convinced that this is not ideal but the assumption we're working under is that rangekeys are few and small in size, and so we never run the risk of our SSTs becoming too large in the ExportResponse even with 100k+ ranges.
Are you referring to the ElasticCPULimiter here? I don't think thats tied to our |
43ee667
to
0fb467f
Compare
Works for me.
No, I was just thinking about the admission queue in general. The fact that the size limit + pagination would force us to periodically stop working and send another request which then needs to be re-admitted (I think?) rather than having one request that is allowed to scan for as long as it pleases. But I'll admit to not knowing much about how admission control works currently. This was just speculation really. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(drive-by)
Even if the sst is small because only rangekeys are being written to it, we shouldn't be doing large scans without any CPU resource constraint. That is, this ought to integrate with the elastic cpu limiting, which means it needs to have the ability to resume. I assume we already have that as part of of the refactoring below that reuses the mvccExportToWriter
for this fingerprinting and regular export.
One would need to plumb the ElasticCPUWorkHandle
by changing the code in
if ba.IsSingleExportRequest() { |
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, and @stevendanna)
0fb467f
to
dea6f01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code now at least looks like it's already integrated with elastic CPU control (Aditya and I talked in person BTW -- it was part of why they're re-using the ExportRequest type instead of a special-cased finger-print only request). I assume the requests we're generating are batches with a single export request, so the conditional above evaluates to true?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @dt, @erikgrinaker, and @stevendanna)
pkg/storage/mvcc.go
line 5783 at r6 (raw file):
ctx context.Context, cs *cluster.Settings, reader Reader, opts MVCCExportOptions, dest io.Writer, ) (roachpb.BulkOpSummary, MVCCKey, uint64, error) { ctx, span := tracing.ChildSpan(ctx, "storage.MVCCExportToSST")
"storage.MVCCExportFingerprint" as the tag name.
Yup, a follow-up PR that will teach ExportRequest to use the fingerprint writer is expected to send batches with a single export request so the conditional should be true. I expect the CPU-based pagination to work exactly as it does for backups. Happy to write a test in that follow-up PR where it's all hooked up. |
friendly ping @erikgrinaker @stevendanna |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
friendly ping
Yep, sorry for the delay.
pkg/storage/fingerprint_writer.go
Outdated
if err := f.hash(key.Key); err != nil { | ||
return err | ||
} | ||
if err := f.hash([]byte(key.Timestamp.String())); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem great, since the string representation isn't guaranteed to be stable nor lossless. How about hashing the encoded MVCC timestamp instead? We can use e.g. encodeMVCCTimestampToBuf()
to avoid allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, changed to encoding the buf instead. In the follow up that introduces the builtin that actually uses the fingerprint mode I was writing queries like:
WITH sfp AS (SELECT xor_agg(fnv64(key, ts::BYTES, value)) AS scanfp from crdb_internal.scan(crdb_internal.table_span('t'::regclass::int)))
SELECT count(*) FROM crdb_internal.fingerprint(crdb_internal.table_span('t'::regclass::int),'$before_insert'::TIMESTAMPTZ, false) AS efp, sfp WHERE efp = sfp.scanfp;`
to validate that a scan and an exportrequest returned the same fingerprint. This won't work out of the box cause crdb_internal.scan
returns the String()
version of the timestamp. Maybe we can add an overload to crdb_internal.scan
that returns the encoded MVCC timestamp instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's annoying. Wonder why it doesn't return a proper timestamp type. Might as well just parse it back to a timestamp in the test, I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I thought encodeMVCCTimestampToBuf
would resize the buffer, but I see that it doesn't because of how it's used in the key encoding functions. This code is very performance-sensitive, so I can't add in additional size checks in the hot path. But here's a patch which adds EncodeMVCCTimestampToBuf
that does resize the buffer (the name duplication isn't great, but at least it's consistent with EncodeMVCCKeyToBuf
/ encodeMVCCKeyToBuf
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the patch, added it in.
// This test uses a `fingerprintOracle` to verify that the fingerprint generated | ||
// by `MVCCExportFingerprint` is what we would get if we iterated over an SST | ||
// with all keys and computed our own fingerprint. | ||
func TestMVCCExportFingerprint(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some test cases to TestMVCCHistories
as well? We already have comprehensive export tests there in mvcc_histories/export
, including an example dataset (but let's use a separate file for fingerprint tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Didn't know this test existed.
I added the same tests as in mvcc_histories/export
but instead of printing point keys I validate the fingerprint against an oracle. Apologies if this file is hard to review but I ran a diff with some sed between mvcc_histories/export
and mvcc_histories/export_fingerprint
and the rangekeys outputted in both are identical as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! These should output the actual fingerprint as well, to assert the stability of the fingerprint. I don't think it's necessary to output pointkeys_fingerprint_verified
, since the test will fail if it doesn't match so it doesn't add anything, but no strong opinion.
I have a slight preference for adding a fingerprint
option to the existing export
command rather than adding a new command. Essentially, if e.hasArg("fingerprint") { ... }
. In particular because the current approach has a bug: anexport
command following an export_fingerprint
will generate a fingerprint rather than an export. Either way, the command list comment at the top of the file needs to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, changed to what you recommended.
dea6f01
to
7e56760
Compare
This PR was included in a batch that was canceled, it will be automatically retried |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed: |
This is a refactor only change that pulls out the logic in `MVCCExportToSST` into `mvccExportToWriter` that accepts a `storage.Writer` interface. This will allow us to pass in a `FingerprintWriter` in a future commit. Informs: cockroachdb#89336 Release note: None
This change introduces a fingerprintWriter that hashes every key/timestamp and value for point keys, and combines their hashes via a XOR into a running aggregate. Range keys are not fingerprinted but instead written to a pebble SST that is returned to the caller. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic fingerprinting scheme across all returned range keys. The fingerprintWriter is used by `MVCCExportFingerprint` that exports a fingerprint for point keys in the keyrange [StartKey, EndKey) over the interval (StartTS, EndTS]. The export logic used by `MVCCExportFingerprint` is the same that drives `MVCCExportToSST`. The former writes to a fingerprintWriter while the latter writes to an sstWriter. Currently, this method only support using an `fnv64` hasher to fingerprint each KV. This change does not wire `MVCCExportFingerprint` to ExportRequest command evaluation. This will be done as a followup. Informs: cockroachdb#89336 Release note: None
84cdd0b
to
b0c957d
Compare
bors r+ |
Build failed (retrying...): |
Build succeeded: |
This change adds a `crdb_internal.fingerprint` builtin that accepts a `startTime`, `endTime`, `startKey` and `endKey` to define the interval the user wants to fingerprint. The builtin is powered by sending an ExportRequest with the defined intervals but with the `ExportFingerprint` option set to true. Setting this option on the ExportRequest means that instead of writing all point and rangekeys to an SST and sending them back to the client, command evaluation will use the newly introduced `fingerprintWriter` (cockroachdb#90848) when exporting keys. This writer computes an `fnv64` hash of the key/timestamp, value for each point key and maintains a running XOR aggregate of all the point keys processed as part of the ExportRequest. Rangekeys are not fingerprinted during command evaluation, but instead returned to the client in a pebble SST. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic ingerprinting scheme across all returned range keys. The ExportRequest sent as part of this builtin does not set any DistSender limit, thereby allowing concurrent execution across ranges. We are not concerned about the ExportResponses growing too large since the SSTs will only contain rangekeys that should be few in number. If this assumption is proved incorrect in the future, we can revisit setting a `TargetBytes` to the header of the BatchRequest. Fixes: cockroachdb#89336 Release note: None
This change adds a `crdb_internal.fingerprint` builtin that accepts a `startTime`, `endTime`, `startKey` and `endKey` to define the interval the user wants to fingerprint. The builtin is powered by sending an ExportRequest with the defined intervals but with the `ExportFingerprint` option set to true. Setting this option on the ExportRequest means that instead of writing all point and rangekeys to an SST and sending them back to the client, command evaluation will use the newly introduced `fingerprintWriter` (cockroachdb#90848) when exporting keys. This writer computes an `fnv64` hash of the key/timestamp, value for each point key and maintains a running XOR aggregate of all the point keys processed as part of the ExportRequest. Rangekeys are not fingerprinted during command evaluation, but instead returned to the client in a pebble SST. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic ingerprinting scheme across all returned range keys. The ExportRequest sent as part of this builtin does not set any DistSender limit, thereby allowing concurrent execution across ranges. We are not concerned about the ExportResponses growing too large since the SSTs will only contain rangekeys that should be few in number. If this assumption is proved incorrect in the future, we can revisit setting a `TargetBytes` to the header of the BatchRequest. Fixes: cockroachdb#89336 Release note: None
This change adds a `crdb_internal.fingerprint` builtin that accepts a `startTime`, `endTime`, `startKey` and `endKey` to define the interval the user wants to fingerprint. The builtin is powered by sending an ExportRequest with the defined intervals but with the `ExportFingerprint` option set to true. Setting this option on the ExportRequest means that instead of writing all point and rangekeys to an SST and sending them back to the client, command evaluation will use the newly introduced `fingerprintWriter` (cockroachdb#90848) when exporting keys. This writer computes an `fnv64` hash of the key/timestamp, value for each point key and maintains a running XOR aggregate of all the point keys processed as part of the ExportRequest. Rangekeys are not fingerprinted during command evaluation, but instead returned to the client in a pebble SST. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic ingerprinting scheme across all returned range keys. The ExportRequest sent as part of this builtin does not set any DistSender limit, thereby allowing concurrent execution across ranges. We are not concerned about the ExportResponses growing too large since the SSTs will only contain rangekeys that should be few in number. If this assumption is proved incorrect in the future, we can revisit setting a `TargetBytes` to the header of the BatchRequest. Fixes: cockroachdb#89336 Release note: None
91124: builtins: add crdb_internal.fingerprint builtin r=adityamaru a=adityamaru This change adds a `crdb_internal.fingerprint` builtin that accepts a `startTime`, `endTime`, `startKey` and `endKey` to define the interval the user wants to fingerprint. The builtin is powered by sending an ExportRequest with the defined intervals but with the `ExportFingerprint` option set to true. Setting this option on the ExportRequest means that instead of writing all point and rangekeys to an SST and sending them back to the client, command evaluation will use the newly introduced `fingerprintWriter` (#90848) when exporting keys. This writer computes an `fnv64` hash of the key/timestamp, value for each point key and maintains a running XOR aggregate of all the point keys processed as part of the ExportRequest. Rangekeys are not fingerprinted during command evaluation, but instead returned to the client in a pebble SST. This is because range keys do not have a stable, discrete identity and so it is up to the caller to define a deterministic ingerprinting scheme across all returned range keys. The ExportRequest sent as part of this builtin does not set any DistSender limit, thereby allowing concurrent execution across ranges. We are not concerned about the ExportResponses growing too large since the SSTs will only contain rangekeys that should be few in number. If this assumption is proved incorrect in the future, we can revisit setting a `TargetBytes` to the header of the BatchRequest. Fixes: #89336 Release note: None Co-authored-by: adityamaru <[email protected]>
This change introduces a fingerprintWriter that
hashes every key/timestamp and value for point keys, and
combines their hashes via a XOR into a running aggregate.
Range keys are not fingerprinted but instead written to a pebble SST that is
returned to the caller. This is because range keys do not have a stable,
discrete identity and so it is up to the caller to define a deterministic
fingerprinting scheme across all returned range keys.
The fingerprintWriter is used by
MVCCExportFingerprint
thatexports a fingerprint for point keys in the keyrange
[StartKey, EndKey) over the interval (StartTS, EndTS].
The export logic used by
MVCCExportFingerprint
is the samethat drives
MVCCExportToSST
. The former writes to a fingerprintWriterwhile the latter writes to an sstWriter. Currently, this
method only support using an
fnv64
hasher to fingerprint each KV.This change does not wire
MVCCExportFingerprint
to ExportRequestcommand evaluation. This will be done as a followup.
Informs: #89336
Release note: None