Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: add MVCCExportFingerprint method and fingerprintWriter #90848

Merged
merged 2 commits into from
Nov 10, 2022

Conversation

adityamaru
Copy link
Contributor

This change introduces a fingerprintWriter that
hashes every key/timestamp and value for point keys, and
combines their hashes via a XOR into a running aggregate.
Range keys are not fingerprinted but instead written to a pebble SST that is
returned to the caller. This is because range keys do not have a stable,
discrete identity and so it is up to the caller to define a deterministic
fingerprinting scheme across all returned range keys.

The fingerprintWriter is used by MVCCExportFingerprint that
exports a fingerprint for point keys in the keyrange
[StartKey, EndKey) over the interval (StartTS, EndTS].
The export logic used by MVCCExportFingerprint is the same
that drives MVCCExportToSST. The former writes to a fingerprintWriter
while the latter writes to an sstWriter. Currently, this
method only support using an fnv64 hasher to fingerprint each KV.

This change does not wire MVCCExportFingerprint to ExportRequest
command evaluation. This will be done as a followup.

Informs: #89336

Release note: None

@adityamaru adityamaru requested review from a team as code owners October 28, 2022 16:32
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Oct 31, 2022

Just a heads-up that I'm still sick, so PR reviews are on a best-effort basis, but will try to get to this soon.

@adityamaru
Copy link
Contributor Author

Just a heads-up that I'm still sick, so PR reviews are on a best-effort basis

No problem! no real rush here.

adityamaru added a commit to adityamaru/cockroach that referenced this pull request Nov 2, 2022
This change adds a `crdb_internal.fingerprint` builtin
that accepts a `startTime`, `endTime`, `startKey` and `endKey`
to define the interval the user wants to fingerprint. The builtin
is powered by sending an ExportRequest with the defined intervals
but with the `ExportFingerprint` option set to true.

Setting this option on the ExportRequest means that instead of
writing all point and rangekeys to an SST and sending them back to
the client, command evaluation will use the newly introduced
`fingerprintWriter` (cockroachdb#90848) when exporting keys. This writer
computes an `fnv64` hash of the key/timestamp, value for each point key
and maintains a running XOR aggregate of all the point keys processed
as part of the ExportRequest. Rangekeys are not fingerprinted during
command evaluation, but instead returned to the client in a
pebble SST. This is because range keys do not have a stable,
discrete identity and so it is up to the caller to define a deterministic
ingerprinting scheme across all returned range keys.

The ExportRequest sent as part of this builtin does not set any DistSender
limit, thereby allowing concurrent execution across ranges. We are not
concerned about the ExportResponses growing too large since the SSTs
will only contain rangekeys that should be few in number. If this assumption
is proved incorrect in the future, we can revisit setting a `TargetBytes`
to the header of the BatchRequest.

Fixes: cockroachdb#89336

Release note (sql change): introduces a `crdb_internal.fingerprint` builtin
that can be used to generate a `fnv64` fingerprint of keys (and optionally their revisions)
in a given key/time interval.
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like good incremental progress.

What are we thinking about regarding pagination and return size accounting? Right now, it looks like if we set TargetSize > 0 then the fingerprinting will return based on a size calculation that includes point keys that aren't actually be returned. Perhaps this might be what we want anyway just as a crude way to give admission control a change to slow down our scan of the data.

defer f.hasher.Reset()

// Hash the key/timestamp and value of the RawMVCC.
_, err := f.hasher.Write(key.Key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can happen in a follow-up, but I think for the purpose of tenant fingerprinting, we'll eventually want a way to specify whether the tenant prefix of the key and the checksum of the value are included in the fingerprint.

Copy link
Contributor Author

@adityamaru adityamaru Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point, I hadn't thought about this. Let me file an issue for this so I don't forget - #91150

pkg/storage/fingerprint_writer.go Outdated Show resolved Hide resolved

// ApplyBatchRepr implements the Writer interface.
func (f *fingerprintWriter) ApplyBatchRepr(repr []byte, sync bool) error {
panic("unimplemented")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way, but in production I'm not sure the fingerprinting would ever be worth panic'ing the node for. Perhaps this could be an assertion error?

Also, given how much of this interface is unimplemented, it kinda makes me wonder if we want a smaller interface that we could use as the argument to MVCCExportToSST. Perhaps called ExportWriter or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I like the idea of introducing a trimmed down ExportWriter interface. fwiw I don't think this would be a situation where we see an unexpected panic because these methods are called by us in mvccExportToWriter and we only use 3 of the many at the moment. It is possible someone forgets to test a change with the fingerprint option set on the ExportRequest so I am onboard with introducing a trimmed down interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added an ExportWriter interface that is passed into mvccExportToWriter so that we can avoid all those stub methods in the fingerprint writer.

@adityamaru
Copy link
Contributor Author

adityamaru commented Nov 2, 2022

What are we thinking about regarding pagination and return size accounting?

The plan is to actually not set any distsender limits, to begin with, which should mean that there is no size based pagination. This allows for range requests to be sent and processed concurrently by the dist sender. I could be convinced that this is not ideal but the assumption we're working under is that rangekeys are few and small in size, and so we never run the risk of our SSTs becoming too large in the ExportResponse even with 100k+ ranges.

Perhaps this might be what we want anyway just as a crude way to give admission control a change to slow down our scan of the data.

Are you referring to the ElasticCPULimiter here? I don't think thats tied to our TargetBytes?

@stevendanna
Copy link
Collaborator

The plan is to actually not set any distsender limits, to begin with, which should mean that there is no size based pagination. This allows for range requests to be sent and processed concurrently by the dist sender. I could be convinced that this is not ideal but the assumption we're working under is that rangekeys are few and small in size, and so we never run the risk of our SSTs becoming too large in the ExportResponse even with 100k+ ranges.

Works for me.

Are you referring to the ElasticCPULimiter here? I don't think thats tied to our TargetBytes?

No, I was just thinking about the admission queue in general. The fact that the size limit + pagination would force us to periodically stop working and send another request which then needs to be re-admitted (I think?) rather than having one request that is allowed to scan for as long as it pleases. But I'll admit to not knowing much about how admission control works currently. This was just speculation really.

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(drive-by)
Even if the sst is small because only rangekeys are being written to it, we shouldn't be doing large scans without any CPU resource constraint. That is, this ought to integrate with the elastic cpu limiting, which means it needs to have the ability to resume. I assume we already have that as part of of the refactoring below that reuses the mvccExportToWriter for this fingerprinting and regular export.
One would need to plumb the ElasticCPUWorkHandle by changing the code in

if ba.IsSingleExportRequest() {

@irfansharif

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dt, @erikgrinaker, and @stevendanna)

Copy link
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code now at least looks like it's already integrated with elastic CPU control (Aditya and I talked in person BTW -- it was part of why they're re-using the ExportRequest type instead of a special-cased finger-print only request). I assume the requests we're generating are batches with a single export request, so the conditional above evaluates to true?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @dt, @erikgrinaker, and @stevendanna)


pkg/storage/mvcc.go line 5783 at r6 (raw file):

	ctx context.Context, cs *cluster.Settings, reader Reader, opts MVCCExportOptions, dest io.Writer,
) (roachpb.BulkOpSummary, MVCCKey, uint64, error) {
	ctx, span := tracing.ChildSpan(ctx, "storage.MVCCExportToSST")

"storage.MVCCExportFingerprint" as the tag name.

@adityamaru
Copy link
Contributor Author

I assume the requests we're generating are batches with a single export request, so the conditional above evaluates to true?

Yup, a follow-up PR that will teach ExportRequest to use the fingerprint writer is expected to send batches with a single export request so the conditional should be true. I expect the CPU-based pagination to work exactly as it does for backups. Happy to write a test in that follow-up PR where it's all hooked up.

@adityamaru
Copy link
Contributor Author

friendly ping @erikgrinaker @stevendanna

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping

Yep, sorry for the delay.

pkg/storage/mvcc.go Outdated Show resolved Hide resolved
pkg/storage/fingerprint_writer.go Outdated Show resolved Hide resolved
if err := f.hash(key.Key); err != nil {
return err
}
if err := f.hash([]byte(key.Timestamp.String())); err != nil {
Copy link
Contributor

@erikgrinaker erikgrinaker Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem great, since the string representation isn't guaranteed to be stable nor lossless. How about hashing the encoded MVCC timestamp instead? We can use e.g. encodeMVCCTimestampToBuf() to avoid allocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, changed to encoding the buf instead. In the follow up that introduces the builtin that actually uses the fingerprint mode I was writing queries like:

WITH sfp AS (SELECT xor_agg(fnv64(key, ts::BYTES, value)) AS scanfp from crdb_internal.scan(crdb_internal.table_span('t'::regclass::int)))
  SELECT count(*) FROM crdb_internal.fingerprint(crdb_internal.table_span('t'::regclass::int),'$before_insert'::TIMESTAMPTZ, false) AS efp, sfp WHERE efp = sfp.scanfp;`

to validate that a scan and an exportrequest returned the same fingerprint. This won't work out of the box cause crdb_internal.scan returns the String() version of the timestamp. Maybe we can add an overload to crdb_internal.scan that returns the encoded MVCC timestamp instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's annoying. Wonder why it doesn't return a proper timestamp type. Might as well just parse it back to a timestamp in the test, I suppose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I thought encodeMVCCTimestampToBuf would resize the buffer, but I see that it doesn't because of how it's used in the key encoding functions. This code is very performance-sensitive, so I can't add in additional size checks in the hot path. But here's a patch which adds EncodeMVCCTimestampToBuf that does resize the buffer (the name duplication isn't great, but at least it's consistent with EncodeMVCCKeyToBuf / encodeMVCCKeyToBuf).

encodets.patch.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the patch, added it in.

// This test uses a `fingerprintOracle` to verify that the fingerprint generated
// by `MVCCExportFingerprint` is what we would get if we iterated over an SST
// with all keys and computed our own fingerprint.
func TestMVCCExportFingerprint(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some test cases to TestMVCCHistories as well? We already have comprehensive export tests there in mvcc_histories/export, including an example dataset (but let's use a separate file for fingerprint tests).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Didn't know this test existed.

I added the same tests as in mvcc_histories/export but instead of printing point keys I validate the fingerprint against an oracle. Apologies if this file is hard to review but I ran a diff with some sed between mvcc_histories/export and mvcc_histories/export_fingerprint and the rangekeys outputted in both are identical as expected.

Copy link
Contributor

@erikgrinaker erikgrinaker Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! These should output the actual fingerprint as well, to assert the stability of the fingerprint. I don't think it's necessary to output pointkeys_fingerprint_verified, since the test will fail if it doesn't match so it doesn't add anything, but no strong opinion.

I have a slight preference for adding a fingerprint option to the existing export command rather than adding a new command. Essentially, if e.hasArg("fingerprint") { ... }. In particular because the current approach has a bug: anexport command following an export_fingerprint will generate a fingerprint rather than an export. Either way, the command list comment at the top of the file needs to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, changed to what you recommended.

pkg/storage/fingerprint_writer.go Outdated Show resolved Hide resolved
@craig
Copy link
Contributor

craig bot commented Nov 9, 2022

This PR was included in a batch that was canceled, it will be automatically retried

@craig
Copy link
Contributor

craig bot commented Nov 10, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Nov 10, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Nov 10, 2022

Build failed:

This is a refactor only change that pulls out the
logic in `MVCCExportToSST` into `mvccExportToWriter` that
accepts a `storage.Writer` interface. This will allow us
to pass in a `FingerprintWriter` in a future commit.

Informs: cockroachdb#89336

Release note: None
This change introduces a fingerprintWriter that
hashes every key/timestamp and value for point keys, and
combines their hashes via a XOR into a running aggregate.
Range keys are not fingerprinted but instead written to a pebble SST that is
returned to the caller. This is because range keys do not have a stable,
discrete identity and so it is up to the caller to define a deterministic
fingerprinting scheme across all returned range keys.

The fingerprintWriter is used by `MVCCExportFingerprint` that
exports a fingerprint for point keys in the keyrange
[StartKey, EndKey) over the interval (StartTS, EndTS].
The export logic used by `MVCCExportFingerprint` is the same
that drives `MVCCExportToSST`. The former writes to a fingerprintWriter
while the latter writes to an sstWriter. Currently, this
method only support using an `fnv64` hasher to fingerprint each KV.

This change does not wire `MVCCExportFingerprint` to ExportRequest
command evaluation. This will be done as a followup.

Informs: cockroachdb#89336

Release note: None
@adityamaru
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Nov 10, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Nov 10, 2022

Build succeeded:

@craig craig bot merged commit 58e91e2 into cockroachdb:master Nov 10, 2022
@adityamaru adityamaru deleted the export-hasher branch November 11, 2022 20:00
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Nov 17, 2022
This change adds a `crdb_internal.fingerprint` builtin
that accepts a `startTime`, `endTime`, `startKey` and `endKey`
to define the interval the user wants to fingerprint. The builtin
is powered by sending an ExportRequest with the defined intervals
but with the `ExportFingerprint` option set to true.

Setting this option on the ExportRequest means that instead of
writing all point and rangekeys to an SST and sending them back to
the client, command evaluation will use the newly introduced
`fingerprintWriter` (cockroachdb#90848) when exporting keys. This writer
computes an `fnv64` hash of the key/timestamp, value for each point key
and maintains a running XOR aggregate of all the point keys processed
as part of the ExportRequest. Rangekeys are not fingerprinted during
command evaluation, but instead returned to the client in a
pebble SST. This is because range keys do not have a stable,
discrete identity and so it is up to the caller to define a deterministic
ingerprinting scheme across all returned range keys.

The ExportRequest sent as part of this builtin does not set any DistSender
limit, thereby allowing concurrent execution across ranges. We are not
concerned about the ExportResponses growing too large since the SSTs
will only contain rangekeys that should be few in number. If this assumption
is proved incorrect in the future, we can revisit setting a `TargetBytes`
to the header of the BatchRequest.

Fixes: cockroachdb#89336

Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Nov 23, 2022
This change adds a `crdb_internal.fingerprint` builtin
that accepts a `startTime`, `endTime`, `startKey` and `endKey`
to define the interval the user wants to fingerprint. The builtin
is powered by sending an ExportRequest with the defined intervals
but with the `ExportFingerprint` option set to true.

Setting this option on the ExportRequest means that instead of
writing all point and rangekeys to an SST and sending them back to
the client, command evaluation will use the newly introduced
`fingerprintWriter` (cockroachdb#90848) when exporting keys. This writer
computes an `fnv64` hash of the key/timestamp, value for each point key
and maintains a running XOR aggregate of all the point keys processed
as part of the ExportRequest. Rangekeys are not fingerprinted during
command evaluation, but instead returned to the client in a
pebble SST. This is because range keys do not have a stable,
discrete identity and so it is up to the caller to define a deterministic
ingerprinting scheme across all returned range keys.

The ExportRequest sent as part of this builtin does not set any DistSender
limit, thereby allowing concurrent execution across ranges. We are not
concerned about the ExportResponses growing too large since the SSTs
will only contain rangekeys that should be few in number. If this assumption
is proved incorrect in the future, we can revisit setting a `TargetBytes`
to the header of the BatchRequest.

Fixes: cockroachdb#89336

Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this pull request Nov 24, 2022
This change adds a `crdb_internal.fingerprint` builtin
that accepts a `startTime`, `endTime`, `startKey` and `endKey`
to define the interval the user wants to fingerprint. The builtin
is powered by sending an ExportRequest with the defined intervals
but with the `ExportFingerprint` option set to true.

Setting this option on the ExportRequest means that instead of
writing all point and rangekeys to an SST and sending them back to
the client, command evaluation will use the newly introduced
`fingerprintWriter` (cockroachdb#90848) when exporting keys. This writer
computes an `fnv64` hash of the key/timestamp, value for each point key
and maintains a running XOR aggregate of all the point keys processed
as part of the ExportRequest. Rangekeys are not fingerprinted during
command evaluation, but instead returned to the client in a
pebble SST. This is because range keys do not have a stable,
discrete identity and so it is up to the caller to define a deterministic
ingerprinting scheme across all returned range keys.

The ExportRequest sent as part of this builtin does not set any DistSender
limit, thereby allowing concurrent execution across ranges. We are not
concerned about the ExportResponses growing too large since the SSTs
will only contain rangekeys that should be few in number. If this assumption
is proved incorrect in the future, we can revisit setting a `TargetBytes`
to the header of the BatchRequest.

Fixes: cockroachdb#89336

Release note: None
craig bot pushed a commit that referenced this pull request Nov 24, 2022
91124: builtins: add crdb_internal.fingerprint builtin r=adityamaru a=adityamaru

This change adds a `crdb_internal.fingerprint` builtin
that accepts a `startTime`, `endTime`, `startKey` and `endKey`
to define the interval the user wants to fingerprint. The builtin
is powered by sending an ExportRequest with the defined intervals
but with the `ExportFingerprint` option set to true.

Setting this option on the ExportRequest means that instead of
writing all point and rangekeys to an SST and sending them back to
the client, command evaluation will use the newly introduced
`fingerprintWriter` (#90848) when exporting keys. This writer
computes an `fnv64` hash of the key/timestamp, value for each point key
and maintains a running XOR aggregate of all the point keys processed
as part of the ExportRequest. Rangekeys are not fingerprinted during
command evaluation, but instead returned to the client in a
pebble SST. This is because range keys do not have a stable,
discrete identity and so it is up to the caller to define a deterministic
ingerprinting scheme across all returned range keys.

The ExportRequest sent as part of this builtin does not set any DistSender
limit, thereby allowing concurrent execution across ranges. We are not
concerned about the ExportResponses growing too large since the SSTs
will only contain rangekeys that should be few in number. If this assumption
is proved incorrect in the future, we can revisit setting a `TargetBytes`
to the header of the BatchRequest.

Fixes: #89336

Release note: None

Co-authored-by: adityamaru <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants