-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: create stripped crdb_internal.fingerprint overload #98759
Conversation
pkg/storage/fingerprint_writer.go
Outdated
_, tenantPrefix, err := keys.DecodeTenantPrefix(startKey) | ||
if err != nil { | ||
return fingerprintWriter{}, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we are decoding this so that we can use StripTenantPrefix below which asserts that the prefix is the same. But what if I want to use this from the system tenant on a span that crosses tenant boundaries. Are we saying that isn't a valid use case. If so, we should include some reasoning for why not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is a good point. Though I wonder if using a codec's StipTenantPrefix
is more performant than DecodeTenantPrefix
. I can run some benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no noticeable perf difference between StripTenantPrefix
vs DecodeTenantPrefix
, according to a macro benchmark (fingerpinting a 15 gb table). So I suppose I shouldn't add this restriction to limit the cmd to a single tenant key span -- i'll need to create a slightly more complicated helper to strip the index prefix then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a bunch key stripping utilities. If you like this approach, I'll some unit tests to the first commit and rebase.
808d613
to
5fc09b4
Compare
unrelated CI flake. RFAL! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some drive-by nits, hope you don't mind 😄
Reviewed 13 of 13 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @msbutler, and @stevendanna)
-- commits
line 2 at r2:
nit: missing Release note: None
.
pkg/keys/sql.go
line 70 at r2 (raw file):
func StripTenantPrefixE(key roachpb.Key) ([]byte, error) { var err error // Strip the tenant prefix
nit: missing period.
pkg/sql/sem/builtins/builtins.go
line 7500 at r3 (raw file):
ReturnType: tree.FixedReturnType(types.Int), Fn: func(ctx context.Context, evalCtx *eval.Context, args tree.Datums) (tree.Datum, error) { skipTimestamp := *args[1].(*tree.DBool)
nit: it would be good to add an explicit length check on args
to be 2 here (and 3 in the other overload) as well as to check the types of each arg (i.e. return an error if args[1]
is not a DBool
).
pkg/sql/sem/builtins/builtins.go
line 7501 at r3 (raw file):
Fn: func(ctx context.Context, evalCtx *eval.Context, args tree.Datums) (tree.Datum, error) { skipTimestamp := *args[1].(*tree.DBool) return fingerprint(ctx, evalCtx, args, hlc.Timestamp{}, false, bool(skipTimestamp))
nit: add inlined comment for what false
stands here for.
pkg/sql/sem/builtins/builtins.go
line 7520 at r3 (raw file):
startTimestamp := hlc.Timestamp{WallTime: startTime.UnixNano()} allRevisions := *args[2].(*tree.DBool) return fingerprint(ctx, evalCtx, args, startTimestamp, bool(allRevisions), false)
nit: ditto for inlined comment for false
.
pkg/sql/sem/builtins/fingerprint_builtins.go
line 37 at r3 (raw file):
ctx context.Context, evalCtx *eval.Context, args tree.Datums,
nit: since we only use args[0]
and we assume it's a DArray
, should we just take in DArray
explicitly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think will be useful. I've left some nitpicks.
On the various Strip functions: I could go either way. They introduce a bit of duplication, but it is nice to be able to operate on a roachpb.Key without specific codec. If you like them, I'd say keep them.
|
||
|
||
run ok | ||
put k=a ts=2 v=a2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: If you want to test the table prefix stripping as well, if you do something like put k=/b ts=2 v=b tenant-prefix=10 init-checksum
you'll get a key that looks like a table key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but I think I've got plenty of coverage for the index prefix stripping stuff in other tests. This unit test specifically exercises the timestamp stripping.
5fc09b4
to
45643a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @stevendanna, and @yuzefovich)
pkg/sql/sem/builtins/builtins.go
line 7500 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: it would be good to add an explicit length check on
args
to be 2 here (and 3 in the other overload) as well as to check the types of each arg (i.e. return an error ifargs[1]
is not aDBool
).
done.
pkg/sql/sem/builtins/builtins.go
line 7501 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: add inlined comment for what
false
stands here for.
done
pkg/sql/sem/builtins/builtins.go
line 7520 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: ditto for inlined comment for
false
.
done
pkg/sql/sem/builtins/fingerprint_builtins.go
line 37 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: since we only use
args[0]
and we assume it's aDArray
, should we just take inDArray
explicitly here?
i refactored this so we parse the span outside fingerprint().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @stevendanna, and @yuzefovich)
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: missing
Release note: None
.
will plan to clean up the commit message once I squash everything!
The codec normally used to decode and encode crdb keys requires a tenant prefix upon creation. For callers that want to manipulate keys across tenants, the per tenant codec provides a poor experience. This patch provides a few helper functions to strip key prefixes without a codec. Informs cockroachdb#98570 Release note: None
This patch adds a variant of crdb_internal.fingerprint(), which creates a "stripped" fingerprint of the target span. Namely, `crdb_internal.fingerpint(span,true)` will return a fingerprint that is agnostic to the mvcc timestamps and the index prefix of the key, and considers only the latest mvcc history of the key span. This cmd should only get used over user table space, i.e. on keys with an index prefix. For example, suppose the user fingerprinted a table at some system time, then backed up and restored it to that same system time. The restored table should have the same fingerprint! crdb_internal.fingerpint(span,false) is equivalent to crdb_internal.fingerpint(span,NULL,LATEST). This fingerprint variant is signicantly more scalable than SHOW EXPERIMENTAL FINGERPRINT, as it uses export requests compared to a simple scan. Fixes cockroachdb#98570 Release note: None
45643a7
to
cf91202
Compare
TFTRs!!! bors r=stevendanna |
Build succeeded: |
This patch adds a variant of crdb_internal.fingerprint(), which creates a "stripped" fingerprint of the target span. Namely,
crdb_internal.fingerping(span,true)
will return a fingerprint that is agnostic to the mvcc timestamps and the index prefix of the key, and considers only the latest mvcc history of the key span.For example, suppose the user fingerprinted a table at some system time, then backed up and restored it to that same system time. The restored table should have the same fingerprint!
This fingerprint variant is signicantly more scalable than SHOW EXPERIMENTAL FINGERPRINT, as it uses export requests compared to a simple scan.
Fixes #98570
Release note: None