-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
builtins: add timestamp to crdb_internal.scan/list_sql_keys_in_range #90956
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably need to alter the rangeKeyIterator code. I think you probably need to change this:
// Values implements the tree.ValueGenerator interface.
func (rk *rangeKeyIterator) Values() (tree.Datums, error) {
kv := rk.kvs[rk.index]
rk.buf[0] = tree.NewDString(kv.Key.String())
rk.buf[1] = tree.NewDString(kv.Value.PrettyPrint())
return rk.buf[:], nil
}
to make sure it only returns the first two elements of the buffer. (or change the return type of that function as well.
This change adds a third return column to the builtins that corresponds to the timestamp at which the value for a particular key was written. Release note (sql change): `crdb_internal.scan` and `crdb_internal.list_sql_keys_in_range` return the timestamp at which the value for a key was written, in addition to the raw key and value.
f3e22e6
to
0af0ea9
Compare
heh, you're too quick! Decided to add the ts to that builtin too, let me know if you'd rather I go the other way. |
crdb_internal.scan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the uses of sql_keys_in_range
in Slack and didn't see anyone using it in a way that would be seriously impacted by the new column, so I think adding this new column there is OK by me.
Left one question re the return type that I'll leave to your judgement.
@@ -2065,13 +2065,13 @@ const spanKeyIteratorChunkBytes = 8 << 20 // 8MiB | |||
var rangeKeyIteratorType = types.MakeLabeledTuple( | |||
// TODO(rohany): These could be bytes if we don't want to display the | |||
// prettified versions of the key and value. | |||
[]*types.T{types.String, types.String}, | |||
[]string{"key", "value"}, | |||
[]*types.T{types.String, types.String, types.String}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder: Should we make this new column types.Decimal? That is the type of crdb_internal_mvcc_timestamp column is. I could go either way..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In https://github.com/cockroachdb/cockroach/pull/90848/files#diff-ed2b0d994e60d34e9f853343fd43ee6e14ca234027786774a143e5b73b5a2698R111 im currently hashing the hlc.Timestamp.String() representation of an MVCC timestamp. As discussed offline I wanted uniformity between the output of this builtin and what we hash so that in the future we can write statements such as:
WITH sfp AS (SELECT xor_agg(fnv64(key, ts::BYTES, value)) AS scanfp from crdb_internal.scan(crdb_internal.table_span('t'::regclass::int)))
SELECT count(*) FROM crdb_internal.fingerprint(crdb_internal.table_span('t'::regclass::int),'$before_insert'::TIMESTAMPTZ, false) AS efp, sfp WHERE efp = sfp.scanfp;
to test that our fingerprinting is working as expected. We could go either way here:
- hash the decimal representation of the timestamp and change scan to return a decimal instead
- hash the string representation of the timestamp and keep hashing behaviour what it is at the moment
In the interest of moving things ahead, I'll leave this as a string for now but while reviewing #90848 if we decide decimal is a better representation in both places, I'll make the switch there.
TFTR! bors r=stevendanna |
Build succeeded: |
This change adds a third return column to the builtins
that corresponds to the timestamp at which the value for
a particular key was written.
Release note (sql change):
crdb_internal.scan
andcrdb_internal.list_sql_keys_in_range
returnthe timestamp at which the value for a key was written, in
addition to the raw key and value.