Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtins: add timestamp to crdb_internal.scan/list_sql_keys_in_range #90956

Merged
merged 1 commit into from
Nov 1, 2022

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Oct 30, 2022

This change adds a third return column to the builtins
that corresponds to the timestamp at which the value for
a particular key was written.

Release note (sql change): crdb_internal.scan and
crdb_internal.list_sql_keys_in_range return
the timestamp at which the value for a key was written, in
addition to the raw key and value.

@adityamaru adityamaru requested review from stevendanna and a team October 30, 2022 15:03
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to alter the rangeKeyIterator code. I think you probably need to change this:

// Values implements the tree.ValueGenerator interface.
func (rk *rangeKeyIterator) Values() (tree.Datums, error) {
	kv := rk.kvs[rk.index]
	rk.buf[0] = tree.NewDString(kv.Key.String())
	rk.buf[1] = tree.NewDString(kv.Value.PrettyPrint())
	return rk.buf[:], nil
}

to make sure it only returns the first two elements of the buffer. (or change the return type of that function as well.

This change adds a third return column to the builtins
that corresponds to the timestamp at which the value for
a particular key was written.

Release note (sql change): `crdb_internal.scan` and
`crdb_internal.list_sql_keys_in_range` return
the timestamp at which the value for a key was written, in
addition to the raw key and value.
@adityamaru
Copy link
Contributor Author

You probably need to alter the rangeKeyIterator code. I think you probably need to change this:

heh, you're too quick! Decided to add the ts to that builtin too, let me know if you'd rather I go the other way.

@adityamaru adityamaru changed the title builtins: add timestamp to crdb_internal.scan builtins: add timestamp to crdb_internal.scan/list_sql_keys_in_range Oct 30, 2022
Copy link
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the uses of sql_keys_in_range in Slack and didn't see anyone using it in a way that would be seriously impacted by the new column, so I think adding this new column there is OK by me.

Left one question re the return type that I'll leave to your judgement.

@@ -2065,13 +2065,13 @@ const spanKeyIteratorChunkBytes = 8 << 20 // 8MiB
var rangeKeyIteratorType = types.MakeLabeledTuple(
// TODO(rohany): These could be bytes if we don't want to display the
// prettified versions of the key and value.
[]*types.T{types.String, types.String},
[]string{"key", "value"},
[]*types.T{types.String, types.String, types.String},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder: Should we make this new column types.Decimal? That is the type of crdb_internal_mvcc_timestamp column is. I could go either way..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://github.com/cockroachdb/cockroach/pull/90848/files#diff-ed2b0d994e60d34e9f853343fd43ee6e14ca234027786774a143e5b73b5a2698R111 im currently hashing the hlc.Timestamp.String() representation of an MVCC timestamp. As discussed offline I wanted uniformity between the output of this builtin and what we hash so that in the future we can write statements such as:

WITH sfp AS (SELECT xor_agg(fnv64(key, ts::BYTES, value)) AS scanfp from crdb_internal.scan(crdb_internal.table_span('t'::regclass::int)))
  SELECT count(*) FROM crdb_internal.fingerprint(crdb_internal.table_span('t'::regclass::int),'$before_insert'::TIMESTAMPTZ, false) AS efp, sfp WHERE efp = sfp.scanfp;

to test that our fingerprinting is working as expected. We could go either way here:

  1. hash the decimal representation of the timestamp and change scan to return a decimal instead
  2. hash the string representation of the timestamp and keep hashing behaviour what it is at the moment

In the interest of moving things ahead, I'll leave this as a string for now but while reviewing #90848 if we decide decimal is a better representation in both places, I'll make the switch there.

@adityamaru
Copy link
Contributor Author

TFTR!

bors r=stevendanna

@craig
Copy link
Contributor

craig bot commented Nov 1, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants