Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: new functions crdb_internal.ranges_in_span and crdb_internal.tenant_ranges_per_table #96011

Closed
wants to merge 1 commit into from

Conversation

THardy98
Copy link

@THardy98 THardy98 commented Jan 26, 2023

Part of: #94332
Part of: #94330

This PR introduces two new SRFs:
crdb_internal.ranges_in_span(start_key, end_key) - returns the set of ranges encompassing this span in the form of (range_id, start_key, end_key)

Example response:

select * from crdb_internal.ranges_in_span('\xf670', '\xf671');
  range_id | start_key | end_key
-----------+-----------+----------
        91 | \xf66f    | \xffff

crdb_internal.tenant_ranges_per_table() - returns the tenant's tables, each table row contains the set of ranges encompassing the table, each row takes the form of (database_name, database_id, table_name, table_id, range_count, range_ids)

Example response:

select * from crdb_internal.tenant_ranges_per_table() where database_name = 'movr';
  database_name | database_id |         table_name         | table_id | range_count |          range_ids
----------------+-------------+----------------------------+----------+-------------+-------------------------------
  movr          |         104 | users                      |      106 |           9 | {76,60,58,56,65,55,59,57,66}
  movr          |         104 | vehicles                   |      107 |           9 | {77,64,63,62,86,61,87,85,75}
  movr          |         104 | rides                      |      108 |           9 | {74,73,72,69,71,67,70,68,88}
  movr          |         104 | vehicle_location_histories |      109 |           1 | {89}
  movr          |         104 | promo_codes                |      110 |           1 | {90}
  movr          |         104 | user_promo_codes           |      111 |           1 | {91}

crdb_internal.ranges_in_span is a QOL improvement, for convienence.
crdb_internal.tenant_ranges_per_table is intended to be used as a method to retrieve a large number of tables' range data quickly, particularly for the database details page (where we surface the range count of each table).

Tested with 100 databases, each with 100 tables, each table having 1 range, crdb_internal.tenant_ranges_per_table is able to retrieve the range ids & range count for the ~10,000 tables in ~1.5s.

select * from crdb_internal.tenant_ranges_per_table() where database_name = 'movr';
  database_name | database_id |         table_name         | table_id | range_count |          range_ids
----------------+-------------+----------------------------+----------+-------------+-------------------------------
...
  movr"100      |         310 | users17                   |    10411 |           1 | {10101}
(10048 rows)

Time: 1.126s total (execution 1.121s / network 0.005s)

Release note(sql change): introduce two SRFs crdb_internal.ranges_in_span and crdb_internal.tenant_ranges_per_table

@THardy98 THardy98 requested a review from a team January 26, 2023 17:11
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@THardy98 THardy98 requested a review from a team January 26, 2023 17:12
@THardy98 THardy98 force-pushed the srfs_table_ranges branch 6 times, most recently from 35e6926 to 3e7c95b Compare January 26, 2023 21:45
Copy link
Contributor

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @THardy98)


pkg/sql/logictest/testdata/logic_test/ranges_in_span line 1 at r3 (raw file):

# LogicTest: local

When new functions are added, we should make sure they are secondary-tenant compatible.

It looks like you are already using rangedesc.Iterator which is good, but # LogicTest: local will prevent the test from being run under a config that uses a secondary tenant (for example the 3node-tenant config).

What happens if you remove this and run it under the 3node-tenant config?

Copy link
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall)


pkg/sql/logictest/testdata/logic_test/ranges_in_span line 1 at r3 (raw file):

Previously, ecwall (Evan Wall) wrote…

When new functions are added, we should make sure they are secondary-tenant compatible.

It looks like you are already using rangedesc.Iterator which is good, but # LogicTest: local will prevent the test from being run under a config that uses a secondary tenant (for example the 3node-tenant config).

What happens if you remove this and run it under the 3node-tenant config?

Looks like a test failure:

RangeIterator failed to seek to /Meta2/Table/3/NULL: rpc error: code = Unauthenticated desc = requested key /Meta2/Table/3/NULL not fully contained in tenant keyspace /Tenant/1{0-1}

Would it be acceptable to swallow this error and continue iterating, effectively skip any iterations we aren't authed for? Or is there a better way to handle this, I'm not familiar with how we handle this.

@THardy98 THardy98 requested review from ecwall and knz January 30, 2023 16:17
Copy link
Contributor

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @THardy98)


pkg/sql/sem/builtins/generator_builtins.go line 2941 at r6 (raw file):

}

func getRangeIteratorWithinSpan(

Can you add GetRangeIterator(context.Context, roachpb.Span) to eval.Planner (see GetRangeDescByID)? Then you don't need to pass in the db from here


pkg/sql/logictest/testdata/logic_test/ranges_in_span line 1 at r3 (raw file):

Previously, THardy98 (Thomas Hardy) wrote…

Looks like a test failure:

RangeIterator failed to seek to /Meta2/Table/3/NULL: rpc error: code = Unauthenticated desc = requested key /Meta2/Table/3/NULL not fully contained in tenant keyspace /Tenant/1{0-1}

Would it be acceptable to swallow this error and continue iterating, effectively skip any iterations we aren't authed for? Or is there a better way to handle this, I'm not familiar with how we handle this.

That means you are trying to access keys outside of the tenant's keyspace which is /Tenant/1{0-1} meaning [/Tenant/10, /Tenant/11). Can you use a span like '/Tenant/10', '/Tenant/11'instead (you have to figure out the bytes for this)?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction this is going but there are two main design questions remaining open:

  • how do you intend this to behave when a single large table contains thousands of ranges?
  • there's memory accounting missing in this approach

Reviewed 1 of 6 files at r1, 1 of 5 files at r3, 1 of 1 files at r5, 5 of 5 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @THardy98)


-- commits line 9 at r6:
Can you make this more granular, so that it takes as argument which tables to retrieve ranges for. If there's 100 databases and you inspect just 1, you want this SRF to produce only the data for that db.


pkg/sql/sem/builtins/generator_builtins.go line 3032 at r6 (raw file):

// Start implements the tree.ValueGenerator interface.
func (trpti *tenantRangesPerTableIterator) Start(ctx context.Context, txn *kv.Txn) error {
	rangeIter, err := getRangeIteratorWithinSpan(ctx, txn.DB(), trpti.span)

This method would be better served by a call into a function implemented in pkg/sql off (*planner) that inspects the descriptor collection and retrieves descriptors one by one. As implemented here, it still retrieves all descriptors in RAM (via the vtable) before it starts iterating.


pkg/sql/sem/builtins/generator_builtins.go line 3085 at r6 (raw file):

		End:   interval.Comparable(tableStartKey.PrefixEnd()),
	}
	tableOverlappingRanges := trpti.rangeIntervalTree.Get(tableRange)

Wowzer, what do you intend to do here if there are 10000 ranges in 1 table? E.g. a large table?

I think this code would be better served by a range iterator in the mapping from table -> range, with 1 result row per range.

Copy link
Author

@THardy98 THardy98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @knz)


pkg/sql/logictest/testdata/logic_test/ranges_in_span line 1 at r3 (raw file):

Previously, ecwall (Evan Wall) wrote…

That means you are trying to access keys outside of the tenant's keyspace which is /Tenant/1{0-1} meaning [/Tenant/10, /Tenant/11). Can you use a span like '/Tenant/10', '/Tenant/11'instead (you have to figure out the bytes for this)?

I'm confused as to why this happens though, particularly for tenant_ranges_per_table where we explicitly pass in the tenant's span. Why would we be accessing keys outside the tenant's keyspace?

Copy link
Contributor

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @THardy98)


pkg/sql/logictest/testdata/logic_test/ranges_in_span line 1 at r3 (raw file):

Previously, THardy98 (Thomas Hardy) wrote…

I'm confused as to why this happens though, particularly for tenant_ranges_per_table where we explicitly pass in the tenant's span. Why would we be accessing keys outside the tenant's keyspace?

Yeah /Meta2 is not inside the tenant's keyspace so I am not sure why that is being read. What is the stack trace when this error happens?

…tenant_ranges_per_table`

Part of: cockroachdb#94332
Part of: cockroachdb#94330

This PR introduces two new SRFs:
`crdb_internal.ranges_in_span(start_key, end_key)` - returns the set of ranges encompassing this span in the form of `(range_id, start_key, end_key)`
`crdb_internal.tenant_ranges_per_table()` - returns the tenant's tables, each table row contains the set of ranges encompassing the table, each row takes the form of `(database_name, database_id, table_name, table_id, range_count, range_ids)`

The former SRF is a QOL improvement.
The latter SRF is intended to be used as a method to retrieve a large number of tables' range data quickly, particularly for the database details page (where we surface the range count of each table).

The latter SRF is able to the range ids & range count for ~10,000 tables in ~1.5s.

Release note(sql change): introduce two SRFs `crdb_internal.ranges_in_span` and `crdb_internal.tenant_ranges_per_table`
THardy98 pushed a commit to THardy98/cockroach that referenced this pull request Feb 22, 2023
Part of: cockroachdb#96011

This PR introduces a new SRF: `crdb_internal.ranges_in_span(start_key,
end_key)` - returns the set of ranges encompassing this span in the form
of `(range_id, start_key, end_key)`

Release note (sql change): introduce new builtin function
`crdb_internal.ranges_in_span(start_key, end_key), returns the set of
ranges encompassing the given start and end key.
@THardy98
Copy link
Author

THardy98 commented Feb 23, 2023

Closing this PR in favour of opening a new PR using the new SpanStats RPC implementation (see #96223) to surface table span statistics.

Appreciate the time spent on reviews for this PR.

I've separated builtins into their own PRs here for your review:

@THardy98 THardy98 closed this Feb 23, 2023
THardy98 pushed a commit to THardy98/cockroach that referenced this pull request Feb 24, 2023
Part of: cockroachdb#96011

This PR introduces a new SRF: `crdb_internal.ranges_in_span(start_key,
end_key)` - returns the set of ranges encompassing this span in the form
of `(range_id, start_key, end_key)`

Release note (sql change): introduce new builtin function
`crdb_internal.ranges_in_span(start_key, end_key), returns the set of
ranges encompassing the given start and end key.
THardy98 pushed a commit to THardy98/cockroach that referenced this pull request Feb 28, 2023
Part of: cockroachdb#96011

This PR introduces a new SRF: `crdb_internal.ranges_in_span(start_key,
end_key)` - returns the set of ranges encompassing this span in the form
of `(range_id, start_key, end_key)`

Release note (sql change): introduce new builtin function
`crdb_internal.ranges_in_span(start_key, end_key), returns the set of
ranges encompassing the given start and end key.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants