server: enable `data_distribution` API for secondary tenants. #128578

shubhamdhama · 2024-08-08T12:22:10Z

Previously, this endpoint scanned over meta KVs and used their startKey to
determine the table they belong to. However, this approach does not work
for coalesced ranges.

The fix, inspired by the SHOW CLUSTER RANGES query, uses the
crdb_internal.table_spans table and joins it with the
crdb_internal.ranges_no_leases table.

There is still some work left that can be addressed in follow-up PR:

We might want to add a summation of ranges at various levels because the
sum of replicas of all tables in a database can be more than the actual
number of ranges that belong to that database due to coalescing.

Overall, this PR also fixes the broken behavior to a large extent of this
endpoint since range coalescing is enabled.

Informs: #97942
Fixes: #109501, #106897
Epic: CRDB-12100
Release note: None

cockroach-teamcity · 2024-08-08T12:22:20Z

This change is

Background: I wanted to include deleted tables in `crdb_internal.table_spans` and add a new column `is_deleted`. However, to achieve this, I needed to add a boolean to `forEachTableDesc` and all the invoked functions so that `forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them. The argument count for these functions was already excessive, so I made the following changes to improve readability and maintainability. Details: Previously, there were around four different overloads of functions: - forEachTableDesc - forEachTableDescAll - forEachTableDescWithTableLookup - forEachTableDescAllWithTableLookup - forEachTableDescWithTableLookupInternal This added excessive nesting and reduced readability. In many cases, functions with the prefix `TableLookup` were used even though they no longer utilized `tableLookupFn`. In these proposed changes: - I aim to improve readability by merging these functions. - To avoid repeatedly declaring the same definition, I moved all arguments to a new struct. This way, we only define what we need. ```diff - func(ctx context.Context, db catalog.DatabaseDescriptor, - sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error { + func(ctx context.Context, descCtx tableDescContext) error { + sc, table := descCtx.schema, descCtx.table ``` Now, we only have `forEachTableDesc` to iterate over table descriptors. Informs: cockroachdb#128578, cockroachdb#97942 Epic: CRDB-12100 Release note: None

shubhamdhama · 2024-08-27T06:58:01Z

We might want to put this behind some capability.

Background: I wanted to include deleted tables in `crdb_internal.table_spans` and add a new column `is_deleted`. However, to achieve this, I needed to add a boolean to `forEachTableDesc` and all the invoked functions so that `forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them. The argument count for these functions was already excessive, so I made the following changes to improve readability and maintainability. Details: Previously, there were around four different overloads of functions: - forEachTableDesc - forEachTableDescAll - forEachTableDescWithTableLookup - forEachTableDescAllWithTableLookup - forEachTableDescWithTableLookupInternal This added excessive nesting and reduced readability. In many cases, functions with the prefix `TableLookup` were used even though they no longer utilized `tableLookupFn`. In these proposed changes: - I aim to improve readability by merging these functions. - To avoid repeatedly declaring the same definition, I moved all arguments to a new struct. This way, we only define what we need. ```diff - func(ctx context.Context, db catalog.DatabaseDescriptor, - sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error { + func(ctx context.Context, descCtx tableDescContext) error { + sc, table := descCtx.schema, descCtx.table ``` Now, we only have `forEachTableDesc` to iterate over table descriptors. Informs: cockroachdb#128578, cockroachdb#97942 Epic: CRDB-12100 Release note: None

128755: sql: refactor to combine `forEachTableDesc*` functions. r=rafiss a=shubhamdhama Background: I wanted to include deleted tables in `crdb_internal.table_spans` and add a new column `is_deleted`. However, to achieve this, I needed to add a boolean to `forEachTableDesc` and all the invoked functions so that `forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them. The argument count for these functions was already excessive, so I made the following changes to improve readability and maintainability. Details: Previously, there were around four different overloads of functions: - forEachTableDesc - forEachTableDescAll - forEachTableDescWithTableLookup - forEachTableDescAllWithTableLookup - forEachTableDescWithTableLookupInternal This added excessive nesting and reduced readability. In many cases, functions with the prefix `TableLookup` were used even though they no longer utilized `tableLookupFn`. In these proposed changes: - I aim to improve readability by merging these functions. - To avoid repeatedly declaring the same definition, I moved all arguments to a new struct. This way, we only define what we need. ```diff - func(ctx context.Context, db catalog.DatabaseDescriptor, - sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error { + func(ctx context.Context, descCtx tableDescContext) error { + sc, table := descCtx.schema, descCtx.table ``` Now, we only have `forEachTableDesc` to iterate over table descriptors. Informs: #128578, #97942 Epic: CRDB-12100 Release note: None Co-authored-by: Shubham Dhama <[email protected]>

This addition is useful for listing table spans that are deleted but not yet garbage collected. Informs: cockroachdb#128578, cockroachdb#128755 Epic: CRDB-12100 Release note (sql change): This adds a new column to `crdb_internal.table_spans` to indicate whether a table is dropped. Rows for dropped tables will be removed once they are garbage collected.

128788: sql: add `dropped` column to `crdb_internal.table_spans`. r=rafiss a=shubhamdhama This addition is useful for listing table spans that are deleted but not yet garbage collected. Informs: #128578, #128755 Epic: CRDB-12100 Release note (sql change): This adds a new column to `crdb_internal.table_spans` to indicate whether a table is dropped. Rows for dropped tables will be removed once they are garbage collected. 129229: kvserver/rangefeed: rename Send to SendUnbuffered r=nvanbenschoten a=wenyihu6 **kvserver/rangefeed: move helpers to processor_helpers_test** This patch moves the helper functions in processor_test.go to processor_helpers_test.go. Part of: #126560 Release note: none --- **kvserver/rangefeed: rename Send to SendUnbuffered for Stream** This patch renames Send to SendUnbuffered in Stream to distinguish it from SendBuffered future patches are going to introduce. Part of: #129813 Release note: none --- **kvserver/rangefeed: rename SendIsThreadSafe to SendUnbufferedIsThreadSafe for Stream** This patch renames SendIsThreadSafe to SendUnbufferedIsThreadSafe in Stream. Part of: #129813 Release note: none Co-authored-by: Shubham Dhama <[email protected]> Co-authored-by: Wenyi Hu <[email protected]>

Previously, this endpoint scanned over meta KVs and used their startKey to determine the table they belong to. However, this approach does not work for coalesced ranges. The fix, inspired by the `SHOW CLUSTER RANGES` query, uses the `crdb_internal.table_spans` table and joins it with the `crdb_internal.ranges_no_leases` table. There is still some work left that can be addressed in follow-up PR: - We might want to add a summation of ranges at various levels because the sum of replicas of all tables in a database can be more than the actual number of ranges that belong to that database due to coalescing. Overall, this PR also fixes the broken behavior to a large extent of this endpoint since range coalescing is enabled. Informs: cockroachdb#97942 Fixes: cockroachdb#109501, cockroachdb#106897 Epic: CRDB-12100 Release note: None

dhartunian · 2024-09-09T13:46:36Z

We might want to put this behind some capability.

I'm not sure this is necessary, can you say more about why you think a capability is needed? Are you accessing some information that's not already tenant-scoped? Since you're using the adminServer not the systemAdminServer, the APIs should already account for tenant scoping automatically.

shubhamdhama · 2024-09-09T14:46:48Z

I'm not sure this is necessary, can you say more about why you think a capability is needed? Are you accessing some information that's not already tenant-scoped? Since you're using the adminServer not the systemAdminServer, the APIs should already account for tenant scoping automatically.

Yes, I wasn't sure whether we would need it or not, but you are right – we don't. The information we are accessing is through SQL, so we are fine on that front, and since it's an admin endpoint, so we are good at the front too.

On an unrelated note, once this PR is merged, I will create an issue for the following:

We might want to add a summation of ranges at various levels because the
sum of replicas of all tables in a database can be more than the actual
number of ranges that belong to that database due to coalescing.

Essentially, this PR has enabled the endpoint for secondary tenants and improved the result too (i.e., they are not as misleading as before).
However, there are few aspects for which I believe observability team would be better equipped to lead, particularly how should this data distribution UI page should adapt to range coalescing. For instance, a single range might intersect with three tables, meaning all three tables would count that same range. This creates a challenge in accurately displaying the total number of ranges. I suggest adding some more fields to the result protos, but that's an implementation detail.

dhartunian

can you just file a small follow up to enable it on the tenant DB Console page? it's disabled here:

cockroach/pkg/ui/workspaces/db-console/src/views/reports/containers/debug/index.tsx

Lines 304 to 309 in 33a791d

    
           <DebugPanelLink 
        
             name="Data Distribution and Zone Configs" 
        
             url="#/data-distribution" 
        
             note="View the distribution of table data across nodes and verify zone configuration." 
        
             disabled={disable_kv_level_advanced_debug} 
        
           />

However, there are few aspects for which I believe observability team would be better equipped to lead, particularly how should this data distribution UI page should adapt to range coalescing. For instance, a single range might intersect with three tables, meaning all three tables would count that same range. This creates a challenge in accurately displaying the total number of ranges. I suggest adding some more fields to the result protos, but that's an implementation detail.

Yes, we've been discussing what to do about stuff like this as we need to figure out some improvements for the hot ranges page already. You can leave that followup with the obs team.

Reviewed 2 of 3 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @kyle-a-wong and @stevendanna)

stevendanna

I've left some potential cleanups, but I'm OK with those being in follow-up PRs and/or put into someone's backlog.

pkg/server/admin.go

pkg/server/serverpb/admin.proto

stevendanna

👍 Thanks for doing that extra cleanup.

Not recommending a change, but just leaving this here as you might find it amusing. The query could be made to return 1 row per table with something like:

   WITH tables AS (
        SELECT
            t.schema_name, t.name AS table_name, t.database_name,
            t.table_id, t.drop_time, s.start_key, s.end_key
        FROM
            "".crdb_internal.tables t
            JOIN "".crdb_internal.table_spans s ON t.table_id = s.descriptor_id
        WHERE
            t.database_name IS NOT NULL
   )
    SELECT
        table_id, table_name, schema_name, database_name, drop_time, array_agg(replica) as replicas
    FROM
    (SELECT
        t.table_id, t.table_name, t.schema_name, t.database_name, t.drop_time, unnest(r.replicas) as replica
     FROM
        tables t
        JOIN "".crdb_internal.ranges_no_leases r ON t.start_key < r.end_key
        AND t.end_key > r.start_key
    ) GROUP BY table_id, table_name, schema_name, database_name, drop_time;

I have no idea how this would perform in a big cluster vs what you have.

shubhamdhama · 2024-09-11T17:39:52Z

Yes, it should make that code more fun. Thanks for playing around with it.
This reminds me of a book in my list: https://theartofpostgresql.com "Turn thousands of lines of code into simple queries".

shubhamdhama · 2024-09-11T17:41:54Z

Thanks everyone for the review ✌️

bors r=dhartunian,stevendanna

craig · 2024-09-11T18:34:15Z

Build succeeded:

shubhamdhama requested review from a team as code owners August 8, 2024 12:22

shubhamdhama requested review from kyle-a-wong and removed request for a team August 8, 2024 12:22

shubhamdhama requested review from stevendanna and dhartunian August 8, 2024 15:11

shubhamdhama mentioned this pull request Aug 10, 2024

sql: refactor to combine forEachTableDesc* functions. #128755

Merged

shubhamdhama changed the title ~~server: Enable data_distribution API for secondary tenants.~~ server: enable data_distribution API for secondary tenants. Aug 12, 2024

shubhamdhama mentioned this pull request Aug 12, 2024

sql: add dropped column to crdb_internal.table_spans. #128788

Merged

shubhamdhama force-pushed the data-distribution-enable-for-mt-97942 branch from d9ee422 to 7bdabc2 Compare September 2, 2024 09:52

shubhamdhama force-pushed the data-distribution-enable-for-mt-97942 branch from 7bdabc2 to 0b57de4 Compare September 9, 2024 06:17

shubhamdhama mentioned this pull request Sep 9, 2024

serverccl: TestAdminAPIDataDistributionPartitioning fails on secondary tenants #106897

Closed

dhartunian approved these changes Sep 11, 2024

View reviewed changes

stevendanna approved these changes Sep 11, 2024

View reviewed changes

pkg/server/admin.go Outdated Show resolved Hide resolved

pkg/server/admin.go Outdated Show resolved Hide resolved

stevendanna reviewed Sep 11, 2024

View reviewed changes

pkg/server/serverpb/admin.proto Show resolved Hide resolved

server: remove ZoneConfigID and eleminate tableInfosByTableID.

cc17b55

shubhamdhama force-pushed the data-distribution-enable-for-mt-97942 branch from 471e5cf to cc17b55 Compare September 11, 2024 16:36

stevendanna approved these changes Sep 11, 2024

View reviewed changes

craig bot merged commit ce44508 into cockroachdb:master Sep 11, 2024
22 of 23 checks passed

shubhamdhama linked an issue Sep 12, 2024 that may be closed by this pull request

serverccl: TestAdminAPIDataDistributionPartitioning fails on secondary tenants #106897

Closed

This was referenced Sep 13, 2024

dbconsole,ui: enable "Data Distribution" for secondary tenants #130648

Open

server: make /data-distribution compatible with range coalescing #97942

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: enable `data_distribution` API for secondary tenants. #128578

server: enable `data_distribution` API for secondary tenants. #128578

shubhamdhama commented Aug 8, 2024 •

edited

Loading

cockroach-teamcity commented Aug 8, 2024

shubhamdhama commented Aug 27, 2024

dhartunian commented Sep 9, 2024

shubhamdhama commented Sep 9, 2024

dhartunian left a comment

stevendanna left a comment

stevendanna left a comment

shubhamdhama commented Sep 11, 2024

shubhamdhama commented Sep 11, 2024

craig bot commented Sep 11, 2024

	<DebugPanelLink
	name="Data Distribution and Zone Configs"
	url="#/data-distribution"
	note="View the distribution of table data across nodes and verify zone configuration."
	disabled={disable_kv_level_advanced_debug}
	/>

server: enable data_distribution API for secondary tenants. #128578

server: enable data_distribution API for secondary tenants. #128578

Conversation

shubhamdhama commented Aug 8, 2024 • edited Loading

cockroach-teamcity commented Aug 8, 2024

shubhamdhama commented Aug 27, 2024

dhartunian commented Sep 9, 2024

shubhamdhama commented Sep 9, 2024

dhartunian left a comment

Choose a reason for hiding this comment

stevendanna left a comment

Choose a reason for hiding this comment

stevendanna left a comment

Choose a reason for hiding this comment

shubhamdhama commented Sep 11, 2024

shubhamdhama commented Sep 11, 2024

craig bot commented Sep 11, 2024

server: enable `data_distribution` API for secondary tenants. #128578

server: enable `data_distribution` API for secondary tenants. #128578

shubhamdhama commented Aug 8, 2024 •

edited

Loading