-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: enable data_distribution
API for secondary tenants.
#128578
server: enable data_distribution
API for secondary tenants.
#128578
Conversation
Background: I wanted to include deleted tables in `crdb_internal.table_spans` and add a new column `is_deleted`. However, to achieve this, I needed to add a boolean to `forEachTableDesc` and all the invoked functions so that `forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them. The argument count for these functions was already excessive, so I made the following changes to improve readability and maintainability. Details: Previously, there were around four different overloads of functions: - forEachTableDesc - forEachTableDescAll - forEachTableDescWithTableLookup - forEachTableDescAllWithTableLookup - forEachTableDescWithTableLookupInternal This added excessive nesting and reduced readability. In many cases, functions with the prefix `TableLookup` were used even though they no longer utilized `tableLookupFn`. In these proposed changes: - I aim to improve readability by merging these functions. - To avoid repeatedly declaring the same definition, I moved all arguments to a new struct. This way, we only define what we need. ```diff - func(ctx context.Context, db catalog.DatabaseDescriptor, - sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error { + func(ctx context.Context, descCtx tableDescContext) error { + sc, table := descCtx.schema, descCtx.table ``` Now, we only have `forEachTableDesc` to iterate over table descriptors. Informs: cockroachdb#128578, cockroachdb#97942 Epic: CRDB-12100 Release note: None
data_distribution
API for secondary tenants.
|
Background: I wanted to include deleted tables in `crdb_internal.table_spans` and add a new column `is_deleted`. However, to achieve this, I needed to add a boolean to `forEachTableDesc` and all the invoked functions so that `forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them. The argument count for these functions was already excessive, so I made the following changes to improve readability and maintainability. Details: Previously, there were around four different overloads of functions: - forEachTableDesc - forEachTableDescAll - forEachTableDescWithTableLookup - forEachTableDescAllWithTableLookup - forEachTableDescWithTableLookupInternal This added excessive nesting and reduced readability. In many cases, functions with the prefix `TableLookup` were used even though they no longer utilized `tableLookupFn`. In these proposed changes: - I aim to improve readability by merging these functions. - To avoid repeatedly declaring the same definition, I moved all arguments to a new struct. This way, we only define what we need. ```diff - func(ctx context.Context, db catalog.DatabaseDescriptor, - sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error { + func(ctx context.Context, descCtx tableDescContext) error { + sc, table := descCtx.schema, descCtx.table ``` Now, we only have `forEachTableDesc` to iterate over table descriptors. Informs: cockroachdb#128578, cockroachdb#97942 Epic: CRDB-12100 Release note: None
128755: sql: refactor to combine `forEachTableDesc*` functions. r=rafiss a=shubhamdhama Background: I wanted to include deleted tables in `crdb_internal.table_spans` and add a new column `is_deleted`. However, to achieve this, I needed to add a boolean to `forEachTableDesc` and all the invoked functions so that `forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them. The argument count for these functions was already excessive, so I made the following changes to improve readability and maintainability. Details: Previously, there were around four different overloads of functions: - forEachTableDesc - forEachTableDescAll - forEachTableDescWithTableLookup - forEachTableDescAllWithTableLookup - forEachTableDescWithTableLookupInternal This added excessive nesting and reduced readability. In many cases, functions with the prefix `TableLookup` were used even though they no longer utilized `tableLookupFn`. In these proposed changes: - I aim to improve readability by merging these functions. - To avoid repeatedly declaring the same definition, I moved all arguments to a new struct. This way, we only define what we need. ```diff - func(ctx context.Context, db catalog.DatabaseDescriptor, - sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error { + func(ctx context.Context, descCtx tableDescContext) error { + sc, table := descCtx.schema, descCtx.table ``` Now, we only have `forEachTableDesc` to iterate over table descriptors. Informs: #128578, #97942 Epic: CRDB-12100 Release note: None Co-authored-by: Shubham Dhama <[email protected]>
This addition is useful for listing table spans that are deleted but not yet garbage collected. Informs: cockroachdb#128578, cockroachdb#128755 Epic: CRDB-12100 Release note (sql change): This adds a new column to `crdb_internal.table_spans` to indicate whether a table is dropped. Rows for dropped tables will be removed once they are garbage collected.
This addition is useful for listing table spans that are deleted but not yet garbage collected. Informs: cockroachdb#128578, cockroachdb#128755 Epic: CRDB-12100 Release note (sql change): This adds a new column to `crdb_internal.table_spans` to indicate whether a table is dropped. Rows for dropped tables will be removed once they are garbage collected.
128788: sql: add `dropped` column to `crdb_internal.table_spans`. r=rafiss a=shubhamdhama This addition is useful for listing table spans that are deleted but not yet garbage collected. Informs: #128578, #128755 Epic: CRDB-12100 Release note (sql change): This adds a new column to `crdb_internal.table_spans` to indicate whether a table is dropped. Rows for dropped tables will be removed once they are garbage collected. 129229: kvserver/rangefeed: rename Send to SendUnbuffered r=nvanbenschoten a=wenyihu6 **kvserver/rangefeed: move helpers to processor_helpers_test** This patch moves the helper functions in processor_test.go to processor_helpers_test.go. Part of: #126560 Release note: none --- **kvserver/rangefeed: rename Send to SendUnbuffered for Stream** This patch renames Send to SendUnbuffered in Stream to distinguish it from SendBuffered future patches are going to introduce. Part of: #129813 Release note: none --- **kvserver/rangefeed: rename SendIsThreadSafe to SendUnbufferedIsThreadSafe for Stream** This patch renames SendIsThreadSafe to SendUnbufferedIsThreadSafe in Stream. Part of: #129813 Release note: none Co-authored-by: Shubham Dhama <[email protected]> Co-authored-by: Wenyi Hu <[email protected]>
d9ee422
to
7bdabc2
Compare
Previously, this endpoint scanned over meta KVs and used their startKey to determine the table they belong to. However, this approach does not work for coalesced ranges. The fix, inspired by the `SHOW CLUSTER RANGES` query, uses the `crdb_internal.table_spans` table and joins it with the `crdb_internal.ranges_no_leases` table. There is still some work left that can be addressed in follow-up PR: - We might want to add a summation of ranges at various levels because the sum of replicas of all tables in a database can be more than the actual number of ranges that belong to that database due to coalescing. Overall, this PR also fixes the broken behavior to a large extent of this endpoint since range coalescing is enabled. Informs: cockroachdb#97942 Fixes: cockroachdb#109501, cockroachdb#106897 Epic: CRDB-12100 Release note: None
7bdabc2
to
0b57de4
Compare
I'm not sure this is necessary, can you say more about why you think a capability is needed? Are you accessing some information that's not already tenant-scoped? Since you're using the |
Yes, I wasn't sure whether we would need it or not, but you are right – we don't. The information we are accessing is through SQL, so we are fine on that front, and since it's an admin endpoint, so we are good at the front too. On an unrelated note, once this PR is merged, I will create an issue for the following:
Essentially, this PR has enabled the endpoint for secondary tenants and improved the result too (i.e., they are not as misleading as before). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just file a small follow up to enable it on the tenant DB Console page? it's disabled here:
cockroach/pkg/ui/workspaces/db-console/src/views/reports/containers/debug/index.tsx
Lines 304 to 309 in 33a791d
<DebugPanelLink | |
name="Data Distribution and Zone Configs" | |
url="#/data-distribution" | |
note="View the distribution of table data across nodes and verify zone configuration." | |
disabled={disable_kv_level_advanced_debug} | |
/> |
However, there are few aspects for which I believe observability team would be better equipped to lead, particularly how should this data distribution UI page should adapt to range coalescing. For instance, a single range might intersect with three tables, meaning all three tables would count that same range. This creates a challenge in accurately displaying the total number of ranges. I suggest adding some more fields to the result protos, but that's an implementation detail.
Yes, we've been discussing what to do about stuff like this as we need to figure out some improvements for the hot ranges page already. You can leave that followup with the obs team.
Reviewed 2 of 3 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @kyle-a-wong and @stevendanna)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some potential cleanups, but I'm OK with those being in follow-up PRs and/or put into someone's backlog.
471e5cf
to
cc17b55
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks for doing that extra cleanup.
Not recommending a change, but just leaving this here as you might find it amusing. The query could be made to return 1 row per table with something like:
WITH tables AS (
SELECT
t.schema_name, t.name AS table_name, t.database_name,
t.table_id, t.drop_time, s.start_key, s.end_key
FROM
"".crdb_internal.tables t
JOIN "".crdb_internal.table_spans s ON t.table_id = s.descriptor_id
WHERE
t.database_name IS NOT NULL
)
SELECT
table_id, table_name, schema_name, database_name, drop_time, array_agg(replica) as replicas
FROM
(SELECT
t.table_id, t.table_name, t.schema_name, t.database_name, t.drop_time, unnest(r.replicas) as replica
FROM
tables t
JOIN "".crdb_internal.ranges_no_leases r ON t.start_key < r.end_key
AND t.end_key > r.start_key
) GROUP BY table_id, table_name, schema_name, database_name, drop_time;
I have no idea how this would perform in a big cluster vs what you have.
Yes, it should make that code more fun. Thanks for playing around with it. |
Thanks everyone for the review ✌️ bors r=dhartunian,stevendanna |
Previously, this endpoint scanned over meta KVs and used their startKey to
determine the table they belong to. However, this approach does not work
for coalesced ranges.
The fix, inspired by the
SHOW CLUSTER RANGES
query, uses thecrdb_internal.table_spans
table and joins it with thecrdb_internal.ranges_no_leases
table.There is still some work left that can be addressed in follow-up PR:
sum of replicas of all tables in a database can be more than the actual
number of ranges that belong to that database due to coalescing.
Overall, this PR also fixes the broken behavior to a large extent of this
endpoint since range coalescing is enabled.
Informs: #97942
Fixes: #109501, #106897
Epic: CRDB-12100
Release note: None