Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: make /data-distribution compatible with range coalescing #97942

Open
dhartunian opened this issue Mar 2, 2023 · 7 comments · Fixed by #97945
Open

server: make /data-distribution compatible with range coalescing #97942

dhartunian opened this issue Mar 2, 2023 · 7 comments · Fixed by #97945
Labels
A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability

Comments

@dhartunian
Copy link
Collaborator

dhartunian commented Mar 2, 2023

Current implementation returns an error.

Epic: CRDB-12100

Jira issue: CRDB-24980

@dhartunian dhartunian added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Mar 2, 2023
dhartunian added a commit to dhartunian/cockroach that referenced this issue Mar 7, 2023
Prevously, the Advanced Debug page on the DB console would show a lot
of options to the tenant that were either unsupported, unimplemented,
or not valid in that context. This change hides those options for now
to create a smoother Console experience for tenants.

The changes made are currently implemented via a server-controlled
feature flag that is enabled by default on tenants, and not on the
system tenant.

Additionally, an alert banner is shown when viewing "Advanced Debug"
as the system tenant with a note mentioning that the system tenant
sees more options available.

The summary of disabled items is as follows:
* Problem ranges: hidden since implementation depends on liveness
  Issue to fix: cockroachdb#97941
* Data distribution: hidden due to error on tenants
  Issue to fix: cockroachdb#97942
* Stores section: not applicable to tenants
* Problem Ranges section: see above
* Raft and Key Visualizer links: not applicable to tenants
* Closed timestamps: not applicable to tenants
* Tracing active operations: ui impl proxies to nodes, needs changes
  Issue to fix: cockroachdb#97943
* Debug requests/events: not applicable to tenants
* Enqueue range: not applicable to tenants
* Node status: not implemented on tenants
* Single node's ranges: not implemented on tenants
* Hot Ranges (legacy): not implemented on tenants
* Single Node Specific: not implemented on tenants
* Cluster wide: not applicable to tenants
* Allocator: not applicable to tenants

Resolves: cockroachdb#80595
Epic: CRDB-12100

Release note: None
craig bot pushed a commit that referenced this issue Mar 9, 2023
97945: ui: disable invalid options for tenant advanced debug r=stevendanna,knz a=dhartunian

Prevously, the Advanced Debug page on the DB console would show a lot of options to the tenant that were either unsupported, unimplemented, or not valid in that context. This change hides those options for now to create a smoother Console experience for tenants.

The changes made are currently implemented via a server-controlled feature flag that is enabled by default on tenants, and not on the system tenant.

The summary of disabled items is as follows:
* Problem ranges: hidden since implementation depends on liveness 
  * Issue to fix: #97941
* Data distribution: hidden due to error on tenants
  * Issue to fix: #97942
* Stores section: not applicable to tenants
* Problem Ranges section: see above
* Raft and Key Visualizer links: not applicable to tenants
* Closed timestamps: not applicable to tenants
* Tracing active operations: ui impl proxies to nodes, needs changes
  * Issue to fix: #97943
* Debug requests/events: not applicable to tenants
* Enqueue range: not applicable to tenants
* Node status: not implemented on tenants
* Single node's ranges: not implemented on tenants
* Hot Ranges (legacy): not implemented on tenants
* Single Node Specific: not implemented on tenants
* Cluster wide: not applicable to tenants
* Allocator: not applicable to tenants

Resolves: #80595
Epic: CRDB-12100

Release note: None

98167: sqlstats: flush stmt and txns in parallel r=xinhaoz a=xinhaoz

When we have a lot of in-memory data, the flush
can take some time to complete. WHy not flush
stmt and txns in parallel, so that we at least
get some data to both system tables while flushing instead of wiating for the stmt flush to complete
to start flushing tot txns, which can delay data
being shown in the app.

Epic: none

Release note: None

Co-authored-by: David Hartunian <[email protected]>
Co-authored-by: Xin Hao Zhang <[email protected]>
@craig craig bot closed this as completed in 027f1a5 Mar 9, 2023
@dhartunian dhartunian reopened this Mar 9, 2023
@irfansharif
Copy link
Contributor

irfansharif commented Mar 16, 2023

We'd have to teach the underlying API to deal with coalesced ranges (#98820), and reflect it in the visualization as well.

@abarganier
Copy link
Contributor

Some notes after discussing with @dhartunian:

  • Alongside this, it would be nice to have a tenant capability that gates range-level debug information (e.g. ability to view advanced debug pages such as problem ranges, /ranges, etc.)
  • /data-distribution can be gated behind the new capability as well (if it hasn't already been tenant scoped)
  • Some range-level pages already exist at a tenant-scoped level, while others show range information across the entire cluster. Identify those that fall into the latter category, and gate behind the new capability.

@abarganier abarganier self-assigned this Mar 27, 2023
craig bot pushed a commit that referenced this issue Mar 30, 2023
98689: workload: jitter the teardown of connections to prevent thundering herd r=sean- a=sean-

workload: jitter the teardown of connections to prevent thundering herd

This change upgrades workload's use of pgx from v4 to v5 in order to allow jittering the teardown of connections.  This change sets a max connection age of 5min and jitters the teardown by 30s.  Upgrading to pgx v5 also adds non-blocking pgxpool connection acquisition.

workload: add flags to manage the age and lifecycle of connection pool

Add flags to all workload types to specify:

* the max connection age: `--max-conn-lifetime duration`
* the max connection age jitter: `--max-conn-lifetime-jitter duration`
* the max connection idle time: `--max-conn-idle-time duration`
* the connection health check interval: `--conn-healthcheck-period duration`
* the min number of connections in the pool: `--min-conns int`

workload: add support for remaining pgx query modes

Add support for `pgx.QueryExecModeCacheDescribe` and `pgx.QueryExecModeDescribeExec`.  Previously, only three of the five query modes were available.

workload: fix race condition when recording histogram data

Release note (cli change): workload jitters teardown of connections to prevent thundering herd impacting P99 latency results.

Release note (cli change): workload utility now has flags to tune the connection pool used for testing.  See `--conn-healthcheck-period`, `--min-conns`, and the `--max-conn-*` flags for details.

Release note (cli change): workload now supports every [PostgreSQL query mode](https://github.com/jackc/pgx/blob/fa5fbed497bc75acee05c1667a8760ce0d634cba/conn.go#L167-L182) available via the underlying pgx driver.

99142: server: fix `DataDistribution` server error when creating a tenant r=zachlite a=zachlite

This is a stop-gap commit that enables the DataDistribution endpoint to handle the parts of the key space belonging to secondary tenants without error.

Despite no error, the result returned for secondary tenants is not correct. The DataDistribution endpoint was written before #79700, and therefore doesn't know that multiple tables can exist within a range. 

Additionally, the results for the system tenant will be incorrect soon because #81008 is in progress.
Improvements are tracked by #97942

Fixes: #97993
Release note: None

99494: changefeedccl: Do not require rangefeed when running initial scan only. r=miretskiy a=miretskiy

Fixes #99470

Release note: None

Co-authored-by: Sean Chittenden <[email protected]>
Co-authored-by: zachlite <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
@abarganier
Copy link
Contributor

Given that this endpoint doesn't take coalesced ranges into account, and tenants' ranges are always coalesced, it doesn't make sense to implement it for secondary tenants.

Instead, we're going to make the endpoint system-tenant-only until KV updates it to take coalesced ranges into account (or nukes it entirely).

craig bot pushed a commit that referenced this issue Apr 7, 2023
99663: sql: update connExecutor logic for pausable portals r=ZhouXing19 a=ZhouXing19

This PR replaces #96358 and is part of the initial implementation of multiple active portals.

----

This PR is to add limited support for multiple active portals. Now portals satisfying all following restrictions can be paused and resumed (i.e., with other queries interleaving it):

1. Not an internal query;
2. Read-only query;
3. No sub-queries or post-queries.

And such a portal will only have the statement executed with a _non-distributed_ plan. 

This feature is gated by a session variable `multiple_active_portals_enabled`. When it's set `true`, all portals that satisfy the restrictions above will automatically become "pausable" when being created via the pgwire `Bind` stmt. 

The core idea of this implementation is 
1. Add a `switchToAnotherPortal` status to the result-consumption state machine. When we receive an `ExecPortal` message for a different portal, we simply return the control to the connExecutor. (#99052)
2. Persist `flow` `queryID` `span` and `instrumentationHelper` for the portal, and reuse it when we re-execute a portal. This is to ensure we _continue_ the fetching rather than starting all over. (#99173)
3. To enable 2, we need to delay the clean-up of resources till we close the portal. For this we introduced the stacks of cleanup functions. (This PR)

Note that we kept the implementation of the original "un-pausable" portal, as we'd like to limit this new functionality only to a small set of statements. Eventually some of them should be replaced (e.g. the limitedCommandResult's lifecycle) with the new code. 

Also, we don't support distributed plan yet, as it involves much more complicated changes. See `Start with an entirely local plan` section in the [design doc](https://docs.google.com/document/d/1SpKTrTqc4AlGWBqBNgmyXfTweUUsrlqIaSkmaXpznA8/edit). Support for this will come as a follow-up.

Epic: CRDB-17622

Release note (sql change): initial support for multiple active portals. Now with session variable `multiple_active_portals_enabled` set to true,  portals satisfying all following restrictions can be executed in an interleaving manner:  1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with an entirely local plan. 





99947: ui: small fixes to DB Console charts shown for secondary tenants r=dhartunian a=abarganier

#97995 updated the
DB Console to filter out KV-specific charts from the metrics page
when viewing DB Console as a secondary application tenant.

The PR missed a couple small details. This patch cleans those
up with the following:

- Removes KV latency charts for app tenants
- Adds a single storage graph for app tenants showing livebytes
- Removes the "Capacity" chart on the Overview dashboard for app
  tenants

Release note: none

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-12100

NB: Please only review the final commit. 1st commit is being reviewed separately @ #99860

100188: changefeedccl: pubsub sink refactor to batching sink r=rickystewart a=samiskin

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-13237

This change is a followup to #99086 which moves the Pubsub sink to the batching sink framework.

The changes involve:
1. Moves the Pubsub code to match the `SinkClient` interface, moving to using the lower level v1 pubsub API that lets us publish batches manually
3. Removing the extra call to json.Marshal
4. Moving to using the `pstest` package for validating results in unit tests
5. Adding topic handling to the batching sink, where batches are created per-topic
6. Added a pubsub_sink_config since it can now handle Retry and Flush config settings
7. Added metrics even to the old pubsub for the purpose of comparing the two versions

At default settings, this resulted in a peak of 90k messages per second on a single node with throughput at 27.6% cpu usage, putting it at a similar level to kafka.

Running pubsub v2 across all of TPCC (nodes ran out of ranges at different speeds):
<img width="637" alt="Screenshot 2023-03-30 at 3 38 25 PM" src="https://user-images.githubusercontent.com/6236424/229863386-edaee27d-9762-4806-bab6-e18b8a6169d6.png">

Running pubsub v1 (barely visible, 2k messages per second) followed by v2 on tpcc.order_line (in v2 only 2 nodes ended up having ranges assigned to them):
<img width="642" alt="Screenshot 2023-04-04 at 12 53 45 PM" src="https://user-images.githubusercontent.com/6236424/229863507-1883ea45-d8ce-437b-9b9c-550afec68752.png">

In the following graphs from the cloud console, where v1 was ran followed by v2, you can see how the main reason v1 was slow was that it wasn't able to batch different keys together.
<img width="574" alt="Screenshot 2023-04-04 at 12 59 51 PM" src="https://user-images.githubusercontent.com/6236424/229864083-758c0814-d53c-447e-84c3-471cf5d56c44.png">

Publish requests remained the same despite way more messages in v2
<img width="1150" alt="Screenshot 2023-04-04 at 1 46 51 PM" src="https://user-images.githubusercontent.com/6236424/229875314-6e07177e-62c4-4c15-b13f-f75e8143e011.png">



Release note (performance improvement): pubsub sink changefeeds can now support higher throughputs by enabling the changefeed.new_pubsub_sink_enabled cluster setting.

100620: pkg/server: move DataDistribution to systemAdminServer r=dhartunian a=abarganier

The DataDistribution endpoint reports replica counts by database and table. When it was built, it operated off the assumption that a range would only ever contain a single table's data within.

Now that we have coalesced ranges, a single range can span multiple tables. Unfortunately, the DataDistribution endpoint does not take this fact into account, meaning it reports garbled and inaccurate data, unless the `spanconfig.storage_coalesce_adjacent.enabled` setting is set to false (see #98820).

For secondary tenants, ranges are *always* coalesced, so this endpoint in its current state could never report meaningful data for a tenant.

Given all of this, we have decided to make this endpoint only available for the system tenant. This patch
accomplishes this by moving the endpoint away from the adminServer and into the systemAdminServer, making it effectively unimplemented for secondary tenants.

Release note: none

Informs: #97942

Co-authored-by: Jane Xing <[email protected]>
Co-authored-by: Alex Barganier <[email protected]>
Co-authored-by: Shiranka Miskin <[email protected]>
@abarganier
Copy link
Contributor

Marking as closed - see #100620 which disables this endpoint for secondary tenants.

Context here: #97942 (comment)

@knz
Copy link
Contributor

knz commented May 8, 2023

The feature will become broken for the system tenant as well in v23.2, because we will enable range coalescing by default.

I am reopening this issue now to decide what to do with this. We have only 2 options:

  • either remove the feature entirely
  • or, make it work with range coalescing

The current approach (ref) which keeps it but informs the user in DB console that "it is broken for coalesced ranges" is simply not acceptable. Tthe feature will break in 23.2 for all clusters if we don't take action.

@knz knz reopened this May 8, 2023
@knz knz changed the title server: implement /data-distribution for tenants server: make /data-distribution compatible with range coalescing May 8, 2023
@shubhamdhama shubhamdhama self-assigned this Aug 6, 2024
@shubhamdhama
Copy link
Contributor

shubhamdhama commented Aug 6, 2024

So, if I understand correctly, the data_distribution endpoint returns the count of range replicas per node for each table (along with some zone config information, which is not broken).

The problem seems to be here:

cockroach/pkg/server/admin.go

Lines 3039 to 3094 in d527c4c

// Get replica counts.
if err := s.db.Txn(ctx, func(txnCtx context.Context, txn *kv.Txn) error {
acct := s.memMonitor.MakeBoundAccount()
defer acct.Close(txnCtx)
kvs, err := kvclient.ScanMetaKVs(ctx, txn, roachpb.Span{
Key: keys.SystemSQLCodec.TablePrefix(keys.MaxReservedDescID + 1),
EndKey: keys.MaxKey,
})
if err != nil {
return err
}
// Group replicas by table and node, accumulate counts.
var rangeDesc roachpb.RangeDescriptor
for _, kv := range kvs {
if err := acct.Grow(txnCtx, int64(len(kv.Key)+len(kv.Value.RawBytes))); err != nil {
return err
}
if err := kv.ValueProto(&rangeDesc); err != nil {
return err
}
// TODO(embrown): Tables can use one codec since they
// seem to all share the same id.
_, tenID, err := keys.DecodeTenantPrefix(rangeDesc.StartKey.AsRawKey())
if err != nil {
return err
}
// A range descriptor for a secondary tenant may not contain
// a table prefix. Often, the start key for a tenant will be just
// the tenant prefix itself, e.g. `/Tenant/2`. Once the tenant prefix
// is stripped inside `DecodeTablePrefix`, nothing (aka `/Min`) is left.
keySansPrefix, _ := keys.MakeSQLCodec(tenID).StripTenantPrefix(rangeDesc.StartKey.AsRawKey())
if keys.MinKey.Equal(keySansPrefix) {
// There's no table prefix to be decoded.
// Try the next descriptor.
continue
}
_, tableID, err := keys.MakeSQLCodec(tenID).DecodeTablePrefix(rangeDesc.StartKey.AsRawKey())
if err != nil {
return err
}
for _, replicaDesc := range rangeDesc.Replicas().Descriptors() {
tableInfo, found := tableInfosByTableID[tableID]
if !found {
// This is a database, skip.
continue
}
tableInfo.ReplicaCountByNodeId[replicaDesc.NodeID]++
}
}
return nil
}); err != nil {
return nil, err
}

We retrieve all the meta KVs and use the start key to determine the table ID. However, for coalesced ranges, we can have data from multiple tables within the same range.

Initially, it seemed like the work would be easy since I could have simply used SHOW CLUSTER RANGES, but that command doesn't show details for a deleted table (that hasn't been garbage collected yet), which is currently supported by this endpoint for uncoalesced ranges.

So I explored the queries that `SHOW CLUSTER...` uses underneath
WITH ranges AS (
    SELECT
        r.range_id,
        r.start_key,
        r.end_key,
        r.replicas,
        r.replica_localities,
        r.voting_replicas,
        r.non_voting_replicas,
        r.learner_replicas,
        r.split_enforced_until,
        s.descriptor_id AS table_id,
        s.start_key AS object_start_key,
        s.end_key AS object_end_key,
        IF (s.start_key < r.start_key,
            r.start_key,
            s.start_key) AS per_range_start_key,
        IF (s.end_key < r.end_key,
            s.end_key,
            r.end_key) AS per_range_end_key
    FROM
        crdb_internal.ranges_no_leases r
        LEFT OUTER JOIN "".crdb_internal.table_spans s ON s.start_key < r.end_key AND s.end_key > r.start_key
),
named_ranges AS (
SELECT
    r.*,
    database_name,
    t.schema_name,
    t.name AS table_name
FROM
    ranges r
    LEFT OUTER JOIN "".crdb_internal.tables t ON r.table_id = t.table_id
),
intermediate AS (
SELECT
    r.*
FROM
    named_ranges r
)
SELECT
    crdb_internal.pretty_key(r.start_key, -1) AS start_key,
    crdb_internal.pretty_key(r.end_key, -1) AS end_key,
    range_id,
    database_name,
    schema_name,
    table_name,
    table_id,
    crdb_internal.pretty_key(per_range_start_key, -1) AS table_start_key,
    crdb_internal.pretty_key(per_range_end_key, -1) AS table_end_key,
    replicas,
    replica_localities,
    voting_replicas,
    non_voting_replicas,
    learner_replicas,
    split_enforced_until
FROM
    intermediate r
ORDER BY
    r.start_key,
    r.per_range_start_key

While select * from "".crdb_internal.tables contains such deleted tables, select * from "".crdb_internal.table_spans does not, which is bit weird because if you add a ... WHERE descriptor_id clause it also shows that particular deleted table.

Now, I wonder if there is a simpler way at the KV layer to achieve our goal instead of using SQL queries. If not, I can think of creating a new version of or modifying crdb_internal.table_spans so that I can use in JOIN.

shubhamdhama added a commit to shubhamdhama/cockroach that referenced this issue Aug 10, 2024
Background:

I wanted to include deleted tables in `crdb_internal.table_spans` and add a
new column `is_deleted`. However, to achieve this, I needed to add a
boolean to `forEachTableDesc` and all the invoked functions so that
`forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them.
The argument count for these functions was already excessive, so I made the
following changes to improve readability and maintainability.

Details:

Previously, there were around four different overloads of functions:
- forEachTableDesc
- forEachTableDescAll
- forEachTableDescWithTableLookup
- forEachTableDescAllWithTableLookup
- forEachTableDescWithTableLookupInternal

This added excessive nesting and reduced readability. In many cases,
functions with the prefix `TableLookup` were used even though they no
longer utilized `tableLookupFn`.

In these proposed changes:
- I aim to improve readability by merging these functions.
- To avoid repeatedly declaring the same definition, I moved all arguments
  to a new struct. This way, we only define what we need.

```diff
- func(ctx context.Context, db catalog.DatabaseDescriptor,
- 		 sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error {
+ func(ctx context.Context, descCtx tableDescContext) error {
+     sc, table := descCtx.schema, descCtx.table
```

Now, we only have `forEachTableDesc` to iterate over table descriptors.

Informs: cockroachdb#128578, cockroachdb#97942
Epic: CRDB-12100
Release note: None
shubhamdhama added a commit to shubhamdhama/cockroach that referenced this issue Aug 29, 2024
Background:

I wanted to include deleted tables in `crdb_internal.table_spans` and add a
new column `is_deleted`. However, to achieve this, I needed to add a
boolean to `forEachTableDesc` and all the invoked functions so that
`forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them.
The argument count for these functions was already excessive, so I made the
following changes to improve readability and maintainability.

Details:

Previously, there were around four different overloads of functions:
- forEachTableDesc
- forEachTableDescAll
- forEachTableDescWithTableLookup
- forEachTableDescAllWithTableLookup
- forEachTableDescWithTableLookupInternal

This added excessive nesting and reduced readability. In many cases,
functions with the prefix `TableLookup` were used even though they no
longer utilized `tableLookupFn`.

In these proposed changes:
- I aim to improve readability by merging these functions.
- To avoid repeatedly declaring the same definition, I moved all arguments
  to a new struct. This way, we only define what we need.

```diff
- func(ctx context.Context, db catalog.DatabaseDescriptor,
- 		 sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error {
+ func(ctx context.Context, descCtx tableDescContext) error {
+     sc, table := descCtx.schema, descCtx.table
```

Now, we only have `forEachTableDesc` to iterate over table descriptors.

Informs: cockroachdb#128578, cockroachdb#97942
Epic: CRDB-12100
Release note: None
craig bot pushed a commit that referenced this issue Aug 29, 2024
128755: sql: refactor to combine `forEachTableDesc*` functions. r=rafiss a=shubhamdhama

Background:

I wanted to include deleted tables in `crdb_internal.table_spans` and add a
new column `is_deleted`. However, to achieve this, I needed to add a
boolean to `forEachTableDesc` and all the invoked functions so that
`forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them.
The argument count for these functions was already excessive, so I made the
following changes to improve readability and maintainability.

Details:

Previously, there were around four different overloads of functions:
- forEachTableDesc
- forEachTableDescAll
- forEachTableDescWithTableLookup
- forEachTableDescAllWithTableLookup
- forEachTableDescWithTableLookupInternal

This added excessive nesting and reduced readability. In many cases,
functions with the prefix `TableLookup` were used even though they no
longer utilized `tableLookupFn`.

In these proposed changes:
- I aim to improve readability by merging these functions.
- To avoid repeatedly declaring the same definition, I moved all arguments
  to a new struct. This way, we only define what we need.

```diff
- func(ctx context.Context, db catalog.DatabaseDescriptor,
- 		 sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error {
+ func(ctx context.Context, descCtx tableDescContext) error {
+     sc, table := descCtx.schema, descCtx.table
```

Now, we only have `forEachTableDesc` to iterate over table descriptors.

Informs: #128578, #97942
Epic: CRDB-12100
Release note: None


Co-authored-by: Shubham Dhama <[email protected]>
shubhamdhama added a commit to shubhamdhama/cockroach that referenced this issue Sep 9, 2024
Previously, this endpoint scanned over meta KVs and used their startKey to
determine the table they belong to. However, this approach does not work
for coalesced ranges.

The fix, inspired by the `SHOW CLUSTER RANGES` query, uses the
`crdb_internal.table_spans` table and joins it with the
`crdb_internal.ranges_no_leases` table.

There is still some work left that can be addressed in follow-up PR:

- We might want to add a summation of ranges at various levels because the
  sum of replicas of all tables in a database can be more than the actual
  number of ranges that belong to that database due to coalescing.

Overall, this PR also fixes the broken behavior to a large extent of this
endpoint since range coalescing is enabled.

Informs: cockroachdb#97942
Fixes: cockroachdb#109501, cockroachdb#106897
Epic: CRDB-12100
Release note: None
craig bot pushed a commit that referenced this issue Sep 11, 2024
128578: server: enable `data_distribution` API for secondary tenants. r=dhartunian,stevendanna a=shubhamdhama

Previously, this endpoint scanned over meta KVs and used their startKey to
determine the table they belong to. However, this approach does not work
for coalesced ranges.

The fix, inspired by the `SHOW CLUSTER RANGES` query, uses the
`crdb_internal.table_spans` table and joins it with the
`crdb_internal.ranges_no_leases` table.

There is still some work left that can be addressed in follow-up PR:

- We might want to add a summation of ranges at various levels because the
  sum of replicas of all tables in a database can be more than the actual
  number of ranges that belong to that database due to coalescing.

Overall, this PR also fixes the broken behavior to a large extent of this
endpoint since range coalescing is enabled.

Informs: #97942
Fixes: #109501, #106897
Epic: CRDB-12100
Release note: None


Co-authored-by: Shubham Dhama <[email protected]>
@shubhamdhama shubhamdhama removed their assignment Sep 13, 2024
@shubhamdhama
Copy link
Contributor

While this endpoint is enabled for secondary tenants, there are still some follow-ups needed to make it user friendly and compatible with range coalescing: #128578 (comment). I'm removing my assignment as discussed (#128578 (review))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants