Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver,server,sql: provide efficient mechanism to retrieve data size information for span #84105

Closed
ajwerner opened this issue Jul 8, 2022 · 3 comments
Assignees
Labels
A-disaster-recovery A-kv-distribution Relating to rebalancing and leasing. A-kv-observability A-kv-transactions Relating to MVCC and the transactional model. A-multitenancy Related to multi-tenancy C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Jul 8, 2022

Jira: CRDB-17463
Epic: CRDB-22711

Is your feature request related to a problem? Please describe.

SQL observability desires access to fine-grained size information for tables, and, probably before too long even indexes and index partitions. This shows up in a few places. One is via crdb_internal.ranges where we request some data on a per range basis. Another is in the admin server in TableStats and DatabaseStatsThere are two primary mechanisms by which this information has been retrieved in the past, both of which have problems:

  1. serverpb.SpanStatsRequest -- used to fetch table stats
    1. This request is a per-node request which relatively efficiently aggregates the MVCC stats and approximate disk bytes for a given key span, if the number of ranges is large relative to the number of nodes. Conversely, the fact that the request gets sent to every node means that it might be more expensive if the number of nodes is large relative to the number of ranges.
    2. The approximate disk bytes out of pebble are particularly valuable because they are the only way to get a sense for physical byte usage, and they can be safely aggregated to produce a meaningful value. That being said, while I might think of the disk usage as a good thing to know, it's definitely more complicated to explain and reason about than logical bytes.
    3. One can sort of reason about aggregating the MVCC stats too, but would need to divide properly by the replication factor.
  2. roachpb.RangeStatsRequest -- used inside some of the virtual tables
    1. This request is a per-range request, that means it is likely to be more expensive if the number of nodes is small relative to the number of ranges.
    2. This request is generally sent sequentially, which means it can be very slow.
    3. This request is non-transactional and only needs to go to a local replica, so if there is a replica for every relevant range in the local region, it can be fast.

An issue with both of these requests is that neither provides a mechanism to compute MVCC statistics for a span which is smaller than a range; both provide only MVCC stats for all of the ranges which intersect the span in question. In multi-tenant cockroach, this poses a serious problem today because we pack tables into a range in that setting. We would like to pack tables together also in "regular" cockroach (#81008), which would break existing mechanisms. This will also get in the way of collecting information for indexes which are already collocated inside a span.

Another relevant piece of context is that in many cases which care about data size, we also care about placement in terms of regions of the replicas and of the leaseholder. The RangeStatsResponse carries RangeInfo which has the lease information and the descriptor.

Up to this point, the above mentioned requests only look at in-memory data structures; there is no scanning of data required to produce responses. This is nice from an efficiency perspective, but seems untenable. We'll need to provide tools to compute sub-range MVCC statistics.

Describe the solution you'd like

As a first pass, keeping the client-API range-oriented does not seem a like a huge problem -- we do not have evidence that the number of RPCs causes big problems. It's also the case that if a span intersects a great many ranges, most of the ranges will be fully contained by the span -- so today's in-memory behavior continues to apply. A proposal, which may be a weak one, would be to introduce a new request which sits in-between roachpb.RangeStatsRequest and serverpb.SpanStatsRequest: roachpb.SpanStatsRequest. This request will leverage DistSender parallelism to perform its work. When the relevant portion of the span does not fully cover the range, the actual MVCC stats for that span will be computed by scanning the data. This should only end up happening for up to two ranges. The response can contain placement and leaseholder information for each range which intersects. This information could be aggregated when merging the responses or presented on a per-range basis.

Describe alternatives you've considered

One alternative would be to stick to a store-oriented API.

Additional context

Reducing RPCs

Ideally, we'd need to only send a number of RPCs proportional to the number of KV nodes containing data intersecting the span[s] in question. An open question is whether the O(ranges) RPCs is actively a problem given the ability of the distsender to parallelize and the fact that there likely needs to be independent locking (#34999). It is possible that we may want to allow for multiple spans in the same request, but that decision is out of scope for this initial issue statement. Likely we should push the problem of fairness and cost of parallelism into the distsender to coordinate with admission control.

Aggregating internally

One observation is that in many of the higher level use cases for these requests, we don't really care about individual ranges, we just care about spans and their aggregate statistics. This is particularly true in #84090. Indeed there may be value in obfuscating the individual ranges and nodes from the client.

@ajwerner ajwerner added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 8, 2022
@ajwerner
Copy link
Contributor Author

ajwerner commented Jul 8, 2022

This new request or mechanism then should be utilized in database and tables details pages. Probably also the request should return approximate disk bytes like the existing serverpb.SpanStatsRequest. Note that I'm deferring any efficiency considerations here in the case of large numbers of ranges. I'm hopeful that parallelism makes it good enough.

@msbutler
Copy link
Collaborator

msbutler commented Jul 21, 2022

I think this tool could also be used to more accurately display logical and physical table size in SHOW BACKUP. We currently estimate logical table size, see step three in #82274 (comment)

@blathers-crl
Copy link

blathers-crl bot commented Jul 21, 2022

cc @cockroachdb/bulk-io

@zachlite zachlite self-assigned this Jan 25, 2023
zachlite pushed a commit to zachlite/cockroach that referenced this issue Jan 30, 2023
…tats,

proposed in cockroachdb#84105. The goal is to provide accurate table/index stats since
the assumption that 1 range can contain at most 1 table/index no longer holds.

Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711
Release note: None
zachlite pushed a commit to zachlite/cockroach that referenced this issue Feb 1, 2023
…tats,

proposed in cockroachdb#84105. The goal is to provide accurate table/index stats since
the assumption that 1 range can contain at most 1 table/index no longer holds.

Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711
Release note: None

add interface and abstract away access to KV

rename accessor implementation
zachlite pushed a commit to zachlite/cockroach that referenced this issue Feb 7, 2023
…tats,

proposed in cockroachdb#84105. The goal is to provide accurate table/index stats since
the assumption that 1 range can contain at most 1 table/index no longer holds.

Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711
Release note: None

add interface and abstract away access to KV

rename accessor implementation

dial nodes that have replicas

fix tests

remove newlines

add Accessor implementation for secondary tenants

revert ui changes
zachlite pushed a commit to zachlite/cockroach that referenced this issue Feb 7, 2023
This commit provides a re-implementation of fetching the MVCC stats
and range count for a given span. The previous implementation relied on
the assumption that 1 range can contain at most 1 table/index. Since
cockroachdbGH-79700, this assumption is no longer true.

This re-implementation has three components. The first is spanstats.Accessor,
which provides an interface for accessing span stats from KV.

The second is spanstatsaccessor.LocalAccessor, which provides an implementation
of the interface for callers that are co-located on a KV node.

The third is an implementation by kvtenantccl.Connector, which provides an
implementation of the interface for non co-located callers, like SQL pods.

The interaction between these components is illustrated below:

 System Tenant
+----------------------------------------------------+
|                                                    |
|                                                    |
|                                    KV Node fan-out |
|                                     +----------+   |
|                                     |          |   |
|     +-------------------------------v----+     |   |
|     |                                    |     |   |
|     |  serverpb.InternalSpanStatsServer  +-----+   |
|     |                                    |         |
|     +----------------+-------------------+         |
|                      |                             |
|         roachpb.InternalSpanStatsResponse          |
|                      |                             |
|                      |                             |
|                      v                             |
|    +----------------------------------+            |
|    |                                  |            |
|    |  spanstatsaccessor.LocalAccessor |            |
|    |                                  |            |
|    +-----------------+----------------+            |
|                      |                             |
|                      +---------------------+       |
|         roachpb.InternalSpanStatsResponse  |       |
|                      |                     |       |
|                      v                     v       |
|         +-------------------------+  +-----------+ |
|         |                         |  |           | |
|         |  serverpb.StatusServer  |  | SQLServer | |
|         |                         |  |           | |
|         +------------+------------+  +-----------+ |
|                      |                             |
|                      |                             |
+----------------------+-----------------------------+
                       |
                       |  serverpb.SpanStatsResponse
                       |
                       |
                       |
                       |
                       |
 Secondary Tenant      |
+----------------------+------------------------------+
|                      |                              |
|         +------------v-------------+                |
|         |                          |                |
|         |   kvtenantccl.Connector  |                |
|         |                          |                |
|         +------------+-------------+                |
|                      |                              |
|                      |                              |
|                      +--------------------+         |
|          roachpb.InternalSpanStatsResponse|         |
|                      |                    |         |
|                      |                    |         |
|    +-----------------v-------+     +------v------+  |
|    |                         |     |             |  |
|    |  serverpb.StatusServer  |     |  SQLServer  |  |
|    |                         |     |             |  |
|    +-------------------------+     +-------------+  |
|                                                     |
|                                                     |
|                                                     |
+-----------------------------------------------------+

Resolves cockroachdb#84105
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711
Release note (backward-incompatible change): The SpanStatsRequest message
field 'node_id' has changed from type 'string' to type 'int32'.
zachlite pushed a commit to zachlite/cockroach that referenced this issue Feb 8, 2023
This commit provides a re-implementation of fetching the MVCC stats
and range count for a given span. The previous implementation relied on
the assumption that 1 range can contain at most 1 table/index. Since
cockroachdbGH-79700, this assumption is no longer true.

This re-implementation allows secondary tenants to access span stats
via the serverpb.TenantStatusServer interface.

If a roachpb.SpanStatsRequest has a node_id value of 0, instead of a specific
node id, the response will yield cluster-wide values. To achieve this,
the server does a fan-out to nodes that are known to have replicas for the
span requested.

The interaction between tenants is illustrated below:

 System Tenant
+----------------------------------------------------+
|                                                    |
|                                                    |
|                                    KV Node fan-out |
|                                     +----------+   |
|                                     |          |   |
|     +-------------------------------v----+     |   |
|     |                                    |     |   |
|     |      server.systemStatusServer     +-----+   |
|     |                                    |         |
|     +----------------+-------------------+         |
|                      |                             |
|                      |                             |
|                      |     via TenantStatusServer  |
|                      +--------------+              |
|                      |              |              |
|                      |              |              |
|                      |              v              |
|                      |        +-----------+        |
|                      |        |           |        |
|                      |        | SQLServer |        |
|                      |        |           |        |
|                      |        +-----------+        |
|                      |                             |
+----------------------v-----------------------------+
                       |
                       |
                       |
                       |   roachpb.SpanStatsResponse
                       |
                       |
 Secondary Tenant      |
+----------------------+------------------------------+
|                      |                              |
|         +------------v-------------+                |
|         |            |             |                |
|         |   kvtenantccl.Connector  |                |
|         |                          |                |
|         +------------+-------------+                |
|                      |                              |
|                      |    via TenantStatusServer    |
|                      +--------------------+         |
|                      |                    |         |
|                      |                    |         |
|                      |                    |         |
|    +-----------------v-------+     +------v------+  |
|    |                         |     |             |  |
|    |    server.statusServer  |     |  SQLServer  |  |
|    |                         |     |             |  |
|    +-------------------------+     +-------------+  |
|                                                     |
|                                                     |
|                                                     |
+-----------------------------------------------------+

Resolves cockroachdb#84105
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711
Release note (backward-incompatible change): The SpanStatsRequest message
field 'node_id' has changed from type 'string' to type 'int32'.

refactor away from accessor interface

remove print
zachlite pushed a commit to zachlite/cockroach that referenced this issue Feb 9, 2023
This commit provides a re-implementation of fetching the MVCC stats
and range count for a given span. The previous implementation relied on
the assumption that 1 range can contain at most 1 table/index. Since
cockroachdbGH-79700, this assumption is no longer true.

This re-implementation allows secondary tenants to access span stats
via the serverpb.TenantStatusServer interface.

If a roachpb.SpanStatsRequest has a node_id value of 0, instead of a specific
node id, the response will yield cluster-wide values. To achieve this,
the server does a fan-out to nodes that are known to have replicas for the
span requested.

The interaction between tenants is illustrated below:

 System Tenant
+----------------------------------------------------+
|                                                    |
|                                                    |
|                                    KV Node fan-out |
|                                     +----------+   |
|                                     |          |   |
|     +-------------------------------v----+     |   |
|     |                                    |     |   |
|     |      server.systemStatusServer     +-----+   |
|     |                                    |         |
|     +----------------+-------------------+         |
|                      |                             |
|                      |                             |
|                      |     via TenantStatusServer  |
|                      +--------------+              |
|                      |              |              |
|                      |              |              |
|                      |              v              |
|                      |        +-----------+        |
|                      |        |           |        |
|                      |        | SQLServer |        |
|                      |        |           |        |
|                      |        +-----------+        |
|                      |                             |
+----------------------v-----------------------------+
                       |
                       |
                       |
                       |   roachpb.SpanStatsResponse
                       |
                       |
 Secondary Tenant      |
+----------------------+------------------------------+
|                      |                              |
|         +------------v-------------+                |
|         |            |             |                |
|         |   kvtenantccl.Connector  |                |
|         |                          |                |
|         +------------+-------------+                |
|                      |                              |
|                      |    via TenantStatusServer    |
|                      +--------------------+         |
|                      |                    |         |
|                      |                    |         |
|                      |                    |         |
|    +-----------------v-------+     +------v------+  |
|    |                         |     |             |  |
|    |    server.statusServer  |     |  SQLServer  |  |
|    |                         |     |             |  |
|    +-------------------------+     +-------------+  |
|                                                     |
|                                                     |
|                                                     |
+-----------------------------------------------------+

Resolves cockroachdb#84105
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711
Release note (backward-incompatible change): The SpanStatsRequest message
field 'node_id' has changed from type 'string' to type 'int32'.

refactor away from accessor interface

remove print
zachlite pushed a commit to zachlite/cockroach that referenced this issue Feb 13, 2023
This commit provides a re-implementation of fetching the MVCC stats
and range count for a given span. The previous implementation relied on
the assumption that 1 range can contain at most 1 table/index. Since
cockroachdbGH-79700, this assumption is no longer true.

This re-implementation allows secondary tenants to access span stats
via the serverpb.TenantStatusServer interface.

If a roachpb.SpanStatsRequest has a node_id value of 0, instead of a specific
node id, the response will yield cluster-wide values. To achieve this,
the server does a fan-out to nodes that are known to have replicas for the
span requested.

The interaction between tenants is illustrated below:

 System Tenant
+----------------------------------------------------+
|                                                    |
|                                                    |
|                                    KV Node fan-out |
|                                     +----------+   |
|                                     |          |   |
|     +-------------------------------v----+     |   |
|     |                                    |     |   |
|     |      server.systemStatusServer     +-----+   |
|     |                                    |         |
|     +----------------+-------------------+         |
|                      |                             |
|                      |                             |
|                      |     via TenantStatusServer  |
|                      +--------------+              |
|                      |              |              |
|                      |              |              |
|                      |              v              |
|                      |        +-----------+        |
|                      |        |           |        |
|                      |        | SQLServer |        |
|                      |        |           |        |
|                      |        +-----------+        |
|                      |                             |
+----------------------v-----------------------------+
                       |
                       |
                       |
                       |   roachpb.SpanStatsResponse
                       |
                       |
 Secondary Tenant      |
+----------------------+------------------------------+
|                      |                              |
|         +------------v-------------+                |
|         |            |             |                |
|         |   kvtenantccl.Connector  |                |
|         |                          |                |
|         +------------+-------------+                |
|                      |                              |
|                      |    via TenantStatusServer    |
|                      +--------------------+         |
|                      |                    |         |
|                      |                    |         |
|                      |                    |         |
|    +-----------------v-------+     +------v------+  |
|    |                         |     |             |  |
|    |    server.statusServer  |     |  SQLServer  |  |
|    |                         |     |             |  |
|    +-------------------------+     +-------------+  |
|                                                     |
|                                                     |
|                                                     |
+-----------------------------------------------------+

Resolves cockroachdb#84105
Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711
Release note (backward-incompatible change): The SpanStatsRequest message
field 'node_id' has changed from type 'string' to type 'int32'.
@craig craig bot closed this as completed in 4fee2a4 Feb 15, 2023
irfansharif added a commit to irfansharif/cockroach that referenced this issue May 7, 2023
Fixes cockroachdb#81008.

We built the basic infrastructure to coalesce ranges across table
boundaries back in 22.2 as part of cockroachdb#66063. We've enabled this
optimization for secondary tenants since then, but hadn't for the system
tenant because of two primary blockers:

- cockroachdb#93617: SHOW RANGES was broken by coalesced ranges.
- cockroachdb#84105: APIs to compute sizes for schema objects (used in our UI) was
  broken by coalesced ranges.

In both these cases we baked in assumptions about there being a minimum
of one-{table,index,partition}-per-range. These blockers didn't apply to
secondary tenants at the time since they didn't have access to SHOW
RANGES, nor the UI pages where these schema statistics were displayed.

We've addressed both these blockers in the 23.1 cycle as part of
bridging the compatibility between secondary tenants and yesteryear's
system tenant.
- cockroachdb#93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both
  internally and its external UX, to accommodate ranges straddling
  table/database boundaries.
- cockroachdb#96223 re-worked our SpanStats API to work in the face of coalesced
  ranges, addressing cockroachdb#84105.

Release note (general change): CockroachDB would previously use separate
ranges for each table, index, or partition. This is no longer true --
it's possible now to have multiple tables, indexes, and partitions to
get packed into the same range. For users with many such "schema
objects", this will reduce the total range count in their clusters. This
is especially true if individual tables, indexes, or partitions are
smaller than the default configured maximum range size (controlled using
zone configs, specifically the range_max_bytes parameter).
We made this change to improve scalability with respect to the number of
schema objects, since the underlying range count now no longer a
bottleneck. Users upgrading from 22.2, once finalizing their upgrade,
may observe a round of range merges and snapshot transfers (to power
said range merges) as a result of this change. If users want to opt-out
of this optimization, they can use the following cluster setting:

  SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false;
irfansharif added a commit to irfansharif/cockroach that referenced this issue May 8, 2023
Fixes cockroachdb#81008.

We built the basic infrastructure to coalesce ranges across table
boundaries back in 22.2 as part of cockroachdb#66063. We've enabled this
optimization for secondary tenants since then, but hadn't for the system
tenant because of two primary blockers:

- cockroachdb#93617: SHOW RANGES was broken by coalesced ranges.
- cockroachdb#84105: APIs to compute sizes for schema objects (used in our UI) was
  broken by coalesced ranges.

In both these cases we baked in assumptions about there being a minimum
of one-{table,index,partition}-per-range. These blockers didn't apply to
secondary tenants at the time since they didn't have access to SHOW
RANGES, nor the UI pages where these schema statistics were displayed.

We've addressed both these blockers in the 23.1 cycle as part of
bridging the compatibility between secondary tenants and yesteryear's
system tenant.
- cockroachdb#93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both
  internally and its external UX, to accommodate ranges straddling
  table/database boundaries.
- cockroachdb#96223 re-worked our SpanStats API to work in the face of coalesced
  ranges, addressing cockroachdb#84105.

Release note (general change): CockroachDB would previously use separate
ranges for each table, index, or partition. This is no longer true --
it's possible now to have multiple tables, indexes, and partitions to
get packed into the same range. For users with many such "schema
objects", this will reduce the total range count in their clusters. This
is especially true if individual tables, indexes, or partitions are
smaller than the default configured maximum range size (controlled using
zone configs, specifically the range_max_bytes parameter).
We made this change to improve scalability with respect to the number of
schema objects, since the underlying range count now no longer a
bottleneck. Users upgrading from 22.2, once finalizing their upgrade,
may observe a round of range merges and snapshot transfers (to power
said range merges) as a result of this change. If users want to opt-out
of this optimization, they can use the following cluster setting:

  SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false;
craig bot pushed a commit that referenced this issue May 8, 2023
98820: spanconfig: enable range coalescing by default r=irfansharif a=irfansharif

Fixes #81008.

We built the basic infrastructure to coalesce ranges across table boundaries back in 22.2 as part of #66063. We've enabled this optimization for secondary tenants since then, but hadn't for the system tenant because of two primary blockers:

- #93617: SHOW RANGES was broken by coalesced ranges.
- #84105: APIs to compute sizes for schema objects (used in our UI) was broken by coalesced ranges.

In both these cases we baked in assumptions about there being a minimum of one-{table,index,partition}-per-range. These blockers didn't apply to secondary tenants at the time since they didn't have access to SHOW RANGES, nor the UI pages where these schema statistics were displayed.

We've addressed both these blockers in the 23.1 cycle as part of bridging the compatibility between secondary tenants and yesteryear's system tenant.
- #93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both internally and its external UX, to accommodate ranges straddling table/database boundaries.
- #96223 re-worked our SpanStats API to work in the face of coalesced ranges, addressing #84105.

Release note (general change): CockroachDB would previously use separate ranges for each table, index, or partition. This is no longer true -- it's possible now to have multiple tables, indexes, and partitions to get packed into the same range. For users with many such "schema objects", this will reduce the total range count in their clusters. This is especially true if individual tables, indexes, or partitions are smaller than the default configured maximum range size (controlled using zone configs, specifically the range_max_bytes parameter). We made this change to improve scalability with respect to the number of schema objects, since the underlying range count now no longer a bottleneck. Users upgrading from 22.2, once finalizing their upgrade, may observe a round of range merges and snapshot transfers (to power said range merges) as a result of this change. If users want to opt-out of this optimization, they can use the following cluster setting:
```
  SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false; 
```

Co-authored-by: irfan sharif <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery A-kv-distribution Relating to rebalancing and leasing. A-kv-observability A-kv-transactions Relating to MVCC and the transactional model. A-multitenancy Related to multi-tenancy C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
No open projects
Archived in project
Development

No branches or pull requests

4 participants