-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver,server,sql: provide efficient mechanism to retrieve data size information for span #84105
Comments
This new request or mechanism then should be utilized in database and tables details pages. Probably also the request should return approximate disk bytes like the existing |
I think this tool could also be used to more accurately display logical and physical table size in SHOW BACKUP. We currently estimate logical table size, see step three in #82274 (comment) |
cc @cockroachdb/bulk-io |
…tats, proposed in cockroachdb#84105. The goal is to provide accurate table/index stats since the assumption that 1 range can contain at most 1 table/index no longer holds. Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note: None
…tats, proposed in cockroachdb#84105. The goal is to provide accurate table/index stats since the assumption that 1 range can contain at most 1 table/index no longer holds. Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note: None add interface and abstract away access to KV rename accessor implementation
…tats, proposed in cockroachdb#84105. The goal is to provide accurate table/index stats since the assumption that 1 range can contain at most 1 table/index no longer holds. Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note: None add interface and abstract away access to KV rename accessor implementation dial nodes that have replicas fix tests remove newlines add Accessor implementation for secondary tenants revert ui changes
This commit provides a re-implementation of fetching the MVCC stats and range count for a given span. The previous implementation relied on the assumption that 1 range can contain at most 1 table/index. Since cockroachdbGH-79700, this assumption is no longer true. This re-implementation has three components. The first is spanstats.Accessor, which provides an interface for accessing span stats from KV. The second is spanstatsaccessor.LocalAccessor, which provides an implementation of the interface for callers that are co-located on a KV node. The third is an implementation by kvtenantccl.Connector, which provides an implementation of the interface for non co-located callers, like SQL pods. The interaction between these components is illustrated below: System Tenant +----------------------------------------------------+ | | | | | KV Node fan-out | | +----------+ | | | | | | +-------------------------------v----+ | | | | | | | | | serverpb.InternalSpanStatsServer +-----+ | | | | | | +----------------+-------------------+ | | | | | roachpb.InternalSpanStatsResponse | | | | | | | | v | | +----------------------------------+ | | | | | | | spanstatsaccessor.LocalAccessor | | | | | | | +-----------------+----------------+ | | | | | +---------------------+ | | roachpb.InternalSpanStatsResponse | | | | | | | v v | | +-------------------------+ +-----------+ | | | | | | | | | serverpb.StatusServer | | SQLServer | | | | | | | | | +------------+------------+ +-----------+ | | | | | | | +----------------------+-----------------------------+ | | serverpb.SpanStatsResponse | | | | | Secondary Tenant | +----------------------+------------------------------+ | | | | +------------v-------------+ | | | | | | | kvtenantccl.Connector | | | | | | | +------------+-------------+ | | | | | | | | +--------------------+ | | roachpb.InternalSpanStatsResponse| | | | | | | | | | | +-----------------v-------+ +------v------+ | | | | | | | | | serverpb.StatusServer | | SQLServer | | | | | | | | | +-------------------------+ +-------------+ | | | | | | | +-----------------------------------------------------+ Resolves cockroachdb#84105 Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note (backward-incompatible change): The SpanStatsRequest message field 'node_id' has changed from type 'string' to type 'int32'.
This commit provides a re-implementation of fetching the MVCC stats and range count for a given span. The previous implementation relied on the assumption that 1 range can contain at most 1 table/index. Since cockroachdbGH-79700, this assumption is no longer true. This re-implementation allows secondary tenants to access span stats via the serverpb.TenantStatusServer interface. If a roachpb.SpanStatsRequest has a node_id value of 0, instead of a specific node id, the response will yield cluster-wide values. To achieve this, the server does a fan-out to nodes that are known to have replicas for the span requested. The interaction between tenants is illustrated below: System Tenant +----------------------------------------------------+ | | | | | KV Node fan-out | | +----------+ | | | | | | +-------------------------------v----+ | | | | | | | | | server.systemStatusServer +-----+ | | | | | | +----------------+-------------------+ | | | | | | | | | via TenantStatusServer | | +--------------+ | | | | | | | | | | | v | | | +-----------+ | | | | | | | | | SQLServer | | | | | | | | | +-----------+ | | | | +----------------------v-----------------------------+ | | | | roachpb.SpanStatsResponse | | Secondary Tenant | +----------------------+------------------------------+ | | | | +------------v-------------+ | | | | | | | | kvtenantccl.Connector | | | | | | | +------------+-------------+ | | | | | | via TenantStatusServer | | +--------------------+ | | | | | | | | | | | | | | +-----------------v-------+ +------v------+ | | | | | | | | | server.statusServer | | SQLServer | | | | | | | | | +-------------------------+ +-------------+ | | | | | | | +-----------------------------------------------------+ Resolves cockroachdb#84105 Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note (backward-incompatible change): The SpanStatsRequest message field 'node_id' has changed from type 'string' to type 'int32'. refactor away from accessor interface remove print
This commit provides a re-implementation of fetching the MVCC stats and range count for a given span. The previous implementation relied on the assumption that 1 range can contain at most 1 table/index. Since cockroachdbGH-79700, this assumption is no longer true. This re-implementation allows secondary tenants to access span stats via the serverpb.TenantStatusServer interface. If a roachpb.SpanStatsRequest has a node_id value of 0, instead of a specific node id, the response will yield cluster-wide values. To achieve this, the server does a fan-out to nodes that are known to have replicas for the span requested. The interaction between tenants is illustrated below: System Tenant +----------------------------------------------------+ | | | | | KV Node fan-out | | +----------+ | | | | | | +-------------------------------v----+ | | | | | | | | | server.systemStatusServer +-----+ | | | | | | +----------------+-------------------+ | | | | | | | | | via TenantStatusServer | | +--------------+ | | | | | | | | | | | v | | | +-----------+ | | | | | | | | | SQLServer | | | | | | | | | +-----------+ | | | | +----------------------v-----------------------------+ | | | | roachpb.SpanStatsResponse | | Secondary Tenant | +----------------------+------------------------------+ | | | | +------------v-------------+ | | | | | | | | kvtenantccl.Connector | | | | | | | +------------+-------------+ | | | | | | via TenantStatusServer | | +--------------------+ | | | | | | | | | | | | | | +-----------------v-------+ +------v------+ | | | | | | | | | server.statusServer | | SQLServer | | | | | | | | | +-------------------------+ +-------------+ | | | | | | | +-----------------------------------------------------+ Resolves cockroachdb#84105 Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note (backward-incompatible change): The SpanStatsRequest message field 'node_id' has changed from type 'string' to type 'int32'. refactor away from accessor interface remove print
This commit provides a re-implementation of fetching the MVCC stats and range count for a given span. The previous implementation relied on the assumption that 1 range can contain at most 1 table/index. Since cockroachdbGH-79700, this assumption is no longer true. This re-implementation allows secondary tenants to access span stats via the serverpb.TenantStatusServer interface. If a roachpb.SpanStatsRequest has a node_id value of 0, instead of a specific node id, the response will yield cluster-wide values. To achieve this, the server does a fan-out to nodes that are known to have replicas for the span requested. The interaction between tenants is illustrated below: System Tenant +----------------------------------------------------+ | | | | | KV Node fan-out | | +----------+ | | | | | | +-------------------------------v----+ | | | | | | | | | server.systemStatusServer +-----+ | | | | | | +----------------+-------------------+ | | | | | | | | | via TenantStatusServer | | +--------------+ | | | | | | | | | | | v | | | +-----------+ | | | | | | | | | SQLServer | | | | | | | | | +-----------+ | | | | +----------------------v-----------------------------+ | | | | roachpb.SpanStatsResponse | | Secondary Tenant | +----------------------+------------------------------+ | | | | +------------v-------------+ | | | | | | | | kvtenantccl.Connector | | | | | | | +------------+-------------+ | | | | | | via TenantStatusServer | | +--------------------+ | | | | | | | | | | | | | | +-----------------v-------+ +------v------+ | | | | | | | | | server.statusServer | | SQLServer | | | | | | | | | +-------------------------+ +-------------+ | | | | | | | +-----------------------------------------------------+ Resolves cockroachdb#84105 Part of: https://cockroachlabs.atlassian.net/browse/CRDB-22711 Release note (backward-incompatible change): The SpanStatsRequest message field 'node_id' has changed from type 'string' to type 'int32'.
Fixes cockroachdb#81008. We built the basic infrastructure to coalesce ranges across table boundaries back in 22.2 as part of cockroachdb#66063. We've enabled this optimization for secondary tenants since then, but hadn't for the system tenant because of two primary blockers: - cockroachdb#93617: SHOW RANGES was broken by coalesced ranges. - cockroachdb#84105: APIs to compute sizes for schema objects (used in our UI) was broken by coalesced ranges. In both these cases we baked in assumptions about there being a minimum of one-{table,index,partition}-per-range. These blockers didn't apply to secondary tenants at the time since they didn't have access to SHOW RANGES, nor the UI pages where these schema statistics were displayed. We've addressed both these blockers in the 23.1 cycle as part of bridging the compatibility between secondary tenants and yesteryear's system tenant. - cockroachdb#93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both internally and its external UX, to accommodate ranges straddling table/database boundaries. - cockroachdb#96223 re-worked our SpanStats API to work in the face of coalesced ranges, addressing cockroachdb#84105. Release note (general change): CockroachDB would previously use separate ranges for each table, index, or partition. This is no longer true -- it's possible now to have multiple tables, indexes, and partitions to get packed into the same range. For users with many such "schema objects", this will reduce the total range count in their clusters. This is especially true if individual tables, indexes, or partitions are smaller than the default configured maximum range size (controlled using zone configs, specifically the range_max_bytes parameter). We made this change to improve scalability with respect to the number of schema objects, since the underlying range count now no longer a bottleneck. Users upgrading from 22.2, once finalizing their upgrade, may observe a round of range merges and snapshot transfers (to power said range merges) as a result of this change. If users want to opt-out of this optimization, they can use the following cluster setting: SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false;
Fixes cockroachdb#81008. We built the basic infrastructure to coalesce ranges across table boundaries back in 22.2 as part of cockroachdb#66063. We've enabled this optimization for secondary tenants since then, but hadn't for the system tenant because of two primary blockers: - cockroachdb#93617: SHOW RANGES was broken by coalesced ranges. - cockroachdb#84105: APIs to compute sizes for schema objects (used in our UI) was broken by coalesced ranges. In both these cases we baked in assumptions about there being a minimum of one-{table,index,partition}-per-range. These blockers didn't apply to secondary tenants at the time since they didn't have access to SHOW RANGES, nor the UI pages where these schema statistics were displayed. We've addressed both these blockers in the 23.1 cycle as part of bridging the compatibility between secondary tenants and yesteryear's system tenant. - cockroachdb#93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both internally and its external UX, to accommodate ranges straddling table/database boundaries. - cockroachdb#96223 re-worked our SpanStats API to work in the face of coalesced ranges, addressing cockroachdb#84105. Release note (general change): CockroachDB would previously use separate ranges for each table, index, or partition. This is no longer true -- it's possible now to have multiple tables, indexes, and partitions to get packed into the same range. For users with many such "schema objects", this will reduce the total range count in their clusters. This is especially true if individual tables, indexes, or partitions are smaller than the default configured maximum range size (controlled using zone configs, specifically the range_max_bytes parameter). We made this change to improve scalability with respect to the number of schema objects, since the underlying range count now no longer a bottleneck. Users upgrading from 22.2, once finalizing their upgrade, may observe a round of range merges and snapshot transfers (to power said range merges) as a result of this change. If users want to opt-out of this optimization, they can use the following cluster setting: SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false;
98820: spanconfig: enable range coalescing by default r=irfansharif a=irfansharif Fixes #81008. We built the basic infrastructure to coalesce ranges across table boundaries back in 22.2 as part of #66063. We've enabled this optimization for secondary tenants since then, but hadn't for the system tenant because of two primary blockers: - #93617: SHOW RANGES was broken by coalesced ranges. - #84105: APIs to compute sizes for schema objects (used in our UI) was broken by coalesced ranges. In both these cases we baked in assumptions about there being a minimum of one-{table,index,partition}-per-range. These blockers didn't apply to secondary tenants at the time since they didn't have access to SHOW RANGES, nor the UI pages where these schema statistics were displayed. We've addressed both these blockers in the 23.1 cycle as part of bridging the compatibility between secondary tenants and yesteryear's system tenant. - #93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both internally and its external UX, to accommodate ranges straddling table/database boundaries. - #96223 re-worked our SpanStats API to work in the face of coalesced ranges, addressing #84105. Release note (general change): CockroachDB would previously use separate ranges for each table, index, or partition. This is no longer true -- it's possible now to have multiple tables, indexes, and partitions to get packed into the same range. For users with many such "schema objects", this will reduce the total range count in their clusters. This is especially true if individual tables, indexes, or partitions are smaller than the default configured maximum range size (controlled using zone configs, specifically the range_max_bytes parameter). We made this change to improve scalability with respect to the number of schema objects, since the underlying range count now no longer a bottleneck. Users upgrading from 22.2, once finalizing their upgrade, may observe a round of range merges and snapshot transfers (to power said range merges) as a result of this change. If users want to opt-out of this optimization, they can use the following cluster setting: ``` SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false; ``` Co-authored-by: irfan sharif <[email protected]>
Jira: CRDB-17463
Epic: CRDB-22711
Is your feature request related to a problem? Please describe.
SQL observability desires access to fine-grained size information for tables, and, probably before too long even indexes and index partitions. This shows up in a few places. One is via
crdb_internal.ranges
where we request some data on a per range basis. Another is in the admin server inTableStats
andDatabaseStats
There are two primary mechanisms by which this information has been retrieved in the past, both of which have problems:serverpb.SpanStatsRequest
-- used to fetch table statsroachpb.RangeStatsRequest
-- used inside some of the virtual tablesAn issue with both of these requests is that neither provides a mechanism to compute MVCC statistics for a span which is smaller than a range; both provide only MVCC stats for all of the ranges which intersect the span in question. In multi-tenant cockroach, this poses a serious problem today because we pack tables into a range in that setting. We would like to pack tables together also in "regular" cockroach (#81008), which would break existing mechanisms. This will also get in the way of collecting information for indexes which are already collocated inside a span.
Another relevant piece of context is that in many cases which care about data size, we also care about placement in terms of regions of the replicas and of the leaseholder. The
RangeStatsResponse
carriesRangeInfo
which has the lease information and the descriptor.Up to this point, the above mentioned requests only look at in-memory data structures; there is no scanning of data required to produce responses. This is nice from an efficiency perspective, but seems untenable. We'll need to provide tools to compute sub-range MVCC statistics.
Describe the solution you'd like
As a first pass, keeping the client-API range-oriented does not seem a like a huge problem -- we do not have evidence that the number of RPCs causes big problems. It's also the case that if a span intersects a great many ranges, most of the ranges will be fully contained by the span -- so today's in-memory behavior continues to apply. A proposal, which may be a weak one, would be to introduce a new request which sits in-between
roachpb.RangeStatsRequest
andserverpb.SpanStatsRequest
:roachpb.SpanStatsRequest
. This request will leverageDistSender
parallelism to perform its work. When the relevant portion of the span does not fully cover the range, the actual MVCC stats for that span will be computed by scanning the data. This should only end up happening for up to two ranges. The response can contain placement and leaseholder information for each range which intersects. This information could be aggregated when merging the responses or presented on a per-range basis.Describe alternatives you've considered
One alternative would be to stick to a store-oriented API.
Additional context
Reducing RPCs
Ideally, we'd need to only send a number of RPCs proportional to the number of KV nodes containing data intersecting the span[s] in question. An open question is whether the O(ranges) RPCs is actively a problem given the ability of the distsender to parallelize and the fact that there likely needs to be independent locking (#34999). It is possible that we may want to allow for multiple spans in the same request, but that decision is out of scope for this initial issue statement. Likely we should push the problem of fairness and cost of parallelism into the distsender to coordinate with admission control.
Aggregating internally
One observation is that in many of the higher level use cases for these requests, we don't really care about individual ranges, we just care about spans and their aggregate statistics. This is particularly true in #84090. Indeed there may be value in obfuscating the individual ranges and nodes from the client.
The text was updated successfully, but these errors were encountered: