-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spanconfig: enable range coalescing by default #98820
spanconfig: enable range coalescing by default #98820
Conversation
050822f
to
c58ff8c
Compare
Hm, looks like #97942 is another dependency. Our data distribution page in the advanced debug page is making the same minimum-one-table-per-range assumption here (+cc @dhartunian). Womp womp. Line 3081 in b6e0f76
|
c58ff8c
to
914d273
Compare
CI failure is #98555. |
914d273
to
b19d25c
Compare
98979: sql: backward-compatibility for SHOW RANGES / crdb_internal.ranges r=cucaroach,fqazi a=knz Requested by `@mwang1026` and `@nvanbenschoten` . Fixes #97861. Fixes #99073. Needed for #98820. Epic: CRDB-24928 See the individual commits for details; exceprt from the 2nd commit: TLDR: the pre-v23.1 behavior of SHOW RANGES and `crdb_internal.ranges{_no_leases}` is made available conditional on a new cluster setting `sql.show_ranges_deprecated_behavior.enabled`. It is set to true by default in v23.1, however any use of the feature will prompt a SQL notice with the following message: ``` NOTICE: attention! the pre-23.1 behavior of SHOW RANGES and crdb_internal.ranges{,_no_leases} is deprecated! HINT: Consider enabling the new functionality by setting 'sql.show_ranges_deprecated_behavior.enabled' to 'false'. The new SHOW RANGES statement has more options. Refer to the online documentation or execute 'SHOW RANGES ??' for details. ``` The deprecated behavior is also disabled in the test suite and by default in `cockroach demo`, i.e. the tests continue to exercise the new behavior. Release note (backward-incompatible change): The pre-v23.1 output produced by `SHOW RANGES`, `crdb_internal.ranges`, `crdb_internal.ranges_no_leases` is deprecated. The new interface and functionality for `SHWO RANGES` can be enabled by changing the cluster setting `sql.show_ranges_deprecated_behavior.enabled` to `false`. It will become default in v23.2. Co-authored-by: Raphael 'kena' Poss <[email protected]>
The DataDistribution endpoint reports replica counts by database and table. When it was built, it operated off the assumption that a range would only ever contain a single table's data within. Now that we have coalesced ranges, a single range can span multiple tables. Unfortunately, the DataDistribution endpoint does not take this fact into account, meaning it reports garbled and inaccurate data, unless the `spanconfig.storage_coalesce_adjacent.enabled` setting is set to false (see cockroachdb#98820). For secondary tenants, ranges are *always* coalesced, so this endpoint in its current state could never report meaningful data for a tenant. Given all of this, we have decided to make this endpoint only available for the system tenant. This patch accomplishes this by moving the endpoint away from the adminServer and into the systemAdminServer, making it effectively unimplemented for secondary tenants. Release note: none
99663: sql: update connExecutor logic for pausable portals r=ZhouXing19 a=ZhouXing19 This PR replaces #96358 and is part of the initial implementation of multiple active portals. ---- This PR is to add limited support for multiple active portals. Now portals satisfying all following restrictions can be paused and resumed (i.e., with other queries interleaving it): 1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with a _non-distributed_ plan. This feature is gated by a session variable `multiple_active_portals_enabled`. When it's set `true`, all portals that satisfy the restrictions above will automatically become "pausable" when being created via the pgwire `Bind` stmt. The core idea of this implementation is 1. Add a `switchToAnotherPortal` status to the result-consumption state machine. When we receive an `ExecPortal` message for a different portal, we simply return the control to the connExecutor. (#99052) 2. Persist `flow` `queryID` `span` and `instrumentationHelper` for the portal, and reuse it when we re-execute a portal. This is to ensure we _continue_ the fetching rather than starting all over. (#99173) 3. To enable 2, we need to delay the clean-up of resources till we close the portal. For this we introduced the stacks of cleanup functions. (This PR) Note that we kept the implementation of the original "un-pausable" portal, as we'd like to limit this new functionality only to a small set of statements. Eventually some of them should be replaced (e.g. the limitedCommandResult's lifecycle) with the new code. Also, we don't support distributed plan yet, as it involves much more complicated changes. See `Start with an entirely local plan` section in the [design doc](https://docs.google.com/document/d/1SpKTrTqc4AlGWBqBNgmyXfTweUUsrlqIaSkmaXpznA8/edit). Support for this will come as a follow-up. Epic: CRDB-17622 Release note (sql change): initial support for multiple active portals. Now with session variable `multiple_active_portals_enabled` set to true, portals satisfying all following restrictions can be executed in an interleaving manner: 1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with an entirely local plan. 99947: ui: small fixes to DB Console charts shown for secondary tenants r=dhartunian a=abarganier #97995 updated the DB Console to filter out KV-specific charts from the metrics page when viewing DB Console as a secondary application tenant. The PR missed a couple small details. This patch cleans those up with the following: - Removes KV latency charts for app tenants - Adds a single storage graph for app tenants showing livebytes - Removes the "Capacity" chart on the Overview dashboard for app tenants Release note: none Epic: https://cockroachlabs.atlassian.net/browse/CRDB-12100 NB: Please only review the final commit. 1st commit is being reviewed separately @ #99860 100188: changefeedccl: pubsub sink refactor to batching sink r=rickystewart a=samiskin Epic: https://cockroachlabs.atlassian.net/browse/CRDB-13237 This change is a followup to #99086 which moves the Pubsub sink to the batching sink framework. The changes involve: 1. Moves the Pubsub code to match the `SinkClient` interface, moving to using the lower level v1 pubsub API that lets us publish batches manually 3. Removing the extra call to json.Marshal 4. Moving to using the `pstest` package for validating results in unit tests 5. Adding topic handling to the batching sink, where batches are created per-topic 6. Added a pubsub_sink_config since it can now handle Retry and Flush config settings 7. Added metrics even to the old pubsub for the purpose of comparing the two versions At default settings, this resulted in a peak of 90k messages per second on a single node with throughput at 27.6% cpu usage, putting it at a similar level to kafka. Running pubsub v2 across all of TPCC (nodes ran out of ranges at different speeds): <img width="637" alt="Screenshot 2023-03-30 at 3 38 25 PM" src="https://user-images.githubusercontent.com/6236424/229863386-edaee27d-9762-4806-bab6-e18b8a6169d6.png"> Running pubsub v1 (barely visible, 2k messages per second) followed by v2 on tpcc.order_line (in v2 only 2 nodes ended up having ranges assigned to them): <img width="642" alt="Screenshot 2023-04-04 at 12 53 45 PM" src="https://user-images.githubusercontent.com/6236424/229863507-1883ea45-d8ce-437b-9b9c-550afec68752.png"> In the following graphs from the cloud console, where v1 was ran followed by v2, you can see how the main reason v1 was slow was that it wasn't able to batch different keys together. <img width="574" alt="Screenshot 2023-04-04 at 12 59 51 PM" src="https://user-images.githubusercontent.com/6236424/229864083-758c0814-d53c-447e-84c3-471cf5d56c44.png"> Publish requests remained the same despite way more messages in v2 <img width="1150" alt="Screenshot 2023-04-04 at 1 46 51 PM" src="https://user-images.githubusercontent.com/6236424/229875314-6e07177e-62c4-4c15-b13f-f75e8143e011.png"> Release note (performance improvement): pubsub sink changefeeds can now support higher throughputs by enabling the changefeed.new_pubsub_sink_enabled cluster setting. 100620: pkg/server: move DataDistribution to systemAdminServer r=dhartunian a=abarganier The DataDistribution endpoint reports replica counts by database and table. When it was built, it operated off the assumption that a range would only ever contain a single table's data within. Now that we have coalesced ranges, a single range can span multiple tables. Unfortunately, the DataDistribution endpoint does not take this fact into account, meaning it reports garbled and inaccurate data, unless the `spanconfig.storage_coalesce_adjacent.enabled` setting is set to false (see #98820). For secondary tenants, ranges are *always* coalesced, so this endpoint in its current state could never report meaningful data for a tenant. Given all of this, we have decided to make this endpoint only available for the system tenant. This patch accomplishes this by moving the endpoint away from the adminServer and into the systemAdminServer, making it effectively unimplemented for secondary tenants. Release note: none Informs: #97942 Co-authored-by: Jane Xing <[email protected]> Co-authored-by: Alex Barganier <[email protected]> Co-authored-by: Shiranka Miskin <[email protected]>
b19d25c
to
b2c550f
Compare
Discussed offline with David H. + Alex B., who are ok with just the following warning in the advanced debug page. @arulajmani, can I get a review? |
b2c550f
to
494b561
Compare
@@ -182,6 +182,15 @@ export class DataDistributionPage extends React.Component< | |||
<BackToAdvanceDebug history={this.props.history} /> | |||
<section className="section"> | |||
<h1 className="base-heading">Data Distribution</h1> | |||
<p style={{ maxWidth: 300, paddingTop: 10 }}> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a tracking issue to fix the data distribution screen or remove it outright?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was #97942, but the Obs-Inf folks closed it. I'll defer to them if they want something to fix this and/or removing it outright.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 11 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @irfansharif)
pkg/spanconfig/spanconfigstore/span_store.go
line 159 at r1 (raw file):
if roachpb.Key(rem).Compare(systemTableUpperBound) < 0 || !(StorageCoalesceAdjacentSetting.Get(&s.settings.SV) && s.settings.Version.IsActive(ctx, clusterversion.V23_2_EnableRangeCoalescingForSystemTenant)) {
Seems like the range coalescing cluster setting would be a no-op in 23.1 <-> 23.2 mixed version clusters; is the thinking that this is okay because it was a hidden cluster setting previously?
Fixes cockroachdb#81008. We built the basic infrastructure to coalesce ranges across table boundaries back in 22.2 as part of cockroachdb#66063. We've enabled this optimization for secondary tenants since then, but hadn't for the system tenant because of two primary blockers: - cockroachdb#93617: SHOW RANGES was broken by coalesced ranges. - cockroachdb#84105: APIs to compute sizes for schema objects (used in our UI) was broken by coalesced ranges. In both these cases we baked in assumptions about there being a minimum of one-{table,index,partition}-per-range. These blockers didn't apply to secondary tenants at the time since they didn't have access to SHOW RANGES, nor the UI pages where these schema statistics were displayed. We've addressed both these blockers in the 23.1 cycle as part of bridging the compatibility between secondary tenants and yesteryear's system tenant. - cockroachdb#93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both internally and its external UX, to accommodate ranges straddling table/database boundaries. - cockroachdb#96223 re-worked our SpanStats API to work in the face of coalesced ranges, addressing cockroachdb#84105. Release note (general change): CockroachDB would previously use separate ranges for each table, index, or partition. This is no longer true -- it's possible now to have multiple tables, indexes, and partitions to get packed into the same range. For users with many such "schema objects", this will reduce the total range count in their clusters. This is especially true if individual tables, indexes, or partitions are smaller than the default configured maximum range size (controlled using zone configs, specifically the range_max_bytes parameter). We made this change to improve scalability with respect to the number of schema objects, since the underlying range count now no longer a bottleneck. Users upgrading from 22.2, once finalizing their upgrade, may observe a round of range merges and snapshot transfers (to power said range merges) as a result of this change. If users want to opt-out of this optimization, they can use the following cluster setting: SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false;
494b561
to
7a57e7e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, and @knz)
pkg/spanconfig/spanconfigstore/span_store.go
line 36 at r1 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
Can you make this one public too then thanks
Done.
pkg/spanconfig/spanconfigstore/span_store.go
line 159 at r1 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Seems like the range coalescing cluster setting would be a no-op in 23.1 <-> 23.2 mixed version clusters; is the thinking that this is okay because it was a hidden cluster setting previously?
Yes.
Build succeeded: |
We enabled range coalescing by default in 23.2 as part of cockroachdb#98820. Alongside that change, we want to flip the SHOW RANGES behavior to be compatible with coalesced ranges, which this commit does. See cockroachdb#93644 and the accompanying release notes. Release note (backward-incompatible change): The pre-v23.1 output produced by SHOW RANGES, crdb_internal.ranges, crdb_internal.ranges_no_leases was deprecated in 23.1, and is now replaced by default with output that's compatible with coalesced ranges (i.e. ranges that pack multiple tables/indexes/partitions into individual ranges). See the 23.1 release notes for SHOW RANGES for more details.
101825: sql: add crdb_internal.pretty_value r=Xiang-Gu,msbutler a=stevendanna This composes nicely with crdb_internal.scan to allow inspection of table data and other command line tools like cockroach debug pebble. Epic: none Release note: None 102907: workload: remove initial prefix from bank workload payload r=herkolategan,srosenberg a=renatolabs An `initial-` prefix is added to the payload column of the `bank` table when the workload is initialized. It was introduced about 6 years ago [1] and its purpose at the time is not clear. There are two main problems with it: * the `initial-` prefix suggests the payload may be updated, but that actually never happens. * as currently implemented, it assumes that the `payload-bytes` command line flag is at least `len([]byte("initial-"))`. Passing a lower value to that command line flag leads to a panic. This is an implicit assumption that should not exist. This changes the row generation function so that `payload-bytes` bytes are randomly generated and inserted into the `payload` column, without the `initial-` prefix. [1] d49d535 Epic: none Release note: None 102961: sql: default sql.show_ranges_deprecated_behavior.enabled to true r=irfansharif a=irfansharif We enabled range coalescing by default in 23.2 as part of #98820. Alongside that change, we want to flip the SHOW RANGES behavior to be compatible with coalesced ranges, which this commit does. See #93644 and the accompanying release notes. Release note (backward-incompatible change): The pre-v23.1 output produced by SHOW RANGES, crdb_internal.ranges, crdb_internal.ranges_no_leases was deprecated in 23.1, and is now replaced by default with output that's compatible with coalesced ranges (i.e. ranges that pack multiple tables/indexes/partitions into individual ranges). See the 23.1 release notes for SHOW RANGES for more details. Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Renato Costa <[email protected]> Co-authored-by: irfan sharif <[email protected]>
Fixes #81008.
We built the basic infrastructure to coalesce ranges across table boundaries back in 22.2 as part of #66063. We've enabled this optimization for secondary tenants since then, but hadn't for the system tenant because of two primary blockers:
In both these cases we baked in assumptions about there being a minimum of one-{table,index,partition}-per-range. These blockers didn't apply to secondary tenants at the time since they didn't have access to SHOW RANGES, nor the UI pages where these schema statistics were displayed.
We've addressed both these blockers in the 23.1 cycle as part of bridging the compatibility between secondary tenants and yesteryear's system tenant.
SHOW RANGES
and changes tocrdb_internal.ranges{,_no_leases}
#93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both internally and its external UX, to accommodate ranges straddling table/database boundaries.Release note (general change): CockroachDB would previously use separate ranges for each table, index, or partition. This is no longer true -- it's possible now to have multiple tables, indexes, and partitions to get packed into the same range. For users with many such "schema objects", this will reduce the total range count in their clusters. This is especially true if individual tables, indexes, or partitions are smaller than the default configured maximum range size (controlled using zone configs, specifically the range_max_bytes parameter). We made this change to improve scalability with respect to the number of schema objects, since the underlying range count now no longer a bottleneck. Users upgrading from 22.2, once finalizing their upgrade, may observe a round of range merges and snapshot transfers (to power said range merges) as a result of this change. If users want to opt-out of this optimization, they can use the following cluster setting: