-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: make /data-distribution
compatible with range coalescing
#97942
Comments
Prevously, the Advanced Debug page on the DB console would show a lot of options to the tenant that were either unsupported, unimplemented, or not valid in that context. This change hides those options for now to create a smoother Console experience for tenants. The changes made are currently implemented via a server-controlled feature flag that is enabled by default on tenants, and not on the system tenant. Additionally, an alert banner is shown when viewing "Advanced Debug" as the system tenant with a note mentioning that the system tenant sees more options available. The summary of disabled items is as follows: * Problem ranges: hidden since implementation depends on liveness Issue to fix: cockroachdb#97941 * Data distribution: hidden due to error on tenants Issue to fix: cockroachdb#97942 * Stores section: not applicable to tenants * Problem Ranges section: see above * Raft and Key Visualizer links: not applicable to tenants * Closed timestamps: not applicable to tenants * Tracing active operations: ui impl proxies to nodes, needs changes Issue to fix: cockroachdb#97943 * Debug requests/events: not applicable to tenants * Enqueue range: not applicable to tenants * Node status: not implemented on tenants * Single node's ranges: not implemented on tenants * Hot Ranges (legacy): not implemented on tenants * Single Node Specific: not implemented on tenants * Cluster wide: not applicable to tenants * Allocator: not applicable to tenants Resolves: cockroachdb#80595 Epic: CRDB-12100 Release note: None
97945: ui: disable invalid options for tenant advanced debug r=stevendanna,knz a=dhartunian Prevously, the Advanced Debug page on the DB console would show a lot of options to the tenant that were either unsupported, unimplemented, or not valid in that context. This change hides those options for now to create a smoother Console experience for tenants. The changes made are currently implemented via a server-controlled feature flag that is enabled by default on tenants, and not on the system tenant. The summary of disabled items is as follows: * Problem ranges: hidden since implementation depends on liveness * Issue to fix: #97941 * Data distribution: hidden due to error on tenants * Issue to fix: #97942 * Stores section: not applicable to tenants * Problem Ranges section: see above * Raft and Key Visualizer links: not applicable to tenants * Closed timestamps: not applicable to tenants * Tracing active operations: ui impl proxies to nodes, needs changes * Issue to fix: #97943 * Debug requests/events: not applicable to tenants * Enqueue range: not applicable to tenants * Node status: not implemented on tenants * Single node's ranges: not implemented on tenants * Hot Ranges (legacy): not implemented on tenants * Single Node Specific: not implemented on tenants * Cluster wide: not applicable to tenants * Allocator: not applicable to tenants Resolves: #80595 Epic: CRDB-12100 Release note: None 98167: sqlstats: flush stmt and txns in parallel r=xinhaoz a=xinhaoz When we have a lot of in-memory data, the flush can take some time to complete. WHy not flush stmt and txns in parallel, so that we at least get some data to both system tables while flushing instead of wiating for the stmt flush to complete to start flushing tot txns, which can delay data being shown in the app. Epic: none Release note: None Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Xin Hao Zhang <[email protected]>
We'd have to teach the underlying API to deal with coalesced ranges (#98820), and reflect it in the visualization as well. |
Some notes after discussing with @dhartunian:
|
98689: workload: jitter the teardown of connections to prevent thundering herd r=sean- a=sean- workload: jitter the teardown of connections to prevent thundering herd This change upgrades workload's use of pgx from v4 to v5 in order to allow jittering the teardown of connections. This change sets a max connection age of 5min and jitters the teardown by 30s. Upgrading to pgx v5 also adds non-blocking pgxpool connection acquisition. workload: add flags to manage the age and lifecycle of connection pool Add flags to all workload types to specify: * the max connection age: `--max-conn-lifetime duration` * the max connection age jitter: `--max-conn-lifetime-jitter duration` * the max connection idle time: `--max-conn-idle-time duration` * the connection health check interval: `--conn-healthcheck-period duration` * the min number of connections in the pool: `--min-conns int` workload: add support for remaining pgx query modes Add support for `pgx.QueryExecModeCacheDescribe` and `pgx.QueryExecModeDescribeExec`. Previously, only three of the five query modes were available. workload: fix race condition when recording histogram data Release note (cli change): workload jitters teardown of connections to prevent thundering herd impacting P99 latency results. Release note (cli change): workload utility now has flags to tune the connection pool used for testing. See `--conn-healthcheck-period`, `--min-conns`, and the `--max-conn-*` flags for details. Release note (cli change): workload now supports every [PostgreSQL query mode](https://github.com/jackc/pgx/blob/fa5fbed497bc75acee05c1667a8760ce0d634cba/conn.go#L167-L182) available via the underlying pgx driver. 99142: server: fix `DataDistribution` server error when creating a tenant r=zachlite a=zachlite This is a stop-gap commit that enables the DataDistribution endpoint to handle the parts of the key space belonging to secondary tenants without error. Despite no error, the result returned for secondary tenants is not correct. The DataDistribution endpoint was written before #79700, and therefore doesn't know that multiple tables can exist within a range. Additionally, the results for the system tenant will be incorrect soon because #81008 is in progress. Improvements are tracked by #97942 Fixes: #97993 Release note: None 99494: changefeedccl: Do not require rangefeed when running initial scan only. r=miretskiy a=miretskiy Fixes #99470 Release note: None Co-authored-by: Sean Chittenden <[email protected]> Co-authored-by: zachlite <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Given that this endpoint doesn't take coalesced ranges into account, and tenants' ranges are always coalesced, it doesn't make sense to implement it for secondary tenants. Instead, we're going to make the endpoint system-tenant-only until KV updates it to take coalesced ranges into account (or nukes it entirely). |
99663: sql: update connExecutor logic for pausable portals r=ZhouXing19 a=ZhouXing19 This PR replaces #96358 and is part of the initial implementation of multiple active portals. ---- This PR is to add limited support for multiple active portals. Now portals satisfying all following restrictions can be paused and resumed (i.e., with other queries interleaving it): 1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with a _non-distributed_ plan. This feature is gated by a session variable `multiple_active_portals_enabled`. When it's set `true`, all portals that satisfy the restrictions above will automatically become "pausable" when being created via the pgwire `Bind` stmt. The core idea of this implementation is 1. Add a `switchToAnotherPortal` status to the result-consumption state machine. When we receive an `ExecPortal` message for a different portal, we simply return the control to the connExecutor. (#99052) 2. Persist `flow` `queryID` `span` and `instrumentationHelper` for the portal, and reuse it when we re-execute a portal. This is to ensure we _continue_ the fetching rather than starting all over. (#99173) 3. To enable 2, we need to delay the clean-up of resources till we close the portal. For this we introduced the stacks of cleanup functions. (This PR) Note that we kept the implementation of the original "un-pausable" portal, as we'd like to limit this new functionality only to a small set of statements. Eventually some of them should be replaced (e.g. the limitedCommandResult's lifecycle) with the new code. Also, we don't support distributed plan yet, as it involves much more complicated changes. See `Start with an entirely local plan` section in the [design doc](https://docs.google.com/document/d/1SpKTrTqc4AlGWBqBNgmyXfTweUUsrlqIaSkmaXpznA8/edit). Support for this will come as a follow-up. Epic: CRDB-17622 Release note (sql change): initial support for multiple active portals. Now with session variable `multiple_active_portals_enabled` set to true, portals satisfying all following restrictions can be executed in an interleaving manner: 1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with an entirely local plan. 99947: ui: small fixes to DB Console charts shown for secondary tenants r=dhartunian a=abarganier #97995 updated the DB Console to filter out KV-specific charts from the metrics page when viewing DB Console as a secondary application tenant. The PR missed a couple small details. This patch cleans those up with the following: - Removes KV latency charts for app tenants - Adds a single storage graph for app tenants showing livebytes - Removes the "Capacity" chart on the Overview dashboard for app tenants Release note: none Epic: https://cockroachlabs.atlassian.net/browse/CRDB-12100 NB: Please only review the final commit. 1st commit is being reviewed separately @ #99860 100188: changefeedccl: pubsub sink refactor to batching sink r=rickystewart a=samiskin Epic: https://cockroachlabs.atlassian.net/browse/CRDB-13237 This change is a followup to #99086 which moves the Pubsub sink to the batching sink framework. The changes involve: 1. Moves the Pubsub code to match the `SinkClient` interface, moving to using the lower level v1 pubsub API that lets us publish batches manually 3. Removing the extra call to json.Marshal 4. Moving to using the `pstest` package for validating results in unit tests 5. Adding topic handling to the batching sink, where batches are created per-topic 6. Added a pubsub_sink_config since it can now handle Retry and Flush config settings 7. Added metrics even to the old pubsub for the purpose of comparing the two versions At default settings, this resulted in a peak of 90k messages per second on a single node with throughput at 27.6% cpu usage, putting it at a similar level to kafka. Running pubsub v2 across all of TPCC (nodes ran out of ranges at different speeds): <img width="637" alt="Screenshot 2023-03-30 at 3 38 25 PM" src="https://user-images.githubusercontent.com/6236424/229863386-edaee27d-9762-4806-bab6-e18b8a6169d6.png"> Running pubsub v1 (barely visible, 2k messages per second) followed by v2 on tpcc.order_line (in v2 only 2 nodes ended up having ranges assigned to them): <img width="642" alt="Screenshot 2023-04-04 at 12 53 45 PM" src="https://user-images.githubusercontent.com/6236424/229863507-1883ea45-d8ce-437b-9b9c-550afec68752.png"> In the following graphs from the cloud console, where v1 was ran followed by v2, you can see how the main reason v1 was slow was that it wasn't able to batch different keys together. <img width="574" alt="Screenshot 2023-04-04 at 12 59 51 PM" src="https://user-images.githubusercontent.com/6236424/229864083-758c0814-d53c-447e-84c3-471cf5d56c44.png"> Publish requests remained the same despite way more messages in v2 <img width="1150" alt="Screenshot 2023-04-04 at 1 46 51 PM" src="https://user-images.githubusercontent.com/6236424/229875314-6e07177e-62c4-4c15-b13f-f75e8143e011.png"> Release note (performance improvement): pubsub sink changefeeds can now support higher throughputs by enabling the changefeed.new_pubsub_sink_enabled cluster setting. 100620: pkg/server: move DataDistribution to systemAdminServer r=dhartunian a=abarganier The DataDistribution endpoint reports replica counts by database and table. When it was built, it operated off the assumption that a range would only ever contain a single table's data within. Now that we have coalesced ranges, a single range can span multiple tables. Unfortunately, the DataDistribution endpoint does not take this fact into account, meaning it reports garbled and inaccurate data, unless the `spanconfig.storage_coalesce_adjacent.enabled` setting is set to false (see #98820). For secondary tenants, ranges are *always* coalesced, so this endpoint in its current state could never report meaningful data for a tenant. Given all of this, we have decided to make this endpoint only available for the system tenant. This patch accomplishes this by moving the endpoint away from the adminServer and into the systemAdminServer, making it effectively unimplemented for secondary tenants. Release note: none Informs: #97942 Co-authored-by: Jane Xing <[email protected]> Co-authored-by: Alex Barganier <[email protected]> Co-authored-by: Shiranka Miskin <[email protected]>
Marking as closed - see #100620 which disables this endpoint for secondary tenants. Context here: #97942 (comment) |
The feature will become broken for the system tenant as well in v23.2, because we will enable range coalescing by default. I am reopening this issue now to decide what to do with this. We have only 2 options:
The current approach (ref) which keeps it but informs the user in DB console that "it is broken for coalesced ranges" is simply not acceptable. Tthe feature will break in 23.2 for all clusters if we don't take action. |
/data-distribution
for tenants/data-distribution
compatible with range coalescing
So, if I understand correctly, the The problem seems to be here: Lines 3039 to 3094 in d527c4c
We retrieve all the meta KVs and use the start key to determine the table ID. However, for coalesced ranges, we can have data from multiple tables within the same range. Initially, it seemed like the work would be easy since I could have simply used So I explored the queries that `SHOW CLUSTER...` uses underneath
While Now, I wonder if there is a simpler way at the KV layer to achieve our goal instead of using SQL queries. If not, I can think of creating a new version of or modifying |
Background: I wanted to include deleted tables in `crdb_internal.table_spans` and add a new column `is_deleted`. However, to achieve this, I needed to add a boolean to `forEachTableDesc` and all the invoked functions so that `forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them. The argument count for these functions was already excessive, so I made the following changes to improve readability and maintainability. Details: Previously, there were around four different overloads of functions: - forEachTableDesc - forEachTableDescAll - forEachTableDescWithTableLookup - forEachTableDescAllWithTableLookup - forEachTableDescWithTableLookupInternal This added excessive nesting and reduced readability. In many cases, functions with the prefix `TableLookup` were used even though they no longer utilized `tableLookupFn`. In these proposed changes: - I aim to improve readability by merging these functions. - To avoid repeatedly declaring the same definition, I moved all arguments to a new struct. This way, we only define what we need. ```diff - func(ctx context.Context, db catalog.DatabaseDescriptor, - sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error { + func(ctx context.Context, descCtx tableDescContext) error { + sc, table := descCtx.schema, descCtx.table ``` Now, we only have `forEachTableDesc` to iterate over table descriptors. Informs: cockroachdb#128578, cockroachdb#97942 Epic: CRDB-12100 Release note: None
Background: I wanted to include deleted tables in `crdb_internal.table_spans` and add a new column `is_deleted`. However, to achieve this, I needed to add a boolean to `forEachTableDesc` and all the invoked functions so that `forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them. The argument count for these functions was already excessive, so I made the following changes to improve readability and maintainability. Details: Previously, there were around four different overloads of functions: - forEachTableDesc - forEachTableDescAll - forEachTableDescWithTableLookup - forEachTableDescAllWithTableLookup - forEachTableDescWithTableLookupInternal This added excessive nesting and reduced readability. In many cases, functions with the prefix `TableLookup` were used even though they no longer utilized `tableLookupFn`. In these proposed changes: - I aim to improve readability by merging these functions. - To avoid repeatedly declaring the same definition, I moved all arguments to a new struct. This way, we only define what we need. ```diff - func(ctx context.Context, db catalog.DatabaseDescriptor, - sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error { + func(ctx context.Context, descCtx tableDescContext) error { + sc, table := descCtx.schema, descCtx.table ``` Now, we only have `forEachTableDesc` to iterate over table descriptors. Informs: cockroachdb#128578, cockroachdb#97942 Epic: CRDB-12100 Release note: None
128755: sql: refactor to combine `forEachTableDesc*` functions. r=rafiss a=shubhamdhama Background: I wanted to include deleted tables in `crdb_internal.table_spans` and add a new column `is_deleted`. However, to achieve this, I needed to add a boolean to `forEachTableDesc` and all the invoked functions so that `forEachTableDescWithTableLookupInternalFromDescriptors` doesn't skip them. The argument count for these functions was already excessive, so I made the following changes to improve readability and maintainability. Details: Previously, there were around four different overloads of functions: - forEachTableDesc - forEachTableDescAll - forEachTableDescWithTableLookup - forEachTableDescAllWithTableLookup - forEachTableDescWithTableLookupInternal This added excessive nesting and reduced readability. In many cases, functions with the prefix `TableLookup` were used even though they no longer utilized `tableLookupFn`. In these proposed changes: - I aim to improve readability by merging these functions. - To avoid repeatedly declaring the same definition, I moved all arguments to a new struct. This way, we only define what we need. ```diff - func(ctx context.Context, db catalog.DatabaseDescriptor, - sc catalog.SchemaDescriptor, table catalog.TableDescriptor) error { + func(ctx context.Context, descCtx tableDescContext) error { + sc, table := descCtx.schema, descCtx.table ``` Now, we only have `forEachTableDesc` to iterate over table descriptors. Informs: #128578, #97942 Epic: CRDB-12100 Release note: None Co-authored-by: Shubham Dhama <[email protected]>
Previously, this endpoint scanned over meta KVs and used their startKey to determine the table they belong to. However, this approach does not work for coalesced ranges. The fix, inspired by the `SHOW CLUSTER RANGES` query, uses the `crdb_internal.table_spans` table and joins it with the `crdb_internal.ranges_no_leases` table. There is still some work left that can be addressed in follow-up PR: - We might want to add a summation of ranges at various levels because the sum of replicas of all tables in a database can be more than the actual number of ranges that belong to that database due to coalescing. Overall, this PR also fixes the broken behavior to a large extent of this endpoint since range coalescing is enabled. Informs: cockroachdb#97942 Fixes: cockroachdb#109501, cockroachdb#106897 Epic: CRDB-12100 Release note: None
128578: server: enable `data_distribution` API for secondary tenants. r=dhartunian,stevendanna a=shubhamdhama Previously, this endpoint scanned over meta KVs and used their startKey to determine the table they belong to. However, this approach does not work for coalesced ranges. The fix, inspired by the `SHOW CLUSTER RANGES` query, uses the `crdb_internal.table_spans` table and joins it with the `crdb_internal.ranges_no_leases` table. There is still some work left that can be addressed in follow-up PR: - We might want to add a summation of ranges at various levels because the sum of replicas of all tables in a database can be more than the actual number of ranges that belong to that database due to coalescing. Overall, this PR also fixes the broken behavior to a large extent of this endpoint since range coalescing is enabled. Informs: #97942 Fixes: #109501, #106897 Epic: CRDB-12100 Release note: None Co-authored-by: Shubham Dhama <[email protected]>
While this endpoint is enabled for secondary tenants, there are still some follow-ups needed to make it user friendly and compatible with range coalescing: #128578 (comment). I'm removing my assignment as discussed (#128578 (review)) |
Current implementation returns an error.
Epic: CRDB-12100
Jira issue: CRDB-24980
The text was updated successfully, but these errors were encountered: