-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: improve the accounting of AddSSTableRequest
s' QPS
#73731
Comments
Following #72085, once all callers are migrated to use |
Can't AddSSTableRequests contain the number of keys the SST includes (and perhaps the total length of the keys and/or bytes in case we want to track something based on that later)? That doesn't seem difficult to achieve and then we can properly solve the problem. |
Actually, I think they already do most of the time: Notably, this is always provided during bulk ingestion: cockroach/pkg/kv/bulk/sst_batcher.go Lines 470 to 472 in 2318073
|
I was under the impression that we recorded reads (QPS) and writes (WPS) separately.
The proposal here seems to be to better include |
I did a proof of concept of the initial proposal in https://github.com/erikgrinaker/cockroach/tree/sst-qps, where both the number of requests in a batch and also the number of keys in an SST count towards QPS. Note that index backfills do not include MVCC stats by default because they do not hit this branch: cockroach/pkg/kv/bulk/sst_batcher.go Lines 208 to 210 in 2cce39c
This was a marked improvement, where load did get distributed across other nodes as well, although this took a fairly long time to converge (>30 minutes), and even after that the original three nodes were still running about twice as hot than the others. See this internal Slack thread for results and graphs. @aayushshah15 points out that this may also be due to fallout from #65379. Either way, this approach seems viable, but will need additional testing and tuning. For posterity, this was tested as follows:
|
73786: kvserver: document writeStats semantics r=nvanbenschoten a=tbg Because they're weird. We are summing up these two: https://github.com/cockroachdb/cockroach/blob/afb50d6227e78daa72ac8fa08222bc58a5767648/pkg/kv/kvserver/replica_application_state_machine.go#L929-L928 https://github.com/cockroachdb/cockroach/blob/afb50d6227e78daa72ac8fa08222bc58a5767648/pkg/kv/kvserver/replica_application_state_machine.go#L644-L646 Oddness: - the first and the second often measure the same thing, so we're double counting: a `Put(x)` will result in a mutation but may also result in a key being added, so we're recording that effect twice. - a `Put(x)` on an existing key will only be counted in `b.mutations` so it counts only once. - AddSSTable is only reflected in the second one, but the `KeyCount` could be an estimate (or so I believe; not 100% sure). - `b.mutations` also contains update to `RangeAppliedState`, so it's counting at least one mutation per batch. Overall, the writes per second have to be taken with the grain of salt that they will in practice over-count by a factor of anywhere between one and three, the extreme case being a `Put` that creates a new key, which will have a contribution of one `RangeAppliedState` write, one `KeyCount`, and the actual mutation in the batch. Touches #73731. Touches #42277. Release note: None Co-authored-by: Tobias Grieger <[email protected]>
73878: sql: introduce MVCC-compliant index backfiller r=ajwerner a=stevendanna Previously, the index backfilling process depended upon non-MVCC compliant AddSSTable calls which potentially rewrote previously read historical values. To support an MVCC-compliant AddSSTable that writes at the _current_ timestamp, this change implements a new backfilling process described in the following RFC: https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211004_incremental_index_backfiller.md In summary, the new index backfilling process depends on backfilling the new index when it is in a BACKFILLING state (added in #72281). In this state it receives no writes or deletes. Writes that occur during the backfilling process are captured by a "temporary index." This temporary index uses the DeletePreservingEncoding to ensure it captures deletes as well as writes. After the of bulk backfill using the MVCC-compliant AddSSTable, the index is moved into a MERGING state (added in #75663) in which it receives writes and deletes. Writes previously captured by the temporary index are then transactionally merged into the newly added index. This feature is currently behind a new boolean cluster setting which default to true. Schema changes that contains both old and new-style backfills are rejected. Reverting the default to false will require updating various tests since many tests depend on the exact index IDs of newly added indexes. Release note: None 76252: kvserver: account for AddSSTableRequests in QPS r=kvoli a=kvoli Previously, Queries-Per-Second (QPS) was calculated uniformly per `BatchRequest` as 1. This patch introduces variable QPS calculation for `AddSSTableRequest`, which use an order of magnitude more resources than other request types. This patch introduces the `kv.replica_stats.addsst_request_size_factor` cluster setting. This setting is used to attribute QPS to `AddSSTableRequest` sizes. The calculation is done as QPS = 1 + size(AddSSTableRequest) / factor. When `kv.replica_stats.addsst_request_size_factor` is less than 1, or no `AddSSTableRequest` exists within a `BatchRequest`, then QPS = 1; the current behavior today. resolves #73731 Release note (performance improvement): Introduced `kv.replica_stats.addsst_request_size_factor` cluster setting. This setting is used to tune Queries-Per-Second (QPS) sensitivity to large imports. By default, this setting is disabled. When enabled, the size of any AddSSTableRequest will contribute to QPS in inverse relation to this settings magnitude. By default this setting configured to a conservative 50,000; every 50 kilobytes will be accounted for as an additional 1 QPS. 76617: bazel: make `CrdbTestBuild` `const` r=RaduBerinde a=rickystewart This partially reverts #72838. That change was made to avoid thrashing the cache when swapping between test- and non-test configurations. Today we have `dev cache` which can retain build artifacts across the different configurations, so this doesn't really serve a purpose any more. Indeed, you can now swap between `bazel build pkg/cmd/cockroach-short` and `bazel build pkg/cmd/cockroach-short --config=test` very quickly with Bazel downloading old versions of the built libraries if the built-in cache gets wiped. We still filter out the `go:build` lines in `crdb_test_{off,on}.go` so we don't have to set `gotags` for test and non-test, which still saves a lot of time and unnecessary recompilation. We have a check for this in CI so no one should be able to add a build constraint without us knowing. Release note: None 76627: util/tracing: improve span recording interface r=andreimatei a=andreimatei Before this patch, the Span's recording interface was left over from a time when there were only one recording mode: verbose. We now have two modes: verbose recording and structured recording. They can be enabled at span creation time through the WithRecording(<type>) option. This patch changes the Span's SetVerbose() method to expose the two options. Release note: None 76667: roachtest/test: fix ORM testing due to removed cluster setting r=rafiss a=otan `sql.catalog.unsafe_skip_system_config_trigger.enabled` got removed recently and was part of an alpha. Let's clean it up in ORMs too. Resolves #76654 Resolves #76655 Resolves #76656 Resolves #76657 Resolves #76658 Resolves #76659 Resolves #76660 Resolves #76661 Resolves #76662 Resolves #76663 Resolves #76664 Resolves #76665 Resolves #76666 Release note: None Co-authored-by: Steven Danna <[email protected]> Co-authored-by: Rui Hu <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Oliver Tan <[email protected]>
Previously, Queries-Per-Second (QPS) was calculated uniformly per `BatchRequest` as 1. This patch introduces variable QPS calculation for `AddSSTableRequest`, which use an order of magnitude more resources than other request types. This patch introduces the `kv.replica_stats.addsst_request_size_factor` cluster setting. This setting is used to attribute QPS to `AddSSTableRequest` sizes. The calculation is done as QPS = 1 + size(AddSSTableRequest) / factor. When `kv.replica_stats.addsst_request_size_factor` is less than 1, or no `AddSSTableRequest` exists within a `BatchRequest`, then QPS = 1; the current behavior today. resolves cockroachdb#73731 Release note (performance improvement): Introduced `kv.replica_stats.addsst_request_size_factor` cluster setting. This setting is used to tune Queries-Per-Second (QPS) sensitivity to large imports. By default, this setting is disabled. When enabled, the size of any AddSSTableRequest will contribute to QPS in inverse relation to this settings magnitude. By default this setting configured to a conservative 50,000; every 50 kilobytes will be accounted for as an additional 1 QPS.
Currently, our QPS accounting is a little broken because we only record the count of incoming
BatchRequest
s into a replica (not the actual count of requests in those batches). One of the consequences of this is thatAddSSTTableRequest
s only count as 1 QPS even though their actual load on the receiving hardware is orders of magnitude higher than that. This means that during an import, without any background traffic in the cluster, CRDB'sStoreRebalancer
(which is responsible for QPS based lease and replica rebalancing) doesn't even kick in (because it doesn't try resolving tiny QPS differences between stores). We've seen at least one support case where this all lead to severe load imbalance among the nodes in a customer's cluster.One idea by @nvanbenschoten was to consider
AddSSTableRequests
to be something like 1000 requests when recording them in the leaseholder replica'sleaseholderStats
.This is likely a small change but we'd want to run a few experiments to validate it (same as #50620).
cc. @cockroachdb/kv-notifications
The text was updated successfully, but these errors were encountered: