Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: improve the accounting of AddSSTableRequests' QPS #73731

Closed
aayushshah15 opened this issue Dec 13, 2021 · 6 comments · Fixed by #76252
Closed

kvserver: improve the accounting of AddSSTableRequests' QPS #73731

aayushshah15 opened this issue Dec 13, 2021 · 6 comments · Fixed by #76252
Assignees
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team

Comments

@aayushshah15
Copy link
Contributor

Currently, our QPS accounting is a little broken because we only record the count of incoming BatchRequests into a replica (not the actual count of requests in those batches). One of the consequences of this is that AddSSTTableRequests only count as 1 QPS even though their actual load on the receiving hardware is orders of magnitude higher than that. This means that during an import, without any background traffic in the cluster, CRDB's StoreRebalancer (which is responsible for QPS based lease and replica rebalancing) doesn't even kick in (because it doesn't try resolving tiny QPS differences between stores). We've seen at least one support case where this all lead to severe load imbalance among the nodes in a customer's cluster.

One idea by @nvanbenschoten was to consider AddSSTableRequests to be something like 1000 requests when recording them in the leaseholder replica's leaseholderStats.

This is likely a small change but we'd want to run a few experiments to validate it (same as #50620).

cc. @cockroachdb/kv-notifications

@aayushshah15 aayushshah15 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-kv-distribution Relating to rebalancing and leasing. labels Dec 13, 2021
@erikgrinaker
Copy link
Contributor

Following #72085, once all callers are migrated to use WriteAtRequestTimestamp, we'll know during evaluation how many keys the SST contains, and could propagate this back into the stats. As a fallback, we may want a linear cost based on the SST byte size, since they can vary quite a bit.

@blathers-crl blathers-crl bot added the T-kv KV Team label Dec 13, 2021
@nvanbenschoten
Copy link
Member

@dt mentioned that he landed something related in #42277. Unfortunately, that PR updated the writeStats on each replica, but only the leaseholderStats is used to make load-based rebalancing decisions, so we wouldn't have expected that change to help here.

@tbg
Copy link
Member

tbg commented Dec 14, 2021

One idea by @nvanbenschoten was to consider AddSSTableRequests to be something like 1000 requests when recording them in the leaseholder replica's leaseholderStats.

Can't AddSSTableRequests contain the number of keys the SST includes (and perhaps the total length of the keys and/or bytes in case we want to track something based on that later)? That doesn't seem difficult to achieve and then we can properly solve the problem.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Dec 14, 2021

Can't AddSSTableRequests contain the number of keys the SST includes (and perhaps the total length of the keys and/or bytes in case we want to track something based on that later)? That doesn't seem difficult to achieve and then we can properly solve the problem.

Actually, I think they already do most of the time:
https://github.com/cockroachdb/cockroach/blob/0c8383cbc91da37846584e08eff5060e0e14cdfa/pkg/roachpb/api.proto#L1681-L1685

Notably, this is always provided during bulk ingestion:

err = db.AddSSTable(ctx, item.start, item.end, item.sstBytes, false, /* disallowConflicts */
!item.disallowShadowingBelow.IsEmpty(), item.disallowShadowingBelow, &item.stats,
ingestAsWriteBatch, batchTs, false /* writeAtBatchTs */)

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Dec 17, 2021

AddSSTTableRequests only count as 1 QPS

I was under the impression that we recorded reads (QPS) and writes (WPS) separately. AddSSTable requests are already accounted for in WPS:

b.r.writeStats.recordCount(float64(added), 0)

The proposal here seems to be to better include AddSSTable writes in QPS. Wouldn't it be better to change the store rebalancer to take WPS into account for rebalancing decisions too, separately from QPS, instead of including all writes in QPS? Nodes can generally handle much larger amounts of reads than writes, and so even if a replica is seeing a high write volume, it may not get split/rebalanced because it's drowned out by an even higher read volume -- so it seems to make sense to keep these signals separate.

@erikgrinaker
Copy link
Contributor

I did a proof of concept of the initial proposal in https://github.com/erikgrinaker/cockroach/tree/sst-qps, where both the number of requests in a batch and also the number of keys in an SST count towards QPS. Note that index backfills do not include MVCC stats by default because they do not hit this branch:

if !b.disallowShadowingBelow.IsEmpty() {
b.updateMVCCStats(key, value)
}

This was a marked improvement, where load did get distributed across other nodes as well, although this took a fairly long time to converge (>30 minutes), and even after that the original three nodes were still running about twice as hot than the others. See this internal Slack thread for results and graphs. @aayushshah15 points out that this may also be due to fallout from #65379.

Either way, this approach seems viable, but will need additional testing and tuning. For posterity, this was tested as follows:

export GCE_PROJECT=cockroach-ephemeral
cluster=grinaker-index
roachprod create ${cluster} --nodes=10 --gce-machine-type=n2-standard-4 --gce-zones=europe-west1-c --local-ssd=false --gce-pd-volume-size=500 --lifetime=168h
roachprod stage ${cluster} cockroach
roachprod start ${cluster} --racks=10

CREATE TABLE trade (t_id INT8 NOT NULL PRIMARY KEY, t_dts TIMESTAMP NOT NULL, t_st_id VARCHAR(4) NOT NULL, t_tt_id VARCHAR(3) NOT NULL, t_is_cash BOOL NOT NULL, t_s_symb VARCHAR(15) NOT NULL, t_qty INT4 NOT NULL CHECK (t_qty > 0), t_bid_price DECIMAL(8,2) NOT NULL CHECK (t_bid_price > 0), t_ca_id INT8 NOT NULL, t_exec_name VARCHAR(49) NOT NULL, t_trade_price DECIMAL(8,2), t_chrg DECIMAL(10,2) NOT NULL CHECK (t_chrg >= 0), t_comm DECIMAL(10,2) NOT NULL CHECK (t_comm >= 0), t_tax DECIMAL(10,2) NOT NULL CHECK (t_tax  >= 0), t_lifo BOOL NOT NULL, FAMILY (t_id), FAMILY (t_comm, t_dts, t_st_id, t_trade_price, t_exec_name, t_tt_id, t_is_cash, t_s_symb, t_qty, t_bid_price, t_ca_id, t_chrg, t_lifo), FAMILY (t_tax));
IMPORT INTO trade CSV DATA ('gs://cockroach-fixtures/tpce-csv/customers=500000/*/Trade.txt?AUTH=implicit') WITH delimiter = '|', DETACHED;
CREATE INDEX ON trade (t_ca_id, t_dts DESC) STORING (t_st_id, t_tt_id, t_is_cash, t_s_symb, t_qty, t_bid_price, t_exec_name, t_trade_price, t_chrg);

craig bot pushed a commit that referenced this issue Dec 20, 2021
73786: kvserver: document writeStats semantics r=nvanbenschoten a=tbg

Because they're weird. We are summing up these two:

https://github.com/cockroachdb/cockroach/blob/afb50d6227e78daa72ac8fa08222bc58a5767648/pkg/kv/kvserver/replica_application_state_machine.go#L929-L928

https://github.com/cockroachdb/cockroach/blob/afb50d6227e78daa72ac8fa08222bc58a5767648/pkg/kv/kvserver/replica_application_state_machine.go#L644-L646

Oddness:

- the first and the second often measure the same thing, so we're double
  counting: a `Put(x)` will result in a mutation but may also result in
  a key being added, so we're recording that effect twice.
- a `Put(x)` on an existing key will only be counted in `b.mutations` so
  it counts only once.
- AddSSTable is only reflected in the second one, but the `KeyCount`
  could be an estimate (or so I believe; not 100% sure).
- `b.mutations` also contains update to `RangeAppliedState`, so it's
  counting at least one mutation per batch.

Overall, the writes per second have to be taken with the grain of salt
that they will in practice over-count by a factor of anywhere between
one and three, the extreme case being a `Put` that creates a new key,
which will have a contribution of one `RangeAppliedState` write, one
`KeyCount`, and the actual mutation in the batch.

Touches #73731.
Touches #42277.

Release note: None


Co-authored-by: Tobias Grieger <[email protected]>
@kvoli kvoli self-assigned this Feb 1, 2022
craig bot pushed a commit that referenced this issue Feb 16, 2022
73878: sql: introduce MVCC-compliant index backfiller r=ajwerner a=stevendanna

Previously, the index backfilling process depended upon non-MVCC
compliant AddSSTable calls which potentially rewrote previously read
historical values.

To support an MVCC-compliant AddSSTable that writes at the _current_
timestamp, this change implements a new backfilling process described
in the following RFC:

https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20211004_incremental_index_backfiller.md

In summary, the new index backfilling process depends on backfilling
the new index when it is in a BACKFILLING state (added in #72281). In
this state it receives no writes or deletes. Writes that occur during
the backfilling process are captured by a "temporary index."  This
temporary index uses the DeletePreservingEncoding to ensure it
captures deletes as well as writes.

After the of bulk backfill using the MVCC-compliant AddSSTable, the
index is moved into a MERGING state
(added in #75663) in which it receives writes and deletes. Writes
previously captured by the temporary index are then transactionally
merged into the newly added index.

This feature is currently behind a new boolean cluster setting which
default to true. Schema changes that contains both old and new-style
backfills are rejected.

Reverting the default to false will require updating various tests
since many tests depend on the exact index IDs of newly added indexes.

Release note: None

76252: kvserver: account for AddSSTableRequests in QPS r=kvoli a=kvoli

Previously, Queries-Per-Second (QPS) was calculated uniformly per
`BatchRequest` as 1. This patch introduces variable QPS calculation
for `AddSSTableRequest`, which use an order of magnitude more
resources than other request types.

This patch introduces the
`kv.replica_stats.addsst_request_size_factor` cluster setting. This
setting is used to attribute QPS to `AddSSTableRequest` sizes. The
calculation is done as QPS = 1 + size(AddSSTableRequest) / factor.
When `kv.replica_stats.addsst_request_size_factor` is less than 1, or
no `AddSSTableRequest` exists within a `BatchRequest`, then QPS = 1;
the current behavior today.

resolves #73731

Release note (performance improvement):
Introduced `kv.replica_stats.addsst_request_size_factor` cluster
setting. This setting is used to tune Queries-Per-Second (QPS)
sensitivity to large imports. By default, this setting is disabled.
When enabled, the size of any AddSSTableRequest will contribute to
QPS in inverse relation to this settings magnitude. By default this
setting configured to a conservative 50,000; every 50 kilobytes will
be accounted for as an additional 1 QPS.

76617: bazel: make `CrdbTestBuild` `const` r=RaduBerinde a=rickystewart

This partially reverts #72838. That change was made to avoid thrashing
the cache when swapping between test- and non-test configurations.
Today we have `dev cache` which can retain build artifacts across the
different configurations, so this doesn't really serve a purpose any
more. Indeed, you can now swap between `bazel build
pkg/cmd/cockroach-short` and `bazel build pkg/cmd/cockroach-short
--config=test` very quickly with Bazel downloading old versions of the
built libraries if the built-in cache gets wiped.

We still filter out the `go:build` lines in `crdb_test_{off,on}.go` so
we don't have to set `gotags` for test and non-test, which still saves a
lot of time and unnecessary recompilation. We have a check for this in
CI so no one should be able to add a build constraint without us
knowing.

Release note: None

76627: util/tracing: improve span recording interface r=andreimatei a=andreimatei

Before this patch, the Span's recording interface was left over from a
time when there were only one recording mode: verbose. We now have two
modes: verbose recording and structured recording. They can be enabled
at span creation time through the WithRecording(<type>) option. This
patch changes the Span's SetVerbose() method to expose the two options.

Release note: None

76667: roachtest/test: fix ORM testing due to removed cluster setting r=rafiss a=otan

`sql.catalog.unsafe_skip_system_config_trigger.enabled` got removed
recently and was part of an alpha. Let's clean it up in ORMs too.

Resolves #76654
Resolves #76655
Resolves #76656
Resolves #76657
Resolves #76658
Resolves #76659
Resolves #76660
Resolves #76661
Resolves #76662
Resolves #76663
Resolves #76664
Resolves #76665
Resolves #76666

Release note: None

Co-authored-by: Steven Danna <[email protected]>
Co-authored-by: Rui Hu <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
@craig craig bot closed this as completed in 27bfca4 Feb 16, 2022
RajivTS pushed a commit to RajivTS/cockroach that referenced this issue Mar 6, 2022
Previously, Queries-Per-Second (QPS) was calculated uniformly per
`BatchRequest` as 1. This patch introduces variable QPS calculation
for `AddSSTableRequest`, which use an order of magnitude more
resources than other request types.

This patch introduces the
`kv.replica_stats.addsst_request_size_factor` cluster setting. This
setting is used to attribute QPS to `AddSSTableRequest` sizes. The
calculation is done as QPS = 1 + size(AddSSTableRequest) / factor.
When `kv.replica_stats.addsst_request_size_factor` is less than 1, or
no `AddSSTableRequest` exists within a `BatchRequest`, then QPS = 1;
the current behavior today.

resolves cockroachdb#73731

Release note (performance improvement):
Introduced `kv.replica_stats.addsst_request_size_factor` cluster
setting. This setting is used to tune Queries-Per-Second (QPS)
sensitivity to large imports. By default, this setting is disabled.
When enabled, the size of any AddSSTableRequest will contribute to
QPS in inverse relation to this settings magnitude. By default this
setting configured to a conservative 50,000; every 50 kilobytes will
be accounted for as an additional 1 QPS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants