kv: DistSender batch slicing is quadratic with batch size #68536

nvanbenschoten · 2021-08-06T14:51:57Z

CPU profiles like the following demonstrate high CPU utilization in DistSender. The profile is exemplary of the cost of a BatchRequest with a high fanout factor.

If we look closely, we can see that two of the most pronounced functions are kvcoord.next and kvcoord.truncate. The first of these is called here and the second here. We can also assume that if this was a reverse scan, kvcoord.prev, called here would replace kvcoord.next as a pronounced use of CPU.

The problem is that all three functions defined in pkg/kv/kvclient/kvcoord/batch.go are linear with respect to batch size, and they are called in a loop that iterates over ranges touched by a batch. This means that with a large batch and a high fanout factor (i.e. each request in the batch touches a different range), the cost of these functions is quadratic with respect to the number of requests in the batch. This is fine for small batches, but not great for large batches, like the ones issued by a lookup join.

We should explore what we can do here to make this efficient. Solutions may include micro-optimizing the internals of these functions (with microbenchmarks) or restructuring their use to avoid the quadratic cost.

Jira issue: CRDB-9072

The text was updated successfully, but these errors were encountered:

yuzefovich · 2021-09-24T18:15:31Z

Possibly related issue #70706.

81845: roachtest: include some cluster configs in test failure reports r=srosenberg a=renatolabs This commit changes the 'Parameters' section of a test failure. Previously, it would only include the contents of the `TAGS` and `GOFLAGS` environment variables. It now adds cluster configuration values obtained from the `ClusterSpec`. These parameters are prefixed with `ROACHTEST` (e.g., `ROACHTEST_cloud`). For now, just the cloud name, number of CPUs and SSDs are included. In the future, we can expand this work to support other relevant cluster configuration values as needed to debug test failures. Resolves: #80799. Release note: None. 81934: kvcoord: optimize Truncate r=yuzefovich a=yuzefovich **kvcoord: introduce benchmark for Truncate function** Release note: None **kvcoord: optimize Truncate** This commit optimizes the `Truncate` function by accessing the header of the request directly (instead of having to construct `roachpb.Span`) as well as keeping track of whether the request has been changed (instead of performing key comparison later). ``` name old time/op new time/op delta Truncate/reqs=32/type=get-24 3.81µs ± 4% 3.42µs ± 1% -10.32% (p=0.000 n=9+9) Truncate/reqs=32/type=scan-24 6.35µs ±11% 5.81µs ±11% -8.56% (p=0.019 n=10+10) Truncate/reqs=1024/type=get-24 98.9µs ± 2% 85.8µs ± 1% -13.24% (p=0.000 n=10+10) Truncate/reqs=1024/type=scan-24 176µs ± 2% 155µs ± 2% -12.17% (p=0.000 n=10+10) Truncate/reqs=32768/type=get-24 3.23ms ± 1% 2.78ms ± 0% -13.86% (p=0.000 n=10+9) Truncate/reqs=32768/type=scan-24 5.64ms ± 1% 5.02ms ± 0% -11.01% (p=0.000 n=10+9) name old alloc/op new alloc/op delta Truncate/reqs=32/type=get-24 744B ± 0% 744B ± 0% ~ (all equal) Truncate/reqs=32/type=scan-24 2.08kB ±37% 2.05kB ±36% ~ (p=0.897 n=10+10) Truncate/reqs=1024/type=get-24 24.6kB ± 0% 24.6kB ± 0% ~ (all equal) Truncate/reqs=1024/type=scan-24 73.2kB ± 2% 73.7kB ± 2% ~ (p=0.167 n=9+9) Truncate/reqs=32768/type=get-24 1.21MB ± 0% 1.21MB ± 0% ~ (p=0.370 n=10+10) Truncate/reqs=32768/type=scan-24 2.82MB ± 0% 2.84MB ± 0% +0.48% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Truncate/reqs=32/type=get-24 10.0 ± 0% 10.0 ± 0% ~ (all equal) Truncate/reqs=32/type=scan-24 33.6 ±25% 34.8 ±26% ~ (p=0.646 n=10+10) Truncate/reqs=1024/type=get-24 20.0 ± 0% 20.0 ± 0% ~ (all equal) Truncate/reqs=1024/type=scan-24 692 ± 5% 705 ± 6% ~ (p=0.167 n=9+9) Truncate/reqs=32768/type=get-24 40.0 ± 0% 40.0 ± 0% ~ (all equal) Truncate/reqs=32768/type=scan-24 21.6k ± 2% 22.0k ± 1% +1.73% (p=0.000 n=10+10) ``` Addresses: #68536. Release note: None 81938: kvcoord: don't construct roachpb.Spans needlessly in several spots r=yuzefovich a=yuzefovich This should be a minor performance improvement, but I don't know of good benchmarks to verify that, yet the change doesn't seem controversial either. Release note: None Co-authored-by: Renato Costa <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]>

yuzefovich · 2022-06-15T05:04:37Z

I have thought about this problem for some time, and I think the best solution is to sort the requests upfront (because it allows us to only compare the range boundaries against a contiguous subset of all requests), and then keep track of the fully-processed requests (in order to skip key comparisons altogether) as well as the yet-to-be-processed suffixes of other requests such that the ordering of requests is maintained across the truncation. This solution is implemented in #82865.

In terms of time complexity, the existing qudratic approach is O(N * R) where N is the number of original requests and R is the number of ranges those requests touch. The proposed solution is actually worse in the absolute worse case - O(N * (log N + R)) because of the need to sort the requests up front. However, in practice, it should be much faster and behave like O(N * log N + T * R) where T is the average number of requests that intersect with a single range.

nvanbenschoten added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv-client Relating to the KV client and the KV interface. T-kv KV Team labels Aug 6, 2021

jordanlewis mentioned this issue Aug 6, 2021

[wip] kvcoord: improve truncate() and next() #68564

Closed

This was referenced May 26, 2022

kvcoord: optimize Truncate #81934

Merged

kvcoord: permit parallelization of scans with limits #54680

Open

yuzefovich self-assigned this Jun 11, 2022

blathers-crl bot added the T-sql-queries SQL Queries Team label Jun 11, 2022

yuzefovich mentioned this issue Jun 15, 2022

kvcoord: optimize batch truncation loop #82865

Merged

craig bot closed this as completed in 90b15cf Jul 6, 2022

exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label Feb 13, 2023

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to Done in SQL Queries Jul 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: DistSender batch slicing is quadratic with batch size #68536

kv: DistSender batch slicing is quadratic with batch size #68536

nvanbenschoten commented Aug 6, 2021 •

edited by cockroach-jira-scripts

Loading

yuzefovich commented Sep 24, 2021

yuzefovich commented Jun 15, 2022

kv: DistSender batch slicing is quadratic with batch size #68536

kv: DistSender batch slicing is quadratic with batch size #68536

Comments

nvanbenschoten commented Aug 6, 2021 • edited by cockroach-jira-scripts Loading

yuzefovich commented Sep 24, 2021

yuzefovich commented Jun 15, 2022

nvanbenschoten commented Aug 6, 2021 •

edited by cockroach-jira-scripts

Loading