kv: performance regression due to new call to `collectSpansRead` #91374

nvanbenschoten · 2022-11-06T18:46:04Z

We accidentally introduced a regression in d6d4ccb due to the new call to collectSpansRead when recomputing a request's access span (post limiting). collectSpansRead is allocation heavy and computationally expensive. In addition, we're also failing to Release the *spanset.SpanSet objects that the function returns, so we're leaking from its memory pool (top level objects and recycled inner slices).

➜ benchdiff --new=d6d4ccb --count=20 --post-checkout='dev generate go' --run='KV/././rows=1$$' ./pkg/sql/tests

name                        old time/op    new time/op    delta
KV/Scan/SQL/rows=1-10         89.2µs ± 1%    90.2µs ± 1%   +1.08%  (p=0.000 n=19+20)
KV/Update/SQL/rows=1-10        164µs ± 1%     167µs ± 2%   +1.63%  (p=0.000 n=18+19)
KV/Delete/SQL/rows=1-10        139µs ± 2%     142µs ± 1%   +2.06%  (p=0.000 n=19+20)
KV/Insert/SQL/rows=1-10        122µs ± 1%     126µs ± 1%   +2.77%  (p=0.000 n=19+18)
KV/Update/Native/rows=1-10    65.5µs ± 2%    67.6µs ± 2%   +3.16%  (p=0.000 n=19+20)
KV/Delete/Native/rows=1-10    43.2µs ± 2%    44.6µs ± 1%   +3.29%  (p=0.000 n=20+19)
KV/Insert/Native/rows=1-10    41.8µs ± 2%    43.3µs ± 2%   +3.54%  (p=0.000 n=18+19)
KV/Scan/Native/rows=1-10      17.2µs ± 2%    17.9µs ± 2%   +4.38%  (p=0.000 n=19+20)

name                        old alloc/op   new alloc/op   delta
KV/Delete/SQL/rows=1-10       50.9kB ± 0%    52.3kB ± 1%   +2.70%  (p=0.000 n=20+20)
KV/Scan/SQL/rows=1-10         24.5kB ± 0%    25.3kB ± 0%   +2.93%  (p=0.000 n=19+19)
KV/Insert/SQL/rows=1-10       43.4kB ± 0%    44.8kB ± 0%   +3.44%  (p=0.000 n=20+20)
KV/Update/SQL/rows=1-10       49.8kB ± 0%    52.0kB ± 0%   +4.42%  (p=0.000 n=18+18)
KV/Delete/Native/rows=1-10    16.0kB ± 0%    17.3kB ± 1%   +8.35%  (p=0.000 n=15+20)
KV/Insert/Native/rows=1-10    15.8kB ± 0%    17.3kB ± 0%   +9.40%  (p=0.000 n=18+19)
KV/Scan/Native/rows=1-10      7.54kB ± 0%    8.26kB ± 0%   +9.52%  (p=0.000 n=20+20)
KV/Update/Native/rows=1-10    22.2kB ± 0%    24.5kB ± 0%  +10.14%  (p=0.000 n=19+20)

name                        old allocs/op  new allocs/op  delta
KV/Scan/SQL/rows=1-10            257 ± 1%       265 ± 0%   +3.24%  (p=0.000 n=19+15)
KV/Delete/SQL/rows=1-10          384 ± 0%       402 ± 0%   +4.55%  (p=0.000 n=20+16)
KV/Insert/SQL/rows=1-10          348 ± 0%       366 ± 0%   +5.07%  (p=0.000 n=18+16)
KV/Update/SQL/rows=1-10          484 ± 0%       509 ± 0%   +5.25%  (p=0.000 n=18+18)
KV/Delete/Native/rows=1-10       121 ± 0%       138 ± 0%  +14.05%  (p=0.000 n=19+20)
KV/Insert/Native/rows=1-10       120 ± 0%       137 ± 0%  +14.17%  (p=0.000 n=20+20)
KV/Update/Native/rows=1-10       168 ± 0%       193 ± 0%  +15.12%  (p=0.000 n=20+20)
KV/Scan/Native/rows=1-10        50.0 ± 0%      59.0 ± 0%  +18.00%  (p=0.000 n=20+18)

An earlier revision of the PR included a new getTrueSpans function, which avoided the call to collectSpansRead. I think we'll want to do something like this. However, even in getTrueSpans, we were still constructing a *spanset.SpanSet. We should explore whether we need to construct a *spanset.SpanSet or whether we can get away with sometime simpler. Notice that recordBatchForLoadBasedSplitting calls BoundarySpan on this spanset, which reduces the full set to a single span. Can we compute the boundary span directly in O(1) memory (i.e. no heap allocations)? Can we push this into the span function passed to Decider.Record so that we don't even pay this cost unless it is needed?

NOTE: this was not included in the v22.2 release, so it's not a release blocker.

cc. @KaiSun314 @kvoli

Jira issue: CRDB-21237

The text was updated successfully, but these errors were encountered:

This commit adds `Release` calls for both SpanSets returned by `collectSpansRead` on the optimistic eval validation path. The failure to recycle these SpanSets was causing a leak from the `SpanSet` memory pool of both top level objects and the recycled inner slices. We have the same issue at the other caller of `collectSpansRead`, which is being tracked more broadly in cockroachdb#91374. Release note: None Epic: None

aayushshah15 · 2022-11-06T19:56:47Z

My bad on not picking up on the potential for regression here. I have a meta question: is there anything we can do at the CI level that would flag these kinds of regressions in microbenchmarks? As it stands, it's way too easy to introduce these regressions in the read/write eval path and nothing that tells the PR author about them.

kvoli · 2022-11-06T23:40:19Z

Computing in constant memory seems fine with the previous approach. Lazily evaluating the boundaries inside the decider, which would only be once initiated is a great idea.

90709: sql/schemachanger: inject DML statements into end to end tests r=fqazi a=fqazi Fixes: #83304 Previously, the declarative schema changer tests only ran DDL statements for the schema change and had no mechanism for determining correct behaviour if queries were run concurrently at each phase. To address this, this patch adds a new framework to these tests which allows us to inject DML (inserts / selects) at various stages of a declarative schema change. This gives us a more powerful framework for validating behaviour. Release note: None The first commit here can be ignored, and its a separate PR to make sure this runs stable. A PR is already open for it. 91045: kvserver: separate repl q decision from action r=nvanbenschoten a=kvoli Same as #90529. with a fix to stop logging an error on replicate queue metrics unsupported action. ``` dev test pkg/cli -f TestPartialZip -v --stress --race ... 1135 runs so far, 0 failures, over 2m55s 1168 runs so far, 0 failures, over 3m0s ``` Previously, the replicate queue would both plan and apply changes for in-process replicas within the processOneChange function. This was problematic for simulation as it was not possible to call processOneChange directly to apply the simulated result, without blocking the goroutine. This patch separates processOneChange into planning (PlanOneChange), the application of the change (applyChange) and post application tracking (TrackChangeOutcome). resolves: #90533 Release note: None 91375: kv: recycle SpanSets returned during optimistic eval validation r=nvanbenschoten a=nvanbenschoten This commit adds `Release` calls for both SpanSets returned by `collectSpansRead` on the optimistic eval validation path. The failure to recycle these SpanSets was causing a leak from the `SpanSet` memory pool of both top level objects and the recycled inner slices. We have the same issue at the other caller of `collectSpansRead`, which is being tracked more broadly in #91374. Release note: None Epic: None Co-authored-by: Faizan Qazi <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]>

nvanbenschoten added C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-kv Anything in KV that doesn't belong in a more specific category. labels Nov 6, 2022

nvanbenschoten assigned KaiSun314 Nov 6, 2022

nvanbenschoten mentioned this issue Nov 6, 2022

kv: recycle SpanSets returned during optimistic eval validation #91375

Merged

KaiSun314 mentioned this issue Nov 8, 2022

kvserver: Fix performance regression due to new call to collectSpansRead #91462

Merged

craig bot closed this as completed in 0d9669a Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: performance regression due to new call to `collectSpansRead` #91374

kv: performance regression due to new call to `collectSpansRead` #91374

nvanbenschoten commented Nov 6, 2022 •

edited by cockroach-jira-scripts

Loading

aayushshah15 commented Nov 6, 2022

kvoli commented Nov 6, 2022

kv: performance regression due to new call to collectSpansRead #91374

kv: performance regression due to new call to collectSpansRead #91374

Comments

nvanbenschoten commented Nov 6, 2022 • edited by cockroach-jira-scripts Loading

aayushshah15 commented Nov 6, 2022

kvoli commented Nov 6, 2022

kv: performance regression due to new call to `collectSpansRead` #91374

kv: performance regression due to new call to `collectSpansRead` #91374

nvanbenschoten commented Nov 6, 2022 •

edited by cockroach-jira-scripts

Loading