kvclient: regression in BenchmarkColBatchScan due to dist sender metrics #111142

yuzefovich · 2023-09-22T21:27:03Z

I bisected most of the regression on BenchmarkColBatchScan from pkg/sql/colflow between 23.1 and master to 3502ca2

name                        old time/op    new time/op    delta
ColBatchScan/rows=16-24       72.4µs ± 7%    96.4µs ± 5%  +33.18%  (p=0.000 n=10+10)
ColBatchScan/rows=256-24       168µs ± 4%     195µs ± 6%  +15.65%  (p=0.000 n=9+9)
ColBatchScan/rows=4096-24     1.40ms ± 1%    1.41ms ± 1%   +1.23%  (p=0.001 n=8+8)
ColBatchScan/rows=65536-24    21.8ms ± 1%    21.7ms ± 1%     ~     (p=0.481 n=10+10)

name                        old speed      new speed      delta
ColBatchScan/rows=16-24     3.54MB/s ± 7%  2.66MB/s ± 5%  -24.96%  (p=0.000 n=10+10)
ColBatchScan/rows=256-24    24.3MB/s ± 4%  21.0MB/s ± 5%  -13.52%  (p=0.000 n=9+9)
ColBatchScan/rows=4096-24   46.9MB/s ± 1%  46.3MB/s ± 1%   -1.21%  (p=0.001 n=8+8)
ColBatchScan/rows=65536-24  48.1MB/s ± 1%  48.2MB/s ± 1%     ~     (p=0.469 n=10+10)

name                        old alloc/op   new alloc/op   delta
ColBatchScan/rows=16-24       8.21kB ±11%    8.78kB ± 9%     ~     (p=0.053 n=10+9)
ColBatchScan/rows=256-24      28.0kB ± 1%    28.8kB ± 1%   +3.15%  (p=0.000 n=9+9)
ColBatchScan/rows=4096-24      195kB ±14%     196kB ±14%     ~     (p=0.195 n=8+8)
ColBatchScan/rows=65536-24    2.19MB ± 1%    2.19MB ± 1%     ~     (p=0.815 n=9+8)

name                        old allocs/op  new allocs/op  delta
ColBatchScan/rows=16-24          114 ±10%       119 ± 6%     ~     (p=0.076 n=10+9)
ColBatchScan/rows=256-24         158 ± 1%       168 ± 1%   +6.55%  (p=0.000 n=9+9)
ColBatchScan/rows=4096-24       246 ±130%      254 ±127%   +3.61%  (p=0.046 n=8+8)
ColBatchScan/rows=65536-24       292 ±24%       326 ±39%  +11.82%  (p=0.035 n=8+8)

My guess (based on a quick glance at the code) is that this is due to having to compute ba.Size() and br.Size(), so we might not be able to do much about it, but will defer to KV.

Jira issue: CRDB-31787

The text was updated successfully, but these errors were encountered:

yuzefovich · 2023-10-02T19:22:18Z

This regression can also be observed on BenchmarkKV/Scan/Native in pkg/sql/tests:

name                          old time/op    new time/op    delta
KV/Scan/Native/rows=1-24        29.3µs ± 1%    43.6µs ± 1%  +48.61%  (p=0.000 n=10+9)
KV/Scan/Native/rows=10-24       32.1µs ± 2%    46.5µs ± 2%  +44.65%  (p=0.000 n=10+10)
KV/Scan/Native/rows=100-24      50.1µs ± 1%    66.3µs ± 2%  +32.34%  (p=0.000 n=9+10)
KV/Scan/Native/rows=1000-24      227µs ± 3%     252µs ± 1%  +10.88%  (p=0.000 n=10+9)
KV/Scan/Native/rows=10000-24    1.91ms ± 4%    2.02ms ± 2%   +5.73%  (p=0.000 n=9+9)

name                          old alloc/op   new alloc/op   delta
KV/Scan/Native/rows=1-24        8.35kB ± 0%    9.33kB ± 0%  +11.72%  (p=0.000 n=10+10)
KV/Scan/Native/rows=10-24       9.79kB ± 0%   10.77kB ± 0%   +9.96%  (p=0.000 n=10+9)
KV/Scan/Native/rows=100-24      22.6kB ± 0%    23.7kB ± 0%   +4.79%  (p=0.000 n=8+10)
KV/Scan/Native/rows=1000-24      176kB ± 0%     177kB ± 0%   +0.71%  (p=0.000 n=9+10)
KV/Scan/Native/rows=10000-24    1.55MB ± 0%    1.55MB ± 0%   +0.11%  (p=0.000 n=10+10)

name                          old allocs/op  new allocs/op  delta
KV/Scan/Native/rows=1-24          61.0 ± 0%      74.0 ± 0%  +21.31%  (p=0.000 n=10+8)
KV/Scan/Native/rows=10-24         66.0 ± 0%      79.0 ± 0%  +19.70%  (p=0.000 n=8+8)
KV/Scan/Native/rows=100-24        74.1 ± 1%      88.3 ± 1%  +19.12%  (p=0.000 n=8+10)
KV/Scan/Native/rows=1000-24        126 ± 4%       143 ± 6%  +13.81%  (p=0.000 n=9+10)
KV/Scan/Native/rows=10000-24       597 ± 3%       622 ± 2%   +4.09%  (p=0.000 n=10+10)

113069: kvserver: add BenchmarkNodeLivenessScanStorage to measure liveness scan r=andrewbaptist,jbowens a=sumeerbhola Node liveness scans, like the one done in MaybeGossipNodeLivenessRaftMuLocked, while holding raftMu, are performance sensitive, and slowness has caused production issues (https://github.com/cockroachlabs/support/issues/2665, https://github.com/cockroachlabs/support/issues/2107). This benchmark measures the scan performance both when DELs (due to GC) have not been compacted away, and when they have. It also sets up a varying number of live versions since decommissioned nodes will have a single live version. Results on M1 macbook on master with dead-keys=false and compacted=true: ``` NodeLivenessScanStorage/num-live=2/compacted=true-10 26.80µ ± 9% NodeLivenessScanStorage/num-live=5/compacted=true-10 30.34µ ± 3% NodeLivenessScanStorage/num-live=10/compacted=true-10 38.88µ ± 8% NodeLivenessScanStorage/num-live=1000/compacted=true-10 861.5µ ± 3% ``` When compacted=false the scan takes ~10ms, which is > 100x slower, but probably acceptable for this workload. ``` NodeLivenessScanStorage/num-live=2/compacted=false-10 9.430m ± 5% NodeLivenessScanStorage/num-live=5/compacted=false-10 9.534m ± 4% NodeLivenessScanStorage/num-live=10/compacted=false-10 9.456m ± 2% NodeLivenessScanStorage/num-live=1000/compacted=false-10 10.34m ± 7% ``` dead-keys=true (and compacted=false) defeats the NextPrefix optimization, since the next prefix can have all its keys deleted and the iterator has to step through all of them (it can't be sure that all the keys for that next prefix are deleted). This case should not occur in the liveness range, as we don't remove decommissioned entries, but is included for better understanding. ``` NodeLivenessScanStorage/num-live=2/dead-keys=true/compacted=false-10 58.33m ``` Compared to v22.2, the results are sometimes > 10x faster, when the pebbleMVCCScanner seek optimization in v22.2 was defeated. ``` │ sec/op │ sec/op vs base │ NodeLivenessScanStorage/num-live=2/compacted=false-10 117.280m ± 2% 9.430m ± 5% -91.96% (p=0.002 n=6) NodeLivenessScanStorage/num-live=5/compacted=false-10 117.298m ± 0% 9.534m ± 4% -91.87% (p=0.002 n=6) NodeLivenessScanStorage/num-live=10/compacted=false-10 12.009m ± 0% 9.456m ± 2% -21.26% (p=0.002 n=6) NodeLivenessScanStorage/num-live=1000/compacted=false-10 13.04m ± 0% 10.34m ± 7% -20.66% (p=0.002 n=6) │ block-bytes/op │ block-bytes/op vs base │ NodeLivenessScanStorage/num-live=2/compacted=false-10 14.565Mi ± 0% 8.356Mi ± 0% -42.63% (p=0.002 n=6) NodeLivenessScanStorage/num-live=5/compacted=false-10 14.570Mi ± 0% 8.361Mi ± 0% -42.61% (p=0.002 n=6) NodeLivenessScanStorage/num-live=10/compacted=false-10 11.094Mi ± 0% 8.368Mi ± 0% -24.57% (p=0.002 n=6) NodeLivenessScanStorage/num-live=1000/compacted=false-10 12.235Mi ± 0% 8.990Mi ± 0% -26.53% (p=0.002 n=6) │ B/op │ B/op vs base │ NodeLivenessScanStorage/num-live=2/compacted=false-10 42.83Ki ± 4% 41.87Ki ± 0% -2.22% (p=0.002 n=6) NodeLivenessScanStorage/num-live=5/compacted=false-10 43.28Ki ± 3% 41.84Ki ± 0% -3.32% (p=0.002 n=6) NodeLivenessScanStorage/num-live=10/compacted=false-10 37.59Ki ± 0% 41.92Ki ± 0% +11.51% (p=0.002 n=6) NodeLivenessScanStorage/num-live=1000/compacted=false-10 37.67Ki ± 1% 42.66Ki ± 0% +13.23% (p=0.002 n=6) │ allocs/op │ allocs/op vs base │ NodeLivenessScanStorage/num-live=2/compacted=false-10 105.00 ± 8% 85.00 ± 0% -19.05% (p=0.002 n=6) NodeLivenessScanStorage/num-live=5/compacted=false-10 107.00 ± 5% 85.00 ± 0% -20.56% (p=0.002 n=6) NodeLivenessScanStorage/num-live=10/compacted=false-10 74.00 ± 1% 85.00 ± 0% +14.86% (p=0.002 n=6) NodeLivenessScanStorage/num-live=1000/compacted=false-10 79.00 ± 1% 92.00 ± 1% +16.46% (p=0.002 n=6) ``` Relates to https://github.com/cockroachlabs/support/issues/2665 Epic: none Release note: None 113229: kv,server,roachpb: avoid error overhead for x-locality comparison r=pavelkalinnikov a=kvoli Cross locality traffic instrumentation was added to raft, snapshots and batch requests to quantify the amount of cross region/zone traffic. Errors would be returned from `CompareWithLocality` when the region, or zone locality flags were set in an unsupported manner according to our documentation. These error allocations added overhead (cpu/mem) when hit. Alter `CompareWithLocality` to return booleans in place of an error to reduce overhead. Resolves: #111148 Resolves: #111142 Informs: #111561 Release note: None Co-authored-by: sumeerbhola <[email protected]> Co-authored-by: Austen McClernon <[email protected]>

Cross locality traffic instrumentation was added to raft, snapshots and batch requests to quantify the amount of cross region/zone traffic. Errors would be returned from `CompareWithLocality` when the region, or zone locality flags were set in an unsupported manner according to our documentation. These error allocations added overhead (cpu/mem) when hit. Alter `CompareWithLocality` to return booleans in place of an error to reduce overhead. Resolves: #111148 Resolves: #111142 Informs: #111561 Release note: None

kvoli · 2023-10-30T18:36:47Z

Will be closed on #113302 merging.

kvoli · 2023-10-30T19:09:49Z

Closed on #113302.

yuzefovich mentioned this issue Sep 22, 2023

server: regression in BenchmarkColBatchScan due to x-region, x-zone metrics #111148

Closed

kvoli self-assigned this Sep 28, 2023

nvanbenschoten added GA-blocker and removed release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Oct 25, 2023

kvoli mentioned this issue Oct 26, 2023

kvserver: 23.2 replication microbenchmark regressions #111561

Closed

7 tasks

kvoli mentioned this issue Oct 27, 2023

kv,server,roachpb: avoid error overhead for x-locality comparison #113229

Merged

craig bot closed this as completed in a760335 Oct 30, 2023

blathers-crl bot mentioned this issue Oct 30, 2023

release-23.2: kv,server,roachpb: avoid error overhead for x-locality comparison #113302

Merged

kvoli reopened this Oct 30, 2023

kvoli closed this as completed Oct 30, 2023

github-project-automation bot added this to KV Aug 28, 2024

github-project-automation bot moved this to Closed in KV Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvclient: regression in BenchmarkColBatchScan due to dist sender metrics #111142

kvclient: regression in BenchmarkColBatchScan due to dist sender metrics #111142

yuzefovich commented Sep 22, 2023 •

edited by cockroach-jira-scripts

Loading

yuzefovich commented Oct 2, 2023

kvoli commented Oct 30, 2023

kvoli commented Oct 30, 2023

kvclient: regression in BenchmarkColBatchScan due to dist sender metrics #111142

kvclient: regression in BenchmarkColBatchScan due to dist sender metrics #111142

Comments

yuzefovich commented Sep 22, 2023 • edited by cockroach-jira-scripts Loading

yuzefovich commented Oct 2, 2023

kvoli commented Oct 30, 2023

kvoli commented Oct 30, 2023

yuzefovich commented Sep 22, 2023 •

edited by cockroach-jira-scripts

Loading