release-23.1: server: support multi-span statistics endpoint #101877

blathers-crl · 2023-04-19T21:41:37Z

Backport 1/1 commits from #101378 on behalf of @THardy98.

/cc @cockroachdb/release

Epic: None
Extends: #96223

This PR extends the implementation of our SpanStats RPC endpoint to fetch stats for multiple spans at once. By extending the endpoint, we amortize the cost of the RPC's node fanout across all requested spans, whereas previously, we were issuing a fanout per span requested. Additionally, this change batches KV layer requests for ranges fully contained by the span, instead of issuing a request per fully contained range.

Note that we do not deprecate the start_key and end_key fields as they're used to determine whether an old node is calling out to a node using the new proto format.

The changes here explicitly do not support mixed-version clusters.

BENCHMARK RESULTS

Here are some benchmark results from running:

BENCHTIMEOUT=72h PKG=./pkg/server BENCHES=BenchmarkSpanStats ./scripts/bench HEAD^ HEAD

Note that HEAD is actually a temp change to revert to old logic (request per span) and HEAD^ is the new logic (multi-span request). As such the increases in latency/memory are actually reductions.

name                                                                                                              old time/op    new time/op    delta
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_25_ranges_each-24      10.3ms ± 2%    24.5ms ± 2%   +137.38%  (p=0.000 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_50_ranges_each-24      17.1ms ± 2%    31.3ms ± 1%    +83.29%  (p=0.000 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_100_ranges_each-24     30.5ms ± 2%   102.7ms ± 3%   +236.55%  (p=0.000 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_25_ranges_each-24      1.75s ± 5%     2.10s ± 2%    +19.89%  (p=0.000 n=10+8)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_50_ranges_each-24      3.00s ± 1%     3.43s ± 1%    +14.35%  (p=0.000 n=8+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_100_ranges_each-24     5.01s ± 1%     5.53s ± 1%    +10.44%  (p=0.000 n=9+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_25_ranges_each-24      9.66s ± 1%    10.63s ± 1%    +10.10%  (p=0.000 n=9+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_50_ranges_each-24      15.2s ± 1%     16.2s ± 0%     +6.61%  (p=0.000 n=9+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_100_ranges_each-24     17.4s ± 1%     18.6s ± 1%     +7.31%  (p=0.000 n=9+9)

name                                                                                                              old alloc/op   new alloc/op   delta
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_25_ranges_each-24      3.91MB ± 2%   18.55MB ± 1%   +374.43%  (p=0.000 n=9+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_50_ranges_each-24      6.95MB ± 2%   21.18MB ± 1%   +204.85%  (p=0.000 n=8+8)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_100_ranges_each-24     13.3MB ± 1%   134.6MB ± 1%   +912.92%  (p=0.000 n=8+8)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_25_ranges_each-24     1.99GB ± 4%    2.27GB ± 4%    +14.11%  (p=0.000 n=8+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_50_ranges_each-24     4.16GB ± 2%    4.43GB ± 3%     +6.57%  (p=0.000 n=9+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_100_ranges_each-24    7.50GB ± 1%    7.75GB ± 1%     +3.27%  (p=0.000 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_25_ranges_each-24     11.8GB ± 0%    12.4GB ± 0%     +4.70%  (p=0.000 n=7+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_50_ranges_each-24     21.1GB ± 2%    21.6GB ± 1%     +2.70%  (p=0.000 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_100_ranges_each-24    25.8GB ± 0%    26.4GB ± 0%     +2.29%  (p=0.000 n=8+10)

name                                                                                                              old allocs/op  new allocs/op  delta
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_25_ranges_each-24       26.9k ± 0%     90.1k ± 2%   +235.04%  (p=0.000 n=9+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_50_ranges_each-24       51.8k ± 3%    114.9k ± 1%   +121.89%  (p=0.000 n=8+8)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_100_ranges_each-24       106k ± 3%     1426k ± 1%  +1240.14%  (p=0.000 n=8+8)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_25_ranges_each-24      23.2M ± 5%     23.9M ± 3%     +3.19%  (p=0.003 n=9+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_50_ranges_each-24      48.7M ± 2%     49.4M ± 2%       ~     (p=0.075 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_100_ranges_each-24     87.9M ± 1%     88.6M ± 1%       ~     (p=0.075 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_25_ranges_each-24       140M ± 0%      142M ± 0%     +1.04%  (p=0.000 n=8+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_50_ranges_each-24       248M ± 1%      249M ± 1%     +0.65%  (p=0.001 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_100_ranges_each-24      306M ± 1%      308M ± 0%     +0.57%  (p=0.002 n=10+9)

Some notable improvements particularly with requests for spans with fewer ranges. After a point, the raw number of ranges becomes the bottleneck, despite reducing the number of fanouts. Not sure if there is a better way to fetch range statistics but I think the improvement here is enough for this PR. If improvements for fetching range statistics are identified, they can be done in a follow up PR and backported.

Release note (sql change): span statistics are unavailable on mixed-version clusters

Release justification: performance improvement fetching span statistics in bulk

Extends: #96223 This PR extends the implementation of our SpanStats RPC endpoint to fetch stats for multiple spans at once. By extending the endpoint, we amortize the cost of the RPC's node fanout across all requested spans, whereas previously, we were issuing a fanout per span requested. Additionally, this change batches KV layer requests for ranges fully contained by the span, instead of issuing a request per fully contained range. Release note (sql change): span statistics are unavailable on mixed-version clusters

blathers-crl · 2023-04-19T21:41:39Z

cockroach-teamcity · 2023-04-19T21:41:53Z

This change is

maryliag

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @j82w, @THardy98, and @zachlite)

THardy98 · 2023-04-26T16:33:52Z

TYFR :)

blathers-crl bot requested review from a team as code owners April 19, 2023 21:41

blathers-crl bot requested a review from a team April 19, 2023 21:41

blathers-crl bot requested a review from a team as a code owner April 19, 2023 21:41

blathers-crl bot requested a review from a team April 19, 2023 21:41

blathers-crl bot requested a review from a team as a code owner April 19, 2023 21:41

blathers-crl bot requested a review from a team April 19, 2023 21:41

blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Apr 19, 2023

blathers-crl bot force-pushed the blathers/backport-release-23.1-101378 branch from cff1a58 to b0a7afc Compare April 19, 2023 21:41

blathers-crl bot requested review from srosenberg and smg260 and removed request for a team April 19, 2023 21:41

blathers-crl bot force-pushed the blathers/backport-release-23.1-101378 branch from ece2060 to 84dbb3e Compare April 19, 2023 21:41

blathers-crl bot assigned THardy98 Apr 19, 2023

blathers-crl bot requested review from dhartunian, j82w, renatolabs, THardy98 and zachlite April 19, 2023 21:41

THardy98 removed request for a team April 20, 2023 12:49

THardy98 removed request for srosenberg, smg260, dhartunian and renatolabs April 20, 2023 12:49

maryliag approved these changes Apr 26, 2023

View reviewed changes

THardy98 merged commit 25b7026 into release-23.1 Apr 26, 2023

THardy98 deleted the blathers/backport-release-23.1-101378 branch April 26, 2023 16:34

cockroach-teamcity mentioned this pull request Apr 27, 2023

PR #101877 - server: extend span statistics endpoint cockroachdb/docs#16855

Open

THardy98 mentioned this pull request Apr 27, 2023

server: tenant span stats handle different cluster version #102143

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-23.1: server: support multi-span statistics endpoint #101877

release-23.1: server: support multi-span statistics endpoint #101877

blathers-crl bot commented Apr 19, 2023 •

edited by THardy98

Loading

blathers-crl bot commented Apr 19, 2023

cockroach-teamcity commented Apr 19, 2023

maryliag left a comment

THardy98 commented Apr 26, 2023

release-23.1: server: support multi-span statistics endpoint #101877

release-23.1: server: support multi-span statistics endpoint #101877

Conversation

blathers-crl bot commented Apr 19, 2023 • edited by THardy98 Loading

blathers-crl bot commented Apr 19, 2023

cockroach-teamcity commented Apr 19, 2023

maryliag left a comment

Choose a reason for hiding this comment

THardy98 commented Apr 26, 2023

blathers-crl bot commented Apr 19, 2023 •

edited by THardy98

Loading