extend span stats #98490

THardy98 · 2023-03-13T13:56:14Z

Extends: #96223

This PR extends the implementation of our SpanStats RPC endpoint to fetch stats for multiple spans at once. By extending the endpoint, we amortize the cost of the RPC's node fanout across all requested spans, whereas previously, we were issuing a fanout per span requested. Additionally, this change batches KV layer requests for ranges fully contained by the span, instead of issuing a request per fully contained range.

Changes were made to the span_stats.proto to accommodate these changes, namely:

the addition of the spans field in the SpanStatsRequest, to allow for multiple spans
the addion of the span_to_stats field in the SpanStatsResponse, to allow for multiple span statistics responses, indexed by their corresponding span string

No protobuf fields were removed and the RPC logic handles both old and new request/response formats in the case of a mixed-version cluster. Requests initiated from a 23.1 node expect the request to use the new format (i.e. use spans instead of StartKey/EndKey), but will handle fanout calls from a 22.2 node using the old format.

This change introduces two cluster settings:

server.span_stats.span_batch_limit: the maximum number of spans allowed in a request payload for span statistics, default value of 500
server.span_stats.range_batch_limit: the maximum batch size when fetching ranges statistics for a span, default value of 100

Their defaults are conservative from my testing, but can be modified in need be.

Here are some benchmark results from running:
BENCHTIMEOUT=72h PKG=./pkg/server BENCHES=BenchmarkSpanStats ./scripts/bench HEAD^ HEAD

note that HEAD is actually a temp change to revert to old logic (request per span) and HEAD^ is the new logic (multi-span request)

name                                                                                                              old time/op    new time/op    delta
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_25_ranges_each-24      8.97ms ± 2%   35.39ms ± 2%   +294.62%  (p=0.000 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_50_ranges_each-24      16.0ms ± 2%    42.7ms ± 2%   +167.86%  (p=0.000 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_100_ranges_each-24     32.5ms ± 2%   119.9ms ± 2%   +268.36%  (p=0.000 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_25_ranges_each-24      1.33s ± 2%     2.11s ± 1%    +58.16%  (p=0.000 n=9+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_50_ranges_each-24      2.81s ± 2%     3.62s ± 2%    +29.04%  (p=0.000 n=9+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_100_ranges_each-24     5.08s ± 1%     5.97s ± 2%    +17.58%  (p=0.000 n=9+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_25_ranges_each-24      8.05s ± 1%     9.81s ± 1%    +21.82%  (p=0.000 n=8+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_50_ranges_each-24      14.3s ± 2%     15.6s ± 1%     +9.15%  (p=0.000 n=10+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_100_ranges_each-24     17.8s ± 4%     19.4s ± 1%     +8.93%  (p=0.000 n=10+9)

name                                                                                                              old alloc/op   new alloc/op   delta
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_25_ranges_each-24      5.45MB ± 3%   33.65MB ± 1%   +517.58%  (p=0.000 n=9+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_50_ranges_each-24      8.38MB ± 2%   35.58MB ± 0%   +324.72%  (p=0.000 n=8+8)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_100_ranges_each-24     14.8MB ± 1%   149.6MB ± 2%   +913.76%  (p=0.000 n=9+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_25_ranges_each-24     1.96GB ± 1%    2.49GB ± 1%    +27.01%  (p=0.000 n=9+8)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_50_ranges_each-24     4.20GB ± 6%    4.68GB ± 4%    +11.57%  (p=0.000 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_100_ranges_each-24    7.50GB ± 2%    7.98GB ± 1%     +6.43%  (p=0.000 n=10+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_25_ranges_each-24     11.8GB ± 1%    12.9GB ± 1%     +9.23%  (p=0.000 n=8+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_50_ranges_each-24     21.0GB ± 1%    21.7GB ± 2%     +2.95%  (p=0.000 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_100_ranges_each-24    26.0GB ± 2%    26.5GB ± 0%     +2.15%  (p=0.000 n=10+8)

name                                                                                                              old allocs/op  new allocs/op  delta
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_25_ranges_each-24       32.4k ± 0%    148.5k ± 1%   +358.30%  (p=0.000 n=8+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_50_ranges_each-24       57.4k ± 3%    173.4k ± 1%   +202.02%  (p=0.000 n=8+8)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_10_spans_with_100_ranges_each-24       112k ± 0%     1482k ± 0%  +1223.11%  (p=0.000 n=8+8)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_25_ranges_each-24      22.9M ± 1%     24.2M ± 0%     +5.93%  (p=0.000 n=9+8)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_50_ranges_each-24      48.7M ± 3%     50.1M ± 3%     +2.82%  (p=0.003 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_100_spans_with_100_ranges_each-24     87.8M ± 2%     89.4M ± 2%     +1.76%  (p=0.000 n=10+10)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_25_ranges_each-24       139M ± 0%      142M ± 0%     +2.03%  (p=0.000 n=7+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_50_ranges_each-24       247M ± 1%      250M ± 0%     +1.09%  (p=0.000 n=10+9)
SpanStats/3node/BenchmarkSpanStats_-_span_stats_for_3_node_cluster,_collecting_200_spans_with_100_ranges_each-24      306M ± 1%      309M ± 0%     +0.91%  (p=0.000 n=10+9)

Some notable improvements particularly with requests for spans with fewer ranges. After a point, the raw number of ranges becomes the bottleneck, despite reducing the number of fanouts. Not sure if there is a better way to fetch range statistics but I think the improvement here is enough for this PR. If improvements for fetching range statistics are identified, they can be done in a follow up PR and backported.

I've added mixed_version_tenant_span_stats to test on a mixed-version cluster, but this only tests when the gateway node is 23.1 (as the tenant_span_stats builtin only existed on 23.1, and prior to that there was no way to get span stats via SQL that I know of). My thinking is the TestLocalSpanStats unit test covers the case where we receive old format requests coming from a 22.2 node.

NOTE FOR REVIEWERS

I'm getting an error with invalid span keys, this is what I see:

Time for span /Table/255{-/PrefixEnd}: 125.657µs. Node ID: 3.   <------ this is the previous span
error getting disk bytes invalid key-range specified (start > end) /Table/25{6-7} /Table/256 /Table/257
error visiting stores (getting disk bytes) invalid key-range specified (start > end)
ENCOUNTERED AN ERROR WITH NODE ID: 3. ERROR: rpc error: code = Unknown desc = invalid key-range specified (start > end)

I'm not really sure what to make of this, I don't understand why/what would cause the span's start/end keys to be in this state. This only seems to reproduce when there are a large number of spans being requested (~200+). Tables were generated via generate_test_objects

blathers-crl · 2023-03-13T13:56:19Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2023-03-13T13:56:26Z

This change is

zachlite

This is really cool. Nice work on this Thomas!

I've introduced an arbitrary value for the spanRequestLimit at 1000. I'm not sure if this is excessive or what an appropriate value would be (should this be a cluster setting or a const?). Additionally, it feels like the RPC should be the one that enforces this limit, I've thought of a couple options but not sure which would be preferred (alternative options also welcome):

Perhaps its worth running some timing tests with different levels of spanRequestLimit while a cluster is under some sustained load. A non-public cluster setting could make tuning it easier. I agree that 1000 is arbitrary, but I can't make an informed guess without seeing some timings as a function of spanRequestLimit. I don't think it needs to be a perfect science, but I'd be interested in seeing the effect of different orders of magnitude. I'm thinking 1e2 to 1e5 🤷

Enforce a strict limit. The RPC checks if the number of spans provided in the request exceeds the limit, if so, return with an error

Since the request has a payload (spans), I think it's just way simpler to force the caller to limit the size of their request payload.

I've introduced an arbitrary value for rangeStatsBatchSize at 100. Again, not sure if this is excessive or what an appropriate value would be (again, should this also be a cluster setting?).

I'd suggest the same approach as above.

I'm getting an error with invalid span keys, this is what I see from my prints:

I'll need to take a closer look at this

Tentatively, I've mentioned this before, but what do people think about including node ids and regions from this endpoint?

I don't love it. To be honest, I would defer to others on this.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @THardy98)

pkg/server/span_stats_server.go line 209 at r1 (raw file):

		}
		// If we still have some remaining ranges, request range stats for the current batch.
		if len(fullyContainedKeysBatch) > 0 {

I think it would be good to add a flushBatchedContainedKeys function or something

knz · 2023-03-15T22:42:54Z

regarding the key ordering bug. This looks like a legit bug and needs to be investigated.
regarding the pagination limits. I recommend using cluster settings when in doubt. Will make debugging easier when times come (future you will thank past you when he can set the values to 1 on a live cluster)
regarding the API semantics. Please do not discuss and come to a decision casually in your code editor and in review. Please come and talk with the obs inf team. We need to look at big picture here.

dhartunian

Tentatively, I've mentioned this before, but what do people think about including node ids and regions from this endpoint? I understand there is hesitancy to expose non-tenant-level ideas like node ids, but the alternative we're using at the moment to surface table node IDs/regions on the console is replicas and replica_localitieson theranges_no_leasestable (which doesn't actually give us node IDs), and we're already doing the work to gather this information innodeIDsAndRangeCountForSpan`.

I would keep the payloads separate since node-level info is gated via tenant capabilities.

In general, I think this change should be vetted from the perspective of upgrades and mixed-version clusters. I'm not entirely sure what our goals are in terms of keeping all the builtins functioning as expected etc. We should either:

Retain backwards compatible payloads and process using new code when the spans array is nonempty in the request
OR Add new endpoints that do things in a new way, and cut over to using them in a separate version.

Reviewed 4 of 10 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @j82w and @THardy98)

pkg/roachpb/span_stats.proto line 27 at r1 (raw file):

  string node_id = 1 [(gogoproto.customname) = "NodeID"];
  repeated Span spans = 4 [(gogoproto.nullable) = false];
  reserved 2, 3;

consider keeping start_key and end_key here and keeping the handler backwards compatible.

pkg/rpc/auth_tenant.go line 202 at r1 (raw file):

	tenID roachpb.TenantID, args *roachpb.SpanStatsRequest,
) error {
	var err error

nit: don't need this var

THardy98

Added some handling in the RPC to cover mixed-version cases.

From my understanding, there are effectively 2 cases we need to consider:

When the version of the gateway node is < 23.1 and fans out to a node >= 23.1
- in this case, the v23.1 node needs to recognize that the request payload is limited to a single span, populated using the start/end key proto fields. Additionally, the response payload is again limited to a single span populated using the old response fields (range_count, approximate_disk_bytes, total_stats), instead of the new span_to_stats mapping.
When the version of the gateway node is >= 23.1 and fans out to a node < 23.1
- in this case, the v23.1 node needs to ensure that it's passing a backwards-compatible request payload to nodes < 23.1. This is done by passing the old request payload using the start/end key proto fields for each span that lives on the node. Additionally, the v23.1 node needs to recognize that the response payload from the < 23.1 nodes will be for a single span using the old response fields, and aggregate them into the final response accordingly.

Added SpanStatsBatchLimit and RangeStatsBatchLimit cluster settings to roachpb.span_stats.go. Callers of SpanStats are expected to limit the size of their payload based on the SpanStatsBatchLimit cluster setting. I wanted to add them to span_stats_server.go but callers outside the server package are likely to run into import cycles. Adding them to the roachpb package mitigates these import cycles issues.

regarding the key ordering bug. This looks like a legit bug and needs to be investigated.

@knz @zachlite I am able to reproduce this error on this PR reliably creating 200 tables using generate_test_objects under a single database and calling crdb_internal.tenant_span_stats(database's id). I've noticed that I cannot reproduce this on master with the same flow, even when generating more tables (i.e. >200), so it must be something introduced in this PR. Could it be due to conversions between Span and RSpan?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian, @j82w, and @zachlite)

pkg/roachpb/span_stats.proto line 27 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

consider keeping start_key and end_key here and keeping the handler backwards compatible.

I've reinstated all formerly removed proto fields for backwards compatibility.

pkg/rpc/auth_tenant.go line 202 at r1 (raw file):

Previously, dhartunian (David Hartunian) wrote…

nit: don't need this var

Removed.

pkg/server/span_stats_server.go line 209 at r1 (raw file):

Previously, zachlite wrote…

I think it would be good to add a flushBatchedContainedKeys function or something

Done.

Santamaura · 2023-03-20T15:58:28Z

@THardy98 I had a general question: is the intent to get this in for 23.1 or are we targeting post 23.1?

THardy98 · 2023-03-20T16:38:24Z

@THardy98 I had a general question: is the intent to get this in for 23.1 or are we targeting post 23.1?

I would like to get this in for 23.1 but will be tight.

THardy98 · 2023-03-24T23:58:48Z

RFAL.

I am still apprehensive with the key ordering bug mentioned above and in the PR desc.

Extends: cockroachdb#96223 This PR extends the implementation of our SpanStats RPC endpoint to fetch stats for multiple spans at once. By extending the endpoint, we amortize the cost of the RPC's node fanout across all requested spans, whereas previously, we were issuing a fanout per span requested. Additionally, this change batches KV layer requests for ranges fully contained by the span, instead of issuing a request per fully contained range. Release note: None https://cockroachlabs.atlassian.net/browse/DOC-1355 #Informs: cockroachdb#33316 #Epic: CRDB-8035

THardy98 · 2023-04-12T19:53:16Z

Closed in favour of: #101378

THardy98 requested review from zachlite, j82w and a team March 13, 2023 13:56

THardy98 requested a review from a team as a code owner March 13, 2023 13:56

THardy98 requested a review from a team March 13, 2023 13:56

THardy98 requested review from a team as code owners March 13, 2023 13:56

THardy98 requested a review from a team March 13, 2023 13:56

THardy98 requested review from a team and removed request for a team March 13, 2023 13:56

zachlite reviewed Mar 15, 2023

View reviewed changes

dhartunian reviewed Mar 16, 2023

View reviewed changes

THardy98 force-pushed the extend_span_stats branch from 09a0df1 to 5a79465 Compare March 18, 2023 23:32

THardy98 commented Mar 19, 2023

View reviewed changes

THardy98 force-pushed the extend_span_stats branch from 5a79465 to 8fa20cb Compare March 19, 2023 00:30

THardy98 requested review from dhartunian and zachlite March 20, 2023 12:49

THardy98 force-pushed the extend_span_stats branch from 8fa20cb to 07e2e3a Compare March 22, 2023 17:34

THardy98 requested a review from a team as a code owner March 22, 2023 17:34

THardy98 force-pushed the extend_span_stats branch 16 times, most recently from 14d281e to a7ff80c Compare March 24, 2023 23:56

THardy98 force-pushed the extend_span_stats branch from a7ff80c to 7657a73 Compare March 27, 2023 13:53

THardy98 force-pushed the extend_span_stats branch from 7657a73 to f0c2ffb Compare March 27, 2023 13:58

THardy98 closed this Apr 12, 2023

zachlite mentioned this pull request Jun 27, 2023

server: allow an arbitrary number of spans in a span stats request #105638

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend span stats #98490

extend span stats #98490

THardy98 commented Mar 13, 2023 •

edited

Loading

blathers-crl bot commented Mar 13, 2023

cockroach-teamcity commented Mar 13, 2023

zachlite left a comment

knz commented Mar 15, 2023

dhartunian left a comment

THardy98 left a comment •

edited

Loading

Santamaura commented Mar 20, 2023

THardy98 commented Mar 20, 2023

THardy98 commented Mar 24, 2023 •

edited

Loading

THardy98 commented Apr 12, 2023

extend span stats #98490

extend span stats #98490

Conversation

THardy98 commented Mar 13, 2023 • edited Loading

blathers-crl bot commented Mar 13, 2023

cockroach-teamcity commented Mar 13, 2023

zachlite left a comment

Choose a reason for hiding this comment

knz commented Mar 15, 2023

dhartunian left a comment

Choose a reason for hiding this comment

THardy98 left a comment • edited Loading

Choose a reason for hiding this comment

Santamaura commented Mar 20, 2023

THardy98 commented Mar 20, 2023

THardy98 commented Mar 24, 2023 • edited Loading

THardy98 commented Apr 12, 2023

THardy98 commented Mar 13, 2023 •

edited

Loading

THardy98 left a comment •

edited

Loading

THardy98 commented Mar 24, 2023 •

edited

Loading