Merge #136258 #136648 #137019 #137044 #137059 #137099 #137118

136258: kvserver: add TestFlowControlSendQueueRangeSplitMerge test r=sumeerbhola a=kvoli Add a new rac2 flow control integration test, `TestFlowControlSendQueueRangeSplitMerge`. This test takes the following steps: ```sql -- We will exhaust the tokens across all streams while admission is blocked on -- n3, using a single 4 MiB (deduction, the write itself is small) write. Then, -- we will write a 1 MiB put to the range, split it, write a 1 MiB put to the -- LHS range, merge the ranges, and write a 1 MiB put to the merged range. We -- expect that at each stage where a send queue develops n1->s3, the send queue -- will be flushed by the range merge and range split range operations.``sql ``` Note that the RHS is not written to post-split, pre-merge. See the relevant comments, this will be resolved via #136649, or some variation, which enforces the timely replication on subsume requests. Part of: #132614 Release note: None 136648: rpc: reuse gRPC streams across unary BatchRequest RPCs r=tbg a=nvanbenschoten Closes #136572. This commit introduces pooling of gRPC streams that are used to send requests and receive corresponding responses in a manner that mimics unary RPC invocation. Pooling these streams allows for reuse of gRPC resources across calls, as opposed to native unary RPCs, which create a new stream and throw it away for each request (see grpc.invoke). The new pooling mechanism is used for the Internal/Batch RPC method, which is the dominant RPC method used to communicate between the KV client and KV server. A new Internal/BatchStream RPC method is introduced to allow a client to send and receive BatchRequest/BatchResponse pairs over a long-lived, pooled stream. A pool of these streams is then maintained alongside each gRPC connection. The pool grows and shrinks dynamically based on demand. The change demonstrates a large performance improvement in both microbenchmarks and full system benchmarks, which reveals just how expensive the gRPC stream setup on each unary RPC is. ### Microbenchmarks: ``` name old time/op new time/op delta Sysbench/KV/1node_remote/oltp_point_select-10 45.9µs ± 1% 28.8µs ± 2% -37.31% (p=0.000 n=9+8) Sysbench/KV/1node_remote/oltp_read_only-10 958µs ± 6% 709µs ± 1% -26.00% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 3.65ms ± 6% 2.81ms ± 8% -23.06% (p=0.000 n=8+9) Sysbench/KV/1node_remote/oltp_read_write-10 1.77ms ± 5% 1.38ms ± 1% -22.09% (p=0.000 n=10+8) Sysbench/KV/1node_remote/oltp_write_only-10 688µs ± 4% 557µs ± 1% -19.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_point_select-10 181µs ± 8% 159µs ± 2% -12.10% (p=0.000 n=8+9) Sysbench/SQL/1node_remote/oltp_write_only-10 2.16ms ± 4% 1.92ms ± 3% -11.08% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_write-10 5.89ms ± 2% 5.36ms ± 1% -8.89% (p=0.000 n=9+9) name old alloc/op new alloc/op delta Sysbench/KV/1node_remote/oltp_point_select-10 16.3kB ± 0% 6.4kB ± 0% -60.70% (p=0.000 n=8+10) Sysbench/KV/1node_remote/oltp_write_only-10 359kB ± 1% 256kB ± 1% -28.92% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_write_only-10 748kB ± 0% 548kB ± 1% -26.78% (p=0.000 n=8+10) Sysbench/SQL/1node_remote/oltp_point_select-10 40.9kB ± 0% 30.8kB ± 0% -24.74% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_write-10 1.11MB ± 1% 0.88MB ± 1% -21.17% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_write-10 2.00MB ± 0% 1.65MB ± 0% -17.60% (p=0.000 n=9+10) Sysbench/KV/1node_remote/oltp_read_only-10 790kB ± 0% 655kB ± 0% -17.11% (p=0.000 n=9+9) Sysbench/SQL/1node_remote/oltp_read_only-10 1.33MB ± 0% 1.19MB ± 0% -10.97% (p=0.000 n=10+9) name old allocs/op new allocs/op delta Sysbench/KV/1node_remote/oltp_point_select-10 210 ± 0% 61 ± 0% -70.95% (p=0.000 n=10+10) Sysbench/KV/1node_remote/oltp_read_only-10 3.98k ± 0% 1.88k ± 0% -52.68% (p=0.019 n=6+8) Sysbench/KV/1node_remote/oltp_read_write-10 7.10k ± 0% 3.47k ± 0% -51.07% (p=0.000 n=10+9) Sysbench/KV/1node_remote/oltp_write_only-10 3.10k ± 0% 1.58k ± 0% -48.89% (p=0.000 n=10+9) Sysbench/SQL/1node_remote/oltp_write_only-10 6.73k ± 0% 3.82k ± 0% -43.30% (p=0.000 n=10+10) Sysbench/SQL/1node_remote/oltp_read_write-10 14.4k ± 0% 9.2k ± 0% -36.29% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_point_select-10 429 ± 0% 277 ± 0% -35.46% (p=0.000 n=9+10) Sysbench/SQL/1node_remote/oltp_read_only-10 7.52k ± 0% 5.37k ± 0% -28.60% (p=0.000 n=10+10) ``` ### Roachtests: ``` name old queries/s new queries/s delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 17.6k ± 7% 19.2k ± 2% +9.22% (p=0.008 n=5+5) name old avg_ms/op new avg_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 72.9 ± 7% 66.6 ± 2% -8.57% (p=0.008 n=5+5) name old p95_ms/op new p95_ms/op delta sysbench/oltp_read_write/nodes=3/cpu=8/conc=64 116 ± 8% 106 ± 3% -9.02% (p=0.016 n=5+5) ``` ### Manual tests: Running in a similar configuration to `sysbench/oltp_read_write/nodes=3/cpu=8/conc=64`, but with a benchmarking related cluster settings (before and after) to reduce variance. ``` -- Before Mean: 19771.03 Median: 19714.22 Standard Deviation: 282.96 Coefficient of variance: .0143 -- After Mean: 21908.23 Median: 21923.03 Standard Deviation: 200.88 Coefficient of variance: .0091 ``` ---- Release note (performance improvement): gRPC streams are now pooled across unary intra-cluster RPCs, allowing for reuse of gRPC resources to reduce the cost of remote key-value layer access. This pooling can be disabled using the `rpc.batch_stream_pool.enabled` cluster setting. 137019: roachtest: increase the token return time with disk bandwidth limit r=kvoli a=andrewbaptist Previously the test would wait 10m for tokens to be returned. Without the disk bandwidth limit set, they typically return almost immediately but with a limit they can take ~30m to return in some cases even after the workload is stopped and the system is idle. This change fixes some of the perturbation/metamorphic/* tests that are hitting this slow token return. Epic: none Fixes: #136982 Fixes: #136553 Informs: #137017 Release note: None 137044: kvserver: deflake TestConsistencyQueueRecomputeStats r=miraradeva a=miraradeva The test manually adds voters and expects a leaseholder to be established before forcing a stats re-computation (which runs on the leaseholder). With leader leases, it might take an extra election timeout for the leader lease to be established after adding the new voters, so the test flaked if the re-computation ran (and failed) before the leaseholder was ready. This commit makes the test retry the re-computation until a leasholder is established. Fixes: #136596 Release note: None 137059: catalog/lease: deflake TestDescriptorRefreshOnRetry r=rafiss a=rafiss The test was flaky since the background thread to refresh leases could run and cause the acquisition counts to be off. fixes #137033 Release note: None 137099: kvcoord: deflake TestDistSenderReplicaStall r=miraradeva a=miraradeva The test runs with expiration leases but when fortification is enabled the lease doesn't move off of the stalled replica because the deadlocked leader doesn't step down while it's receiving store liveness support. This commit ensures fortification is off when expiration leases are used for the test. Fixes: #136564 Release note: None 137118: crosscluster/logical: update udf test to expect at-least-once r=dt a=dt We don't provide exactly once so we don't want to test for it. Release note: none. Epic: none. Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Andrew Baptist <[email protected]> Co-authored-by: Mira Radeva <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: David Taylor <[email protected]>
cockroachdb · Dec 10, 2024 · 95d95a6 · 95d95a6
8 parents 4b31b16 + 202bb43 + db408d2 + 87b463a + 35a018c + 67b1c9c + ca0365c + 50f8fe0
commit 95d95a6
Show file tree

Hide file tree

Showing 32 changed files with 1,582 additions and 65 deletions.
diff --git a/build/bazelutil/check.sh b/build/bazelutil/check.sh
@@ -18,6 +18,7 @@ GIT_GREP="git $CONFIGS grep"
 EXISTING_GO_GENERATE_COMMENTS="
 pkg/config/field.go://go:generate stringer --type=Field --linecomment
 pkg/rpc/context.go://go:generate mockgen -destination=mocks_generated_test.go --package=. Dialbacker
+pkg/rpc/stream_pool.go://go:generate mockgen -destination=mocks_generated_test.go --package=. BatchStreamClient
 pkg/roachprod/vm/aws/config.go://go:generate terraformgen -o terraform/main.tf
 pkg/roachprod/prometheus/prometheus.go://go:generate mockgen -package=prometheus -destination=mocks_generated_test.go . Cluster
 pkg/cmd/roachtest/clusterstats/collector.go://go:generate mockgen -package=clusterstats -destination mocks_generated_test.go github.com/cockroachdb/cockroach/pkg/roachprod/prometheus Client

diff --git a/docs/generated/settings/settings-for-tenants.txt b/docs/generated/settings/settings-for-tenants.txt
@@ -401,4 +401,4 @@ trace.span_registry.enabled	boolean	false	if set, ongoing traces can be seen at
 trace.zipkin.collector	string		the address of a Zipkin instance to receive traces, as <host>:<port>. If no port is specified, 9411 will be used.	application
 ui.database_locality_metadata.enabled	boolean	true	if enabled shows extended locality data about databases and tables in DB Console which can be expensive to compute	application
 ui.display_timezone	enumeration	etc/utc	the timezone used to format timestamps in the ui [etc/utc = 0, america/new_york = 1]	application
-version	version	1000024.3-upgrading-to-1000025.1-step-008	set the active cluster version in the format '<major>.<minor>'	application
+version	version	1000024.3-upgrading-to-1000025.1-step-010	set the active cluster version in the format '<major>.<minor>'	application
diff --git a/docs/generated/settings/settings.html b/docs/generated/settings/settings.html
@@ -360,6 +360,6 @@
 <tr><td><div id="setting-trace-zipkin-collector" class="anchored"><code>trace.zipkin.collector</code></div></td><td>string</td><td><code></code></td><td>the address of a Zipkin instance to receive traces, as &lt;host&gt;:&lt;port&gt;. If no port is specified, 9411 will be used.</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
 <tr><td><div id="setting-ui-database-locality-metadata-enabled" class="anchored"><code>ui.database_locality_metadata.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>if enabled shows extended locality data about databases and tables in DB Console which can be expensive to compute</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
 <tr><td><div id="setting-ui-display-timezone" class="anchored"><code>ui.display_timezone</code></div></td><td>enumeration</td><td><code>etc/utc</code></td><td>the timezone used to format timestamps in the ui [etc/utc = 0, america/new_york = 1]</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
-<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000024.3-upgrading-to-1000025.1-step-008</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
+<tr><td><div id="setting-version" class="anchored"><code>version</code></div></td><td>version</td><td><code>1000024.3-upgrading-to-1000025.1-step-010</code></td><td>set the active cluster version in the format &#39;&lt;major&gt;.&lt;minor&gt;&#39;</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
 </tbody>
 </table>
diff --git a/pkg/clusterversion/cockroach_versions.go b/pkg/clusterversion/cockroach_versions.go
@@ -198,6 +198,10 @@ const (
 	// range-ID local key, which is written below raft.
 	V25_1_AddRangeForceFlushKey
 
+	// V25_1_BatchStreamRPC adds the BatchStream RPC, which allows for more
+	// efficient Batch unary RPCs.
+	V25_1_BatchStreamRPC
+
 	// *************************************************
 	// Step (1) Add new versions above this comment.
 	// Do not add new versions to a patch release.
@@ -240,6 +244,7 @@ var versionTable = [numKeys]roachpb.Version{
 	V25_1_AddJobsTables:          {Major: 24, Minor: 3, Internal: 4},
 	V25_1_MoveRaftTruncatedState: {Major: 24, Minor: 3, Internal: 6},
 	V25_1_AddRangeForceFlushKey:  {Major: 24, Minor: 3, Internal: 8},
+	V25_1_BatchStreamRPC:         {Major: 24, Minor: 3, Internal: 10},
 
 	// *************************************************
 	// Step (2): Add new versions above this comment.

diff --git a/pkg/cmd/roachtest/roachtestutil/validation_check.go b/pkg/cmd/roachtest/roachtestutil/validation_check.go
@@ -128,7 +128,11 @@ func CheckInvalidDescriptors(ctx context.Context, db *gosql.DB) error {
 // validateTokensReturned ensures that all RACv2 tokens are returned to the pool
 // at the end of the test.
 func ValidateTokensReturned(
-	ctx context.Context, t test.Test, c cluster.Cluster, nodes option.NodeListOption,
+	ctx context.Context,
+	t test.Test,
+	c cluster.Cluster,
+	nodes option.NodeListOption,
+	waitTime time.Duration,
 ) {
 	t.L().Printf("validating all tokens returned")
 	for _, node := range nodes {
@@ -163,10 +167,10 @@ func ValidateTokensReturned(
 				}
 			}
 			return nil
-			// We wait up to 10 minutes for the tokens to be returned. In tests which
+			// We wait up to waitTime for the tokens to be returned. In tests which
 			// purposefully create a send queue towards a node, the queue may take a
 			// while to drain. The tokens will not be returned until the queue is
 			// empty and there are no inflight requests.
-		}, 10*time.Minute)
+		}, waitTime)
 	}
 }
diff --git a/pkg/cmd/roachtest/tests/admission_control_elastic_mixed_version.go b/pkg/cmd/roachtest/tests/admission_control_elastic_mixed_version.go
@@ -133,7 +133,7 @@ func registerElasticWorkloadMixedVersion(r registry.Registry) {
 			mvt.Run()
 			// TODO(pav-kv): also validate that the write throughput was kept under
 			// control, and the foreground traffic was not starved.
-			roachtestutil.ValidateTokensReturned(ctx, t, c, c.CRDBNodes())
+			roachtestutil.ValidateTokensReturned(ctx, t, c, c.CRDBNodes(), time.Minute)
 		},
 	})
 }
diff --git a/pkg/cmd/roachtest/tests/perturbation/framework.go b/pkg/cmd/roachtest/tests/perturbation/framework.go
@@ -669,7 +669,14 @@ func (v variations) runTest(ctx context.Context, t test.Test, c cluster.Cluster)
 	t.L().Printf("validating stats after the perturbation")
 	failures = append(failures, isAcceptableChange(t.L(), baselineStats, afterStats, v.acceptableChange)...)
 	require.True(t, len(failures) == 0, strings.Join(failures, "\n"))
-	roachtestutil.ValidateTokensReturned(ctx, t, v, v.stableNodes())
+	// TODO(baptist): Look at the time for token return in actual tests to
+	// determine if this can be lowered further.
+	tokenReturnTime := 10 * time.Minute
+	// TODO(#137017): Increase the return time if disk bandwidth limit is set.
+	if v.diskBandwidthLimit != "0" {
+		tokenReturnTime = 1 * time.Hour
+	}
+	roachtestutil.ValidateTokensReturned(ctx, t, v, v.stableNodes(), tokenReturnTime)
 }
 
 func (v variations) applyClusterSettings(ctx context.Context, t test.Test) {

diff --git a/pkg/crosscluster/logical/udf_row_processor_test.go b/pkg/crosscluster/logical/udf_row_processor_test.go
@@ -188,7 +188,8 @@ func TestUDFPreviousValue(t *testing.T) {
 	runnerA.Exec(t, "UPDATE tallies SET v = 15 WHERE pk = 1")
 	WaitUntilReplicatedTime(t, s.Clock().Now(), runnerB, jobBID)
 
-	runnerB.CheckQueryResults(t, "SELECT * FROM tallies", [][]string{
-		{"1", "25"},
+	// At-least-once delivery means it should be at least 25 (might be 30/35/etc).
+	runnerB.CheckQueryResults(t, "SELECT v >= 25 FROM tallies", [][]string{
+		{"true"},
 	})
 }
diff --git a/pkg/kv/kvclient/kvcoord/dist_sender_circuit_breaker_test.go b/pkg/kv/kvclient/kvcoord/dist_sender_circuit_breaker_test.go
@@ -45,7 +45,7 @@ func TestDistSenderReplicaStall(t *testing.T) {
 		// The lease won't move unless we use expiration-based leases. We also
 		// speed up the test by reducing various intervals and timeouts.
 		st := cluster.MakeTestingClusterSettings()
-		kvserver.ExpirationLeasesOnly.Override(ctx, &st.SV, true)
+		kvserver.OverrideDefaultLeaseType(ctx, &st.SV, roachpb.LeaseExpiration)
 		kvcoord.CircuitBreakersMode.Override(
 			ctx, &st.SV, kvcoord.DistSenderCircuitBreakersAllRanges,
 		)

diff --git a/pkg/kv/kvclient/kvcoord/transport_test.go b/pkg/kv/kvclient/kvcoord/transport_test.go
@@ -255,6 +255,12 @@ func (m *mockInternalClient) Batch(
 	return br, nil
 }
 
+func (m *mockInternalClient) BatchStream(
+	ctx context.Context, opts ...grpc.CallOption,
+) (kvpb.Internal_BatchStreamClient, error) {
+	return nil, fmt.Errorf("unsupported BatchStream call")
+}
+
 // RangeLookup implements the kvpb.InternalClient interface.
 func (m *mockInternalClient) RangeLookup(
 	ctx context.Context, rl *kvpb.RangeLookupRequest, _ ...grpc.CallOption,

diff --git a/pkg/kv/kvclient/kvtenant/connector_test.go b/pkg/kv/kvclient/kvtenant/connector_test.go
@@ -121,6 +121,10 @@ func (*mockServer) Batch(context.Context, *kvpb.BatchRequest) (*kvpb.BatchRespon
 	panic("unimplemented")
 }
 
+func (m *mockServer) BatchStream(stream kvpb.Internal_BatchStreamServer) error {
+	panic("implement me")
+}
+
 func (m *mockServer) MuxRangeFeed(server kvpb.Internal_MuxRangeFeedServer) error {
 	panic("implement me")
 }

diff --git a/pkg/kv/kvpb/api.proto b/pkg/kv/kvpb/api.proto
@@ -3668,44 +3668,50 @@ message JoinNodeResponse {
 
 // Batch and RangeFeed service implemented by nodes for KV API requests.
 service Internal {
-  rpc Batch              (BatchRequest)              returns (BatchResponse)                  {}
+  rpc Batch (BatchRequest) returns (BatchResponse) {}
+
+  // BatchStream is a streaming variant of Batch. There is a 1:1 correspondence
+  // between requests and responses. The method is used to facilitate pooling of
+  // gRPC streams to avoid the overhead of creating and discarding a new stream
+  // for each unary Batch RPC invocation. See rpc.BatchStreamPool.
+  rpc BatchStream (stream BatchRequest) returns (stream BatchResponse) {}
+
   rpc RangeLookup        (RangeLookupRequest)        returns (RangeLookupResponse)            {}
-  rpc MuxRangeFeed       (stream RangeFeedRequest)   returns (stream MuxRangeFeedEvent) {}
+  rpc MuxRangeFeed       (stream RangeFeedRequest)   returns (stream MuxRangeFeedEvent)       {}
   rpc GossipSubscription (GossipSubscriptionRequest) returns (stream GossipSubscriptionEvent) {}
   rpc ResetQuorum        (ResetQuorumRequest)        returns (ResetQuorumResponse)            {}
 
   // TokenBucket is used by tenants to obtain Request Units and report
   // consumption.
-  rpc TokenBucket        (TokenBucketRequest)        returns (TokenBucketResponse)            {}
+  rpc TokenBucket (TokenBucketRequest) returns (TokenBucketResponse) {}
 
   // Join a bootstrapped cluster. If the target node is itself not part of a
   // bootstrapped cluster, an appropriate error is returned.
-  rpc Join(JoinNodeRequest) returns (JoinNodeResponse) { }
+  rpc Join (JoinNodeRequest) returns (JoinNodeResponse) {}
 
   // GetSpanConfigs is used to fetch the span configurations over a given
   // keyspan.
-  rpc GetSpanConfigs (GetSpanConfigsRequest) returns (GetSpanConfigsResponse) { }
+  rpc GetSpanConfigs (GetSpanConfigsRequest) returns (GetSpanConfigsResponse) {}
 
   // GetAllSystemSpanConfigsThatApply is used to fetch all system span
   // configurations that apply over a tenant's ranges.
   rpc GetAllSystemSpanConfigsThatApply (GetAllSystemSpanConfigsThatApplyRequest) returns (GetAllSystemSpanConfigsThatApplyResponse) {}
 
   // UpdateSpanConfigs is used to update the span configurations over given
   // keyspans.
-  rpc UpdateSpanConfigs (UpdateSpanConfigsRequest) returns (UpdateSpanConfigsResponse) { }
+  rpc UpdateSpanConfigs (UpdateSpanConfigsRequest) returns (UpdateSpanConfigsResponse) {}
 
   // SpanConfigConformance is used to determine whether ranges backing the given
   // keyspans conform to span configs that apply over them.
-  rpc SpanConfigConformance (SpanConfigConformanceRequest) returns (SpanConfigConformanceResponse) { }
+  rpc SpanConfigConformance (SpanConfigConformanceRequest) returns (SpanConfigConformanceResponse) {}
 
   // TenantSettings is used by tenants to obtain and stay up to date with tenant
   // setting overrides.
-  rpc TenantSettings (TenantSettingsRequest) returns (stream TenantSettingsEvent) { }
-
+  rpc TenantSettings (TenantSettingsRequest) returns (stream TenantSettingsEvent) {}
 
   // GetRangeDescriptors is used by tenants to get range descriptors for their
   // own ranges.
- rpc GetRangeDescriptors (GetRangeDescriptorsRequest) returns (stream GetRangeDescriptorsResponse) { }
+ rpc GetRangeDescriptors (GetRangeDescriptorsRequest) returns (stream GetRangeDescriptorsResponse) {}
 }
 
 // GetRangeDescriptorsRequest is used to fetch range descriptors.

diff --git a/pkg/kv/kvpb/kvpbmock/mocks_generated.go b/pkg/kv/kvpb/kvpbmock/mocks_generated.go
diff --git a/pkg/kv/kvserver/consistency_queue_test.go b/pkg/kv/kvserver/consistency_queue_test.go
@@ -606,18 +606,25 @@ func testConsistencyQueueRecomputeStatsImpl(t *testing.T, hadEstimates bool) {
 		t.Fatal(err)
 	}
 
-	// Force a run of the consistency queue, otherwise it might take a while.
-	store := tc.GetFirstStoreFromServer(t, 0)
-	require.NoError(t, store.ForceConsistencyQueueProcess())
-
-	// The stats should magically repair themselves. We'll first do a quick check
-	// and then a full recomputation.
-	repl, _, err := tc.Servers[0].GetStores().(*kvserver.Stores).GetReplicaForRangeID(ctx, rangeID)
-	require.NoError(t, err)
-	ms := repl.GetMVCCStats()
-	if ms.SysCount >= sysCountGarbage {
-		t.Fatalf("still have a SysCount of %d", ms.SysCount)
-	}
+	// When running with leader leases, it might take an extra election interval
+	// for a lease to be established after adding the voters above because the
+	// leader needs to get store liveness support from the followers. The stats
+	// re-computation runs on the leaseholder and will fail if there isn't one.
+	testutils.SucceedsSoon(t, func() error {
+		// Force a run of the consistency queue, otherwise it might take a while.
+		store := tc.GetFirstStoreFromServer(t, 0)
+		require.NoError(t, store.ForceConsistencyQueueProcess())
+
+		// The stats should magically repair themselves. We'll first do a quick check
+		// and then a full recomputation.
+		repl, _, err := tc.Servers[0].GetStores().(*kvserver.Stores).GetReplicaForRangeID(ctx, rangeID)
+		require.NoError(t, err)
+		ms := repl.GetMVCCStats()
+		if ms.SysCount >= sysCountGarbage {
+			return errors.Newf("still have a SysCount of %d", ms.SysCount)
+		}
+		return nil
+	})
 
 	if delta := computeDelta(db0); delta != (enginepb.MVCCStats{}) {
 		t.Fatalf("stats still in need of adjustment: %+v", delta)