Skip to content

Commit

Permalink
kv: circuit-break requests to unavailable replicas
Browse files Browse the repository at this point in the history
Fixes cockroachdb#33007.
Closes cockroachdb#61311.

This PR uses the circuit breaker package introduced in cockroachdb#73641 to address
issue cockroachdb#33007: When a replica becomes unavailable, it should eagerly
refuse traffic that it believes would simply hang.

Concretely, every request to the Replica gets a cancelable context that
is sensitive to the circuit breaker tripping (relying on context
cancellation makes sure we don't end up with a second channel that needs
to be plumbed to everyone and their dog; Go makes it really difficult to
join two cancellation chains); if the breaker is tripped when the
request arrives, it is refused right away. In either case, the outgoing
error is augmented to carry information about the tripped breaker. This
isn't 100% trivial, since retreating from in-flight replication
typically causes an `AmbiguousResultError`, and while we could avoid it
in some cases we can't in all. A similar problem occurs when the
canceled request was waiting on a lease, in which case the result is a
NotLeaseholderError.

For any request that made it in, if it enters replication but does not
manage to succeed within the timeout specified by the
`kv.replica_circuit_breaker.slow_replication_threshold` cluster setting,
the breaker is tripped, cancelling all inflight and future requests
until the breaker heals. Perhaps surprisingly, the existing "slowness"
detection (the informational "have been waiting ... for proposal"
warning in `executeWriteBatch`) was moved deeper into replication
(`refreshProposalsLocked`), where it now trips the breaker.  This has
the advantage of providing a unified path for lease requests (which
don't go through `executeWriteBatch`) and pipelined writes (which return
before waiting on the inflight replication process). To make this work,
we need to pick a little fight with how leases are (not) refreshed
(cockroachdb#74711) and we need to remember the ticks at which a proposal was first
inserted (despite potential reproposals).

A tripped breaker deploys a background probe which sends a
`ProbeRequest` (introduced in cockroachdb#72972) to determine when to heal; this is
roughly the case whenever replicating a command through Raft is possible
for the Replica, either by appending to the log as the Raft leader, or
by forwarding to the Raft leader. A tiny bit of special casing allows
requests made by the probe to bypass the breaker.

As of this PR, the cluster setting defaults to zero (disabling the
entire mechanism) until some necessary follow-up items have been
addressed (see cockroachdb#74705). For example, the breaker-sensitive context
cancelation is a toy implementation that comes with too much of a
performance overhead (one extra goroutine per request); cockroachdb#74707 will
address that.  Other things not done here that we certainly want in the
22.1 release are UI work (cockroachdb#74713) and metrics (cockroachdb#74505).

The release note is deferred to cockroachdb#74705 (where breakers are turned on).

Release note: None
  • Loading branch information
tbg committed Jan 13, 2022
1 parent 8d22176 commit e86f955
Show file tree
Hide file tree
Showing 20 changed files with 956 additions and 78 deletions.
1 change: 1 addition & 0 deletions docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
<tr><td><code>kv.range_split.by_load_enabled</code></td><td>boolean</td><td><code>true</code></td><td>allow automatic splits of ranges based on where load is concentrated</td></tr>
<tr><td><code>kv.range_split.load_qps_threshold</code></td><td>integer</td><td><code>2500</code></td><td>the QPS over which, the range becomes a candidate for load based splitting</td></tr>
<tr><td><code>kv.rangefeed.enabled</code></td><td>boolean</td><td><code>false</code></td><td>if set, rangefeed registration is enabled</td></tr>
<tr><td><code>kv.replica_circuit_breaker.slow_replication_threshold</code></td><td>duration</td><td><code>0s</code></td><td>duration after which slow proposals trip the per-Replica circuit breaker (zero duration disables breakers)</td></tr>
<tr><td><code>kv.replication_reports.interval</code></td><td>duration</td><td><code>1m0s</code></td><td>the frequency for generating the replication_constraint_stats, replication_stats_report and replication_critical_localities reports (set to 0 to disable)</td></tr>
<tr><td><code>kv.snapshot_rebalance.max_rate</code></td><td>byte size</td><td><code>32 MiB</code></td><td>the rate limit (bytes/sec) to use for rebalance and upreplication snapshots</td></tr>
<tr><td><code>kv.snapshot_recovery.max_rate</code></td><td>byte size</td><td><code>32 MiB</code></td><td>the rate limit (bytes/sec) to use for recovery snapshots</td></tr>
Expand Down
4 changes: 4 additions & 0 deletions pkg/kv/kvserver/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ go_library(
"replica_application_state_machine.go",
"replica_backpressure.go",
"replica_batch_updates.go",
"replica_circuit_breaker.go",
"replica_closedts.go",
"replica_command.go",
"replica_consistency.go",
Expand Down Expand Up @@ -162,6 +163,7 @@ go_library(
"//pkg/util",
"//pkg/util/admission",
"//pkg/util/bufalloc",
"//pkg/util/circuit",
"//pkg/util/contextutil",
"//pkg/util/ctxgroup",
"//pkg/util/encoding",
Expand Down Expand Up @@ -221,6 +223,7 @@ go_test(
"client_rangefeed_test.go",
"client_relocate_range_test.go",
"client_replica_backpressure_test.go",
"client_replica_circuit_breaker_test.go",
"client_replica_gc_test.go",
"client_replica_test.go",
"client_spanconfigs_test.go",
Expand Down Expand Up @@ -374,6 +377,7 @@ go_test(
"//pkg/ts/tspb",
"//pkg/util",
"//pkg/util/caller",
"//pkg/util/circuit",
"//pkg/util/contextutil",
"//pkg/util/ctxgroup",
"//pkg/util/encoding",
Expand Down
Loading

0 comments on commit e86f955

Please sign in to comment.