Skip to content

Commit

Permalink
kvserver: allow circuit-breaker to serve reads
Browse files Browse the repository at this point in the history
This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica
circuit breakers (cockroachdb#33007). The earlier implementation relied on context
cancellation and coarsely failed all requests addressing the Replica
when the breaker was tripped.

This had two downsides: First, there was a (small) performance overhead
for implementing the cancellation that was paid even in the common case
of a healthy Replica. Second, and more importantly, the coarseness meant
that we'd potentially fail many requests that would otherwise succeed,
and in particular follower reads.

@nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with
the concept of "poisoning" and that this could result in fine-grained
circuit breaker behavior where only requests that are truly affected by
unavailability (at the replication layer) would be rejected.

This commit implements that strategy:

A request's latches are poisoned if its completion is predicated on the
replication layer being healthy.  In other words, when the breaker
trips, all inflight proposals have their latches poisoned and new
proposals are failed fast. However, and this is the big difference,
reads can potentially still be accepted in either of two scenarios:

- a valid follower read remains valid regardless of the circuit breaker
  status, and also regardless of inflight proposals (follower reads
  don't observe latches).
- a read that can be served under the current lease and which does
  not conflict with any of the stuck proposals in the replication
  layer (= poisoned latches) can also be served.

In short, reads only fail fast if they encounter a poisoned latch or
need to request a lease. (If they opted out of fail-fast behavior,
they behave as today).

Latch poisoning is added as a first-class concept in the `concurrency`
package, and a structured error `PoisonError` is introduced. This error
in particular contains the span and timestamp of the poisoned latch that
prompted the fail-fast.

Lease proposals now always use `poison.Policy_Wait`, leaving the
fail-fast behavior to the caller. This simplifies things since multiple
callers with their own `poison.Policy` can multiplex onto a single
inflight lease proposal.

Addresses cockroachdb#74799.

Release note: None
Release justification: 22.1 project work
  • Loading branch information
tbg committed Mar 3, 2022
1 parent bd5def5 commit d2041fe
Show file tree
Hide file tree
Showing 37 changed files with 1,103 additions and 794 deletions.
1 change: 1 addition & 0 deletions pkg/gen/protobuf.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ PROTOBUF_SRCS = [
"//pkg/kv/kvnemesis:kvnemesis_go_proto",
"//pkg/kv/kvserver/closedts/ctpb:ctpb_go_proto",
"//pkg/kv/kvserver/concurrency/lock:lock_go_proto",
"//pkg/kv/kvserver/concurrency/poison:poison_go_proto",
"//pkg/kv/kvserver/kvserverpb:kvserverpb_go_proto",
"//pkg/kv/kvserver/liveness/livenesspb:livenesspb_go_proto",
"//pkg/kv/kvserver/loqrecovery/loqrecoverypb:loqrecoverypb_go_proto",
Expand Down
3 changes: 1 addition & 2 deletions pkg/kv/kvserver/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,6 @@ go_library(
"replica_backpressure.go",
"replica_batch_updates.go",
"replica_circuit_breaker.go",
"replica_circuit_breaker_cancelstorage.go",
"replica_closedts.go",
"replica_command.go",
"replica_consistency.go",
Expand Down Expand Up @@ -124,6 +123,7 @@ go_library(
"//pkg/kv/kvserver/closedts/sidetransport",
"//pkg/kv/kvserver/closedts/tracker",
"//pkg/kv/kvserver/concurrency",
"//pkg/kv/kvserver/concurrency/poison",
"//pkg/kv/kvserver/constraint",
"//pkg/kv/kvserver/gc",
"//pkg/kv/kvserver/idalloc",
Expand Down Expand Up @@ -227,7 +227,6 @@ go_test(
"client_rangefeed_test.go",
"client_relocate_range_test.go",
"client_replica_backpressure_test.go",
"client_replica_circuit_breaker_bench_test.go",
"client_replica_circuit_breaker_test.go",
"client_replica_gc_test.go",
"client_replica_test.go",
Expand Down
124 changes: 0 additions & 124 deletions pkg/kv/kvserver/client_replica_circuit_breaker_bench_test.go

This file was deleted.

Loading

0 comments on commit d2041fe

Please sign in to comment.