Skip to content

Commit

Permalink
Merge #86883
Browse files Browse the repository at this point in the history
86883: kvserver: cancel consistency checks more reliably r=tbg a=pavelkalinnikov

This PR increases chance of propagating cancelation signal to replicas to
prevent them from running abandoned consistency check tasks. Specifically:

- The computation is aborted if the collection request is canceled.
- The computation is not started if the collection request gave up recently.
- The initiator runs all requests in parallel to reduce asynchrony, and to be
  able to cancel all the requests explicitly, instead of skipping some of them.

---
### Background

Consistency checks are initiated by `ComputeChecksum` command in the Raft log,
and run until completion under a background context. The result is collected by
the initiator via the `CollectChecksum` long poll. The task is synchronized with
the collection handler via the map of `replicaChecksum` structs.

Currently, the replica initiating the consistency check sends a collection
request to itself first, and only then to other replicas in parallel. This
results in substantial asynchrony on the receiving replica, between the request
handler and the computation task. The current solution to that is keeping the
checksum computation results in memory for `replicaChecksumGCInterval` to return
them to late arriving requests. However, there is **no symmetry** here: if the
computation starts late instead, it doesn't learn about a previously failed request.

The reason why the initiator blocks on its local checksum first is that it
computes the "master checksum", which is then added to all other requests.
However, this field is only used by the receiving end to log an inconsistency
error. The actual killing of this replica happens on the second phase of the
protocol, after the initiating replica commits another Raft message with the
`Terminate` field populated. So, there is **no strong reason to keep this blocking
behaviour**.

When the `CollectChecksum` handler exits due to a canceled context (for example,
the request timed out, or the remote caller crashed), the background task
continues to run. If it was not running, it may start in the future. In both
cases, the consistency checks pool (which has a limited size and processing
rate) spends resources on running dangling checks, and rejects useful ones.

If the initiating replica fails to compute its local checksum, it does not send
requests (or any indication to cancel) to other replicas. This is problematic
because the checksum tasks will be run on all replicas, which opens the
possibility for accumulating many such dangling checks.

---

Part of #77432

Release justification: performance and stability improvement

Release note(bug fix): A consistency check is now skipped/stopped when its
remote initiator gives up on it. Previously such checks would still be
attempted to run, and, due to the limited size of the worker pool, prevent the
useful checks from running. In addition, consistency check requests are now
sent in parallel, and cancelation signal propagates more reliably.

Co-authored-by: Pavel Kalinnikov <[email protected]>
  • Loading branch information
craig[bot] and pav-kv committed Sep 9, 2022
2 parents 7c01c81 + 6d8ace3 commit 203c078
Show file tree
Hide file tree
Showing 8 changed files with 260 additions and 272 deletions.
9 changes: 6 additions & 3 deletions pkg/kv/kvserver/api.proto
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,18 @@ message CollectChecksumRequest {
bytes checksum_id = 3 [(gogoproto.nullable) = false,
(gogoproto.customname) = "ChecksumID",
(gogoproto.customtype) = "github.com/cockroachdb/cockroach/pkg/util/uuid.UUID"];
bytes checksum = 4;
reserved 4;
// If true then the response must include the snapshot of the data from which
// the checksum is computed.
bool with_snapshot = 5;
}

message CollectChecksumResponse {
// The checksum is the sha512 hash of the requested computation. It is empty
// if the computation failed.
bytes checksum = 1;
// snapshot is set if the roachpb.ComputeChecksumRequest had snapshot = true
// and the response checksum is different from the request checksum.
// snapshot is set if the with_snapshot in CollectChecksumRequest is true. For
// example, it can be set by the caller when it has detected an inconsistency.
//
// TODO(tschottdorf): with larger ranges, this is no longer tenable.
// See https://github.com/cockroachdb/cockroach/issues/21128.
Expand Down
9 changes: 5 additions & 4 deletions pkg/kv/kvserver/consistency_queue.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,9 @@ const consistencyCheckRateMinWait = 100 * time.Millisecond
// replication factor of 7 could run 7 concurrent checks on every node.
//
// Note that checksum calculations below Raft are not tied to the caller's
// context (especially on followers), and will continue to run even after the
// caller has given up on them, which may cause them to build up.
// context, and may continue to run even after the caller has given up on them,
// which may cause them to build up. Although we do best effort to cancel the
// running task on the receiving end when the incoming request is aborted.
//
// CHECK_STATS checks do not count towards this limit, as they are cheap and the
// DistSender will parallelize them across all ranges (notably when calling
Expand All @@ -76,8 +77,8 @@ const consistencyCheckAsyncConcurrency = 7

// consistencyCheckAsyncTimeout is a below-Raft timeout for asynchronous
// consistency check calculations. These are not tied to the caller's context,
// and thus will continue to run even after the caller has given up on them, so
// we give them an upper timeout to prevent them from running forever.
// and thus may continue to run even if the caller has given up on them, so we
// give them an upper timeout to prevent them from running forever.
const consistencyCheckAsyncTimeout = time.Hour

var testingAggressiveConsistencyChecks = envutil.EnvOrDefaultBool("COCKROACH_CONSISTENCY_AGGRESSIVE", false)
Expand Down
2 changes: 1 addition & 1 deletion pkg/kv/kvserver/replica.go
Original file line number Diff line number Diff line change
Expand Up @@ -552,7 +552,7 @@ type Replica struct {
lastUpdateTimes lastUpdateTimesMap

// Computed checksum at a snapshot UUID.
checksums map[uuid.UUID]replicaChecksum
checksums map[uuid.UUID]*replicaChecksum

// proposalQuota is the quota pool maintained by the lease holder where
// incoming writes acquire quota from a fixed quota pool before going
Expand Down
8 changes: 7 additions & 1 deletion pkg/kv/kvserver/replica_application_result.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ import (
"github.com/cockroachdb/cockroach/pkg/storage/enginepb"
"github.com/cockroachdb/cockroach/pkg/util/hlc"
"github.com/cockroachdb/cockroach/pkg/util/log"
"github.com/cockroachdb/cockroach/pkg/util/stop"
"github.com/cockroachdb/errors"
)

// replica_application_*.go files provide concrete implementations of
Expand Down Expand Up @@ -334,7 +336,11 @@ func (r *Replica) handleVersionResult(ctx context.Context, version *roachpb.Vers
}

func (r *Replica) handleComputeChecksumResult(ctx context.Context, cc *kvserverpb.ComputeChecksum) {
r.computeChecksumPostApply(ctx, *cc)
err := r.computeChecksumPostApply(ctx, *cc)
// Don't log errors caused by the store quiescing, they are expected.
if err != nil && !errors.Is(err, stop.ErrUnavailable) {
log.Errorf(ctx, "failed to start ComputeChecksum task %s: %v", cc.ChecksumID, err)
}
}

func (r *Replica) handleChangeReplicasResult(
Expand Down
Loading

0 comments on commit 203c078

Please sign in to comment.