Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kvserver: prevent build-up of abandoned consistency checks
We've seen in the events leading up to cockroachdb#75448 that a build-up of consistency check computations on a node can severely impact node performance. This commit attempts to address the main source of that, while re-working the code for easier maintainability. The way the consistency checker works is by replicating a command through Raft that, on each Replica, triggers an async checksum computations the results of which the caller collects via `CollectChecksum` requests addressed to each `Replica`. If for any reason, the caller does *not* wait to collect the checksums but instead moves on to run another consistency check (perhaps on another Range), these inflight computations can build up over time. This was the main issue in cockroachdb#75448; we were accidentally canceling the context on the leaseholder "right away", failing the consistency check (but leaving it running on all other replicas), and moving on to the next Range. As a result, some (but with spread out leaseholders, ultimately all) Replicas ended up with dozens of consistency check computations, starving I/O and CPU. We "addressed" this by avoiding this errant ctx cancellation (cockroachdb#75448 but longer-term cockroachdb#75656), but this isn't a holistic fix yet. In this commit, we make three main changes: - give the inflight consistency check computations a clean API, which makes it much easier to understand "how it works". - when returning from CollectChecksum (either on success or error, notably including context cancellation), cancel the corresponding consistency check. This solves the problem, *assuming* that CollectChecksum is reliably issued to each Replica. - reliably issue CollectChecksum to each Replica on which a computation may have been triggered. When the caller's context is canceled, still do the call with a one-second-timeout one-off Context which should be good enough to make it to the Replica and short-circuit the call. Release note: None
- Loading branch information