-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: use non-cancelable contexts in raft application #75656
Comments
Yeah, in cockroach/pkg/kv/kvserver/replica_raft.go Lines 518 to 526 in 852a80c
We should either do something similar for command application, or derive a non-cancellable context. |
We've seen in the events leading up to cockroachdb#75448 that a build-up of consistency check computations on a node can severely impact node performance. This commit attempts to address the main source of that, while re-working the code for easier maintainability. The way the consistency checker works is by replicating a command through Raft that, on each Replica, triggers an async checksum computations the results of which the caller collects via `CollectChecksum` requests addressed to each `Replica`. If for any reason, the caller does *not* wait to collect the checksums but instead moves on to run another consistency check (perhaps on another Range), these inflight computations can build up over time. This was the main issue in cockroachdb#75448; we were accidentally canceling the context on the leaseholder "right away", failing the consistency check (but leaving it running on all other replicas), and moving on to the next Range. As a result, some (but with spread out leaseholders, ultimately all) Replicas ended up with dozens of consistency check computations, starving I/O and CPU. We "addressed" this by avoiding this errant ctx cancellation (cockroachdb#75448 but longer-term cockroachdb#75656), but this isn't a holistic fix yet. In this commit, we make three main changes: - give the inflight consistency check computations a clean API, which makes it much easier to understand "how it works". - when returning from CollectChecksum (either on success or error, notably including context cancellation), cancel the corresponding consistency check. This solves the problem, *assuming* that CollectChecksum is reliably issued to each Replica. - reliably issue CollectChecksum to each Replica on which a computation may have been triggered. When the caller's context is canceled, still do the call with a one-second-timeout one-off Context which should be good enough to make it to the Replica and short-circuit the call. Release note: None
Wrote a bit of code here, putting this here so I remember: https://github.com/cockroachdb/cockroach/compare/master...tbg:no-downstream-ctx?expand=1 |
I updated the above prototype a little bit (removing the ctx altogether), but I don't think it's what we want for stability. Instead, I'll send a surgical PR that makes sure we don't set the caller's ctx into the proposal. |
Previously, for local proposals, we were threading the caller's context down into Raft. This has caused a few bugs related to context cancellation sensitivity - the execution of a raft command should not be impacted by whether or not the client's context has already finished. This commit derives the proposal context from the Replica's annotated context, using a tracing span derived from the client context. That way, the derived context is not cancelable, but contains the same tracing Span (thus providing similar utility) as before. Notably, the trace span should also already contain the log tags present in the client's context. The tracing has coverage through (at least) a few tests that make assertions about trace events from below-raft showing up in a trace collected by the client, which I verified by omitting the Span: ``` TestRaftSSTableSideloadingProposal TestReplicaAbandonProposal TestReplicaRetryRaftProposal TestReplicaBurstPendingCommandsAndRepropose TestReplicaReproposalWithNewLeaseIndexError TestFailureToProcessCommandClearsLocalResult TestDBAddSSTable ``` See cockroachdb#75656. Release note: None
Previously, for local proposals, we were threading the caller's context down into Raft. This has caused a few bugs related to context cancellation sensitivity - the execution of a raft command should not be impacted by whether or not the client's context has already finished. This commit derives the proposal context from the Replica's annotated context, using a tracing span derived from the client context. That way, the derived context is not cancelable, but contains the same tracing Span (thus providing similar utility) as before. Notably, the trace span should also already contain the log tags present in the client's context. The tracing has coverage through (at least) a few tests that make assertions about trace events from below-raft showing up in a trace collected by the client, which I verified by omitting the Span: ``` TestRaftSSTableSideloadingProposal TestReplicaAbandonProposal TestReplicaRetryRaftProposal TestReplicaBurstPendingCommandsAndRepropose TestReplicaReproposalWithNewLeaseIndexError TestFailureToProcessCommandClearsLocalResult TestDBAddSSTable ``` See #75656. Release note: None
Describe the problem
As of d064059, raft application on the proposal happens under the proposer's context, which may well be canceled before or mid-way through the application. However, raft application should not care about the cancellation, and we just diagnosed a bug where it did.
Also, we had problems with cancellation on the raft goroutine in the past and check against it, but evidently not well enough yet.
Expected behavior
The raft apply loop should use a ctx that is not cancelable. It makes sense to derive certain parts from the proposer - such as a trace span - but not the cancellation policy. That is, in
cockroach/pkg/kv/kvserver/replica_application_decoder.go
Lines 139 to 171 in 852a80c
we should not use a child context of
cmd.proposal.ctx
but create one derived fromctx (the worker ctx)
orcontext.Background()
.Additional data / screenshots
Should also write a test that uses a
TestingPostApplyHook
to ensure that the ctx is not cancelable, or at least that the cancellation doesn't carry over to the child. (Or more simply, that a value stashed in the proposer context is not visible at apply time).@erikgrinaker didn't we put in some invariant check about this sometime recently?
Jira issue: CRDB-12769
The text was updated successfully, but these errors were encountered: