release-20.2: kv: don't leak raft application tracing spans on or after ErrRemoved #54267
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 3/3 commits from #54140.
The cherry-picks were all clean.
/cc @cockroachdb/release
Fixes #53677.
This change ensures that we properly finish tracing spans of Raft commands that throw ErrRemoved errors in ApplySideEffects. It then ensures that we properly finish tracing spans of Raft commands that follow a command that throws an
ErrRemoved
. Before this, these commands would be abandoned and would never be finished. The effects of this are theoretically even worse than those fixed in the previous commit because these leaked commands could be locally proposed, so we may be abandoning a local proposer indefinitely.It's not clear that we ever saw an instance of this. It seems rare for a local proposal to end up in the same CommittedEntries batch as a command that removes a replica because of the lease requirements, but it doesn't seem impossible, especially of the local proposal was a RequestLease request.
I was originally intending to do something more dramatic and make
replicaStateMachine.ApplySideEffects
responsible for acknowledging proposers in all cases, but doing so turned out to be pretty invasive so I was concerned that it would be harder to backport to v20.2 and to v20.1. I may revisit that in the future.Release justification: low risk, high benefit changes to existing functionality.