kvserver: fix trace span use-after-finish in apply pipeline #105877
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reported internally1. It's possible for a proposal to be inserted into the
proposal buffer just before it is applied, and to be flushed when it is
applied. In this case, we would, in
propBuf.FlushLockedWithRaftGroup
, log tothe trace span, which is a problem since that span got finished when the
proposal got applied.
While this crash happened under
useReproposalsV2==true
, I'm unsure why thiswould be a new problem with
useReproposalsV2
, and it doesn't seem to beparticularly common either2. However, I found a small gap in our handling of
applied proposals, where one could actually show up in
FlushLockedWithRaftGroup
and explain this crash.I tightened the existing protections to drop the proposal before it reaches
this point more reliably. Now we should no longer see this crash; if we did
again, this would indicate a more serious problem in the trace span lifecycle.
Epic: CRDB-25287
Release note: None
Footnotes
https://cockroachlabs.slack.com/archives/C0KB9Q03D/p1688058044517609 ↩
./dev test --stress ./pkg/ccl/changefeedccl/ --filter TestChangefeedExactlyOnceExport --cpus 6
: 2456 runs so far, 0 failures, over 5m0s ↩