Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: fix trace span use-after-finish in apply pipeline #105877

Merged
merged 1 commit into from
Jun 30, 2023

Conversation

tbg
Copy link
Member

@tbg tbg commented Jun 30, 2023

Reported internally1. It's possible for a proposal to be inserted into the
proposal buffer just before it is applied, and to be flushed when it is
applied. In this case, we would, in propBuf.FlushLockedWithRaftGroup, log to
the trace span, which is a problem since that span got finished when the
proposal got applied.

While this crash happened under useReproposalsV2==true, I'm unsure why this
would be a new problem with useReproposalsV2, and it doesn't seem to be
particularly common either2. However, I found a small gap in our handling of
applied proposals, where one could actually show up in
FlushLockedWithRaftGroup and explain this crash.

I tightened the existing protections to drop the proposal before it reaches
this point more reliably. Now we should no longer see this crash; if we did
again, this would indicate a more serious problem in the trace span lifecycle.

Epic: CRDB-25287
Release note: None

Footnotes

  1. https://cockroachlabs.slack.com/archives/C0KB9Q03D/p1688058044517609

  2. ./dev test --stress ./pkg/ccl/changefeedccl/ --filter TestChangefeedExactlyOnceExport --cpus 6: 2456 runs so far, 0 failures, over 5m0s

Reported internally[^1]. It's possible for a proposal to be inserted into the
proposal buffer just before it is applied, and to be flushed when it is
applied. In this case, we would, in `propBuf.FlushLockedWithRaftGroup`, log to
the trace span, which is a problem since that span got finished when the
proposal got applied.

While this crash happened under `useReproposalsV2==true`, I'm unsure why this
would be a new problem with `useReproposalsV2, and it doesn't seem to be
particularly common either[^2]. However, I found a small gap in our handling of
applied proposals, where one could actually show up in
`FlushLockedWithRaftGroup` and explain this crash.

I tightened the existing protections to drop the proposal before it reaches
this point more reliably. Now we should no longer see this crash; if we did
again, this would indicate a more serious problem in the trace span lifecycle.

[^1]: https://cockroachlabs.slack.com/archives/C0KB9Q03D/p1688058044517609
[^2]: `./dev test --stress ./pkg/ccl/changefeedccl/ --filter TestChangefeedExactlyOnceExport --cpus 6`: 2456 runs so far, 0 failures, over 5m0s
@blathers-crl
Copy link

blathers-crl bot commented Jun 30, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg requested a review from erikgrinaker June 30, 2023 09:11
@tbg tbg marked this pull request as ready for review June 30, 2023 09:12
@tbg tbg requested a review from a team June 30, 2023 09:12
@tbg
Copy link
Member Author

tbg commented Jun 30, 2023

TFTQR!

bors r=erikgrinaker

@craig
Copy link
Contributor

craig bot commented Jun 30, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants