Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: fix double span Finish on reproposals #108775

Merged
merged 4 commits into from
Aug 21, 2023

Conversation

pav-kv
Copy link
Collaborator

@pav-kv pav-kv commented Aug 15, 2023

Since #106750, ProposalData is being copied on superseding reproposals. The caller only knows about the original proposal, and only detaches its context / tracing span when abandoning the request (e.g. on timeout). By the time of abandoning the request, there might have been a few superseding reproposals, and the context / tracing span is being used by multiple ProposalData objects.

In addition to rewriting the original proposal.ctx, we should do the same for all the ProposalData objects corresponding to the same request. This commit unbinds the context for older proposals when reproposing them, and unbinds the latest reproposal context when cleaning up / abandoning the request.

Fixes #107521
Fixes #108534
Fixes #108241
Fixes #108663
Fixes #108696
Fixes #108837

Epic: CRDB-25287
Release note: none

@blathers-crl
Copy link

blathers-crl bot commented Aug 15, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pav-kv pav-kv force-pushed the fix-reproposal-span-finish branch 3 times, most recently from 8d61ee6 to 6e98b9c Compare August 16, 2023 09:00
@pav-kv pav-kv requested a review from erikgrinaker August 16, 2023 09:01
@pav-kv pav-kv marked this pull request as ready for review August 16, 2023 09:01
@pav-kv pav-kv requested a review from a team August 16, 2023 09:01
@pav-kv pav-kv force-pushed the fix-reproposal-span-finish branch from 6e98b9c to 24eb9bb Compare August 16, 2023 09:24
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in principle, just some minor tweaks.

pkg/kv/kvserver/replica_application_result.go Show resolved Hide resolved
pkg/kv/kvserver/replica_application_result.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/replica_application_result.go Show resolved Hide resolved
// proposals for post-processing.
//
// TODO(pavelkalinnikov): prove and assert that origP.reproposal was nil.
origP.reproposal = newProposal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will prevent garbage collection of all the intermediate reproposals. I don't think we hold on to them from anywhere else? Seems like it would be worthwhile to avoid.

I'm guessing the old proposal is discarded after we hit this code path? If so, we really only need the first and last proposal in the chain. We could e.g. have a backreference to the original proposal and update its forward reference to the latest reproposal, or we could have a cancel function that's propagated down to the active reproposal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked this bit. It became a bit more ugly, but the GC benefit seems worth it. Longer term, we should improve the lifecycle.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, had to drop the invariant check.

@pav-kv pav-kv force-pushed the fix-reproposal-span-finish branch from afa3a59 to 9becf3b Compare August 18, 2023 15:06
@pav-kv pav-kv requested a review from erikgrinaker August 18, 2023 15:10
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but let's squash the chain/seed commits. We should also have a basic test for the proposal linking, if we can easily add one.

pkg/kv/kvserver/replica_raft.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/replica_proposal.go Outdated Show resolved Hide resolved
@pav-kv pav-kv force-pushed the fix-reproposal-span-finish branch from 9becf3b to 8a24ab2 Compare August 21, 2023 11:09
Copy link
Collaborator Author

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed inline comments, and squashed commits.

pkg/kv/kvserver/replica_proposal.go Outdated Show resolved Hide resolved
pkg/kv/kvserver/replica_raft.go Outdated Show resolved Hide resolved
@pav-kv pav-kv force-pushed the fix-reproposal-span-finish branch from 8a24ab2 to d3f93d1 Compare August 21, 2023 11:13
@pav-kv pav-kv requested a review from erikgrinaker August 21, 2023 11:14
@pav-kv
Copy link
Collaborator Author

pav-kv commented Aug 21, 2023

@erikgrinaker Ah hold on, I'm looking if I can add a test.

pav-kv added 3 commits August 21, 2023 12:48
Since cockroachdb#106750, ProposalData is being copied on superseding reproposals. The
caller only knows about the original proposal, and only detaches its context /
tracing span when abandoning the request (e.g. on timeout). By the time of
abandoning the request, there might have been a few superseding reproposals,
and the context / tracing span is being used by multiple ProposalData objects.

In addition to rewriting the original proposal.ctx, we should do the same for
all the ProposalData objects corresponding to the same request. This commit
unbinds the context for older proposals when reproposing them, and unbinds the
latest reproposal context when cleaning up / abandoning the request.

Epic: CRDB-25287
Release note: none
Epic: none
Release note: none
Epic: none
Release note: none
@pav-kv pav-kv force-pushed the fix-reproposal-span-finish branch from d3f93d1 to b83edaf Compare August 21, 2023 11:48
Epic: none
Release note: none
@pav-kv
Copy link
Collaborator Author

pav-kv commented Aug 21, 2023

@erikgrinaker Added a test, PTAL.

@pav-kv pav-kv requested a review from erikgrinaker August 21, 2023 12:44
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. It's possible that we prematurely clear the seed proposal's context here, which could possibly lose some trace events for the remainder of the failed application, but even if we do we likely don't care about them at this point anyway

@pav-kv pav-kv added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Aug 21, 2023
@pav-kv
Copy link
Collaborator Author

pav-kv commented Aug 21, 2023

bors r=erikgrinaker

@pav-kv pav-kv removed the backport-23.1.x Flags PRs that need to be backported to 23.1 label Aug 21, 2023
@pav-kv
Copy link
Collaborator Author

pav-kv commented Aug 21, 2023

We don't need to backport this, this refactoring is post-23.1.

@craig
Copy link
Contributor

craig bot commented Aug 21, 2023

Build succeeded:

@craig craig bot merged commit 2ac44e6 into cockroachdb:master Aug 21, 2023
@pav-kv pav-kv deleted the fix-reproposal-span-finish branch August 21, 2023 14:22
kvoli added a commit to kvoli/cockroach that referenced this pull request Sep 6, 2023
`TestMultiRegionDataDriven` was skipped in cockroachdb#108107 due to finish
(tracing span) being called twice in raft reproposals cockroachdb#107521, which
 cockroachdb#108775 fixed.

Unskip the top level `TestMultiRegionDataDriven` test. Note
`/secondary_region` is still skipped due to a known allocator bug.

Informs: cockroachdb#98020
Epic: none
Release note: None
craig bot pushed a commit that referenced this pull request Sep 6, 2023
110063: ccl/multiregionccl: unskip test multi region dd parent test r=rafiss a=kvoli

`TestMultiRegionDataDriven` was skipped in #108107 due to finish
(tracing span) being called twice in raft reproposals #107521, which
 #108775 fixed.

Unskip the top level `TestMultiRegionDataDriven` test. Note
`/secondary_region` is still skipped due to a known allocator bug.

Informs: #98020
Epic: none
Release note: None

Co-authored-by: Austen McClernon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants