Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv/kvserver/rangefeed: fix catchupIter construction synchronization #69613

Conversation

ajwerner
Copy link
Contributor

The catchupIter could have been constructed after the state of the underlying
store has changed. In general this doesn't seem like a disaster unless the
range has been removed or the gc threshold has changed and data is gone.

So, maybe it is a disaster.

Release justification: fixes for high-priority or high-severity bugs in
existing functionality

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner ajwerner changed the title [WIP] kv/kvserver/rangefeed: fix catchupIter construction synchronization kv/kvserver/rangefeed: fix catchupIter construction synchronization Aug 30, 2021
@ajwerner ajwerner marked this pull request as ready for review August 30, 2021 21:35
@ajwerner ajwerner force-pushed the ajwerner/synchronize-catchup-iter-construction branch from 7574979 to 72ef548 Compare August 30, 2021 21:35
@ajwerner ajwerner requested a review from a team as a code owner August 30, 2021 21:35
@ajwerner
Copy link
Contributor Author

Any idea whether this needs a release note, and, if so, what it should contain?

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Any idea whether this needs a release note, and, if so, what it should contain?

We have no evidence that this has ever actually caused an issue, so I don't think a Release note (bug fix) is necessary, but if you wanted to include one, I'd be vague and just say that a "potential race condition" has been fixed.

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/kv/kvserver/replica_rangefeed.go, line 379 at r1 (raw file):

	rtsIter := func() storage.SimpleMVCCIterator {
		// Assert that we still hold the raftMu when this is called to ensure
		// that the catchUpIter reads from the current snapshot. The replica

s/catchUpIter/rtsIter/


pkg/kv/kvserver/rangefeed/processor.go, line 276 at r1 (raw file):

				}
			}
			if err := stopper.RunAsyncTask(ctx, "rangefeed: output loop", runOutputLoop); err != nil {

Do we need to close the catchupIter (if non-nil) on the error path here? Or maybe bake that ownership into r.disconnect?

@ajwerner ajwerner force-pushed the ajwerner/synchronize-catchup-iter-construction branch from 72ef548 to c3927b0 Compare August 30, 2021 21:59
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another commit to fix the inconsistent capitalization of catchup and catchUp.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/replica_rangefeed.go, line 379 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/catchUpIter/rtsIter/

Done.


pkg/kv/kvserver/rangefeed/processor.go, line 276 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we need to close the catchupIter (if non-nil) on the error path here? Or maybe bake that ownership into r.disconnect?

Good catch. Delegated the closing of the iterator to disconnect in the error paths.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r2, 5 of 5 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

The catchupIter could have been constructed after the state of the underlying
store has changed. In general this doesn't seem like a disaster unless the
range has been removed or the gc threshold has changed and data is gone.

So, maybe it is a disaster.

Release justification: fixes for high-priority or high-severity bugs in
existing functionality

Release note: None
Release justification: non-production code change

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/synchronize-catchup-iter-construction branch from c3927b0 to 9ff249e Compare August 31, 2021 02:28
@ajwerner
Copy link
Contributor Author

TFTR!

bors r=nvanbenschoten

@craig
Copy link
Contributor

craig bot commented Aug 31, 2021

Build succeeded:

@blathers-crl
Copy link

blathers-crl bot commented Aug 31, 2021

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 2e70853 to blathers/backport-release-20.2-69613: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 20.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants