Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: fix seek bug in pebbleMVCCScanner #60460

Merged
merged 1 commit into from
Feb 12, 2021

Conversation

sumeerbhola
Copy link
Collaborator

@sumeerbhola sumeerbhola commented Feb 11, 2021

The seek key in pebbleMVCCScanner.seekVersion was an
unsafe key that preceded a number of calls to Next.
This used to work because the byte slice in the
MVCCKey stayed the same as we were at a different
version of the key. With the intentInterleavingIter
this is not always true: the unsafe byte slice may
have been constructed using the intentIter, which may
no longer be in the same position.

This was responsible for failures in the cdc/bank
roachtest and occasional failures in
TestChangefeedNemeses/sinkless, when separated
intents were enabled.

Some testing with mangling of unsafe keys revealed
another bug, in rangefeed.registration, that is also
fixed here.

Informs #41720

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

That looks like it must have been unpleasant to track down. Would it be worthwhile to have a debug version of MVCCIterator that randomly exercises this behavior of mutating the returned unsafe key after a positioning operation.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: very nice find.

Is this something we'll want to backport, or are we confident that it can't cause issues with any of the existing iterator implementations?

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @itsbilal and @sumeerbhola)


pkg/storage/pebble_mvcc_scanner.go, line 648 at r1 (raw file):

	// the MVCCIterator will be at a different version of the same key, it is
	// free to mutate the backing for p.curUnsafeKey.Key in an arbitrary manner.
	// So assign to this copy, to make is stable.

s/is/it/

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

That looks like it must have been unpleasant to track down.

yep, 3.5 days

Would it be worthwhile to have a debug version of MVCCIterator that randomly exercises this behavior of mutating the returned unsafe key after a positioning operation.

For some reason I was vacillating about this before, because this is the only place I know of in CockroachDB that uses the next optimization, and anyone else using an old key when the key has changed would be immediately wrong. But you suggesting this has convinced me -- I'll add it in a followup PR.

Is this something we'll want to backport, or are we confident that it can't cause issues with any of the existing iterator implementations?

This won't occur with the existing iterator, so I don't think we need a backport.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @itsbilal)


pkg/storage/pebble_mvcc_scanner.go, line 648 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/is/it/

Done

@sumeerbhola
Copy link
Collaborator Author

TestMakeIdleMonitor looks like a completely unrelated test flake that I am unable to reproduce locally

@sumeerbhola
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 11, 2021

Build failed:

@sumeerbhola
Copy link
Collaborator Author

The temporary checking I added around reuse of pebbleMVCCScanner.keyBuf did not reveal any bugs. And now the CI is also green -- I am betting this is unrelated flakiness.

I did find an UnsafeKey() reuse bug in fad6a33 when running with mangling of unsafe keys, that was causing a test failure there, that I am going to add to this PR.

The seek key in pebbleMVCCScanner.seekVersion was an
unsafe key that preceded a number of calls to Next.
This used to work because the byte slice in the
MVCCKey stayed the same as we were at a different
version of the key. With the intentInterleavingIter
this is not always true: the unsafe byte slice may
have been constructed using the intentIter, which may
no longer be in the same position.

This was responsible for failures in the cdc/bank
roachtest and occasional failures in
TestChangefeedNemeses/sinkless, when separated
intents were enabled.

Some testing with mangling of unsafe keys revealed
another bug, in rangefeed.registration, that is also
fixed here.

Informs cockroachdb#41720

Release note: None
@sumeerbhola
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Feb 12, 2021

Build succeeded:

@craig craig bot merged commit 2ff1de1 into cockroachdb:master Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants