Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: stop splitting at invalid split keys #67497

Merged
merged 1 commit into from
Jul 27, 2021

Conversation

pbardea
Copy link
Contributor

@pbardea pbardea commented Jul 12, 2021

Restore could perform invalid splits during the split and scatter phase.
This commit ensures that splits are always performed at safe keys.

Fixes #67488.

Release note (bug fix): Fix a bug where restores of data with multiple
column families could be split illegally (within a single SQL row). This
could result in temporary data unavailability until the ranges on either
side of the invalid split are merged.

@pbardea pbardea requested review from dt and a team July 12, 2021 16:57
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pbardea
Copy link
Contributor Author

pbardea commented Jul 12, 2021

Working on the tests.

I believe that this needs to go back to 20.1.

@pbardea pbardea removed request for a team and dt July 12, 2021 18:10
@pbardea pbardea marked this pull request as draft July 12, 2021 18:10
@jordanlewis
Copy link
Member

Glad the fix was so small. Are there tests cases we can add that are more general that prevent us from making this mistake in the future? Why does the AdminSplit command permit invalid SQL split keys at all?

@pbardea
Copy link
Contributor Author

pbardea commented Jul 12, 2021

Are there tests cases we can add that are more general that prevent us from making this mistake in the future?

I think there's defense that we can do here in layers. The first is a regression case that creates a backup with entries split along these boundaries.

I'm still thinking of more general test cases we can add. We would benefit with more backup/restore testing with column families that operate on larger-data sizes (so that there are several entries to work with). I think a combination of updating the default bank payload that we use for the tests to incorporate column families and also reducing the size of the entries produced by backup (metamorphic constant?) could help activate paths that could have caught something like this.

Why does the AdminSplit command permit invalid SQL split keys at all?

I looked into ensuring that AdminSplit validates invalid split keys by using EnsureSafeSplitKey, but EnsureSafeSplitKey already knows quite a bit about the SQL layer which feels very leaky for the KV layer to know so much about decoding SQL rows... That being said, it's already used by storage.MVCCFindSplitKey, which in turn is used by size-based splits, so it feels like that ship has already sailed.

@pbardea pbardea force-pushed the illegal-split-key branch 5 times, most recently from 1264d65 to 5be34fa Compare July 15, 2021 13:28
@pbardea pbardea self-assigned this Jul 16, 2021
Restore could perform invalid splits during the split and scatter phase.
This commit ensures that splits are always performed at safe keys during
restore.

Release note (bug fix): Fix a bug where restores of data with multiple
column families could be split illegally (within a single SQL row). This
could result in temporary data unavailability until the ranges on either
side of the invalid split are merged.
@pbardea pbardea force-pushed the illegal-split-key branch from 5be34fa to 5bea8fe Compare July 19, 2021 00:58
@pbardea pbardea requested a review from adityamaru July 21, 2021 14:17
@pbardea pbardea marked this pull request as ready for review July 21, 2021 14:18
@pbardea
Copy link
Contributor Author

pbardea commented Jul 21, 2021

@adityamaru This change is pretty much the same that David stamped above with an added test that I wanted to get eyes on. It also ignores any errors from EnsureSafeSplitKey since not all keys are "table keys" (e.g. /Table/51 will produce an error from EnsureSafeSplitKey but is fine to split at). We similarly ignore it in SST_Batcher.

I think a change like this is safer to backport, but a follow up will be to try and get this check into AdminSplit.

@pbardea pbardea requested a review from dt July 22, 2021 14:51
@pbardea
Copy link
Contributor Author

pbardea commented Jul 26, 2021

TFTR
bors r+

@craig
Copy link
Contributor

craig bot commented Jul 26, 2021

Build failed:

@pbardea
Copy link
Contributor Author

pbardea commented Jul 27, 2021

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 27, 2021

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 27, 2021

Build succeeded:

@craig craig bot merged commit 1375a40 into cockroachdb:master Jul 27, 2021
@adityamaru
Copy link
Contributor

blathers backport 21.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

restore: RESTORE issues illegal split in the middle of a SQL row
5 participants