Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1: kvserver: avoid load based splits in middle of SQL row #103876

Merged
merged 4 commits into from
Jun 6, 2023

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented May 25, 2023

Backport 3/3 commits from #103690 on behalf of @kvoli.
Backport 1/1 commits from #104082 on behalf of @kvoli.

/cc @cockroachdb/release


It was possible for a SQL row to be torn across two ranges due to the
load-based splitter not rejecting potentially unsafe split keys. It is
impossible to determine with just the sampled request keys, whether a
key is certainly unsafe or safe, so a split key is returned regardless of error.

This PR side steps this problem by re-using the
adminSplitWithDescriptor command to find the first real key, after or
at the provided args.SplitKey. This ensures that the split key will
always be a real key whilst not requiring any checks in the splitter
itself.

The updated adminSplitWithDescriptor is local only and requires opting
into finding the first safe key by setting findFirstSafeKey to true.

As such, all safe split key checks are also removed from the split
pkg, with a warning added that the any split key returned is unsafe.

Note that the weighted load based split finder, used for CPU splits
did not suffer from returning potentially unsafe splits due to e4f003b.

However, it was possible that no load-based split key was ever found
when using the weighted finder. This was because we discard potentially
unsafe samples, which could have been safe split points.

This PR reverts commit e4f003b, as the
safe split key is enforced elsewhere, mentioned above.

Resolves: #103483
Resolves: #103672

Release note (bug fix): It was possible for a SQL row to be split across
two ranges. When this occurred, SQL queries could return unexpected
errors. This bug is resolved by these changes, as we now sample the real
keys, rather than just request keys to determine load-based split points.


Release justification: Serious bug fix.

@blathers-crl blathers-crl bot requested review from a team as code owners May 25, 2023 07:05
@blathers-crl blathers-crl bot requested review from srosenberg and smg260 and removed request for a team May 25, 2023 07:05
@blathers-crl
Copy link
Author

blathers-crl bot commented May 25, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-103690 branch from dd7da1a to a1a5540 Compare May 25, 2023 07:05
@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels May 25, 2023
@blathers-crl blathers-crl bot requested a review from raggar May 25, 2023 07:05
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@kvoli kvoli force-pushed the blathers/backport-release-23.1-103690 branch from a1a5540 to 1e6cb0a Compare June 5, 2023 16:52
@kvoli
Copy link
Collaborator

kvoli commented Jun 5, 2023

Added in #104082 on-top of the original PR commits. Should I merge into the previous commit
kvserver: avoid load-based splitting between rows or leave as is?

Also rebased on master. I think the patch has sufficiently baked for a backport to 23.1 cc @nvanbenschoten.

@kvoli
Copy link
Collaborator

kvoli commented Jun 5, 2023

Created tracking issue for backports #104353

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I merge into the previous commit
kvserver: avoid load-based splitting between rows or leave as is?

You can leave as is to preserve the commit history.

I think the patch has sufficiently baked for a backport to 23.1

Agreed. LGTM

// 1. Within [startKey,endKey).
// 2. No less than desiredSplitKey.
// 3. Greater than the first key in [startKey,endKey]; or greater than all the
// first row's keys if a table range. .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"range. ."

Feel free to clean this up if you happen to be working in this area on master. No need for the fix to be backported, of course.

kvoli added 4 commits June 5, 2023 19:40
Previously, there was no way to peak the contents of the load based
splitter samples when inspecting nodes. This commit adds string methods
for the `UnweightedFinder`, `WeightedFinder` and `Decider`.

This commit also swaps the order of the should split check to avoid
computation. As a result the output of `cpu_decider_cartesian` changed
slightly as the no split key logging message is now ordered differently.

Informs: #103672
Informs: #103483

Release note: None
It was possible that no load-based split key was ever found when using
the weighted finder. This was because we discard potentially unsafe
samples, which could have been safe split points.

This reverts commit e4f003b.

Resolves: #103672
It was possible for a SQL row to be torn across two ranges due to the
load-based splitter not rejecting potentially unsafe split keys. It is
impossible to determine with keys sampled from response spans, whether a
key is certainly unsafe or safe.

This commit side steps this problem by re-using the
`adminSplitWithDescriptor` command to find the first real key, after or
at the provided `args.SplitKey`. This ensures that the split key will
always be a real key whilst not requiring any checks in the splitter
itself.

The updated `adminSplitWithDescriptor` is local only and requires opting
into finding the first safe key by setting `findFirstSafeKey` to `true`.

As such, all safe split key checks are also removed from the `split`
pkg, with a warning added that the any split key returned is unsafe.

Resolves: #103483

Release note (bug fix): It was possible for a SQL row to be split across
two ranges. When this occurred, SQL queries could return unexpected
errors. This bug is resolved by these changes, as we now inspect the real
keys, rather than just request keys to determine load-based split points.
It was possible that a load based split was suggested for `meta1`, which
would call `MVCCFirstSplitKey` and panic as the `meta1` start key
`\Min` is local, whilst the `meta1` end key is global `0x02 0xff 0xff`.

Add a check that the start key is greater than the `meta1` end key before
processing in `MVCCFirstSplitKey` to prevent the panic.

Note `meta1` would never be split regardless, as
`storage.IsValidSplitKey` would fail after finding a split key.

Also note that if the desired split key is a local key, the same problem
doesn't exist as the minimum split key would be used to seek the first
split key instead.

Fixes: #104007

Release note: None
@kvoli kvoli force-pushed the blathers/backport-release-23.1-103690 branch from 1e6cb0a to 7f93988 Compare June 5, 2023 23:40
@kvoli kvoli merged commit 9f37611 into release-23.1 Jun 6, 2023
@kvoli kvoli deleted the blathers/backport-release-23.1-103690 branch June 6, 2023 01:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants