Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: respect exhausted key limit during ranged intent resolution #47492

Merged

Conversation

nvanbenschoten
Copy link
Member

Fixes #47471.
Fixes #40935.

This commit fixes a long-standing bug where ranged intent resolution would not respect the MaxSpanRequestKeys set on a batch once the limit had already been exhausted by other requests in the same batch. Instead of treating the limit as exhausted, ranged intent resolution would consider the limit nonexistent (unbounded). This bug was triggering an assertion in DistSender. We became more likely to hit this issue in v20.1 because we started performing ranged intent resolution more often due to implicit SELECT FOR UPDATE.

This commit fixes the bug in two ways:

  1. it addresses the root cause, updating MVCCResolveWriteIntentRangeUsingIter to properly respect the limit placed on the request when it is exhauted.
  2. it disables the assertion in DistSender when it detects that we are hitting this bug. This ensures that we don't hit the assertion in mixed version clusters (see roachtest: schemachange/mixed/tpcc failed #40935). By the time we're in DistSender, the damage is already done and has already potentially resulted in a large Raft entry. Maintaining the assertion doesn't do us any good.

Release notes (bug fix): a bug that could could trigger an assertion with the text "received X results, limit was Y" has been fixed. The underlying bug was only performance related and could not cause user-visible correctness violations.

Release justification: fixes a medium-priority bug in existing functionality. The bug could result in an assertion failure and a node crashing. Even though this was an old bug (present in many releases before v20.1), it became a lot easier to hit in v20.1 because we started performing ranged intent resolution more often due to implicit SELECT FOR UPDATE.

Fixes cockroachdb#47471.
Fixes cockroachdb#40935.

This commit fixes a long-standing bug where ranged intent resolution
would not respect the MaxSpanRequestKeys set on a batch once the limit
had already been exhausted by other requests in the same batch. Instead
of treating the limit as exhausted, ranged intent resolution would
consider the limit nonexistent (unbounded). This bug was triggering an
assertion in DistSender. We became more likely to hit this issue in
v20.1 because we started performing ranged intent resolution more often
due to implicit SELECT FOR UPDATE.

This commit fixes the bug in two ways:
1. it addresses the root cause, updating MVCCResolveWriteIntentRangeUsingIter
  to properly respect the limit placed on the request when it is exhauted.
2. it disables the assertion in DistSender when it detects that we are hitting
  this bug. This ensures that we don't hit the assertion in mixed version
  clusters (see cockroachdb#40935). By the time we're in DistSender, the damage is
  already done and has already potentially resulted in a large Raft entry.
  Maintaining the assertion doesn't do us any good.

Release notes (bug fix): a bug that could could trigger an assertion
with the text "received X results, limit was Y" has been fixed. The
underlying bug was only performance related and could not cause
user-visible correctness violations.

Release justification: fixes a medium-priority bug in existing
functionality. The bug could result in an assertion failure and a node
crashing. Even though this was an old bug (present in many releases
before v20.1), it became a lot easier to hit in v20.1 because we started
performing ranged intent resolution more often due to implicit SELECT
FOR UPDATE.
@nvanbenschoten nvanbenschoten added the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Apr 14, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@blathers-crl
Copy link

blathers-crl bot commented Apr 14, 2020

❌ The GitHub CI (Cockroach) build has failed on 1ab9eca7.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

@nvanbenschoten
Copy link
Member Author

TFTR!

I'm going to remove the new assertion from the backport because I don't want to risk having gotten than wrong at this stage in the game.

bors r+

@craig
Copy link
Contributor

craig bot commented Apr 14, 2020

Build succeeded

@craig craig bot merged commit 18b0247 into cockroachdb:master Apr 14, 2020
@nvanbenschoten nvanbenschoten deleted the nvanbenschoten/resIntentRangeLimit branch April 15, 2020 02:48
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Aug 4, 2020
…tSender

This hack was added in cockroachdb#47492 to work around the bug that it was fixing
in mixed-version clusters.

Addresses a TODO.
craig bot pushed a commit that referenced this pull request Aug 7, 2020
52364: keys: remove TODO on SystemConfigSpan r=nvanbenschoten a=nvanbenschoten

This was addressed by #51379.

52365: kv: enable MaxSpanRequestKeys assertion for ResolveIntentRange in DistSender r=nvanbenschoten a=nvanbenschoten

This hack was added in #47492 to work around the bug that it was fixing in mixed-version clusters.

Addresses a TODO.

Co-authored-by: Nathan VanBenschoten <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed roachtest: schemachange/mixed/tpcc failed
3 participants