-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed #47471
Comments
|
It's interesting that 184 is exactly twice 92. I wonder if we're doing something dumb here. |
Oh, I think I might see what's going on here. cc. @tbg, as this class of bug might prove to be an interesting thread to pull on in #46652. |
47492: kv: respect exhausted key limit during ranged intent resolution r=nvanbenschoten a=nvanbenschoten Fixes #47471. Fixes #40935. This commit fixes a long-standing bug where ranged intent resolution would not respect the MaxSpanRequestKeys set on a batch once the limit had already been exhausted by other requests in the same batch. Instead of treating the limit as exhausted, ranged intent resolution would consider the limit nonexistent (unbounded). This bug was triggering an assertion in DistSender. We became more likely to hit this issue in v20.1 because we started performing ranged intent resolution more often due to implicit SELECT FOR UPDATE. This commit fixes the bug in two ways: 1. it addresses the root cause, updating MVCCResolveWriteIntentRangeUsingIter to properly respect the limit placed on the request when it is exhauted. 2. it disables the assertion in DistSender when it detects that we are hitting this bug. This ensures that we don't hit the assertion in mixed version clusters (see #40935). By the time we're in DistSender, the damage is already done and has already potentially resulted in a large Raft entry. Maintaining the assertion doesn't do us any good. Release notes (bug fix): a bug that could could trigger an assertion with the text "received X results, limit was Y" has been fixed. The underlying bug was only performance related and could not cause user-visible correctness violations. Release justification: fixes a medium-priority bug in existing functionality. The bug could result in an assertion failure and a node crashing. Even though this was an old bug (present in many releases before v20.1), it became a lot easier to hit in v20.1 because we started performing ranged intent resolution more often due to implicit SELECT FOR UPDATE. Co-authored-by: Nathan VanBenschoten <[email protected]>
Fixes cockroachdb#47471. Fixes cockroachdb#40935. This commit fixes a long-standing bug where ranged intent resolution would not respect the MaxSpanRequestKeys set on a batch once the limit had already been exhausted by other requests in the same batch. Instead of treating the limit as exhausted, ranged intent resolution would consider the limit nonexistent (unbounded). This bug was triggering an assertion in DistSender. We became more likely to hit this issue in v20.1 because we started performing ranged intent resolution more often due to implicit SELECT FOR UPDATE. This commit fixes the bug in two ways: 1. it addresses the root cause, updating MVCCResolveWriteIntentRangeUsingIter to properly respect the limit placed on the request when it is exhauted. 2. it disables the assertion in DistSender when it detects that we are hitting this bug. This ensures that we don't hit the assertion in mixed version clusters (see cockroachdb#40935). By the time we're in DistSender, the damage is already done and has already potentially resulted in a large Raft entry. Maintaining the assertion doesn't do us any good. Release notes (bug fix): a bug that could could trigger an assertion with the text "received X results, limit was Y" has been fixed. The underlying bug was only performance related and could not cause user-visible correctness violations. Release justification: fixes a medium-priority bug in existing functionality. The bug could result in an assertion failure and a node crashing. Even though this was an old bug (present in many releases before v20.1), it became a lot easier to hit in v20.1 because we started performing ranged intent resolution more often due to implicit SELECT FOR UPDATE.
(roachtest).tpccbench/nodes=9/cpu=4/chaos/partition failed on master@29c0efdcc5edb5d100449a093b25df107f1df2d6:
More
Artifacts: /tpccbench/nodes=9/cpu=4/chaos/partition
Related:
roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed #47317 roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed C-test-failure O-roachtest O-robot branch-provisional_202004061746_v19.2.6 release-blocker
roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed #45722 roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed C-test-failure O-roachtest O-robot branch-release-19.1 release-blocker
roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed #44368 roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed C-test-failure O-roachtest O-robot branch-release-19.2 release-blocker
See this test on roachdash
powered by pkg/cmd/internal/issues
The text was updated successfully, but these errors were encountered: