-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backport-2.1: kv: try next replica on RangeNotFoundError #31250
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bdarnell
approved these changes
Oct 11, 2018
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r1, 8 of 10 files at r3, 1 of 3 files at r4, 6 of 6 files at r5.
Reviewable status: complete! 0 of 0 LGTMs obtained
tbg
force-pushed
the
backport2.1-31013
branch
3 times, most recently
from
October 11, 2018 20:41
78c9981
to
2ebe157
Compare
This was hiding the output if the invocation itself failed, which is when you wanted it most. Release note: None
Release note: None
Release note: None
Previously, if a Batch RPC came back with a RangeNotFoundError, we would immediately stop trying to send to more replicas, evict the range descriptor, and start a new attempt after a back-off. This new attempt could end up using the same replica, so if the RangeNotFoundError persisted for some amount of time, so would the unsuccessful retries for requests to it as DistSender doesn't aggressively shuffle the replicas. It turns out that there are such situations, and the election-after-restart roachtest spuriously hit one of them: 1. new replica receives a preemptive snapshot and the ConfChange 2. cluster restarts 3. now the new replica is in this state until the range wakes up, which may not happen for some time. 4. the first request to the range runs into the above problem @nvanbenschoten: I think there is an issue to be filed about the tendency of DistSender to get stuck in unfortunate configurations. Fixes cockroachdb#30613. Release note (bug fix): Avoid repeatedly trying a replica that was found to be in the process of being added.
Whenever a successful response is received from an RPC that we know has to contact the leaseholder to succeed, update the leaseholder cache. The immediate motivation for this is to be able to land the preceding commits, which greatly exacerbated (as in, added a much faster failure mode to) ``` make stress PKG=./pkg/sql/logictest TESTS=TestPlannerLogic/5node-dist/distsql_interleaved_join ``` However, the change is one we've wanted to make for a while; our caching and in particular the eviction of leaseholders has been deficient essentially ever since it was first introduced. Touches cockroachdb#31068. Release note: None
tbg
force-pushed
the
backport2.1-31013
branch
from
October 12, 2018 19:39
2ebe157
to
fdca0bd
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 5/5 commits from #31013.
/cc @cockroachdb/release
Previously, if a Batch RPC came back with a RangeNotFoundError, we would
immediately stop trying to send to more replicas, evict the range
descriptor, and start a new attempt after a back-off.
This new attempt could end up using the same replica, so if the
RangeNotFoundError persisted for some amount of time, so would the
unsuccessful retries for requests to it as DistSender doesn't aggressively
shuffle the replicas.
It turns out that there are such situations, and the election-after-restart
roachtest spuriously hit one of them:
up, which may not happen for some time. 4. the first request to the range
runs into the above problem
@nvanbenschoten: I think there is an issue to be filed about the tendency
of DistSender to get stuck in unfortunate configurations.
Fixes #30613.
Release note (bug fix): Avoid repeatedly trying a replica that was found to
be in the process of being added.