-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-21.2: kvclient: ignore stale lease information from lagging replicas #88738
release-21.2: kvclient: ignore stale lease information from lagging replicas #88738
Conversation
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
9716eda
to
ec70797
Compare
This commit makes it such that the `DistSender`'s range descriptor cache doesn't trigger a cache eviction based on incompatible lease information in a `NotLeaseHolderError` when it is coming from a replica that has a stale view of the range's descriptor (characterized by an older `DescriptorGeneration` on the replica) Not doing so before was hazardous because, if we received an NLHE that pointed to a replica that did not belong in the cached descriptor, we'd trigger a cache evicion. This assumed that the replica returning the error had a fresher view of the range than what we had in the cache, which is not always true. This meant that we'd keep doing range lookups and subsequent evictions until this lagging replica caught up to the current state of the range. Release note (bug fix): A bug that caused high SQL tail latencies during background rebalancing in the cluster has been fixed.
ec70797
to
f891489
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1, 5 of 5 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)
We recently changed a NLHE to carry the range descriptor generation to ensure to avoid thrashing the range cache if the replica had a stale view of the range. In cockroachdb#75742, we saw issues caused by the dist sender having a stale range descriptor. This patch switches from sending just the descriptor generation back in NLHE to shipping back the entire range descriptor. In the future, we may want to solve the issue above by updating the range cache with the fresher range descriptor thus skipping a cache eviction and range descriptor lookup. References cockroachdb#75742 Release note: None
f891489
to
95c9392
Compare
Backport:
Please see individual PRs for details.
Release justification: bug fix / stability improvement.
/cc @cockroachdb/release