-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: underreplicated ranges block lease transfers #106102
Comments
Lease preference satisfaction needs better test coverage and observability. A lease transfer is not considered by the replicate queue (upreplication, removal, etc) where the leaseholder isn't removed. This state persists until the f2c3e12 A fix could Edit after some git history digging. Footnotes
|
I tested out the repro on
There was no regression. Non-snapshot errors on any In cases where the first The situation isn't that hard to get out of currently, it just requires unblocking the cause of the errors; which are usually lack of available targets, e.g., the dead node was the only valid replica target for some (voter)constraint disjunction. Footnotes
|
This is now resolved via the lease queue #119155. |
See the lease preference setup in #106100 (comment), where we have 5 nodes with RF=5 across 3 racks, with lease preferences set to rack=0 (2 nodes). When we kill a node with rack=0, leases are randomly scattered across all surviving replicas. The replicate queue will (slowly) begin to move leases back to the preferred region.
However, as soon as the killed node is marked as dead, the replicate queue stops moving leases to the preferred region. It's erroring out on upreplication, since there are no stores that can take the replica, but this appears to also short-circuit the lease transfers, so the leases are now permanently stuck in a non-preferred region.
Jira issue: CRDB-29402
The text was updated successfully, but these errors were encountered: