-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
kvserver: add setting to use expiration-based leases
This patch adds the experimental cluster setting `kv.expiration_leases_only.enabled`, which uses expiration-based leases for all ranges. The setting is marked as experimental because, while we believe the system will function correctly, it has performance implications that need to be mapped out and optimized. Expiration-based leases are compelling because they are much more robust than epoch-based leases, and better handle failure modes such as partial/asymmetric network partitions, disk stalls, deadlocks, etc. They require a Raft roundtrip to extend a lease, which ensures that the lease must be functional, while epoch leases only require an RPC request to the liveness leaseholder regardless of whether the lease actually works. Except for the meta and liveness ranges, expiration leases are only extended when a request is processed in the last half of the lease interval (i.e. in the last 3 seconds of the 6 second lease). Otherwise, we allow the lease to expire. This reduces the cost of idle ranges in large clusters, since we avoid the periodic lease extension writes for every range, and can let the ranges quiesce as usual. However, it incurs a synchronous lease acquisition on the next request to the range. Because expiration leases incur one Raft write per range per lease extension, as well as a lease acquisition for idle ranges, they currently come with significant performance overhead. In TPC-E experiments at 100.000 customers with various transaction types, p50 latencies increased 5-70%, p99 latencies increased 20-80%, and pMax latencies increased 0-1000%. A kv95 workload on 10.000 ranges with active leases showed a throughput reduction of about 10%, most likely due to the ~3.000 lease extension writes per second. When the setting is changed, leases are asynchronously switched to the appropriate type (either expiration-based or epoch-based) via the replicate queue. This can take up to the `ScanInterval` to complete, 10 minutes by default (more on nodes with large numbers or ranges). Epic: none Release note: None
- Loading branch information
1 parent
a64df6f
commit 04f4284
Showing
7 changed files
with
221 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters