storage: Lease transfer throttling can lead to bad leaseholder balance after restart #19355
Labels
C-performance
Perf of queries or internals. Solution not expected to change functional behavior.
S-1-stability
Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Milestone
Repro steps:
The reason all the leases get acquired by the node that the load is hitting is because we don't eagerly acquire leases for range, we only take them as required by incoming requests. Since all the incoming requests are going to the same node, that node ends up taking all the leases.
We throttle lease transfers to one per second, which means that transferring thousands of leases literally takes hours. Transferring 100k leases would take more than a day. In practice, I've been seeing a steady stream of 0.9 lease transfers per second on my cluster, going from 16k to 10.75k in about 95 minutes.
Overloading all the leaseholder work onto this one node has real repercussions for the performance of the cluster -- my cluster gained a couple hundred qps (~5%) back as the leases have been spreading out.
The right approach here may be to go back to one of @petermattis's recurring suggestions of changing our rate limiting based on how far away we are from the desired state. When one node has significantly more leases than another, don't throttle lease transfers nearly as much as when the nodes are relatively more balanced.
We could also take a simpler approach, and just not rate limit lease transfers for the first N seconds that a node is running. I think the above approach will be beneficial in other scenarios as well, though, like when just one node in the cluster restarts and we'd like to get leases back onto it more quickly.
The text was updated successfully, but these errors were encountered: