Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Lease transfer throttling can lead to bad leaseholder balance after restart #19355

Closed
a-robinson opened this issue Oct 18, 2017 · 3 comments
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Milestone

Comments

@a-robinson
Copy link
Contributor

Repro steps:

  1. Build up a 3 node cluster with a lot (at least thousands) of ranges (as in Severe performance degradation when shutting one of 3 nodes down. #19236).
  2. Stop generating load
  3. Restart all nodes
  4. Direct uniformly distributed load at a single node
  5. Watch as all the leases get acquired by that one node and take hours to distribute more evenly

The reason all the leases get acquired by the node that the load is hitting is because we don't eagerly acquire leases for range, we only take them as required by incoming requests. Since all the incoming requests are going to the same node, that node ends up taking all the leases.

We throttle lease transfers to one per second, which means that transferring thousands of leases literally takes hours. Transferring 100k leases would take more than a day. In practice, I've been seeing a steady stream of 0.9 lease transfers per second on my cluster, going from 16k to 10.75k in about 95 minutes.

Overloading all the leaseholder work onto this one node has real repercussions for the performance of the cluster -- my cluster gained a couple hundred qps (~5%) back as the leases have been spreading out.

The right approach here may be to go back to one of @petermattis's recurring suggestions of changing our rate limiting based on how far away we are from the desired state. When one node has significantly more leases than another, don't throttle lease transfers nearly as much as when the nodes are relatively more balanced.

We could also take a simpler approach, and just not rate limit lease transfers for the first N seconds that a node is running. I think the above approach will be beneficial in other scenarios as well, though, like when just one node in the cluster restarts and we'd like to get leases back onto it more quickly.

@a-robinson a-robinson added C-performance Perf of queries or internals. Solution not expected to change functional behavior. S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting labels Oct 18, 2017
@tbg tbg added the A-kv-distribution Relating to rebalancing and leasing. label May 15, 2018
@tbg tbg added this to the 2.1 milestone Jul 22, 2018
@tbg tbg added A-coreperf and removed A-disaster-recovery A-kv-transactions Relating to MVCC and the transactional model. A-kv-distribution Relating to rebalancing and leasing. A-kv-client Relating to the KV client and the KV interface. A-storage Relating to our storage engine (Pebble) on-disk storage. A-kv-replication Relating to Raft, consensus, and coordination. labels Jul 31, 2018
@nvanbenschoten
Copy link
Member

@a-robinson mind updating this issue and mentioning how load-based lease rebalancing plays into this? Is that able to rebalance leases at a higher frequency than once per second? Should it be able to if it notices large load imbalances?

I assume there's nothing more to do here in 2.1, right?

@a-robinson
Copy link
Contributor Author

You're right that load-based lease rebalancing does not obey the 1 second throttling, and in cases like this there's a good chance it'll solve the problem for us. I'll try repro'ing.

@a-robinson
Copy link
Contributor Author

Well look at that:

screen shot 2018-10-01 at 1 06 41 pm

screen shot 2018-10-01 at 1 06 45 pm

The green spikes in the bottom graph are load-based lease transfers.

I think we can consider this complete. Without much load on the system, the leases may take longer to balance, but it also probably won't be affecting performance in most such cases.

Fixed by #28340

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-performance Perf of queries or internals. Solution not expected to change functional behavior. S-1-stability Severe stability issues that can be fixed by upgrading, but usually don’t resolve by restarting
Projects
None yet
Development

No branches or pull requests

3 participants