-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: transfer leases on overfull stores #9465
storage: transfer leases on overfull stores #9465
Conversation
@bdarnell I recall you had concerns about using Cc @cockroachdb/stability |
At least #7996 needs fixing. |
From #6929 (comment):
|
@@ -162,17 +165,31 @@ func (rq *replicateQueue) process( | |||
log.Event(ctx, "removing a replica") | |||
// We require the lease in order to process replicas, so | |||
// repl.store.StoreID() corresponds to the lease-holder's store ID. | |||
removeReplica, err := rq.allocator.RemoveTarget(desc.Replicas, repl.store.StoreID()) | |||
removeReplica, err := rq.allocator.RemoveTarget(desc.Replicas, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only non-test user of the second argument to RemoveArgument, so if we do this, we could remove that argument completely. However, as I said in #9462, we need to have a strong preference for rebalancing ranges for which we don't have the lease, or we reintroduce #5737 and the associated availability problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'll see the problem from #5737 as we'll never remove the lease holder. Instead, if the allocator wants to remove the lease holder we'll first transfer the lease.
Let me think about whether we could make the preference for removing a non-lease holder stronger.
if removeReplica.StoreID == repl.store.StoreID() { | ||
return nil | ||
var targetID roachpb.StoreID | ||
for _, replica := range desc.Replicas { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choosing the first non-self replica as the target seems problematic. That store might be overloaded too, and could lead to ping-ponging between the first two stores in the list. We should at least randomize the choice, and ideally choose the most under-loaded one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was too naive. Finding the lease loaded store from the existing replicas (excluding the lease holder) is straightforward, though I'm unable to upload the change right now.
Reviewed 1 of 1 files at r1. storage/replicate_queue.go, line 181 at r1 (raw file):
incorrect number of arguments storage/replicate_queue.go, line 183 at r1 (raw file):
incorrect number of arguments Comments from Reviewable |
You should rebase this to pick up #9442. |
08b852e
to
5973a12
Compare
Review status: 0 of 5 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. storage/replicate_queue.go, line 168 at r1 (raw file):
|
Reviewed 5 of 5 files at r2. storage/allocator.go, line 327 at r2 (raw file):
what's the point of tracking storage/client_split_test.go, line 700 at r2 (raw file):
how come these were needed? storage/replicate_queue.go, line 177 at r2 (raw file):
extremely minor nit: you can blindly return Comments from Reviewable |
5973a12
to
0431f13
Compare
Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed. storage/allocator.go, line 327 at r2 (raw file):
|
Reviewed 2 of 2 files at r3. Comments from Reviewable |
Between the production use of TransferLease and the possibility for new unavailability and thrashing, this should probably be post-code-yellow. Reviewed 3 of 5 files at r2, 2 of 2 files at r3. storage/allocator.go, line 330 at r3 (raw file):
A store could be overfull compared to the list used in TransferLeaseSource and still be underfull compared to the replicas of this particular range. If storage/allocator_test.go, line 751 at r3 (raw file):
s/lease loaded/least loaded/ (and below) storage/replicate_queue.go, line 168 at r1 (raw file):
|
0431f13
to
64e17b7
Compare
I guess I feel more aggressive about getting this in. Note that the recent restart of gamma more than doubled throughput once we're using more than just a single node to process traffic. @tschottdorf and @andreimatei have a bit of work to do to productionize lease transfer. I'm not going to merge this PR until that is done. Review status: all files reviewed at latest revision, 4 unresolved discussions, all commit checks successful. storage/allocator.go, line 330 at r3 (raw file):
|
Reviewed 1 of 1 files at r4. Comments from Reviewable |
I think I had an incomplete understanding of the goal here - I thought it was to unblock the rebalance operations to get replicas more evenly distributed, but we didn't really care about balancing leadership except as necessary to unblock the rebalancing. If we want to balance leadership (and see it improving performance), then it does seem more important, although I no longer think the replicate queue is the right place to do this (unless we go all the way to giving the replicate queue multiple target metrics to optimize). Instead, we could make balancer-like decisions in Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed. storage/allocator.go, line 330 at r3 (raw file):
|
64e17b7
to
511c106
Compare
Hmm, let me take a deeper look at your suggestion. This PR started with the goal of unblocking rebalancing. The effect on performance was not considered (though it could have been predicted). Review status: 4 of 6 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. storage/replicate_queue.go, line 168 at r1 (raw file):
|
Bouncing lease renewals in Review status: 4 of 6 files reviewed at latest revision, 3 unresolved discussions, some commit checks pending. Comments from Reviewable |
511c106
to
008d728
Compare
If we're acquiring a fresh lease in |
Ok, the actual mechanism of transferring leases in A smaller practical problem with adding lease holder rebalancing to Lastly, we seem to have some sort of performance hiccup when transferring leases. With the current state of this PR I sometimes see 5-6sec query times that correspond with a lease being transferred. These are very reliably reproduced in a 2-3min |
I'd like to add my 2c here: it seems to me that, if the replicate_queue decides that it wants to rebalance a range for optimizing whatever metric it's trying to optimize, it should always have the ability to do so. Relying on the Also, dist-sql is hoping to be able to control who becomes lease holders for ranges that don't have an active leases by just sending requests to them. If |
@petermattis I think for balancing lease ownership (whether we do it when acquiring the lease or in the rebalance queue) we just have to allow a large enough deviation from the mean that we won't thrash too much in the interval between store capacity gossips. We could also use some very crude heuristics to avoid the "unable to rebalance anything at all" case: refuse to acquire or renew any more leases when we hold the lease on 80% of our replicas. This doesn't require any cross-node communication. @andreimatei I don't see how distsql would ever be able to completely control lease placement - what if two distsql transactions have different plans? Please leave final decisions about replica and lease placement to the lower levels and limit distsql's involvement to providing hints when the lease is idle. |
Using redirectOnOrAcquireLeaderLease for this purpose right now would On Sunday, September 25, 2016, Ben Darnell [email protected] wrote:
|
008d728
to
0356ff3
Compare
@petermattis you said you sometimes see a performance hiccup after a LeaseTransfer. Maybe it's #8816 |
Yes, perhaps. My hiccups occur right after transferring a lease. We seem to be reproposing raft commands but not making progress because there is no raft leader. Its not clear how we got into that state. The one instance I've looked at so far shows that we transferred the lease for the RHS of a split immediately after the split. I'm wondering if the campaigning after a split is interacting with the raft leadership transfer and a lease transfer. I'm adding some more debug logging and should know more soon. |
Note that if #8816 is indeed the problem, I have a PR out to improve the state of things. It was waiting for the code to decolor. |
@andreimatei where is this PR you speak of? |
#8837, linked from the bug. |
Remove the hack to campaign the RHS on one and only one of the nodes. The hack is no longer necessary now that we're campaigning idle ranges when the first proposal is received. By removing the hack, we're effectively leaving the RHS of a split as a lazily loaded Raft group. Tweaked multiTestContext so that it allows eager campaigning of idle replicas immediately upon startup. Discovered while investigating performance hiccups in cockroachdb#9465.
Remove the hack to campaign the RHS on one and only one of the nodes. The hack is no longer necessary now that we're campaigning idle ranges when the first proposal is received. By removing the hack, we're effectively leaving the RHS of a split as a lazily loaded Raft group. Tweaked Store.Bootstrap so that it allows eager campaigning of idle replicas immediately upon startup. Discovered while investigating performance hiccups in cockroachdb#9465.
Assigned |
If the remove-target is the lease holder, transfer the lease to another store to allow removal of the replica from the overfull store. Fixes cockroachdb#9462.
9b0203d
to
813bace
Compare
Closing this PR. I'll be pulling out pieces into separate PRs and more closely following the approach outlined in #10262. |
If the remove-target is the lease holder, transfer the lease to another
store to allow removal of the replica from the overfull store.
Fixes #9462.
This change is