-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: don't campaign the RHS after splits #9550
storage: don't campaign the RHS after splits #9550
Conversation
LGTM 🐶 |
// Raft processing is initialized lazily; issue a no-op write request to | ||
// ensure that the Raft group has been started. | ||
incArgs := incrementArgs(key2, 0) | ||
if _, err := client.SendWrapped(rg2(store0), nil, &incArgs); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for overall style: pass context.Background()
(I'm assuming this is a context.Context
arg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this was copied from elsewhere in this file. Someone should cleanup all the calls to SendWrapped
:
~/Development/go/src/github.com/cockroachdb/cockroach pmattis/dont-campaign-after-split git grep 'SendWrapped(' | wc -l
214
return | ||
} | ||
} else if len(split.RightDesc.Replicas) == 1 { | ||
if len(split.RightDesc.Replicas) == 1 { | ||
// TODO(peter): In single-node clusters, we enqueue the right-hand side of | ||
// the split (the new range) for Raft processing so that the corresponding | ||
// Raft group is created. This shouldn't be necessary for correctness, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this part still true? Can the test just touch the RHS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still true that TestStatusSummaries
fails without it. And I just tried to fix that test again with only partial success. I really need to look into this further, but not for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #9553 for you.
The campaign has lead to some interference, but it's actually generally useful. The first RPC would otherwise have to wait for double the number of round-trips than usually (one for the election, then one for the actual proposal). I still think this is worth removing for increased clarity to reduce the surface these election checkmates can take up. |
LGTM |
09fd6d8
to
4d8123a
Compare
Remove the hack to campaign the RHS on one and only one of the nodes. The hack is no longer necessary now that we're campaigning idle ranges when the first proposal is received. By removing the hack, we're effectively leaving the RHS of a split as a lazily loaded Raft group. Tweaked Store.Bootstrap so that it allows eager campaigning of idle replicas immediately upon startup. Discovered while investigating performance hiccups in cockroachdb#9465.
LGTM. Was this causing the performance hiccups in #9465 or just something you noticed while investigating how splits work? |
Yes, this was causing performance hiccups in #9465. I definitely saw contested elections due to the RHS of a split campaigning concurrently with a lease transfer. |
Hmm, it seems problematic if the rebalancer is picking up ranges so soon after a split that they haven't completed their initial campaign. Rebalancer-initiated lease transfers could conflict with other elections too. Maybe we should not transfer leases of ranges whose leadership is not determined yet. |
Yeah, that's a good idea. I also saw instances in testing where we added a node for rebalancing, then split the range (which contained 4 nodes) and transferred the lease for the RHS side of the split in order to down-replicate. We don't rebalance ranges that need splitting. Perhaps we shouldn't split ranges that need down-replication. |
I'd rather not introduce more dependencies between queues without stronger justification. I was suggesting something simpler: #9465 tries to transfer the lease whenever it wants to move a range and we are the lease holder. I propose we amend that condition to require that we hold the lease and we know the raft leader (since acquiring a lease requires a raft leader to exist, the main time this is true is immediately after a split, since the RHS lease is pre-seeded from the LHS). If we don't know the raft leader, skip the range and try again after either the leader election has resolved or the lease has expired. |
Fair enough. The two queues with the strongest dependencies are |
Possibly, especially once we have merging (which is going to have to be even more tightly integrated with replication). But I'd be opposed to using that integration to introduce a rule like "don't split a range that needs to be downreplicated". The problem here is not really related to replication, it's just that you only want to transfer leadership when you can be confident that that transfer will be minimally disruptive, which means that the current leadership must be stable. |
Re-reading what I wrote and I see my suggestion was mildly confusing. I wasn't proposing introducing the rule "don't split a range that needs to be down-replicated" in order to solve the problem with lease transfers of ranges without a leader. I was noting that splitting was happening on ranges that needed down-replication which was mildly surprising and perhaps something we should avoid (it is doing work on a replica that will very soon be removed), completely independently of the bad interaction the lease transfer has on the RHS of a newly split (and leader-less) range. |
The test prohibited lease requests for the RHS of the split, but in fact such requests can happen if the test runs for long enough (and the lease installed by the split expires). This was exacerbated by cockroachdb#9550, which for this test (but likely in some generality) means that the RHS of the split does not campaign eagerly and locks up for the duration of a Raft election trigger.
The test prohibited lease requests for the RHS of the split, but in fact such requests can happen if the test runs for long enough (and the lease installed by the split expires). This was exacerbated by cockroachdb#9550, which for this test (but likely in some generality) means that the RHS of the split does not campaign eagerly and locks up for the duration of a Raft election trigger.
The test prohibited lease requests for the RHS of the split, but in fact such requests can happen if the test runs for long enough (and the lease installed by the split expires). This was exacerbated by cockroachdb#9550, which for this test (but likely in some generality) means that the RHS of the split does not campaign eagerly and locks up for the duration of a Raft election trigger.
Extend the change in cockroachdb#9550 to allow eagerly campaigning replicas after startup to include clusters created via TestCluster. Fixes cockroachdb#10160
Extend the change in cockroachdb#9550 to allow eagerly campaigning replicas after startup to include clusters created via TestCluster. Fixes cockroachdb#10160
Remove the hack to campaign the RHS on one and only one of the
nodes. The hack is no longer necessary now that we're campaigning idle
ranges when the first proposal is received. By removing the hack, we're
effectively leaving the RHS of a split as a lazily loaded Raft
group. Tweaked multiTestContext so that it allows eager campaigning of
idle replicas immediately upon startup.
Discovered while investigating performance hiccups in #9465.
This change is