storage: work around can't-swap-leaseholder #40363

tbg · 2019-08-30T07:56:39Z

As of #40284, the replicate queue was issuing swaps (atomic add+remove)
during rebalancing. TestInitialPartitioning helpfully points out (once you
flip atomic rebalancing on) that when the replication factor is one, there
is no way to perform such an atomic swap because it will necessarily have
to remove the leaseholder.

To work around this restriction (which, by the way, we dislike - see
#40333), fall back to just adding a replica in this case without also
removing one. In the next scanner cycle (which should happen immediately
since we requeue the range) the range will be over-replicated and hopefully
the lease will be transferred over and then the original leaseholder
removed. I would be very doubtful that this all works, but it is how things
worked until #40284, so this PR really just falls back to the previous
behavior in cases where we can't do better.

Release note: None

cockroach-teamcity · 2019-08-30T07:56:46Z

This change is

nvanbenschoten

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

pkg/storage/replicate_queue.go, line 515 at r3 (raw file):

// likely to be the leaseholder), then this removal would fail. Instead, this
// method will attempt to transfer the lease away, and returns true to indicate
// to the caller that it should not pursue the current replication change further.

"because it is no longer the leaseholder"

pkg/storage/replicate_queue.go, line 739 at r3 (raw file):

				// only, which should succeed, and the next time we touch this
				// range, we will have one more replica and hopefully it will
				// take the lease and remove the current leaseholder.

I'm surprised that this case doesn't hit an error when it calls maybeTransferLeaseAway. Could you mention what we expect to happen when you call that?

There may be nothing to roll back, so don't log unconditionally. Release note: None

This was showing up a lot in TestInitialPartitioning. If we're trying to remove something but nothing needs to be removed, that seems OK (though there is some question of why we're hitting this regularly). Release note: None

As of cockroachdb#40284, the replicate queue was issuing swaps (atomic add+remove) during rebalancing. TestInitialPartitioning helpfully points out (once you flip atomic rebalancing on) that when the replication factor is one, there is no way to perform such an atomic swap because it will necessarily have to remove the leaseholder. To work around this restriction (which, by the way, we dislike - see \cockroachdb#40333), fall back to just adding a replica in this case without also removing one. In the next scanner cycle (which should happen immediately since we requeue the range) the range will be over-replicated and hopefully the lease will be transferred over and then the original leaseholder removed. I would be very doubtful that this all works, but it is how things worked until cockroachdb#40284, so this PR really just falls back to the previous behavior in cases where we can't do better. Release note: None

tbg

TFTR!

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

tbg · 2019-09-03T17:03:57Z

bors r=nvanbenschoten

40363: storage: work around can't-swap-leaseholder r=nvanbenschoten a=tbg As of #40284, the replicate queue was issuing swaps (atomic add+remove) during rebalancing. TestInitialPartitioning helpfully points out (once you flip atomic rebalancing on) that when the replication factor is one, there is no way to perform such an atomic swap because it will necessarily have to remove the leaseholder. To work around this restriction (which, by the way, we dislike - see \#40333), fall back to just adding a replica in this case without also removing one. In the next scanner cycle (which should happen immediately since we requeue the range) the range will be over-replicated and hopefully the lease will be transferred over and then the original leaseholder removed. I would be very doubtful that this all works, but it is how things worked until #40284, so this PR really just falls back to the previous behavior in cases where we can't do better. Release note: None Co-authored-by: Tobias Schottdorf <[email protected]>

craig · 2019-09-03T17:49:05Z

Build succeeded

GitHub CI (Cockroach)

40370: storage: prepare for kv.atomic_replication_changes=true r=nvanbenschoten a=tbg First three commits are #40363. ---- This PR enables atomic replication changes by default. But most of it is just dealing with the fallout of doing so: 1. we don't handle removal of multiple learners well at the moment. This will be fixed more holistically in #40268, but it's not worth waiting for that because it's easy for us to just avoid the problem. 2. tests that carry out splits become quite flaky because at the beginning of a split, we transition out of a joint config if we see one, and due to the initial upreplication we often do. If we lose the race against the replicate queue, the split catches an error for no good reason. I took this as an opportunity to refactor the descriptor comparisons and to make this specific case a noop, but making it easier to avoid this general class of conflict where it's avoidable in the future. There are probably some more problems that will only become apparent over time, but it's quite simple to turn the cluster setting off again and to patch things up if we do. Release note (general change): atomic replication changes are now enabled by default. Co-authored-by: Tobias Schottdorf <[email protected]>

tbg requested a review from nvanbenschoten August 30, 2019 07:56

tbg force-pushed the fix/removeself branch from 2a14e23 to f106893 Compare August 30, 2019 09:10

tbg mentioned this pull request Aug 30, 2019

storage: prepare for kv.atomic_replication_changes=true #40370

Merged

nvanbenschoten approved these changes Sep 3, 2019

View reviewed changes

tbg added 3 commits September 3, 2019 16:13

storage: remove errant log message

9c6eca7

There may be nothing to roll back, so don't log unconditionally. Release note: None

storage: make an error benign

0e3f046

This was showing up a lot in TestInitialPartitioning. If we're trying to remove something but nothing needs to be removed, that seems OK (though there is some question of why we're hitting this regularly). Release note: None

tbg force-pushed the fix/removeself branch from f106893 to 3686af8 Compare September 3, 2019 14:13

tbg requested a review from nvanbenschoten September 3, 2019 14:14

tbg commented Sep 3, 2019

View reviewed changes

craig bot merged commit 3686af8 into cockroachdb:master Sep 3, 2019

ajwerner mentioned this pull request Sep 10, 2019

roachtest: import/tpcc/warehouses=1000/nodes=32 failed #39072

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: work around can't-swap-leaseholder #40363

storage: work around can't-swap-leaseholder #40363

tbg commented Aug 30, 2019

cockroach-teamcity commented Aug 30, 2019

nvanbenschoten left a comment

tbg left a comment

tbg commented Sep 3, 2019

craig bot commented Sep 3, 2019

storage: work around can't-swap-leaseholder #40363

storage: work around can't-swap-leaseholder #40363

Conversation

tbg commented Aug 30, 2019

cockroach-teamcity commented Aug 30, 2019

nvanbenschoten left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

tbg commented Sep 3, 2019

craig bot commented Sep 3, 2019

Build succeeded