release-22.2.0: kv: bypass lease transfer safety checks during joint consensus #89621

nvanbenschoten · 2022-10-09T19:27:27Z

Backport 3/3 commits from #89340.

/cc @cockroachdb/release

This commit adds logic to bypass lease transfer safety checks (added in 034611b) when in a joint configuration and transferring the lease from a VOTER_DEMOTING to a VOTER_INCOMING. We do so because we could get stuck without a path to exit the joint configuration if we rejected this lease transfer while waiting to confirm that the target is up-to-date on its log. That confirmation may never arrive if the target is dead or partitioned away, and while we'd rather not transfer the lease to a dead node, at least we have a mechanism to recovery from that state. We also just sent the VOTER_INCOMING a snapshot (as a LEARNER, before promotion), so it is unlikely that the replica is actually dead or behind on its log.

A better alternative here would be to introduce a mechanism to choose an alternate lease transfer target after some amount of time, if the lease transfer to the VOTER_INCOMING cannot be confirmed to be safe. We may do this in the future, but given the proximity to the release and given that this matches the behavior in v22.1, we choose this approach for now.

Release note: None

Release justification: Needed to resolve release blocker.

Fixes: #88667
See also #89564

Also simplify a bit of logic.

Fixes cockroachdb#88667. This commit adds logic to bypass lease transfer safety checks (added in 034611b) when in a joint configuration and transferring the lease from a VOTER_DEMOTING to a VOTER_INCOMING. We do so because we could get stuck without a path to exit the joint configuration if we rejected this lease transfer while waiting to confirm that the target is up-to-date on its log. That confirmation may never arrive if the target is dead or partitioned away, and while we'd rather not transfer the lease to a dead node, at least we have a mechanism to recovery from that state. We also just sent the VOTER_INCOMING a snapshot (as a LEARNER, before promotion), so it is unlikely that the replica is actually dead or behind on its log. A better alternative here would be to introduce a mechanism to choose an alternate lease transfer target after some amount of time, if the lease transfer to the VOTER_INCOMING cannot be confirmed to be safe. We may do this in the future, but given the proximity to the release and given that this matches the behavior in v22.1, we choose this approach for now. Release note: None Release justification: Needed to resolve release blocker.

This commit adds a new test which ensures that the lease transfer performed during a joint config replication change that is replacing the existing leaseholder does not get stuck even if the existing leaseholder cannot prove that the incoming leaseholder is caught up on its log. It does so by killing the incoming leaseholder before it receives the lease and ensuring that the range is able to exit the joint configuration. Currently, the range exits by bypassing safety checks during the lease transfer, sending the lease to the dead incoming voter, letting the lease expire, acquiring the lease on one of the non-demoting voters, and exiting. The details here may change in the future, but the goal of this test will not.

blathers-crl · 2022-10-09T19:27:30Z

cockroach-teamcity · 2022-10-09T19:27:39Z

This change is

nvanbenschoten added 3 commits October 9, 2022 12:26

kv: improve logging around maybeTransferLeaseDuringLeaveJoint

fc78a8a

Also simplify a bit of logic.

nvanbenschoten requested a review from shralex October 9, 2022 19:27

nvanbenschoten requested a review from a team as a code owner October 9, 2022 19:27

shralex approved these changes Oct 9, 2022

View reviewed changes

nvanbenschoten merged commit 6ba7a6b into cockroachdb:release-22.2.0 Oct 10, 2022

This was referenced Oct 10, 2022

roachtest: import/nodeShutdown/coordinator failed #89626

Closed

roachtest: restore/nodeShutdown/worker failed #88469

Closed

roachtest: restore/nodeShutdown/coordinator failed #85879

Closed

nvanbenschoten deleted the backport22.2.0-89340 branch October 17, 2022 18:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-22.2.0: kv: bypass lease transfer safety checks during joint consensus #89621

release-22.2.0: kv: bypass lease transfer safety checks during joint consensus #89621

nvanbenschoten commented Oct 9, 2022

blathers-crl bot commented Oct 9, 2022 •

edited by nvanbenschoten

Loading

cockroach-teamcity commented Oct 9, 2022

release-22.2.0: kv: bypass lease transfer safety checks during joint consensus #89621

release-22.2.0: kv: bypass lease transfer safety checks during joint consensus #89621

Conversation

nvanbenschoten commented Oct 9, 2022

blathers-crl bot commented Oct 9, 2022 • edited by nvanbenschoten Loading

cockroach-teamcity commented Oct 9, 2022

blathers-crl bot commented Oct 9, 2022 •

edited by nvanbenschoten

Loading