-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-22.2.0: kv: bypass lease transfer safety checks during joint consensus #89621
Merged
nvanbenschoten
merged 3 commits into
cockroachdb:release-22.2.0
from
nvanbenschoten:backport22.2.0-89340
Oct 10, 2022
Merged
release-22.2.0: kv: bypass lease transfer safety checks during joint consensus #89621
nvanbenschoten
merged 3 commits into
cockroachdb:release-22.2.0
from
nvanbenschoten:backport22.2.0-89340
Oct 10, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Also simplify a bit of logic.
Fixes cockroachdb#88667. This commit adds logic to bypass lease transfer safety checks (added in 034611b) when in a joint configuration and transferring the lease from a VOTER_DEMOTING to a VOTER_INCOMING. We do so because we could get stuck without a path to exit the joint configuration if we rejected this lease transfer while waiting to confirm that the target is up-to-date on its log. That confirmation may never arrive if the target is dead or partitioned away, and while we'd rather not transfer the lease to a dead node, at least we have a mechanism to recovery from that state. We also just sent the VOTER_INCOMING a snapshot (as a LEARNER, before promotion), so it is unlikely that the replica is actually dead or behind on its log. A better alternative here would be to introduce a mechanism to choose an alternate lease transfer target after some amount of time, if the lease transfer to the VOTER_INCOMING cannot be confirmed to be safe. We may do this in the future, but given the proximity to the release and given that this matches the behavior in v22.1, we choose this approach for now. Release note: None Release justification: Needed to resolve release blocker.
This commit adds a new test which ensures that the lease transfer performed during a joint config replication change that is replacing the existing leaseholder does not get stuck even if the existing leaseholder cannot prove that the incoming leaseholder is caught up on its log. It does so by killing the incoming leaseholder before it receives the lease and ensuring that the range is able to exit the joint configuration. Currently, the range exits by bypassing safety checks during the lease transfer, sending the lease to the dead incoming voter, letting the lease expire, acquiring the lease on one of the non-demoting voters, and exiting. The details here may change in the future, but the goal of this test will not.
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
shralex
approved these changes
Oct 9, 2022
This was referenced Oct 10, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 3/3 commits from #89340.
/cc @cockroachdb/release
This commit adds logic to bypass lease transfer safety checks (added in 034611b) when in a joint configuration and transferring the lease from a VOTER_DEMOTING to a VOTER_INCOMING. We do so because we could get stuck without a path to exit the joint configuration if we rejected this lease transfer while waiting to confirm that the target is up-to-date on its log. That confirmation may never arrive if the target is dead or partitioned away, and while we'd rather not transfer the lease to a dead node, at least we have a mechanism to recovery from that state. We also just sent the VOTER_INCOMING a snapshot (as a LEARNER, before promotion), so it is unlikely that the replica is actually dead or behind on its log.
A better alternative here would be to introduce a mechanism to choose an alternate lease transfer target after some amount of time, if the lease transfer to the VOTER_INCOMING cannot be confirmed to be safe. We may do this in the future, but given the proximity to the release and given that this matches the behavior in v22.1, we choose this approach for now.
Release note: None
Release justification: Needed to resolve release blocker.
Fixes: #88667
See also #89564