Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.2.0: kvserver: don't allow VOTER_DEMOTING to acquire lease after transfer #89595

Merged
merged 1 commit into from
Oct 9, 2022

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented Oct 8, 2022

Backport 1/1 commits from #89564 on behalf of @shralex.

/cc @cockroachdb/release


This PR restricts the case when a VOTER_DEMOTING_LEARNER can aquire the lease in a joint configuration to only the case where it was the last leaseholder. Since it is being removed, we only want it to get the lease if no other replica can aquire it, the scenario described in #83687

This fix solves a potential starvation scenario where a VOTER_DEMOTING_LEARNER keeps transferring the lease to the VOTER_INCOMING, succeeding, but then re-acquiring because the VOTER_INCOMING is dead and the lease expires. In this case, we would want another replica to pick up the lease, which would allow us to exit the joint configuration.

This PR also removes the leaseHolderRemovalAllowed parameter of CheckCanReceiveLease,
since it is always true since 22.2.

Release note (bug fix): narrows down the conditions under which a VOTER_DEMOTING_LEARNER can acquire the lease in a joint configuration to a) there has to be an VOTER_INCOMING in the configuration and b) the VOTER_DEMOTING_LEARNER was the last leaseholder. This prevents it from acquiring the lease unless it is the only one that can acquire it, because transferring the lease away is necessary before exiting the joint config (without the fix the system can be stuck in a joint configuration in some rare situations).

Release justification: solves a serious bug.

Fixes: #88667
See also #89340


Release justification:

This PR restricts the case when a VOTER_DEMOTING_LEARNER can
aquire the lease in a joint configuration to only the case where
it was the last leaseholder. Since it is being removed, we only
want it to get the lease if no other replica can aquire it,
the scenario described in #83687

This fix solves a potential starvation scenario where a VOTER_DEMOTING_LEARNER keeps
transferring the lease to the VOTER_INCOMING, succeeding, but then re-acquiring
because the VOTER_INCOMING is dead and the lease expires. In this case, we would
want another replica to pick up the lease, which would allow us to exit the joint configuration.

This PR also removes the leaseHolderRemovalAllowed parameter of CheckCanReceiveLease,
since it is always true since 22.2.

Release note (bug fix): narrows down the conditions under which a VOTER_DEMOTING_LEARNER
can acquire the lease in a joint configuration to a) there has to be an VOTER_INCOMING
in the configuration and b) the VOTER_DEMOTING_LEARNER was the last leaseholder. This
prevents it from acquiring the lease unless it is the only one that can acquire it,
because transferring the lease away is necessary before exiting the joint config (without
the fix the system can be stuck in a joint configuration in some rare situations).

Fixes: #88667
See also #89340
@blathers-crl blathers-crl bot requested a review from a team as a code owner October 8, 2022 02:27
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-22.2.0-89564 branch from 042a320 to c406fca Compare October 8, 2022 02:27
@blathers-crl
Copy link
Author

blathers-crl bot commented Oct 8, 2022

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Oct 8, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@shralex
Copy link
Contributor

shralex commented Oct 9, 2022

Failures seem unrelated. TestMultiRangeScanWithPagination consistently passes locally.

@shralex shralex merged commit 776e9f8 into release-22.2.0 Oct 9, 2022
@shralex shralex deleted the blathers/backport-release-22.2.0-89564 branch October 9, 2022 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants