Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-22.1: kvcoord: optionally update range cache with descriptor returned in NLHE #98798

Closed

Conversation

arulajmani
Copy link
Collaborator

Backport 4/4 commits from #85503.

/cc @cockroachdb/release


We recently started shipping the range descriptors back on
NotLeaseHolderError. Prior to this patch, we only ever used their
generation to elide certain lease updates if they originated from a
replica that had an older (stale) view of this world.

This patch goes a step further in making use of the returned range
descriptor -- we now update the client's range cache if the returned
range descriptor is newer than what existed on the client. We do this
by picking the freshest range descriptor and lease, independently, from
the range descriptor/lease between what exists in the client's range
cache and what was returned in the NLHE. Though unlikely, independently
choosing the freshest lease/range descriptor can lead to cases where the
lease is not compatible with the range descriptor. If we detect this to
be the case, we empty out the lease and simply cache the freshest range
descriptor.

Previously, we always accepted speculative leases to be more recent than
anything already in the cache. Now, we discard speculative leases if
they're coming from a replica that has an older view of the world.
Morally, these semantics are the right way to conceptualize #72772,
given that hazard could only ever exist for speculative leases. #82802
also falls out as a special case of this conceptualization, so closing
that issue simply entails adding a regression test.

We also get rid of optimizations where we would try to identify "stale"
range descriptors when updating lease information. These no longer make
sense given we don't just update the lease, we also update the range
descriptor. These optimizations also didn't account for all possible
cases, such as when the leaseholder was present on the range
descriptor, but as a LEARNER. See #75742 for more details about how
this hazard manifests.

With this new approach we also address #82802. Instead of invalidating
the routing when a descriptor is identified as stale and using that as
a proxy to bail early when routing to replicas, we instead switch to a
more direct approach -- if at any point we detect the leaseholder isn't
included on the transport, we exit the routing logic early, and retry
at a layer above. Given our new range cache update semantics using
descriptors on NLHE errors, it's quite likely this retry circumvents a
range descriptor lookup -- instead, we'd expect this to simply amount to
trying again with a newly constructed transport. The rationale is that
even though the client may have an arbitrarily stale view of the range,
we never expect it to regress. Thus, if the leaseholder was updated on
the client and it doesn't exist on the transport, it must be the case
that the transport was constructed using a stale range descriptor and
stale leaseholder information. As such, there is likely not much value
in trying to exhaust the transport (and potentially getting caught in
backoff loops). We're much better served by bailing early and trying
again with a fresh transport.

Fixes #82802
Fixes #75742

Release note: None
Release justification: Fixes for high priority bugs in existing
functionality.

Previously, this test was constructing a NLHE with a structure we
wouldn't really expect. We instead move to running two versions of
this test -- one with a speculative lease and another where the lease
returned has an older lease sequence than what is cached.

Release note: None
Release justification: Non-production code change
Previously, we were logging that we had evicted a range descriptor
from the cache even though we may not have -- we only evict if the
cache doesn't have a newer descriptor than the one we're trying to
evict. We now only log if we indeed evicted the range descriptor;
otherwise, we do not.

Release note: None
Release justification: Low risk, high benefit change
We recently started shipping the range descriptors back on
NotLeaseHolderError. Prior to this patch, we only ever used their
generation to elide certain lease updates if they originated from a
replica that had an older (stale) view of this world.

This patch goes a step further in making use of the returned range
descriptor -- we now update the client's range cache if the returned
range descriptor is newer than what existed on the client. We do this
by picking the freshest range descriptor and lease, independently, from
the range descriptor/lease what exist in the client's range cache and
what was returned in the NLHE. Though unlikely, independently choosing
the freshest lease/range descriptor can lead to cases where the lease is
not compatible with the range descriptor. If we detect this to be the
case, we empty out the lease and simply cache the freshest range
descriptor.

Previously, we always accepted speculative leases to be more recent than
anything already in the cache. Now, we discard speculative leases if
they're coming from a replica that has an older view of the world.
Morally, these semantics are the right way to conceptualize cockroachdb#72772,
given that hazard could only ever exist for speculative leases. cockroachdb#82802
also falls out as a special case of this conceptualization, and this
patch adds a regression test for the scenario where an uninitialized
replica returns a NLHE with a speculative lease pointing to a replica
that isn't part of the range.

We also get rid of optimizations where we would try to identify "stale"
range descriptors when updating lease information. These no longer make
sense given we don't just update the lease, we also update the range
descriptor. These optimizations also didn't account for all possible
cases, such as when the leaseholder was present on the range
descriptor, but as a LEARNER. See cockroachdb#75742 for more details about how
this hazard manifests.

With this new approach we also address cockroachdb#75742. Instead of invalidating
the routing when a descriptor is identified as stale and using that as
a proxy to bail early when routing to replicas, we instead switch to a
more direct approach -- if at any point we detect the leaseholder isn't
included on the transport, we exit the routing logic early, and retry
at a layer above. Given our new range cache update semantics using
descriptors on NLHE errors, it's quite likely this retry circumvents a
range descriptor lookup from meta2 -- instead, we'd expect this to
simply amount to trying again with a newly constructed transport. The
rationale is that even though the client may have an arbitrarily stale
view of the range, we never expect it to regress. Thus, if the
leaseholder was updated on the client and it doesn't exist on the
transport, it must be the case that the transport was constructed using
a stale range descriptor and stale leaseholder information. As such,
there is likely not much value in trying to exhaust the transport
(and potentially getting caught in backoff loops). We're much better
served by bailing early and trying again with a fresh transport.

Fixes cockroachdb#82802
Fixes cockroachdb#75742

Release note: None
Release justification: Fixes for high priority bugs in existing
functionality.
Previously, if a request was routed to an uninitialized replica, the
returned NLHE would contain a speculative lease  pointing to the creator
of the uninitialized replica.

The prior commit makes it such that lease/leaseholder information in
NLHEs is only considered if accompanied by a non-stale range descriptor.
As uninitialized replicas always return empty range descriptors, which
are considered stale, the speculative lease they returned would never
be acted upon. Given this, we omit speculative leases in NLHEs returned
by uninitialized replicas. The idea being the client will simply move
on to the next replica in its transport which will very likely have
more useful leaseholder information for the client.

Release note: None
Release justification: Low risk, high benefit change
@arulajmani arulajmani requested a review from a team as a code owner March 16, 2023 18:30
@blathers-crl
Copy link

blathers-crl bot commented Mar 16, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Member

This change is Reviewable

ajwerner and others added 2 commits March 16, 2023 14:43
We were logging `lease holder unknown` when the deprecated field was not
populated.

Release note: None
v22.1 binaries assume that the leaseholder is unknown when logging
NLHE errors if the (Deprecated)LeaseHolder field is unset -- regardless
of if the Lease is set or not. We broke this logging in 0402f47
(for mixed version clusters) when we stopped shipping back leaseholder
information (in favour of only shipping lease information) on NLHEs.
This patch fixes this by populating the (Deprecated)LeaseHolder field
when constructing NLHEs.

Release note: None
@arulajmani
Copy link
Collaborator Author

Last two commits from #91423 and #91515.

@irfansharif
Copy link
Contributor

Ping, this possibly showed up here: https://github.com/cockroachlabs/support/issues/2239. Are you planning to merge this backport?

@arulajmani
Copy link
Collaborator Author

I think I'm going to close this backport, given 22.1 is nearing its end. This isn't a trivial backport, so landing it doesn't feel worth the risk.

@arulajmani arulajmani closed this Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants