-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: serve follower reads before checking lease #57992
Comments
…r reads Fixes cockroachdb#60830. This was broken in c44b357 and started causing problems after 0668efb. A non-empty LeaseStatus during follower reads was tripping up logic that assumed that a VALID LeaseStatus indicated that the serving replica was the leaseholder. This assumption used to hold, but unintentionally did not anymore. Returning an empty LeaseStatus on follower reads instead of expanding the meaning of a VALID LeaseStatus to include follower reads seems like the right change. This is because we want to get to a place where the lease is only checked if a replica does not have a sufficient closed timestamp, so follower reads should never even need to consult the lease - see cockroachdb#57992.
60901: kv: return empty LeaseStatus from leaseGoodToGoRLocked during follower reads r=nvanbenschoten a=nvanbenschoten Fixes #60830. This was broken in c44b357 and started causing problems after 0668efb. A non-empty LeaseStatus during follower reads was tripping up logic that assumed that a VALID LeaseStatus indicated that the serving replica was the leaseholder. This assumption used to hold, but unintentionally did not anymore. Returning an empty LeaseStatus on follower reads instead of expanding the meaning of a VALID LeaseStatus to include follower reads seems like the right change. This is because we want to get to a place where the lease is only checked if a replica does not have a sufficient closed timestamp, so follower reads should never even need to consult the lease - see #57992. Co-authored-by: Nathan VanBenschoten <[email protected]>
I tried switching around the conditions for serving as a leaseholder and serving as a follower below, and hit a snag: cockroach/pkg/kv/kvserver/replica.go Lines 1312 to 1317 in 679ee6e
As things stand, if I just invert the conditions and have the leaseholder evaluate a request under an empty lease status, that request just spins in case it runs into a lock. This is because, when hitting the lock and trying to handle it via
To the lock table, the situation looks as if a request evaluating under an old lease hit an intent and the expectation of the highlighted code is that the request will be re-evaluated under a newer lease and thus take a different code path (I think). However, without further changes, the request continues to evaluate as a follower read again and again. I'm wondering what the moral is here. It seems to me that there are benefits from evaluating under a lease when there is a lease: the encountered locks can be added to the lock table. I guess the alternative is to arrange things such that a request evaluating on the leaseholder as a "follower read" handles locks the same way that a true follower would - I believe on follower the lock table is disabled and thus not involved in waiting for locks, and so the reader pushes directly. But, I assume there are benefits to going through the lock table (which I'd be curious to hear spelled out). I see notes, though, suggesting that maybe we actually want to go more in the other direction - enabling the lock table on followers. cockroach/pkg/kv/kvserver/concurrency/concurrency_control.go Lines 765 to 773 in 9157c7b
The note says that this might happen with the migration of replicated locks in to the lock table, so maybe it's close? I want take the opportunity to discuss this issue more conceptually, to have it click more for me. The issue is motivated as:
I believe what this passage refers to is the fact that, currently, a replica doesn't activate the follower reads code path when the lease it knows about is invalid. This is because of the following code: cockroach/pkg/kv/kvserver/replica_follower_read.go Lines 68 to 70 in e024445
This code doesn't handle the InvalidLeaseError (produced here) and, even for the NotLeaseholderError , it nitpicks the error's fields. This code is crap, as far as I can tell. The only reason the code has to care about the specifics of the lease error has to do with the following line:
This is about the "old closed timestamp code", and is going away momentarily. So, I can simply remove the error checks inside |
Before this patch, requests were not eligible for follower reads when the lease was invalid (or when the follower replica in question thought that there's no valid lease because it's behind). The reasons for this bizarre state of affairs stem historically from the intertwining of leases and closed timestamps - the current leaseholder had to be known in order to ask it for a closed timestamp. This entanglement has now gone away, so we can serve follower reads regardless of the lease status as nature intended. Touches cockroachdb#57992 Release note: None
That all sounds right to me. Once we stop considering the error type in |
Before this patch, requests were not eligible for follower reads when the lease was invalid (or when the follower replica in question thought that there's no valid lease because it's behind). The reasons for this bizarre state of affairs stem historically from the intertwining of leases and closed timestamps - the current leaseholder had to be known in order to ask it for a closed timestamp. This entanglement has now gone away, so we can serve follower reads regardless of the lease status as nature intended. Closes cockroachdb#57992 Release note: None
Before this patch, requests were not eligible for follower reads when the lease was invalid (or when the follower replica in question thought that there's no valid lease because it's behind). The reasons for this bizarre state of affairs stem historically from the intertwining of leases and closed timestamps - the current leaseholder had to be known in order to ask it for a closed timestamp. This entanglement has now gone away, so we can serve follower reads regardless of the lease status as nature intended. Closes cockroachdb#57992 Release note: None Release justification: needed for bounded staleness read availability during partial failures.
Extension of #57397.
Once closed timestamps are separated away from lease state, it becomes possible to serve follower reads on ranges that have no active leaseholder. To do so, we'd want to check whether a read can be served by a replica based on the replica's closed timestamp before checking whether the replica has or knows about the current Range lease.
An important property that this should give us is that once a replica begins serving follower reads at a timestamp, it will always continue to serve follower reads at that timestamp. This means that even if the replica becomes completely partitioned away from the rest of its range, it will continue to be available for (increasingly) stale reads.
Epic: CRDB-2559
The text was updated successfully, but these errors were encountered: