kvserver: ignore leaseholder replica type in DistSender #85140

erikgrinaker · 2022-07-27T15:40:18Z

The DistSender could fail to prioritize a newly discovered leaseholder
from a NotLeaseHolderError if the leaseholder had a non-VOTER
replica type. Instead, it would continue to try replicas in order until
possibly exhausting the transport and backing off, leading to increased
tail latencies. This applies in particular to 22.1, where we allowed
VOTER_INCOMING replicas to acquire the lease (see 22b4fb5).

The primary reason is that grpcTransport.MoveToFront() would fail to
compare the new leaseholder replica descriptor with the one in its range
descriptor. There are two reasons why this can happen:

ReplicaDescriptor.ReplicaType is a pointer, where the zero-value
nil is equivalent to VOTER. The equality comparison used in
MoveToFront() is ==, but pointer equality compares the memory
address rather than the value.
The transport will keep using the initial range descriptor when it
was created, and not updating it as we receive updated range
descriptors. This means that the transport may e.g. have a nil
replica type while the leaseholder has an VOTER_INCOMING replica
type.

This patch fixes both issues by adding ReplicaDescriptor.IsSame()
which compares replica identities while ignoring the type.

Resolves #85060.
Touches #74546.

Release note (bug fix): Fixed a bug where new leaseholders (with a
VOTER_INCOMING type) would not always be detected properly during
query execution, leading to occasional increased tail latencies due
to unnecessary internal retries.

cockroach-teamcity · 2022-07-27T15:40:27Z

This change is

The DistSender could fail to prioritize a newly discovered leaseholder from a `NotLeaseHolderError` if the leaseholder had a non-`VOTER` replica type. Instead, it would continue to try replicas in order until possibly exhausting the transport and backing off, leading to increased tail latencies. This applies in particular to 22.1, where we allowed `VOTER_INCOMING` replicas to acquire the lease (see 22b4fb5). The primary reason is that `grpcTransport.MoveToFront()` would fail to compare the new leaseholder replica descriptor with the one in its range descriptor. There are two reasons why this can happen: 1. `ReplicaDescriptor.ReplicaType` is a pointer, where the zero-value `nil` is equivalent to `VOTER`. The equality comparison used in `MoveToFront()` is `==`, but pointer equality compares the memory address rather than the value. 2. The transport will keep using the initial range descriptor when it was created, and not updating it as we receive updated range descriptors. This means that the transport may e.g. have a `nil` replica type while the leaseholder has an `VOTER_INCOMING` replica type. This patch fixes both issues by adding `ReplicaDescriptor.IsSame()` which compares replica identities while ignoring the type. Release note (bug fix): Fixed a bug where new leaseholders (with a `VOTER_INCOMING` type) would not always be detected properly during query execution, leading to occasional increased tail latencies due to unnecessary internal retries.

arulajmani

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pavelkalinnikov)

pav-kv · 2022-07-28T15:57:54Z

pkg/roachpb/metadata.go

+// IsSame returns true if the two replica descriptors refer to the same replica,
+// ignoring the replica type.
+func (r ReplicaDescriptor) IsSame(o ReplicaDescriptor) bool {
+	return r.NodeID == o.NodeID && r.StoreID == o.StoreID && r.ReplicaID == o.ReplicaID


What are the relations between node, store, and replica?

Is ReplicaID local to a pair of (NodeID, StoreID)? Can it be compared without any of the two?

One-to-many all the way down. One node can have many stores (disks), one store can have many replicas.

The node ID must be unique within the cluster. The store ID must be unique per node. The replica ID must be unique within the range. The range ID is sort of implied here, in that the replica descriptor is stored within a range descriptor.

In order to route a request to a replica we need to know all four. A replica ID won't move between nodes/stores -- if we have to move it, we'll create a new replica (with a new ID) on a different node, populate it, and then delete the old one.

You're right in that we technically only need to compare the replica ID here, since we're operating within the context of a single range. We check the node ID and store ID just to make sure we have the right "address" for it, I suppose.

Ok, so: (rangeID, replicaID) -> (nodeID, storeID), and rangeID is implied by the caller of IsSame (otherwise, due to locality of replicaID, it would be comparing apples to tables). Hence, is it enough to just compare ReplicaIDs? From what you said it follows that if rangeID and replicaID are the same then nodeID and storeID are the same too.

Ah, you answered this already. Were racing with the last 2 comments :)

In principle it should be, yeah. But it's cheap enough to check the node and store too, so I don't see a strong reason not to. If there should be a mismatch for whatever reason, then there's no real point in trying to contact the replica anyway, because the transport will send it to the wrong place.

erikgrinaker · 2022-07-28T16:11:37Z

TFTR!

bors r=arulajmani

craig · 2022-07-28T16:53:20Z

This PR was included in a batch that was canceled, it will be automatically retried

craig · 2022-07-28T17:08:20Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2022-07-28T17:37:48Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2022-07-28T19:53:54Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2022-07-29T04:48:23Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2022-07-29T08:52:06Z

Build succeeded:

GitHub CI (Cockroach)

blathers-crl · 2022-07-29T08:52:21Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from 0251570 to blathers/backport-release-22.1-85140: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

erikgrinaker added the backport-22.1.x label Jul 27, 2022

erikgrinaker requested review from pav-kv, nvanbenschoten and a team July 27, 2022 15:40

erikgrinaker requested a review from a team as a code owner July 27, 2022 15:40

erikgrinaker self-assigned this Jul 27, 2022

erikgrinaker force-pushed the distsender-ignore-leaseholder-type branch from deadd68 to 8d4913b Compare July 27, 2022 15:41

erikgrinaker mentioned this pull request Jul 27, 2022

roachpb: make range/replica descriptor fields non-nullable #85120

Merged

erikgrinaker force-pushed the distsender-ignore-leaseholder-type branch from 8d4913b to 9677035 Compare July 27, 2022 15:46

erikgrinaker force-pushed the distsender-ignore-leaseholder-type branch from 9677035 to 0251570 Compare July 27, 2022 19:23

arulajmani approved these changes Jul 28, 2022

View reviewed changes

erikgrinaker mentioned this pull request Jul 28, 2022

kvserver: incorrect DistSender leaseholder processing for non-VOTERs #85060

Closed

pav-kv reviewed Jul 28, 2022

View reviewed changes

craig bot merged commit 9351986 into cockroachdb:master Jul 29, 2022

erikgrinaker mentioned this pull request Jul 29, 2022

release-22.1: kvserver: ignore leaseholder replica type in DistSender #85315

Merged

erikgrinaker deleted the distsender-ignore-leaseholder-type branch August 1, 2022 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: ignore leaseholder replica type in DistSender #85140

kvserver: ignore leaseholder replica type in DistSender #85140

erikgrinaker commented Jul 27, 2022 •

edited

Loading

cockroach-teamcity commented Jul 27, 2022

arulajmani left a comment

pav-kv Jul 28, 2022

pav-kv Jul 28, 2022

erikgrinaker Jul 28, 2022

erikgrinaker Jul 28, 2022

pav-kv Jul 28, 2022

pav-kv Jul 28, 2022

erikgrinaker Jul 28, 2022

erikgrinaker commented Jul 28, 2022

craig bot commented Jul 28, 2022

craig bot commented Jul 28, 2022

craig bot commented Jul 28, 2022

craig bot commented Jul 28, 2022

craig bot commented Jul 29, 2022

craig bot commented Jul 29, 2022

blathers-crl bot commented Jul 29, 2022

kvserver: ignore leaseholder replica type in DistSender #85140

kvserver: ignore leaseholder replica type in DistSender #85140

Conversation

erikgrinaker commented Jul 27, 2022 • edited Loading

cockroach-teamcity commented Jul 27, 2022

arulajmani left a comment

Choose a reason for hiding this comment

pav-kv Jul 28, 2022

Choose a reason for hiding this comment

pav-kv Jul 28, 2022

Choose a reason for hiding this comment

erikgrinaker Jul 28, 2022

Choose a reason for hiding this comment

erikgrinaker Jul 28, 2022

Choose a reason for hiding this comment

pav-kv Jul 28, 2022

Choose a reason for hiding this comment

pav-kv Jul 28, 2022

Choose a reason for hiding this comment

erikgrinaker Jul 28, 2022

Choose a reason for hiding this comment

erikgrinaker commented Jul 28, 2022

craig bot commented Jul 28, 2022

craig bot commented Jul 28, 2022

craig bot commented Jul 28, 2022

craig bot commented Jul 28, 2022

craig bot commented Jul 29, 2022

craig bot commented Jul 29, 2022

blathers-crl bot commented Jul 29, 2022

erikgrinaker commented Jul 27, 2022 •

edited

Loading