Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: ignore leaseholder replica type in DistSender #85140

Merged

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented Jul 27, 2022

The DistSender could fail to prioritize a newly discovered leaseholder
from a NotLeaseHolderError if the leaseholder had a non-VOTER
replica type. Instead, it would continue to try replicas in order until
possibly exhausting the transport and backing off, leading to increased
tail latencies. This applies in particular to 22.1, where we allowed
VOTER_INCOMING replicas to acquire the lease (see 22b4fb5).

The primary reason is that grpcTransport.MoveToFront() would fail to
compare the new leaseholder replica descriptor with the one in its range
descriptor. There are two reasons why this can happen:

  1. ReplicaDescriptor.ReplicaType is a pointer, where the zero-value
    nil is equivalent to VOTER. The equality comparison used in
    MoveToFront() is ==, but pointer equality compares the memory
    address rather than the value.

  2. The transport will keep using the initial range descriptor when it
    was created, and not updating it as we receive updated range
    descriptors. This means that the transport may e.g. have a nil
    replica type while the leaseholder has an VOTER_INCOMING replica
    type.

This patch fixes both issues by adding ReplicaDescriptor.IsSame()
which compares replica identities while ignoring the type.

Resolves #85060.
Touches #74546.

Release note (bug fix): Fixed a bug where new leaseholders (with a
VOTER_INCOMING type) would not always be detected properly during
query execution, leading to occasional increased tail latencies due
to unnecessary internal retries.

@erikgrinaker erikgrinaker requested review from pav-kv, nvanbenschoten and a team July 27, 2022 15:40
@erikgrinaker erikgrinaker requested a review from a team as a code owner July 27, 2022 15:40
@erikgrinaker erikgrinaker self-assigned this Jul 27, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker erikgrinaker force-pushed the distsender-ignore-leaseholder-type branch from deadd68 to 8d4913b Compare July 27, 2022 15:41
@erikgrinaker erikgrinaker force-pushed the distsender-ignore-leaseholder-type branch from 8d4913b to 9677035 Compare July 27, 2022 15:46
The DistSender could fail to prioritize a newly discovered leaseholder
from a `NotLeaseHolderError` if the leaseholder had a non-`VOTER`
replica type. Instead, it would continue to try replicas in order until
possibly exhausting the transport and backing off, leading to increased
tail latencies. This applies in particular to 22.1, where we allowed
`VOTER_INCOMING` replicas to acquire the lease (see 22b4fb5).

The primary reason is that `grpcTransport.MoveToFront()` would fail to
compare the new leaseholder replica descriptor with the one in its range
descriptor. There are two reasons why this can happen:

1. `ReplicaDescriptor.ReplicaType` is a pointer, where the zero-value
   `nil` is equivalent to `VOTER`. The equality comparison used in
   `MoveToFront()` is `==`, but pointer equality compares the memory
   address rather than the value.

2. The transport will keep using the initial range descriptor when it
   was created, and not updating it as we receive updated range
   descriptors. This means that the transport may e.g. have a `nil`
   replica type while the leaseholder has an `VOTER_INCOMING` replica
   type.

This patch fixes both issues by adding `ReplicaDescriptor.IsSame()`
which compares replica identities while ignoring the type.

Release note (bug fix): Fixed a bug where new leaseholders (with a
`VOTER_INCOMING` type) would not always be detected properly during
query execution, leading to occasional increased tail latencies due
to unnecessary internal retries.
@erikgrinaker erikgrinaker force-pushed the distsender-ignore-leaseholder-type branch from 9677035 to 0251570 Compare July 27, 2022 19:23
Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @pavelkalinnikov)

// IsSame returns true if the two replica descriptors refer to the same replica,
// ignoring the replica type.
func (r ReplicaDescriptor) IsSame(o ReplicaDescriptor) bool {
return r.NodeID == o.NodeID && r.StoreID == o.StoreID && r.ReplicaID == o.ReplicaID
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the relations between node, store, and replica?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ReplicaID local to a pair of (NodeID, StoreID)? Can it be compared without any of the two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One-to-many all the way down. One node can have many stores (disks), one store can have many replicas.

The node ID must be unique within the cluster. The store ID must be unique per node. The replica ID must be unique within the range. The range ID is sort of implied here, in that the replica descriptor is stored within a range descriptor.

In order to route a request to a replica we need to know all four. A replica ID won't move between nodes/stores -- if we have to move it, we'll create a new replica (with a new ID) on a different node, populate it, and then delete the old one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right in that we technically only need to compare the replica ID here, since we're operating within the context of a single range. We check the node ID and store ID just to make sure we have the right "address" for it, I suppose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so: (rangeID, replicaID) -> (nodeID, storeID), and rangeID is implied by the caller of IsSame (otherwise, due to locality of replicaID, it would be comparing apples to tables). Hence, is it enough to just compare ReplicaIDs? From what you said it follows that if rangeID and replicaID are the same then nodeID and storeID are the same too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you answered this already. Were racing with the last 2 comments :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle it should be, yeah. But it's cheap enough to check the node and store too, so I don't see a strong reason not to. If there should be a mismatch for whatever reason, then there's no real point in trying to contact the replica anyway, because the transport will send it to the wrong place.

@erikgrinaker
Copy link
Contributor Author

TFTR!

bors r=arulajmani

@craig
Copy link
Contributor

craig bot commented Jul 28, 2022

This PR was included in a batch that was canceled, it will be automatically retried

@craig
Copy link
Contributor

craig bot commented Jul 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 29, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 29, 2022

Build succeeded:

@craig craig bot merged commit 9351986 into cockroachdb:master Jul 29, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jul 29, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 0251570 to blathers/backport-release-22.1-85140: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@erikgrinaker erikgrinaker deleted the distsender-ignore-leaseholder-type branch August 1, 2022 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: incorrect DistSender leaseholder processing for non-VOTERs
4 participants