Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Consistent Hashing for tablet selection #11959

Conversation

arthurschreiber
Copy link
Contributor

@arthurschreiber arthurschreiber commented Dec 14, 2022

Description

This pull request proposes changing the tablet selection logic in vtgate from a random selection (with preference to tablets in local cells), to using a consistent hashing scheme based on the session UUID (with preference to tablets in local cells).

When tablet selection is performed, we want to re-use the same tablet for the same session. We do this by making use of the session UUID. Each MySQL connection coming into vtgate is assigned a random UUID. The session UUID is fixed for the lifetime of the connection, with the exception of performing a session reset (which also resets the session UUID).

Tablets are put into a consistent hash, and a tablet is selected based on the session UUID.

This leads to the same tablet being selected over the lifetime of an incoming MySQL connection, until the point where the tablet topology changes (i.e. a new tablet is added, or an existing tablet becomes unhealthy). When this happens, the consistent hashing scheme will automatically redistribute some of the incoming connections to other available replicas, but most connections should stay unaffected.

Implementation details:

  • xxhash is used for hashing values.
  • "replicas"/"virtual node count" is set at 10. I'm not sure what a good value for this is.

Note: I haven't performed any benchmarking for this. My intention behind this PR is to first gather input on what other people think about this change, and then polishing the implementation.

Related Issue(s)

#11971

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Dec 14, 2022

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • If this is a change that users need to know about, please apply the release notes (needs details) label so that merging is blocked unless the summary release notes document is included.

If a new flag is being introduced:

  • Is it really necessary to add this flag?
  • Flag names should be clear and intuitive (as far as possible)
  • Help text should be descriptive.
  • Flag names should use dashes (-) as word separators rather than underscores (_).

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow should be required, the maintainer team should be notified.

Bug fixes

  • There should be at least one unit or end-to-end test.
  • The Pull Request description should include a link to an issue that describes the bug.

Non-trivial changes

  • There should be some code comments as to why things are implemented the way they are.

New/Existing features

  • Should be documented, either by modifying the existing documentation or creating new documentation.
  • New features should have a link to a feature request issue or an RFC that documents the use cases, corner cases and test cases.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • vtctl command output order should be stable and awk-able.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from VTop, if used there.

This changes tablet selection from being purely random (with a preference for local tablets) to being based on the result of a consistent hash using the session's UUID.

This effectively means that incoming queries via MySQL connections will be routed to the same tablet unless there is a change in the topology (e.g. a new tablet being added, or a tablet being detected as unhealthy). But even if such a topolgy change is detected, only a subset of all incoming connections will start using a different tablet.

We have clusters that have a large amount of replicas that we can pick from, but picking a random replica on each query can lead to a very inconsistent view of our data, especially if there's a high variance of replication lag between replicas.

Signed-off-by: Arthur Schreiber <[email protected]>
@arthurschreiber arthurschreiber force-pushed the arthur/consistent-hashing-for-tablet-selection branch from 07666bb to d57f40e Compare December 14, 2022 16:10
@@ -1283,6 +1283,7 @@ type ExecuteOptions struct {
// TransactionAccessMode specifies the access modes to be used while starting the transaction i.e. READ WRITE/READ ONLY/WITH CONSISTENT SNAPSHOT
// If not specified, the transaction will be started with the default access mode on the connection.
TransactionAccessMode []ExecuteOptions_TransactionAccessMode `protobuf:"varint,14,rep,packed,name=transaction_access_mode,json=transactionAccessMode,proto3,enum=query.ExecuteOptions_TransactionAccessMode" json:"transaction_access_mode,omitempty"`
SessionUUID string `protobuf:"bytes,15,opt,name=SessionUUID,proto3" json:"SessionUUID,omitempty"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too happy about adding the SessionUUID here, but I couldn't find a better way to pass it through to the tablet selection logic. If anyone has a better idea how that could be solved, I'm all ears. 🙇‍♂️

Copy link
Member

@harshit-gangal harshit-gangal Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way is to pass SessionUUID explicitly to all the methods.

@@ -84,7 +85,7 @@ func TestTabletGatewayBeginExecute(t *testing.T) {

func TestTabletGatewayShuffleTablets(t *testing.T) {
hc := discovery.NewFakeHealthCheck(nil)
tg := NewTabletGateway(context.Background(), hc, nil, "local")
tg := NewTabletGateway(context.Background(), hc, nil, "cell1")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand how the existing shuffleTablets test cases where even passing? 🤔

Comment on lines +319 to 324
// Old tablet selection implementation
if sessionUUID == "" {
th = gw.selectRandomTablet(tablets, invalidTablets)
} else {
th = gw.selectConsistentTablet(sessionUUID, tablets, invalidTablets)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put the tablet selection logic behind a CLI flag, if there's any concern in changing this logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that should be the way to move forward by putting behind a flag like tablet-selection-algorithm

}

if len(localTablets) > 0 {
hash := NewConsistentHash()
Copy link
Contributor Author

@arthurschreiber arthurschreiber Dec 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of re-generating the hashes here on each call, we could generate them whenever we get notified of a topology event (via a topology watcher) and then clone the hash struct and it's data when we need to make modifications during tablet selection.

That'd reduce the number of calls to xxhash down to just one (for the session UUID). And even that could be changed to just happen once per query.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all should be done as part of health check subscription calls.

}

func (ch *ConsistentHash) Get(sessionUUID string) *discovery.TabletHealth {
hash := xxhash.Sum64String(sessionUUID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be done once on the connection, it is not required on every query.

@systay
Copy link
Collaborator

systay commented Dec 20, 2022

@arthurschreiber You are giving me mixed signals. [WIP] in the title of the PR, but still marked as ready for review. Which is it? :)

@demmer
Copy link
Member

demmer commented Jan 17, 2023

I happened to stumble upon this PR and it actually relates to some other (very early) work that I've been thinking about:
slackhq#41

Again that isn't yet ready for public comment and I'm not even sure if it works yet since I just wrote it up this weekend. But the reason I mention it is that both share the idea that we want a different tablet selection process than random sorting with local cell preference.

@github-actions
Copy link
Contributor

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

  • Push additional commits to the associated branch.
  • Remove the stale label.
  • Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

@github-actions github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Feb 17, 2023
@github-actions
Copy link
Contributor

This PR was closed because it has been stale for 7 days with no activity.

@github-actions github-actions bot closed this Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants