-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Consistent Hashing for tablet selection #11959
[WIP] Consistent Hashing for tablet selection #11959
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
If a new flag is being introduced:
If a workflow is added or modified:
Bug fixes
Non-trivial changes
New/Existing features
Backward compatibility
|
This changes tablet selection from being purely random (with a preference for local tablets) to being based on the result of a consistent hash using the session's UUID. This effectively means that incoming queries via MySQL connections will be routed to the same tablet unless there is a change in the topology (e.g. a new tablet being added, or a tablet being detected as unhealthy). But even if such a topolgy change is detected, only a subset of all incoming connections will start using a different tablet. We have clusters that have a large amount of replicas that we can pick from, but picking a random replica on each query can lead to a very inconsistent view of our data, especially if there's a high variance of replication lag between replicas. Signed-off-by: Arthur Schreiber <[email protected]>
07666bb
to
d57f40e
Compare
@@ -1283,6 +1283,7 @@ type ExecuteOptions struct { | |||
// TransactionAccessMode specifies the access modes to be used while starting the transaction i.e. READ WRITE/READ ONLY/WITH CONSISTENT SNAPSHOT | |||
// If not specified, the transaction will be started with the default access mode on the connection. | |||
TransactionAccessMode []ExecuteOptions_TransactionAccessMode `protobuf:"varint,14,rep,packed,name=transaction_access_mode,json=transactionAccessMode,proto3,enum=query.ExecuteOptions_TransactionAccessMode" json:"transaction_access_mode,omitempty"` | |||
SessionUUID string `protobuf:"bytes,15,opt,name=SessionUUID,proto3" json:"SessionUUID,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too happy about adding the SessionUUID
here, but I couldn't find a better way to pass it through to the tablet selection logic. If anyone has a better idea how that could be solved, I'm all ears. 🙇♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way is to pass SessionUUID explicitly to all the methods.
@@ -84,7 +85,7 @@ func TestTabletGatewayBeginExecute(t *testing.T) { | |||
|
|||
func TestTabletGatewayShuffleTablets(t *testing.T) { | |||
hc := discovery.NewFakeHealthCheck(nil) | |||
tg := NewTabletGateway(context.Background(), hc, nil, "local") | |||
tg := NewTabletGateway(context.Background(), hc, nil, "cell1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand how the existing shuffleTablets
test cases where even passing? 🤔
// Old tablet selection implementation | ||
if sessionUUID == "" { | ||
th = gw.selectRandomTablet(tablets, invalidTablets) | ||
} else { | ||
th = gw.selectConsistentTablet(sessionUUID, tablets, invalidTablets) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could put the tablet selection logic behind a CLI flag, if there's any concern in changing this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that should be the way to move forward by putting behind a flag like tablet-selection-algorithm
} | ||
|
||
if len(localTablets) > 0 { | ||
hash := NewConsistentHash() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of re-generating the hashes here on each call, we could generate them whenever we get notified of a topology event (via a topology watcher) and then clone the hash struct and it's data when we need to make modifications during tablet selection.
That'd reduce the number of calls to xxhash
down to just one (for the session UUID). And even that could be changed to just happen once per query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all should be done as part of health check subscription calls.
} | ||
|
||
func (ch *ConsistentHash) Get(sessionUUID string) *discovery.TabletHealth { | ||
hash := xxhash.Sum64String(sessionUUID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be done once on the connection, it is not required on every query.
@arthurschreiber You are giving me mixed signals. |
I happened to stumble upon this PR and it actually relates to some other (very early) work that I've been thinking about: Again that isn't yet ready for public comment and I'm not even sure if it works yet since I just wrote it up this weekend. But the reason I mention it is that both share the idea that we want a different tablet selection process than random sorting with local cell preference. |
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
This PR was closed because it has been stale for 7 days with no activity. |
Description
This pull request proposes changing the tablet selection logic in
vtgate
from a random selection (with preference to tablets in local cells), to using a consistent hashing scheme based on the session UUID (with preference to tablets in local cells).When tablet selection is performed, we want to re-use the same tablet for the same session. We do this by making use of the session UUID. Each MySQL connection coming into
vtgate
is assigned a random UUID. The session UUID is fixed for the lifetime of the connection, with the exception of performing a session reset (which also resets the session UUID).Tablets are put into a consistent hash, and a tablet is selected based on the session UUID.
This leads to the same tablet being selected over the lifetime of an incoming MySQL connection, until the point where the tablet topology changes (i.e. a new tablet is added, or an existing tablet becomes unhealthy). When this happens, the consistent hashing scheme will automatically redistribute some of the incoming connections to other available replicas, but most connections should stay unaffected.
Implementation details:
xxhash
is used for hashing values.Note: I haven't performed any benchmarking for this. My intention behind this PR is to first gather input on what other people think about this change, and then polishing the implementation.
Related Issue(s)
#11971
Checklist
Deployment Notes