-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fallback to follower when leader is busy #916
Conversation
Signed-off-by: you06 <[email protected]>
@@ -1191,9 +1195,11 @@ func (c *RegionCache) reloadRegion(regionID uint64) { | |||
// ignore error and use old region info. | |||
logutil.Logger(bo.GetCtx()).Error("load region failure", | |||
zap.Uint64("regionID", regionID), zap.Error(err)) | |||
c.mu.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an extra fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a possible data race.
} | ||
|
||
func (state *tryFollower) onSendSuccess(selector *replicaSelector) { | ||
if !selector.region.switchWorkLeaderToPeer(selector.targetReplica().peer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The former naming and meaning of the switchWorkLeaderToPeer
function is quite confusing, I don't understand what's the purpose of it..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The former usage of tryFollower
is after the failure of accessKnownLeader
, in this case, if one of the follower can serve the leader-read request, it's the new leader, so switch the leader to this peer.
@@ -888,6 +902,22 @@ func (s *replicaSelector) updateLeader(leader *metapb.Peer) { | |||
s.region.invalidate(StoreNotFound) | |||
} | |||
|
|||
// For some reason, the leader is unreachable by now, try followers instead. | |||
func (s *replicaSelector) fallback2Follower(ctx *RPCContext) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By now is the only situation that would be used the stale read fallback -> leader -> fallback replicas
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, when fallbacking to replica from leader, it's a follower read request, not stale read.
/cc @crazycs520 PTAL |
Signed-off-by: you06 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: cfzjywxk <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to give it a hint or constraint on which follower to try (first)? For example we may want it to do the follower read in its local zone as much as possible?
Signed-off-by: you06 <[email protected]>
Implemented this strategy, PTAL. |
Signed-off-by: you06 <[email protected]>
|
||
if len(state.labels) > 0 { | ||
idx, selectReplica := filterReplicas(func(selectReplica *replica) bool { | ||
return selectReplica.store.IsLabelsMatch(state.labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the selectReplica.isExhausted(1)
check missing here? How about putting it into the default filterReplicas
and pass a nil
checker function if there's no labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replica may be exhausted by data-is-not-ready, which does not affect follower read.
* fallback to follower when leader is busy Signed-off-by: you06 <[email protected]> * add comment Signed-off-by: you06 <[email protected]> * Update internal/locate/region_request.go Co-authored-by: cfzjywxk <[email protected]> * after fallback to replica read from leader, retry local follower first Signed-off-by: you06 <[email protected]> * address comment Signed-off-by: you06 <[email protected]> --------- Signed-off-by: you06 <[email protected]> Co-authored-by: cfzjywxk <[email protected]>
* fallback to follower when leader is busy Signed-off-by: you06 <[email protected]> Co-authored-by: cfzjywxk <[email protected]>
* fallback to follower when leader is busy Signed-off-by: you06 <[email protected]> Co-authored-by: cfzjywxk <[email protected]>
* fallback to follower when leader is busy Signed-off-by: you06 <[email protected]> Co-authored-by: cfzjywxk <[email protected]> Signed-off-by: you06 <[email protected]>
* fallback to follower when leader is busy Signed-off-by: you06 <[email protected]> Co-authored-by: cfzjywxk <[email protected]> Co-authored-by: cfzjywxk <[email protected]>
* fallback to follower when leader is busy Signed-off-by: you06 <[email protected]> Co-authored-by: cfzjywxk <[email protected]> Co-authored-by: cfzjywxk <[email protected]>
* fallback to follower when leader is busy Signed-off-by: you06 <[email protected]> Co-authored-by: cfzjywxk <[email protected]> Co-authored-by: cfzjywxk <[email protected]>
* fallback to follower when leader is busy Signed-off-by: you06 <[email protected]> Co-authored-by: cfzjywxk <[email protected]> Co-authored-by: cfzjywxk <[email protected]>
* reload region cache when store is resolved from invalid status (#843) Signed-off-by: you06 <[email protected]> Co-authored-by: disksing <[email protected]> * fallback to follower when leader is busy (#916) (#923) * fallback to follower when leader is busy Signed-off-by: you06 <[email protected]> Co-authored-by: cfzjywxk <[email protected]> Co-authored-by: cfzjywxk <[email protected]> * Resume max retry time check for stale read retry with leader option(#903) (#911) * Resume max retry time check for stale read retry with leader option Signed-off-by: cfzjywxk <[email protected]> * add cancel Signed-off-by: cfzjywxk <[email protected]> --------- Signed-off-by: cfzjywxk <[email protected]> * add region cache state test & fix some issues of replica selector (#910) Signed-off-by: you06 <[email protected]> remove duplicate code Signed-off-by: you06 <[email protected]> * enable workflow for tidb-7.1 Signed-off-by: you06 <[email protected]> * update Signed-off-by: you06 <[email protected]> update Signed-off-by: you06 <[email protected]> fix test Signed-off-by: you06 <[email protected]> fix test Signed-off-by: you06 <[email protected]> * lint Signed-off-by: you06 <[email protected]> * lint Signed-off-by: you06 <[email protected]> * fix flaky test Signed-off-by: you06 <[email protected]> --------- Signed-off-by: you06 <[email protected]> Signed-off-by: cfzjywxk <[email protected]> Co-authored-by: disksing <[email protected]> Co-authored-by: cfzjywxk <[email protected]> Co-authored-by: cfzjywxk <[email protected]>
Fallback to follower when leader is busy.
Inject data-is-not-ready for stale read and server-is-busy for leader, so fallback to leader will be stucked.
With this patch, server-is-busy on leader will try followers.