Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only fallback to leader read with data-is-not-ready error #907

Closed
wants to merge 3 commits into from

Conversation

you06
Copy link
Contributor

@you06 you06 commented Jul 25, 2023

Inject server-is-busy error for stale read and leader read, so direct fallback to leader won't success.

fail

This patch will continue trying other replicas(the duration is high because of backoff).

success

@cfzjywxk
Copy link
Contributor

@you06
Please add descriptions and test results.

@cfzjywxk
Copy link
Contributor

/cc @crazycs520 PTAL

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cfzjywxk cfzjywxk requested review from cfzjywxk, zyguan and ekexium July 27, 2023 12:18
internal/locate/region_request.go Outdated Show resolved Hide resolved
internal/locate/region_request3_test.go Outdated Show resolved Hide resolved
@cfzjywxk
Copy link
Contributor

@you06 Please resolve the conflicts.

@@ -604,7 +605,7 @@ func (state *accessFollower) isCandidate(idx AccessIndex, replica *replica) bool
// The request can only be sent to the leader.
((state.option.leaderOnly && idx == state.leaderIdx) ||
// Choose a replica with matched labels.
(!state.option.leaderOnly && (state.tryLeader || idx != state.leaderIdx) && replica.store.IsLabelsMatch(state.option.labels))) &&
(!state.option.leaderOnly && (state.tryLeader || idx != state.leaderIdx) && (state.lastIdx >= 0 || replica.store.IsLabelsMatch(state.option.labels)))) &&
// Make sure the replica is not unreachable.
replica.store.getLivenessState() != unreachable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to split this check into different checks like the master branch here like:

func (state *accessFollower) isCandidate(idx AccessIndex, replica *replica) bool {
   if condition1 not match 
        return false
   if condition2 not match 
        return false
   if condition3 matches
        return true
   return false    
}

, it's quite difficult to maintain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, and I did this for master branch before.

@@ -604,7 +605,7 @@ func (state *accessFollower) isCandidate(idx AccessIndex, replica *replica) bool
// The request can only be sent to the leader.
((state.option.leaderOnly && idx == state.leaderIdx) ||
// Choose a replica with matched labels.
(!state.option.leaderOnly && (state.tryLeader || idx != state.leaderIdx) && replica.store.IsLabelsMatch(state.option.labels))) &&
(!state.option.leaderOnly && (state.tryLeader || idx != state.leaderIdx) && (state.lastIdx >= 0 || replica.store.IsLabelsMatch(state.option.labels)))) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding (state.lastIdx >= 0 || here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change this condition to leaderExhaust.

The try order is local follower -> leader -> remote follower.

When leaderExhaust is true, we need to ignore the labels to select the remote follower.

Copy link
Contributor

@cfzjywxk cfzjywxk Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is working as expected when the local follower is the leader peer picked first and it's unavailable for both stale and follower read requests?
It's better to comment on it in detail here.

you06 and others added 2 commits July 28, 2023 17:41
Signed-off-by: you06 <[email protected]>

typo

Signed-off-by: you06 <[email protected]>

add test

Signed-off-by: you06 <[email protected]>

remove comment

Signed-off-by: you06 <[email protected]>

remove comment

Signed-off-by: you06 <[email protected]>

update PR

Signed-off-by: you06 <[email protected]>

Update internal/locate/region_request.go

Co-authored-by: cfzjywxk <[email protected]>
Signed-off-by: you06 <[email protected]>
@you06 you06 force-pushed the reset-stale-read-6.5 branch from a408188 to 31ad539 Compare July 28, 2023 10:00
Signed-off-by: you06 <[email protected]>
for i := 0; i < len(selector.replicas) && !state.option.leaderOnly; i++ {
idx := AccessIndex((int(state.lastIdx) + i) % len(selector.replicas))
selectReplica := selector.replicas[idx]
if state.isCandidate(idx, selectReplica) {
if state.isCandidate(idx, selectReplica, leaderExhaust) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leaderExhaust which is the result of state.IsLeaderExhausted should not be used here in the isCandidate function as the leaderExhaust checks isExhausted with 2 and the isCandidate checks only 1. The leaderExhaust is designed to be only used in the L574 if selector.targetIdx < 0 { branch to allow an extra retry on the leader peer if no peers are available.

Copy link
Contributor

@cfzjywxk cfzjywxk Jul 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to figure out the results of the different combinations here and if they all work as expected...
The conditions of combinations include:

  1. A stale read or a follower read.
  2. The local peer matching label is the leader peer or not.
  3. No fallback happens.
  4. After falling back to leader read, if the leader is still unavailable, if the error would be returned to upper layer as expected.

Quite difficult to maintain by now...

@cfzjywxk
Copy link
Contributor

cfzjywxk commented Aug 4, 2023

@you06 Is this to be closed?

@you06
Copy link
Contributor Author

you06 commented Aug 4, 2023

@you06 Is this to be closed?

Yes, with the retry policy of #910 and #916, we do not care about the type of retriable error, and remote stale-read does not make sense, so reset it anyway.

@you06 you06 closed this Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants