Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce unnecessary tikvServerBusy backoff when able to try next replica #1184

Merged
merged 21 commits into from
Mar 4, 2024

Conversation

crazycs520
Copy link
Contributor

@crazycs520 crazycs520 commented Feb 27, 2024

close #1166

PR 923 introduce the skip tikvServerBusy backoff logic, but it it only work for stale-read, and if serverIsBusy.EstimatedWaitMs > 0 is true, it can't skip.

This PR expanded the scope of skips tikvServerBusy backoff logic, make non-leader read requests can skip tikvServerBusy backoff, and if serverIsBusy.EstimatedWaitMs > 0 is true, non-leader read requests can skip too.

Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
Signed-off-by: crazycs520 <[email protected]>
Copy link
Contributor

@you06 you06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// the state is changed in accessFollower.next when leader is unavailable.
func (s *replicaSelector) canFallback2Follower() bool {
// canSkipServerIsBusyBackoff returns true if the request can be sent to next replica and can skip ServerIsBusy backoff.
func (s *replicaSelector) canSkipServerIsBusyBackoff() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can reuse this function handling region unavailable and changing it's name in the future. Not this PR's work.

Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'm thinking perhaps the better structure is to determine whether to backoff just before the next attempt, instead of just after encountering error, so that what @zyguan tried to do in #990 becomes a unified way.
Btw I just noticed that it seems that this PR and #990 looks trying to solve the same problem 🤔 cc @zyguan

@crazycs520
Copy link
Contributor Author

/hold

@crazycs520
Copy link
Contributor Author

LGTM, but I'm thinking perhaps the better structure is to determine whether to backoff just before the next attempt, instead of just after encountering error, so that what @zyguan tried to do in #990 becomes a unified way. Btw I just noticed that it seems that this PR and #990 looks trying to solve the same problem 🤔 cc @zyguan

Nice catch. #990 introduces pending backoff for fast retry, which is great idea. After discussing with @zyguan, I introduce the pending backoff idea into this PR and unified the fast retry logic. And later, we can add more backoff to pending to do fast retry.

Signed-off-by: crazycs520 <[email protected]>
internal/locate/region_request.go Outdated Show resolved Hide resolved
internal/locate/region_request.go Show resolved Hide resolved
Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

internal/locate/region_request.go Outdated Show resolved Hide resolved
@crazycs520
Copy link
Contributor Author

/unhold

@MyonKeminta MyonKeminta removed the hold label Mar 1, 2024
internal/locate/region_cache.go Outdated Show resolved Hide resolved
internal/locate/region_request.go Show resolved Hide resolved
Signed-off-by: crazycs520 <[email protected]>

// canFastRetry returns true if the request can be sent to next replica.
func (s *replicaSelector) canFastRetry() bool {
accessLeader, ok := s.state.(*accessKnownLeader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just do fast retry when tikv_client_read_timeout is configured? It looks a little bit dangerous to use

if not fast retry
   return false
return true

as most of the request would be fast retried .

Besides, maybe we could consult slow information to decide whether fast retry should work. /cc @zyguan @MyonKeminta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the original logic of canFallback2Follower is trying to do fast backoff when retry next replica, so fast backoff is not only work for tikv_client_read_timeout is configured.
cc @you06

Copy link
Contributor

@cfzjywxk cfzjywxk Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the original logic of canFallback2Follower is trying to do fast backoff when retry next replica, so fast backoff is not only work for tikv_client_read_timeout is configured. cc @you06

@crazycs520 It the current logic the same? If so we could continue first.

Copy link
Contributor Author

@crazycs520 crazycs520 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic is not same with before, this PR is for the fix issue #1166, and keep some optimization from #990 and #923

@cfzjywxk cfzjywxk merged commit 50c4085 into tikv:master Mar 4, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce unnecessary tikvServerBusy backoff
5 participants