Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activate Follower-Handle #7431

Closed
3 tasks done
CabinfeverB opened this issue Nov 24, 2023 · 2 comments
Closed
3 tasks done

Activate Follower-Handle #7431

CabinfeverB opened this issue Nov 24, 2023 · 2 comments
Labels
report/customer Customers have encountered this bug. type/development The issue belongs to a development tasks

Comments

@CabinfeverB
Copy link
Member

CabinfeverB commented Nov 24, 2023

Development Task

PD serves as the central node and provides services for the cluster. In the past, PD leaders have been carrying all the services and follower resources have not been utilized.

  • Region API

Summary

An important function of PD is to provide routing information for data replicas. For example, GetRegion is a hot path in TiDB cluster. The pd provides the sync region mechanism. The pd leader synchronizes the region with the changed meta information to a follower. So we can handle the region requeset in PD follower server.

Additional knowledge

Sync region

The pd provides the sync region mechanism. The pd leader synchronizes the region with the changed meta information to a follower.
And the followers hold basicCluster which has region tree.

Follower forward mechanism

When enable-forwarding is enabled (the default value in TiDB is false), the pd periodically checks the connectivity and health status of the pd leader. If false, pd client will try to send request with pd-forwarded-hostto followers and followers will forward the request to the leader.

Region Cache in TiKV client-go

The region cache deletes a region item in either of the following situations:

  1. GC triggered by TTL
  2. When reloading region, insert new region item and delete old region
  3. client-go will do ScanRegions for all regions caused by GC. It will ignore the region cache and request to PD.
    Therefore, if a region is unavailable in client-Go, the client does not delete the old region first and then find a new region

design

There may be a delay in synchronizing region data to followers. As a result, the followers data is not up-to-date. To reduce potential retries on the associated path, callers need to select followers or leaders according to the usage scenario.
Specifically, if the caller wants to obtain a new region first time, it can do so from the followers (we can assume that most of the region information in the followers is up to date). If the caller needs to update the region information, we think that this time must get the latest, then this time from the leader.
According to above additional knowledge, this transformation is very easy to do in client-go.

implement

Client-go Impl
  1. In findRegionByKey, if region is nil, the region cache can get region with follower enabled optin. Otherwise, no option.
  1. We should check epoch in region cache.
PD Client Implement
  1. Only do BuildForwardContext, if we need to forward request to the leader. client: avoid to add redundant grpc metadata #7471
  2. For the GetRegion interface, add FollowerHandleOption in parameters. client: add follower option #7465
  3. Maintain follower health status in the client. If gRPC calling returns error, the error of response is not nil, the region of response is nil, remove this follower from candidate servers. And pd client should get region from the leader rather than return nil and error. client: Introduce ServiceClient #7489 syncer: add region syncer client status #7461
  4. If FollowerHandle is true, pd client sends the requests to the Leader or followers one by one. Add follower-handle in gRPC metadata. client: Introduce ServiceClient #7489 vars: add pd_enable_follower_handle_region to support get region from pd follower pingcap/tidb#49231
Server Implement
  1. If PD server is follower, check whether the gRPC context holds ForwardMetadataKey. If it holds, send it to the leader as before. If not, check the active-follower. If with it, check the status of Sync client first. If the stream is broken, an error is returned. Otherwise, try to get the region from the region tree and return . *: follower support to handle GetRegion and other region api #7432
metrics

#7619

@CabinfeverB CabinfeverB added the type/development The issue belongs to a development tasks label Nov 24, 2023
ti-chi-bot bot added a commit that referenced this issue Nov 29, 2023
ref #7431

Signed-off-by: Cabinfever_B <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot added a commit that referenced this issue Nov 30, 2023
ref #7431

Signed-off-by: Cabinfever_B <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot added a commit that referenced this issue Dec 21, 2023
ref #7431, ref #7576

Signed-off-by: Cabinfever_B <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot bot added a commit that referenced this issue Dec 29, 2023
ref #7431

Signed-off-by: Cabinfever_B <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
pingandb pushed a commit to pingandb/pd that referenced this issue Jan 18, 2024
…7432)

ref tikv#7431

Signed-off-by: Cabinfever_B <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: pingandb <[email protected]>
@CabinfeverB
Copy link
Member Author

Since we have no plans to support more APIs at this time, I close this issue

@seiya-annie
Copy link

/found customer

@ti-chi-bot ti-chi-bot bot added the report/customer Customers have encountered this bug. label Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
report/customer Customers have encountered this bug. type/development The issue belongs to a development tasks
Projects
None yet
Development

No branches or pull requests

2 participants