Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

grpc: Limit the qps for all GRPC interfaces. #7466

Closed
8 of 20 tasks
AndreMouche opened this issue Nov 28, 2023 · 3 comments
Closed
8 of 20 tasks

grpc: Limit the qps for all GRPC interfaces. #7466

AndreMouche opened this issue Nov 28, 2023 · 3 comments
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@AndreMouche
Copy link
Member

AndreMouche commented Nov 28, 2023

Enhancement Task

Currently, we have implemented rate limiting on some interfaces of grpc:

  • Get Members

    pd/server/grpc_service.go

    Lines 416 to 417 in a6e855e

    func (s *GrpcServer) GetMembers(context.Context, *pdpb.GetMembersRequest) (*pdpb.GetMembersResponse, error) {
    if s.GetServiceMiddlewarePersistOptions().IsGRPCRateLimitEnabled() {
  • GetStore

    pd/server/grpc_service.go

    Lines 646 to 648 in a6e855e

    // GetStore implements gRPC PDServer.
    func (s *GrpcServer) GetStore(ctx context.Context, request *pdpb.GetStoreRequest) (*pdpb.GetStoreResponse, error) {
    if s.GetServiceMiddlewarePersistOptions().IsGRPCRateLimitEnabled() {
  • GetAllStores

    pd/server/grpc_service.go

    Lines 749 to 751 in a6e855e

    // GetAllStores implements gRPC PDServer.
    func (s *GrpcServer) GetAllStores(ctx context.Context, request *pdpb.GetAllStoresRequest) (*pdpb.GetAllStoresResponse, error) {
    if s.GetServiceMiddlewarePersistOptions().IsGRPCRateLimitEnabled() {
  • StoreHeartbeat

    pd/server/grpc_service.go

    Lines 794 to 796 in a6e855e

    // StoreHeartbeat implements gRPC PDServer.
    func (s *GrpcServer) StoreHeartbeat(ctx context.Context, request *pdpb.StoreHeartbeatRequest) (*pdpb.StoreHeartbeatResponse, error) {
    if s.GetServiceMiddlewarePersistOptions().IsGRPCRateLimitEnabled() {
  • GetRegion
    https://github.com/tikv/pd/blob/a6e855eef6744adfac232833769219be4f806756/server/grpc_service.go#L1265C4-L1267
  • GetPrevRegion

    pd/server/grpc_service.go

    Lines 1309 to 1311 in a6e855e

    // GetPrevRegion implements gRPC PDServer
    func (s *GrpcServer) GetPrevRegion(ctx context.Context, request *pdpb.GetRegionRequest) (*pdpb.GetRegionResponse, error) {
    if s.GetServiceMiddlewarePersistOptions().IsGRPCRateLimitEnabled() {
  • GetRegionByID

    pd/server/grpc_service.go

    Lines 1354 to 1356 in a6e855e

    // GetRegionByID implements gRPC PDServer.
    func (s *GrpcServer) GetRegionByID(ctx context.Context, request *pdpb.GetRegionByIDRequest) (*pdpb.GetRegionResponse, error) {
    if s.GetServiceMiddlewarePersistOptions().IsGRPCRateLimitEnabled() {
  • ScanRegions

    pd/server/grpc_service.go

    Lines 1398 to 1400 in a6e855e

    // ScanRegions implements gRPC PDServer.
    func (s *GrpcServer) ScanRegions(ctx context.Context, request *pdpb.ScanRegionsRequest) (*pdpb.ScanRegionsResponse, error) {
    if s.GetServiceMiddlewarePersistOptions().IsGRPCRateLimitEnabled() {

While for other apis, we do not support rate limit:

  • AskSplit
  • AskBatchSplit
  • ReportSplit
  • ReportBatchSplit
  • GetClusterConfig
  • PutClusterConfig
  • ScatterRegion
  • GetGCSafePoint
  • SyncRegions
  • UpdateGCSafePoint
  • UpdateServiceGCSafePoint
  • GetOperator

These APIs that are not subject to flow control are also vulnerable to attacks, which can easily overload the PD.
for example , if I run the following SQL on the TiDB:

CREATE TABLE employees (
    id int unsigned NOT NULL,
    fname varchar(30),
    lname varchar(30),
    hired date NOT NULL DEFAULT '1970-01-01',
    separated date DEFAULT '9999-12-31',
    job_code int,
    store_id int NOT NULL
) PARTITION BY RANGE (id)
INTERVAL (100) FIRST PARTITION LESS THAN (1000) LAST PARTITION LESS THAN (100000) MAXVALUE PARTITION;

mysql> split partition table t  between (0) and (100000000) regions 1000;
split partition table t  between (0) and (10000000) regions 1000;

then the qps of AskBatchSplit will super high, and make the leader PD's memory and CPU explode instantly.
image

there are at least two things we can do:

  1. add ratelimit for this API
  2. limit the batch size for batch-split
@AndreMouche AndreMouche added the type/enhancement The issue or PR belongs to an enhancement. label Nov 28, 2023
@CabinfeverB
Copy link
Member

The the batch size has limit by TiDB which is 1000.
Can u post some metrics like CPU/Memory/Goroutin? @AndreMouche

@rleungx
Copy link
Member

rleungx commented Sep 18, 2024

It has already been supported, close it.

@rleungx rleungx closed this as completed Sep 18, 2024
@rleungx
Copy link
Member

rleungx commented Nov 7, 2024

tracked by #5739

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants