Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

autoid client reset connection operation is not concurrency-safe #50519

Closed
tiancaiamao opened this issue Jan 17, 2024 · 2 comments · Fixed by #50522
Closed

autoid client reset connection operation is not concurrency-safe #50519

tiancaiamao opened this issue Jan 17, 2024 · 2 comments · Fixed by #50522
Assignees
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-7.6 report/community The community has encountered this bug. report/customer Customers have encountered this bug. severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@tiancaiamao
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Original found in a customer's environment, the log message shows:

2024-01-12 20:59:17 [2024/01/12 12:59:17.591 +00:00] [INFO] [autoid_service.go:173] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"]
2024-01-12 20:59:17 [2024/01/12 12:59:17.618 +00:00] [INFO] [autoid_service.go:173] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"]
2024-01-12 20:59:17 [2024/01/12 12:59:17.791 +00:00] [INFO] [autoid_service.go:173] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"]

I can reproduce with the following test:

func TestXXX(t *testing.T) {
	etcdLogCfg := zap.NewProductionConfig()
	etcdLogCfg.Level = zap.NewAtomicLevelAt(zap.ErrorLevel)
	etcdCli, err := clientv3.New(clientv3.Config{
		LogConfig:        &etcdLogCfg,
		Endpoints:        []string{"127.0.0.1:2379"},
		AutoSyncInterval: 30 * time.Second,
		DialTimeout:      5 * time.Second,
		DialOptions: []grpc.DialOption{
		},
	})
	if err != nil {
		panic(err)
	}

	var qps uint64
	clidisc := autoid.NewClientDiscover(etcdCli)
	for i:=0; i<7000; i++ {
		go func() {
			sp := autoid.NewSinglePointAlloc(&requirement{nil, clidisc}, 47, 32, false)
			for {
				_, _, err := sp.Alloc(context.Background(), 1, 1, 1)
				if err != nil {
					panic(err)
				}
				atomic.AddUint64(&qps, 1)
			}
		}()
	}
	for {
		time.Sleep(time.Second)
		fmt.Println("==== qps ====", atomic.LoadUint64(&qps))
		atomic.StoreUint64(&qps, 0)
	}
}

With a modified tidb, mock error for 1/1000 percent of the requests:

git diff pkg/autoid_service/autoid.go

@@ -409,6 +410,11 @@ func (s *Service) AllocAutoID(ctx context.Context, req *autoid.AutoIDRequest) (*
                logutil.BgLogger().Info("Current service is not request keyspace leader.", zap.Uint32("req-keyspace-id", req.KeyspaceID), zap.Uint32("service-keyspace-id", serviceKeyspaceID))
                return nil, errors.Trace(errors.New("not leader"))
        }
+
+       if rand.Intn(1000) == 1 {
+               return nil, errors.New("mock error!!")
+       }
+

2. What did you expect to see? (Required)

QPS is stable.

3. What did you see instead (Required)

A lot of error log:

[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error: code = Canceled desc = grpc: the client connection is closing"] [version=0]
[2024/01/17 20:58:37.029 +08:00] [INFO] [autoid_service.go:202] ["reset grpc connection"] [category="autoid client"] [reason="rpc error:

QPS is unstable:

==== qps ==== 4557
==== qps ==== 62
==== qps ==== 1008
==== qps ==== 0
==== qps ==== 0
==== qps ==== 0
==== qps ==== 507
==== qps ==== 0
==== qps ==== 0
==== qps ==== 0
==== qps ==== 0
==== qps ==== 0
==== qps ==== 99
==== qps ==== 1800
==== qps ==== 3992
==== qps ==== 6757
==== qps ==== 9484
==== qps ==== 9091
==== qps ==== 10187
==== qps ==== 81
==== qps ==== 0
==== qps ==== 0
==== qps ==== 0
==== qps ==== 236
==== qps ==== 112
==== qps ==== 2441
==== qps ==== 168

4. What is your TiDB version? (Required)

master 245951f

@tiancaiamao tiancaiamao added the type/bug The issue is confirmed as a bug. label Jan 17, 2024
@tiancaiamao tiancaiamao self-assigned this Jan 17, 2024
@tiancaiamao tiancaiamao added affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-7.6 and removed may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 may-affects-7.5 labels Jan 18, 2024
@jebter jebter added the sig/sql-infra SIG: SQL Infra label Jan 18, 2024
@seiya-annie
Copy link

/found customer

@ti-chi-bot ti-chi-bot bot added the report/customer Customers have encountered this bug. label Jun 4, 2024
@seiya-annie
Copy link

/found community

@ti-chi-bot ti-chi-bot bot added the report/community The community has encountered this bug. label Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-7.6 report/community The community has encountered this bug. report/customer Customers have encountered this bug. severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants