-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pd-tso-bench: client updateMember can't recover after deleting all PD/API pods w/o graceful period. #6681
Comments
maybe we need to add grpc keepalive params in pd-tso-bench |
It seems the bug is from grpc grpc/grpc-go#4785 when the api server is restarted, the channel connectivity go into TRANSIENT_FAILURE |
Does it happen on the client side or the server side? |
It is an erroneous guess that the client used a higher version of grpc. We only need to add keepalive params. |
close #6681 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#6681 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#6681 Signed-off-by: lhy1024 <[email protected]> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Enhancement Task
What did I do?
In dev, run:
./pd-tso-bench -v -duration 250000s -pd "http://serverless-cluster-pd-0.serverless-cluster-pd-peer.tidb-serverless.svc:2379" -client 1 -c 1 -interval 10s
Kill all PD/API pods at 11:59:18 PDT
~ kubectl delete pod serverless-cluster-pd-0 serverless-cluster-pd-1 serverless-cluster-pd-2 -n tidb-serverless --force --grace-period=0
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "serverless-cluster-pd-0" force deleted
pod "serverless-cluster-pd-1" force deleted
pod "serverless-cluster-pd-2" force deleted
~ date ✔ 10376 11:59:17
Mon Jun 26 11:59:18 PDT 2023
pod 0 started at 11:59:25 PDT and ready to serve at 12:00:03 PDT
starting pd-server ...
/pd-server --data-dir=/var/lib/pd --name=serverless-cluster-pd-0 --peer-urls=http://0.0.0.0:2380 --advertise-peer-urls=http://serverless-cluster-pd-0.serverless-cluster-pd-peer.tidb-serverless.svc:2380 --client-urls=http://0.0.0.0:2379 --advertise-client-urls=http://serverless-cluster-pd-0.serverless-cluster-pd-peer.tidb-serverless.svc:2379 --config=/etc/pd/pd.toml --join=http://serverless-cluster-pd-0.serverless-cluster-pd-peer.tidb-serverless.svc:2380,http://serverless-cluster-pd-1.serverless-cluster-pd-peer.tidb-serverless.svc:2380,http://serverless-cluster-pd-2.serverless-cluster-pd-peer.tidb-serverless.svc:2380
[2023/06/26 18:59:25.773 +00:00] [INFO] [versioninfo.go:89] ["Welcome to Placement Driver (API SERVICE)"]
...
[2023/06/26 19:00:03.123 +00:00] [INFO] [manager.go:74] ["Key visual service is started"]
PD client updateMember can't recover.
Check log here https://gist.githubusercontent.com/binshi-bing/d669ed80e48073f4923c51b29ce95642/raw/7b339f6c319333453e9e17dc136393f4a551a5ec/gistfile1.txt
The text was updated successfully, but these errors were encountered: