Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pd-tso-bench: client updateMember can't recover after deleting all PD/API pods w/o graceful period. #6681

Closed
binshi-bing opened this issue Jun 26, 2023 · 4 comments · Fixed by #6699
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@binshi-bing
Copy link
Contributor

binshi-bing commented Jun 26, 2023

Enhancement Task

What did I do?

In dev, run:
./pd-tso-bench -v -duration 250000s -pd "http://serverless-cluster-pd-0.serverless-cluster-pd-peer.tidb-serverless.svc:2379" -client 1 -c 1 -interval 10s

Kill all PD/API pods at 11:59:18 PDT
~  kubectl delete pod serverless-cluster-pd-0 serverless-cluster-pd-1 serverless-cluster-pd-2 -n tidb-serverless --force --grace-period=0
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "serverless-cluster-pd-0" force deleted
pod "serverless-cluster-pd-1" force deleted
pod "serverless-cluster-pd-2" force deleted
~  date  ✔  10376  11:59:17
Mon Jun 26 11:59:18 PDT 2023

pod 0 started at 11:59:25 PDT and ready to serve at 12:00:03 PDT
starting pd-server ...
/pd-server --data-dir=/var/lib/pd --name=serverless-cluster-pd-0 --peer-urls=http://0.0.0.0:2380 --advertise-peer-urls=http://serverless-cluster-pd-0.serverless-cluster-pd-peer.tidb-serverless.svc:2380 --client-urls=http://0.0.0.0:2379 --advertise-client-urls=http://serverless-cluster-pd-0.serverless-cluster-pd-peer.tidb-serverless.svc:2379 --config=/etc/pd/pd.toml --join=http://serverless-cluster-pd-0.serverless-cluster-pd-peer.tidb-serverless.svc:2380,http://serverless-cluster-pd-1.serverless-cluster-pd-peer.tidb-serverless.svc:2380,http://serverless-cluster-pd-2.serverless-cluster-pd-peer.tidb-serverless.svc:2380
[2023/06/26 18:59:25.773 +00:00] [INFO] [versioninfo.go:89] ["Welcome to Placement Driver (API SERVICE)"]
...
[2023/06/26 19:00:03.123 +00:00] [INFO] [manager.go:74] ["Key visual service is started"]

PD client updateMember can't recover.
Check log here https://gist.githubusercontent.com/binshi-bing/d669ed80e48073f4923c51b29ce95642/raw/7b339f6c319333453e9e17dc136393f4a551a5ec/gistfile1.txt

@binshi-bing binshi-bing added the type/enhancement The issue or PR belongs to an enhancement. label Jun 26, 2023
@lhy1024
Copy link
Contributor

lhy1024 commented Jun 27, 2023

maybe we need to add grpc keepalive params in pd-tso-bench

@lhy1024
Copy link
Contributor

lhy1024 commented Jun 27, 2023

It seems the bug is from grpc grpc/grpc-go#4785

when the api server is restarted, the channel connectivity go into TRANSIENT_FAILURE

@rleungx
Copy link
Member

rleungx commented Jun 28, 2023

It seems the bug is from grpc grpc/grpc-go#4785

when the api server is restarted, the channel connectivity go into TRANSIENT_FAILURE

Does it happen on the client side or the server side?

@lhy1024
Copy link
Contributor

lhy1024 commented Jun 28, 2023

It seems the bug is from grpc grpc/grpc-go#4785
when the api server is restarted, the channel connectivity go into TRANSIENT_FAILURE

Does it happen on the client side or the server side?

It is an erroneous guess that the client used a higher version of grpc. We only need to add keepalive params.

ti-chi-bot bot added a commit that referenced this issue Jun 28, 2023
close #6681

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@binshi-bing binshi-bing changed the title pd client: updateMember can't recover after deleting all PD/API pods w/o graceful period. pd-tso-bench: client updateMember can't recover after deleting all PD/API pods w/o graceful period. Jun 29, 2023
rleungx pushed a commit to rleungx/pd that referenced this issue Aug 2, 2023
close tikv#6681

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
rleungx pushed a commit to rleungx/pd that referenced this issue Aug 2, 2023
close tikv#6681

Signed-off-by: lhy1024 <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants