-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ticdc lag reached more than 10min and ticdc crash when inject pdleader data io hang #9054
Comments
/remove-area dm |
/severity major |
#9106 alleviate this problem, in testing, it was found that after merging this PR, there is only a 50% chance of encountering the cdc stuck issue. |
/assign @fubinzh |
What is the root cause of the lag since TiCDC doesn't depends on PD leader's IO, TiCDC just asynchronously push the checkpoint to PD. cc @nongfushanquan |
TiCDC nodes have to keep connection with the PD , but in this scenario , PD can't read the leader's information from etcd , which may be caused by the following issue |
It's a etcd issue, we can't fix now. |
/remove-label affects-7.5 |
@nongfushanquan: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
If PD is able to upgrade its etcd version to 3.4.31, it's likely that the issue will be resolved. |
What did you do?
1、run tpcc with threads 10 and warehouse 1000
2、After 10 minutes, simulates the io of pd leader is hang but the pod is still active
fault start time:2023-05-15 12:05:38
3、After 10 minutes, recovery the fault
fault recover time:2023-05-15 12:15:38
What did you expect to see?
changefeed lag is less than 30s
What did you see instead?
1 、changefeed lag reached more than 10min after inject fault
2、 ticdc crash
Versions of the cluster
git hash:1335f98cdbaf77239bbcbc6b61561e4254449ffe
current status of DM cluster (execute
query-status <task-name>
in dmctl)No response
The text was updated successfully, but these errors were encountered: