Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

limit the maximum number of cached txns in mysql worker #10896

Closed
CharlesCheung96 opened this issue Apr 10, 2024 · 0 comments · Fixed by #10892
Closed

limit the maximum number of cached txns in mysql worker #10896

CharlesCheung96 opened this issue Apr 10, 2024 · 0 comments · Fixed by #10892
Assignees
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. component/sink Sink component. severity/moderate type/enhancement The issue or PR belongs to an enhancement.

Comments

@CharlesCheung96
Copy link
Contributor

CharlesCheung96 commented Apr 10, 2024

Is your feature request related to a problem?

ref sink to mysql (cdc) workload skew issue

PR #10376 tries to fix the skew problem by sending a transaction to a random worker after the depended transactions are executed. For conflicting transactions, only one transaction can be executed among all workers at a time, which can also be called serial execution or one by one. During synchronous real-time streaming, conflicting transactions are executed serially in the upstream cluster, so it is a reasonable choice for TiCDC to execute these transactions serially.

Txn N(row1)... ------> Txn C(row1)------> Txn B(row1)------> Txn A(row1) 
                                                                    |
                                                                    |-----> worker 1

However, for other common scenarios, this approach can be problematic:

  1. Incremental Scan: during the synchronization of historical data, if conflicting transactions are consistently executed in the upstream cluster. Then the MySQL sink needs at least twice the serial throughput to catch up with the latency. However, the serial execution never satisfies this condition.
Incremental Scan: Txn N(row1)... ------> Txn C(row1)------> Txn B(row1)------> Txn A(row1) 
                                                                                       |
                                                                                       |-----> worker 1
                                                                                       | 
Real-time streaming: Txn N(row1)... ------> Txn C(row1)------> Txn B(row1)------> Txn A(row1) 
  1. Cross regional replication: in this scenario, the throughput of a single MySQL worker is limited by network latency, so the throughput of executing transactions one by one may be much smaller than that of the upstream cluster.

New Proposal

It is better to use a compromise optimization that replaces one by one with batch by batch:

  1. The batch mechanism can effectively improve the throughput of a single worker, so it is necessary to preserve the fast dependencies resolving optimization in conflict detector.

  2. At the same time, to avoid workload skew problems, we could limit the maximum number of cached txns in single worker. When the limit is exceeded, the conflict detector should wait for all transactions cached in the worker to complete before sending a new event to it.

@CharlesCheung96 CharlesCheung96 added type/feature Issues about a new feature component/sink Sink component. type/enhancement The issue or PR belongs to an enhancement. area/ticdc Issues or PRs related to TiCDC. and removed type/feature Issues about a new feature labels Apr 10, 2024
@CharlesCheung96 CharlesCheung96 added affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. labels Apr 10, 2024
@CharlesCheung96 CharlesCheung96 added type/bug The issue is confirmed as a bug. and removed type/enhancement The issue or PR belongs to an enhancement. labels Apr 18, 2024
@CharlesCheung96 CharlesCheung96 added affects-8.1 This bug affects the 8.1.x(LTS) versions. affects-6.5 This bug affects the 6.5.x(LTS) versions. and removed affects-6.5 This bug affects the 6.5.x(LTS) versions. labels Apr 24, 2024
@flowbehappy flowbehappy added type/enhancement The issue or PR belongs to an enhancement. and removed type/bug The issue is confirmed as a bug. labels Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. component/sink Sink component. severity/moderate type/enhancement The issue or PR belongs to an enhancement.
Projects
Development

Successfully merging a pull request may close this issue.

2 participants