Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

puller(ticdc): should split all update kv entries in puller when sink module is in safe mode #11231

Closed
lidezhu opened this issue Jun 3, 2024 · 1 comment · Fixed by #11224
Assignees
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. type/enhancement The issue or PR belongs to an enhancement. type/regression

Comments

@lidezhu
Copy link
Collaborator

lidezhu commented Jun 3, 2024

Previous Problem

In #10919, we addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of UPDATE events received by TiCDC.

Take the following SQL statements as an example:

CREATE TABLE t (a INT PRIMARY KEY, b INT);
INSERT INTO t VALUES (1, 1);
INSERT INTO t VALUES (2, 2);

BEGIN;
UPDATE t SET a = 3 WHERE a = 2;
UPDATE t SET a = 2 WHERE a = 1;
COMMIT;

In this example, the two UPDATE statements within the transaction have a sequential dependency on execution. The primary key a is changed from 2 to 3, and then the primary key a is changed from 1 to 2. After this transaction is executed, the records in the upstream database are (2, 1) and (3, 2).

However, the order of UPDATE events received by TiCDC might differ from the actual execution order of the upstream transaction. For example:

UPDATE t SET a = 2 WHERE a = 1;
UPDATE t SET a = 3 WHERE a = 2;

Previously, TiCDC just splits these UPDATE events into DELETE and INSERT events before sending them to downstream. After the split, the actual execution order of these events in the downstream is as follows:

BEGIN;
DELETE FROM t WHERE a = 1;
REPLACE INTO t VALUES (2, 1);
DELETE FROM t WHERE a = 2;
REPLACE INTO t VALUES (3, 2);
COMMIT;

After the downstream executes the transaction, the records in the database are (3, 2), which are different from the records in the upstream database ((2, 1) and (3, 2)), indicating a data inconsistency issue.

To fix this problem, we need a mechanism to reorder the DELETE and INSERT events after the split.

For example, if we can guarantee all DELETE events are executing before the INSERT events in the same transaction. The data inconsistency problem can be avoided.

 BEGIN;
 DELETE FROM t WHERE a = 1;
 DELETE FROM t WHERE a = 2;
 REPLACE INTO t VALUES (2, 1);
 REPLACE INTO t VALUES (3, 2);
 COMMIT;

After the downstream executes the transaction, the records in the downstream database are the same as those in the upstream database, which are (2, 1) and (3, 2), ensuring data consistency.

Previous Solution

But the goal to reorder the DELETE and INSERT events after the split is not easy to accomplish.

The whole process of TiCDC processing a row of data and sending it to the MySQL compatible downstream is as follows:

  1. TiCDC obtains KV data from TiKV and stores it in the local hard disk and sorts it according to the transaction CommitTS and event type (DELETE > UPDATE > INSERT);
  2. The Sink module pulls KV data from the local hard disk in units of transactions and decodes the KV data into Event data (including the column name, type, data, etc. of each column) according to the Table Schema;
  3. The Sink module translates the Event data into SQL and sends it to the downstream;

Splitting the UPDATE event in the sink module can ensure that only the events that update the primary key or the unique non-null unique key are split. But in sink module the event data has been fully loaded into the memory, so it is impossible to load all its data into the memory for splitting and then sorting in large transaction scenario;

If the UPDATE event is split before TiCDC writes the KV data to the local disk, it is hard to know whether the event updates the primary key or the unique non-null unique key, so we can only choose to split all UPDATE events. However, splitting all UPDATE events will lead to performance degradation in some scenarios (sysbench oltp_update_non_index performance degradation of about 41%).

Therefore, we adopted a compromise solution in #10919: When a new table sink start to write data to downstream, we fetch the current timestamp from pd as replicateTS, and we just split all update KV entries which commitTS is smaller than the replicateTS.

After the fix, when using the MySQL sink, TiCDC does not split the UPDATE event in most cases. Consequently, there might be primary key or unique key conflicts during changefeed runtime, causing the changefeed to restart automatically. After the restart, TiCDC will split the conflicting UPDATE events into DELETE and INSERT events before writing them to the local disk. This ensures that all events within the same transaction are correctly ordered, with all DELETE events preceding INSERT events, thus correctly completing data replication.

Current Issue

In previous solution, we rely on restarting changefeed automatically to fix data conflicting error and prevent data inconsistency problem caused by incorrect order of UPDATE events in the same transaction received by TiCDC.

Although after restart, puller can split the conflict UPDATE events and continue to run correctly, and from test we can see that the restart has no noticeable impact on latency, some users may be unhappy with this behavior if their workload has many conflict data which cause changefeed restart occasionally. So we need a workaround to avoid restart if need.

TiCDC has a config known as safe-mode and is default as false currently.
If safe-mode is true,

  1. All INSERT events are translated to REPLACE sql;
  2. All UPDATE events are split into DELETE and INSERT events in sink module(This behavior removed in *(ticdc): split old update kv entry after restarting changefeed #10919 as described in the previous Problem part);

Because in safe-mode, performance degradation is already expected(REPLACE is much slower than INSERT), so we suggest a solution to split all UPDATE KV entries before write them to disk when safe-mode is true. Users can enable safe-mode to prevent changefeed from restarting due to data conflicts while also prevent data inconsistency problem to happen.

@lidezhu lidezhu self-assigned this Jun 3, 2024
@lidezhu lidezhu added area/ticdc Issues or PRs related to TiCDC. type/enhancement The issue or PR belongs to an enhancement. labels Jun 3, 2024
@lidezhu lidezhu changed the title ticdc: should all split kv entry in puller when sink module is in safe mode puller(ticdc): should all split kv entry in puller when sink module is in safe mode Jun 3, 2024
@lidezhu lidezhu changed the title puller(ticdc): should all split kv entry in puller when sink module is in safe mode puller(ticdc): should split all update kv entry in puller when sink module is in safe mode Jun 3, 2024
@lidezhu lidezhu changed the title puller(ticdc): should split all update kv entry in puller when sink module is in safe mode puller(ticdc): should split all update kv entries in puller when sink module is in safe mode Jun 3, 2024
@lidezhu lidezhu added affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. labels Oct 11, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in f1d2ee6 Oct 12, 2024
@kennedy8312
Copy link

/type regression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. area/ticdc Issues or PRs related to TiCDC. type/enhancement The issue or PR belongs to an enhancement. type/regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants