puller(ticdc): should split all update kv entries in puller when sink module is in safe mode #11231

lidezhu · 2024-06-03T14:02:06Z

Previous Problem

In #10919, we addresses the issue of downstream data inconsistencies caused by the potentially incorrect order of UPDATE events received by TiCDC.

Take the following SQL statements as an example:

CREATE TABLE t (a INT PRIMARY KEY, b INT);
INSERT INTO t VALUES (1, 1);
INSERT INTO t VALUES (2, 2);

BEGIN;
UPDATE t SET a = 3 WHERE a = 2;
UPDATE t SET a = 2 WHERE a = 1;
COMMIT;

In this example, the two UPDATE statements within the transaction have a sequential dependency on execution. The primary key a is changed from 2 to 3, and then the primary key a is changed from 1 to 2. After this transaction is executed, the records in the upstream database are (2, 1) and (3, 2).

However, the order of UPDATE events received by TiCDC might differ from the actual execution order of the upstream transaction. For example:

UPDATE t SET a = 2 WHERE a = 1;
UPDATE t SET a = 3 WHERE a = 2;

Previously, TiCDC just splits these UPDATE events into DELETE and INSERT events before sending them to downstream. After the split, the actual execution order of these events in the downstream is as follows:

BEGIN;
DELETE FROM t WHERE a = 1;
REPLACE INTO t VALUES (2, 1);
DELETE FROM t WHERE a = 2;
REPLACE INTO t VALUES (3, 2);
COMMIT;

After the downstream executes the transaction, the records in the database are (3, 2), which are different from the records in the upstream database ((2, 1) and (3, 2)), indicating a data inconsistency issue.

To fix this problem, we need a mechanism to reorder the DELETE and INSERT events after the split.

For example, if we can guarantee all DELETE events are executing before the INSERT events in the same transaction. The data inconsistency problem can be avoided.

 BEGIN;
 DELETE FROM t WHERE a = 1;
 DELETE FROM t WHERE a = 2;
 REPLACE INTO t VALUES (2, 1);
 REPLACE INTO t VALUES (3, 2);
 COMMIT;

After the downstream executes the transaction, the records in the downstream database are the same as those in the upstream database, which are (2, 1) and (3, 2), ensuring data consistency.

Previous Solution

But the goal to reorder the DELETE and INSERT events after the split is not easy to accomplish.

The whole process of TiCDC processing a row of data and sending it to the MySQL compatible downstream is as follows:

TiCDC obtains KV data from TiKV and stores it in the local hard disk and sorts it according to the transaction CommitTS and event type (DELETE > UPDATE > INSERT);
The Sink module pulls KV data from the local hard disk in units of transactions and decodes the KV data into Event data (including the column name, type, data, etc. of each column) according to the Table Schema;
The Sink module translates the Event data into SQL and sends it to the downstream;

Splitting the UPDATE event in the sink module can ensure that only the events that update the primary key or the unique non-null unique key are split. But in sink module the event data has been fully loaded into the memory, so it is impossible to load all its data into the memory for splitting and then sorting in large transaction scenario;

If the UPDATE event is split before TiCDC writes the KV data to the local disk, it is hard to know whether the event updates the primary key or the unique non-null unique key, so we can only choose to split all UPDATE events. However, splitting all UPDATE events will lead to performance degradation in some scenarios (sysbench oltp_update_non_index performance degradation of about 41%).

Therefore, we adopted a compromise solution in #10919: When a new table sink start to write data to downstream, we fetch the current timestamp from pd as replicateTS, and we just split all update KV entries which commitTS is smaller than the replicateTS.

After the fix, when using the MySQL sink, TiCDC does not split the UPDATE event in most cases. Consequently, there might be primary key or unique key conflicts during changefeed runtime, causing the changefeed to restart automatically. After the restart, TiCDC will split the conflicting UPDATE events into DELETE and INSERT events before writing them to the local disk. This ensures that all events within the same transaction are correctly ordered, with all DELETE events preceding INSERT events, thus correctly completing data replication.

Current Issue

In previous solution, we rely on restarting changefeed automatically to fix data conflicting error and prevent data inconsistency problem caused by incorrect order of UPDATE events in the same transaction received by TiCDC.

Although after restart, puller can split the conflict UPDATE events and continue to run correctly, and from test we can see that the restart has no noticeable impact on latency, some users may be unhappy with this behavior if their workload has many conflict data which cause changefeed restart occasionally. So we need a workaround to avoid restart if need.

TiCDC has a config known as safe-mode and is default as false currently.
If safe-mode is true,

All INSERT events are translated to REPLACE sql;
All UPDATE events are split into DELETE and INSERT events in sink module(This behavior removed in *(ticdc): split old update kv entry after restarting changefeed #10919 as described in the previous Problem part);

Because in safe-mode, performance degradation is already expected(REPLACE is much slower than INSERT), so we suggest a solution to split all UPDATE KV entries before write them to disk when safe-mode is true. Users can enable safe-mode to prevent changefeed from restarting due to data conflicts while also prevent data inconsistency problem to happen.

The text was updated successfully, but these errors were encountered:

…cap#11224) close pingcap#11231

…) (#11657) close #11231

…) (#11654) close #11231

…) (#11655) close #11231

…) (#11656) close #11231

kennedy8312 · 2024-11-05T07:44:52Z

/type regression

lidezhu self-assigned this Jun 3, 2024

lidezhu added area/ticdc Issues or PRs related to TiCDC. type/enhancement The issue or PR belongs to an enhancement. labels Jun 3, 2024

lidezhu changed the title ~~ticdc: should all split kv entry in puller when sink module is in safe mode~~ puller(ticdc): should all split kv entry in puller when sink module is in safe mode Jun 3, 2024

lidezhu mentioned this issue Jun 3, 2024

puller(ticdc): always split update kv entries in sink safe mode #11224

Merged

lidezhu changed the title ~~puller(ticdc): should all split kv entry in puller when sink module is in safe mode~~ puller(ticdc): should split all update kv entry in puller when sink module is in safe mode Jun 3, 2024

lidezhu changed the title ~~puller(ticdc): should split all update kv entry in puller when sink module is in safe mode~~ puller(ticdc): should split all update kv entries in puller when sink module is in safe mode Jun 3, 2024

lidezhu added affects-6.5 This bug affects the 6.5.x(LTS) versions. affects-7.1 This bug affects the 7.1.x(LTS) versions. affects-7.5 This bug affects the 7.5.x(LTS) versions. affects-8.1 This bug affects the 8.1.x(LTS) versions. labels Oct 11, 2024

ti-chi-bot bot closed this as completed in #11224 Oct 12, 2024

ti-chi-bot bot closed this as completed in f1d2ee6 Oct 12, 2024

This was referenced Oct 12, 2024

puller(ticdc): always split update kv entries in sink safe mode (#11224) #11654

Merged

puller(ticdc): always split update kv entries in sink safe mode (#11224) #11655

Merged

puller(ticdc): always split update kv entries in sink safe mode (#11224) #11656

Merged

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this issue Oct 12, 2024

puller(ticdc): always split update kv entries in sink safe mode (ping…

daf3eb4

…cap#11224) close pingcap#11231

ti-chi-bot mentioned this issue Oct 12, 2024

puller(ticdc): always split update kv entries in sink safe mode (#11224) #11657

Merged

ti-chi-bot bot pushed a commit that referenced this issue Oct 12, 2024

puller(ticdc): always split update kv entries in sink safe mode (#11224…

c6bfe85

…) (#11657) close #11231

ti-chi-bot bot pushed a commit that referenced this issue Oct 12, 2024

puller(ticdc): always split update kv entries in sink safe mode (#11224…

19fb23f

…) (#11654) close #11231

ti-chi-bot bot pushed a commit that referenced this issue Oct 12, 2024

puller(ticdc): always split update kv entries in sink safe mode (#11224…

8958811

…) (#11655) close #11231

ti-chi-bot bot pushed a commit that referenced this issue Oct 15, 2024

puller(ticdc): always split update kv entries in sink safe mode (#11224…

1722e6f

…) (#11656) close #11231

This was referenced Oct 22, 2024

*(ticdc): add integration test for safe mode #11678

Merged

ticdc: add a integration test for safe mode #11679

Closed

ti-chi-bot bot added the type/regression label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

puller(ticdc): should split all update kv entries in puller when sink module is in safe mode #11231

puller(ticdc): should split all update kv entries in puller when sink module is in safe mode #11231

lidezhu commented Jun 3, 2024 •

edited

Loading

kennedy8312 commented Nov 5, 2024

puller(ticdc): should split all update kv entries in puller when sink module is in safe mode #11231

puller(ticdc): should split all update kv entries in puller when sink module is in safe mode #11231

Comments

lidezhu commented Jun 3, 2024 • edited Loading

Previous Problem

Previous Solution

Current Issue

kennedy8312 commented Nov 5, 2024

lidezhu commented Jun 3, 2024 •

edited

Loading