-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*(ticdc): split old update kv entry after restarting changefeed #10919
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files
Flags with carried forward coverage won't be shown. Click here to find out more. @@ Coverage Diff @@
## master #10919 +/- ##
================================================
- Coverage 57.4327% 57.4095% -0.0233%
================================================
Files 851 851
Lines 125230 125467 +237
================================================
+ Hits 71923 72030 +107
- Misses 47914 48021 +107
- Partials 5393 5416 +23 |
/retest |
3 similar comments
/retest |
/retest |
/retest |
9892dcb
to
ea9216f
Compare
/retest |
ea9216f
to
53928fa
Compare
/review default |
119a56c
to
e06e9c0
Compare
/retest-required |
e06e9c0
to
97009e7
Compare
/run-cherry-picker |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
In response to a cherrypick label: new pull request created to branch |
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
What problem does this PR solve?
Issue Number: close #10918
What is changed and how it works?
When the downstream is mysql-compatiable, we change the logic to handle update events at restart
thresholdTs
later);thresholdTs
, it split it into a delete event and a replace event, and then send them downstream;thresholdTs
, it split into a delete kv entry and a insert kv entry, and then write them into sorter;thresholdTs
, all delete events will send downstream before insert events;When the downstream is mysql-compatiable, We also change the logic to handle update events after the previous restart stage finishes
Previously, when meet a transaction with multiple update events which change the primary key or the not null unique key inside sink module, we always split them into delete events and replace events; This may cause data inconsistency problem as the following example:
Suppose a table t has the schema
create table t(a int primary key)
, and it have two rowsa=1
anda=2
;If a transaction contains two update events:
In the ideal scenario, we expect these two events to be splite into the following events:
After the transaction, table t have two rows
a=2
anda=3
;But inside cdc, we cannot get the original order of these two update events, so these two update events may be split into the following events:
After the transaction, table t have only one row
a=3
;(Data inconsistency happens!)So we do not split any update events inside sink module when the downstream is mysql, this may cause
duplicate key entry
error when the order to execute update events inside a transaction is wrong;This error will cause changefeed to restart and enter the previous restart change, the update events will be split inside puller, and the delete events will be send before insert events;
When apply redo log
When apply redo log, split update events which update handle key to delete events and insert events, and cache the insert events until all delete events in the same transaction are emitted. If the insert events is too many(larger than 50), events will be written to a temp local file;
Check List
Tests
Questions
Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?
Release note