Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*(ticdc): split old update kv entry after restarting changefeed (#10919) #11029

Merged

Conversation

ti-chi-bot
Copy link
Member

@ti-chi-bot ti-chi-bot commented May 7, 2024

This is an automated cherry-pick of #10919

What problem does this PR solve?

Issue Number: close #10918

What is changed and how it works?

When the downstream is mysql-compatiable, we change the logic to handle update events at restart

  1. When start changefeed, we always get the current timestamp from pd(represented as thresholdTs later);
  2. Previous behavior: when sink module receive an update event which commit ts is before thresholdTs, it split it into a delete event and a replace event, and then send them downstream;
  3. Current behavior: when puller module receive an update kv entry which commit ts is before thresholdTs, it split into a delete kv entry and a insert kv entry, and then write them into sorter;
  4. This change makes sure that inside a transaction which commit ts is before thresholdTs, all delete events will send downstream before insert events;

When the downstream is mysql-compatiable, We also change the logic to handle update events after the previous restart stage finishes

Previously, when meet a transaction with multiple update events which change the primary key or the not null unique key inside sink module, we always split them into delete events and replace events; This may cause data inconsistency problem as the following example:
Suppose a table t has the schema create table t(a int primary key), and it have two rows a=1 and a=2;
If a transaction contains two update events:

1. update t set a = 3 where a = 2;
2. update t set a = 2 where a = 1;

In the ideal scenario, we expect these two events to be splite into the following events:

1. delete from t where a = 2;
2. replace into t values(3);
3. delete from t where a = 1;
4. replace into t values(2);

After the transaction, table t have two rows a=2 and a=3;

But inside cdc, we cannot get the original order of these two update events, so these two update events may be split into the following events:

1. delete from t where a = 1;
2. replace into t values(2);
3. delete from t where a = 2;
4. replace into t values(3);

After the transaction, table t have only one row a=3;(Data inconsistency happens!)

So we do not split any update events inside sink module when the downstream is mysql, this may cause duplicate key entry error when the order to execute update events inside a transaction is wrong;
This error will cause changefeed to restart and enter the previous restart change, the update events will be split inside puller, and the delete events will be send before insert events;

When apply redo log

When apply redo log, split update events which update handle key to delete events and insert events, and cache the insert events until all delete events in the same transaction are emitted. If the insert events is too many(larger than 50), events will be written to a temp local file;

Check List

Tests

  • Integration test
  • Unit test

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Fix potential risk of data inconsistency when there are dependencies between update statements in the same transaction.

@ti-chi-bot ti-chi-bot added lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-7.1 This PR is cherry-picked to release-7.1 from a source PR. labels May 7, 2024
@ti-chi-bot ti-chi-bot added the cherry-pick-approved Cherry pick PR approved by release team. label May 20, 2024
@lidezhu
Copy link
Collaborator

lidezhu commented May 24, 2024

/test cdc-integration-kafka-test

@lidezhu
Copy link
Collaborator

lidezhu commented May 24, 2024

/test cdc-integration-kafka-test

1 similar comment
@lidezhu
Copy link
Collaborator

lidezhu commented May 24, 2024

/test cdc-integration-kafka-test

@lidezhu
Copy link
Collaborator

lidezhu commented May 26, 2024

/test all

Copy link
Contributor

ti-chi-bot bot commented May 27, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lidezhu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label May 27, 2024
@lidezhu
Copy link
Collaborator

lidezhu commented May 27, 2024

/hold

@ti-chi-bot ti-chi-bot bot merged commit 192fc75 into pingcap:release-7.1 May 27, 2024
12 checks passed
@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2024
@lidezhu
Copy link
Collaborator

lidezhu commented May 27, 2024

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2024
@lidezhu lidezhu deleted the cherry-pick-10919-to-release-7.1 branch May 27, 2024 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved cherry-pick-approved Cherry pick PR approved by release team. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-7.1 This PR is cherry-picked to release-7.1 from a source PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants