Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: Fix potential data lost of alter_partition_by #8337

Merged
merged 12 commits into from
Nov 10, 2023

Conversation

JaySon-Huang
Copy link
Contributor

@JaySon-Huang JaySon-Huang commented Nov 8, 2023

What problem does this PR solve?

Issue Number: close #8206

Problem Summary:

Introduced by #7822

When executing alter table xxx partition by ... to turn a non-partition table into partition table, there could be a chance that tiflash see a non-partition table turn into a partition table (using the same table_id). But it would be skipped by the previous implementation

template <typename Getter, typename NameMapper>
void SchemaBuilder<Getter, NameMapper>::applyPartitionDiff(
const TiDB::DBInfoPtr & db_info,
const TableInfoPtr & table_info,
const ManageableStoragePtr & storage)
{
const auto & orig_table_info = storage->getTableInfo();
if (!orig_table_info.isLogicalPartitionTable())
{
LOG_ERROR(
log,
"old table in TiFlash not partition table {} with database_id={}, table_id={}",
name_mapper.debugCanonicalName(*db_info, orig_table_info),
db_info->id,
orig_table_info.id);
return;
}

And then tiflash mistaken drop the old table along with all its partitions, but actually those partition are now attached to a new logical table. This leads to data lost after `alter table xxx partition ...

What is changed and how it works?

  • Use tidb_isolation_read_engines instead of hint in test cases
  • Allow turning a non-partition table into partition table
  • When applying SchemaDiff for alter partition, we first create the new table and override the partition id mapping before dropping the old table

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. release-note-none Denotes a PR that doesn't merit a release note. needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. labels Nov 8, 2023
@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 8, 2023
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Nov 8, 2023

/hold

https://ci.pingcap.net/blue/organizations/jenkins/tiflash-ghpr-integration-tests/detail/tiflash-ghpr-integration-tests/15008/pipeline/115/

[2023-11-08T11:35:49.082Z] fullstack-test2/ddl/alter_partition_by.test: Running
[2023-11-08T11:36:15.550Z]   File: fullstack-test2/ddl/alter_partition_by.test
[2023-11-08T11:36:15.550Z]   Error line: 180
[2023-11-08T11:36:15.550Z]   Error: set session tidb_isolation_read_engines='tiflash'; select count(*) from test.t2 partition (p0);
[2023-11-08T11:36:15.550Z]   Result:
[2023-11-08T11:36:15.550Z]     count(*)
[2023-11-08T11:36:15.550Z]     0
[2023-11-08T11:36:15.550Z]   Expected:
[2023-11-08T11:36:15.550Z]     +----------+
[2023-11-08T11:36:15.550Z]     | count(*) |
[2023-11-08T11:36:15.550Z]     +----------+
[2023-11-08T11:36:15.550Z]     |        5 |
[2023-11-08T11:36:15.550Z]     +----------+

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 8, 2023
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Nov 8, 2023
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 9, 2023
Copy link
Contributor

ti-chi-bot bot commented Nov 9, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-11-08 14:11:41.556037286 +0000 UTC m=+3653499.143147416: ☑️ agreed by Lloyd-Pottiger.
  • 2023-11-09 03:07:58.730113346 +0000 UTC m=+3700076.317223492: ☑️ agreed by hongyunyan.

@Lloyd-Pottiger
Copy link
Contributor

/run-all-tests

@JaySon-Huang
Copy link
Contributor Author

/run-integration-test

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 9, 2023
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@JaySon-Huang JaySon-Huang changed the title tests: Fix unstable test of alter_partition_by tests: Fix potential data lost of alter_partition_by Nov 9, 2023
@JaySon-Huang JaySon-Huang changed the title tests: Fix potential data lost of alter_partition_by ddl: Fix potential data lost of alter_partition_by Nov 9, 2023
Signed-off-by: JaySon-Huang <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 10, 2023
@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

@@ -252,9 +252,14 @@ void SchemaBuilder<Getter, NameMapper>::applyDiff(const SchemaDiff & diff)
}
else
{
/// The new non-partitioned table will have a new id
// Create the new table.
// If the new table is a partition table, this will also overwrite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to seperate the logical for ActionAlterTablePartitioning and ActionRemovePartitioning

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ActionRemovePartitioning, it will

  • Receive a schema diff, and add the new non-partition table as a table_id in partition.adding_definitions. -- TiFlash should add the new table id to mapping and handle the apply snapshot
  • Then receive a schema diff to make the new non-partition table as a normal table and remove the old partition-table

So it is the same logic as ActionAlterTablePartitioning in tiflash

auto new_db_info = getter.getDatabase(database_id);
applyCreateStorageInstance(new_db_info, table_info);

for (const auto & part_def : table_info->partition.definitions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when do applyPartitionDiff(new_db_info, table_info, storage), we also will emplacePartitionTableId based on the definitions, why do we also add it here?

This comment was marked as off-topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you mean, let me check it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we create the Storage instance by table_info in line 402, then try to execute applyPartitionDiff. Then nothing must change between the table_info and storage.table_info. So calling applyPartitionDiff is redundant.
Then this function is simple the same as applyCreateTable. So I have remove the function applyPartitionAlter

case SchemaActionType::ActionRemovePartitioning:
{
if (diff.table_id == diff.old_table_id)
{
/// Only internal additions of new partitions
applyPartitionDiff(diff.schema_id, diff.table_id);
}
else
{
// Create the new table.
// If the new table is a partition table, this will also overwrite
// the partition id mapping to the new logical table
applyCreateTable(diff.schema_id, diff.table_id);

@JaySon-Huang
Copy link
Contributor Author

/run-all-tests

Copy link
Contributor

ti-chi-bot bot commented Nov 10, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongyunyan, Lloyd-Pottiger

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [Lloyd-Pottiger,hongyunyan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JaySon-Huang
Copy link
Contributor Author

/hold cancel

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 10, 2023
@ti-chi-bot ti-chi-bot bot merged commit 27de3d3 into pingcap:master Nov 10, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #8355.

ti-chi-bot pushed a commit to ti-chi-bot/tiflash that referenced this pull request Nov 10, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: failed to apply #8337 on top of branch "release-7.5":

failed to git commit: exit status 1

@JaySon-Huang JaySon-Huang deleted the fix_unstable_it branch November 10, 2023 07:21
@JaySon-Huang JaySon-Huang removed the needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. label Nov 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The integration test is not stable
4 participants