Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for new DDL actions for remove and add table partitioning #7822

Merged
merged 15 commits into from
Sep 11, 2023

Conversation

mjonss
Copy link
Contributor

@mjonss mjonss commented Jul 19, 2023

What problem does this PR solve?

Issue Number: close #7823
Problem Summary:
ALTER TABLE t REMOVE PARTITIONING
and
ALTER TABLE t PARTITION BY ...
introduces new DDL Actions, which is not supported by TiFlash.
Handling those is the same as for other partitioning management DDLs, so it is easy to implement.

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed do-not-merge/needs-linked-issue labels Jul 19, 2023
@Lloyd-Pottiger
Copy link
Contributor

/run-all-tests

@@ -219,6 +219,24 @@ void SchemaBuilder<Getter, NameMapper>::applyDiff(const SchemaDiff & diff)
applyPartitionDiff(diff.schema_id, diff.table_id);
break;
}
case SchemaActionType::ActionAlterTablePartitioning:
{
if (diff.table_id == diff.old_table_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about that when diff.table_id = diff.old_table_id and when diff.table_id != diff.old_table_id, could you please give an example here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is depending on the stage change, only the last (StateDeleteReorganization -> StatePublic) is the table ID changed to the new one.

Example (added custom logs):
tidb.log

[2023/08/01 23:11:59.568 +02:00] [WARN] [ddl_worker.go:1431] [updateSchemaVersion] [category=ddl] [TableId=96] [SchemaState="delete only"]
[2023/08/01 23:11:59.637 +02:00] [WARN] [ddl_worker.go:1431] [updateSchemaVersion] [category=ddl] [TableId=96] [SchemaState="delete only"]
[2023/08/01 23:11:59.702 +02:00] [WARN] [ddl_worker.go:1431] [updateSchemaVersion] [category=ddl] [TableId=96] [SchemaState="write only"]
[2023/08/01 23:11:59.798 +02:00] [WARN] [ddl_worker.go:1431] [updateSchemaVersion] [category=ddl] [TableId=96] [SchemaState="write reorganization"]
[2023/08/01 23:11:59.865 +02:00] [WARN] [ddl_worker.go:1431] [updateSchemaVersion] [category=ddl] [TableId=96] [SchemaState="delete reorganization"]
[2023/08/01 23:11:59.865 +02:00] [WARN] [ddl_worker.go:1441] [StateDeleteReorganization] [category=ddl] [TableId=105] [OldTableID=96]

tiflash.log

[2023/08/01 23:11:59.818 +02:00] [WARN] [SchemaBuilder.cpp:238] ["ActionRemovePartitioning table_id 96 old 96"] [source="keyspace=4294967295"] [thread_id=60]
[2023/08/01 23:11:59.819 +02:00] [WARN] [SchemaBuilder.cpp:238] ["ActionRemovePartitioning table_id 96 old 96"] [source="keyspace=4294967295"] [thread_id=60]
[2023/08/01 23:11:59.820 +02:00] [WARN] [SchemaBuilder.cpp:238] ["ActionRemovePartitioning table_id 96 old 96"] [source="keyspace=4294967295"] [thread_id=60]
[2023/08/01 23:11:59.821 +02:00] [WARN] [SchemaBuilder.cpp:238] ["ActionRemovePartitioning table_id 96 old 96"] [source="keyspace=4294967295"] [thread_id=60]
[2023/08/01 23:12:55.086 +02:00] [WARN] [SchemaBuilder.cpp:238] ["ActionRemovePartitioning table_id 105 old 96"] [source="keyspace=4294967295"] [thread_id=61]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the first three stages, does the partition.definitions changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first (StateNone->StateDeleteOnly), it adds the new partitions to TableInfo.Partition.AddingDefinitions and the removed ones to TableInfo.Partition.DroppingDefinitions.

In the second (StateDeleteOnly->StateWriteOnly), it only waits for the partitions to be available in TiFlash (if the table should have any TiFlash replicas).

In the third (StateWriteOnly->StateWriteReorganization), no changes or checks.

In the forth (StateWriteReorganization->StateDeleteReorganization), no changes (but does all the data copying and recreating the indexes).

In the fifth (StateDeleteReorganization->StatePublic), it depends on the actual command (reorganize, remove partitioning or alter table partition by):

  • Reorganize, it will modify the TableInfo.Partition.Definitions by removing the DroppingDefinitions and add the AddingDefinitions and then reset AddingDefinitions and DroppingDefinitions.
  • Remove partitioning, it will change the tableID to the one used for AddingDefinitions. Really drop the old partitioned table metadata, change the TableInfo by setting the table id to the single AddingDefinitions table id, and remove TableInfo.Partition, and then create the table (metadata) with the updated TableInfo.
  • Alter Table Partition By, it will change the TableID and replace the TableInfo.Partition with the new partitioning type, expression and definitions. Also by first drop the table (metadata), change the TableInfo.Partition and table id, and the (re)create the table with the updated TableInfo.

Notice that the data cleanup is done later in an async way, and that the old table data also needs to be accessible during the schemaversion the DDL got during transitioning to StatePublic, since there may be clients in the schemaversion of StateDeleteReorganization that still need to read and (double) write the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some additional testing I figured out that I missed to drop the table (metadata) in TiFlash as well, not just create a new table :)

Copy link
Contributor

@hongyunyan hongyunyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Aug 7, 2023
@JaySon-Huang
Copy link
Contributor

/run-all-tests

@JaySon-Huang
Copy link
Contributor

/run-all-tests

@JaySon-Huang
Copy link
Contributor

/run-all-tests

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 29, 2023
@mjonss
Copy link
Contributor Author

mjonss commented Aug 29, 2023

/test-unit-test

@mjonss
Copy link
Contributor Author

mjonss commented Aug 29, 2023

/run-all-tests

@mjonss
Copy link
Contributor Author

mjonss commented Sep 1, 2023

The failing integration test fullstack-test2/ddl/rename_pk.test also fails locally, without any code changes. Is this test also failing in other PRs? @hongyunyan any ideas on how to fix it?

@mjonss
Copy link
Contributor Author

mjonss commented Sep 1, 2023

/run-all-tests

@JaySon-Huang
Copy link
Contributor

/run-all-tests

@JaySon-Huang
Copy link
Contributor

blocked by pingcap/tidb#46645

[2023-09-04T14:20:07.289Z] fullstack-test2/ddl/multi_alter_with_write.test: OK [0.033 s]
[2023-09-04T14:20:07.289Z] fullstack-test2/ddl/remove_partitioning.test: Running
[2023-09-04T14:20:11.493Z] fullstack-test2/ddl/remove_partitioning.test: OK [4.101 s]
[2023-09-04T14:20:11.493Z] fullstack-test2/ddl/rename_pk.test: Running
[2023-09-04T14:20:13.380Z]   File: fullstack-test2/ddl/rename_pk.test
[2023-09-04T14:20:13.380Z]   Error line: 23
[2023-09-04T14:20:13.380Z]   Error: set session tidb_isolation_read_engines='tiflash'; select * from test.t order by pk;
[2023-09-04T14:20:13.380Z]   Result:
[2023-09-04T14:20:13.380Z]     ERROR 1815 (HY000) at line 1: Internal : Can't find a proper physical plan for this query
[2023-09-04T14:20:13.380Z]   Expected:
[2023-09-04T14:20:13.380Z]     +----+
[2023-09-04T14:20:13.380Z]     | pk |
[2023-09-04T14:20:13.380Z]     +----+
[2023-09-04T14:20:13.380Z]     |  1 |
[2023-09-04T14:20:13.380Z]     |  2 |
[2023-09-04T14:20:13.380Z]     +----+

@bb7133
Copy link
Member

bb7133 commented Sep 5, 2023

blocked by pingcap/tidb#46645

@JaySon-Huang @mjonss Does it mean that some PR(which got merged recently) broke the CI of tiflash?


func> wait_table test t

mysql> set @@global.tidb_allow_mpp = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mysql> set @@global.tidb_allow_mpp = 0;

Copy link
Contributor

@JaySon-Huang JaySon-Huang Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is tidb_allow_mpp = 0 necessary for your test cases? It will make the following tests fail.

I've pushed a commit to remove it for a try.

@JaySon-Huang
Copy link
Contributor

/run-all-tests

@JaySon-Huang
Copy link
Contributor

/run-unit-test

@JaySon-Huang
Copy link
Contributor

/run-integration-test

@ti-chi-bot ti-chi-bot bot added the lgtm label Sep 11, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 11, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongyunyan, JaySon-Huang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,hongyunyan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Sep 11, 2023
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Sep 11, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-08-07 05:53:48.041197625 +0000 UTC m=+683111.983546155: ☑️ agreed by hongyunyan.
  • 2023-09-11 08:28:19.934525353 +0000 UTC m=+257881.859081748: ☑️ agreed by JaySon-Huang.

@JaySon-Huang
Copy link
Contributor

/run-integration-test

1 similar comment
@mjonss
Copy link
Contributor Author

mjonss commented Sep 11, 2023

/run-integration-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support ALTER TABLE t {REMOVE PARTITIONING | PARTITION BY ... }
5 participants