ddl: Do not physical drop table after tiflash replica is set to 0 (#9440) #9441
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an automated cherry-pick of #9440
What problem does this PR solve?
Issue Number: close #9438
Problem Summary:
In v8.1.0 and v8.1.1, if the tiflash replica num is set to 0,
applyDropTable(database_id, table_id, "SetTiFlashReplica-0")
will be executed and add atombstone_ts
to the IStorage instance.https://github.com/pingcap/tiflash/blob/v8.1.1/dbms/src/TiDB/Schema/SchemaBuilder.cpp#L392-L407
If all the regions are removed from the tiflash instance, and the
tombstone_ts
exceeds the gc_safepoint, then we will generate aInterpreterDropQuery
to physically drop the IStorage instance.https://github.com/pingcap/tiflash/blob/v8.1.1/dbms/src/TiDB/Schema/SchemaSyncService.cpp#L304-L354
However, there could be a chance that data loss due to a concurrent issue:
SchemaSyncService::gcImpl
, a table is judge as both "tombstone_ts exceed the gc_safepoint" and "no region peer exists". SoInterpreterDropQuery
is generatedInterpreterDropQuery
get executed.InterpreterDropQuery
get executed, and all the data in theStorageDeltaMerge
get physically removed. But the region is still exist in the raft-layer. And the query result after that will meet data loss.What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note