-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLASH-1026] Synchronization between drop table and remove region #538
Conversation
Signed-off-by: JaySon-Huang <[email protected]>
I think the fundamental difference, that causes this problem, between TMT and DT, is that DT has to remove corresponding storage data (by writing a delete range(?)) when removing a region whereas TMT does not (TMT removes corresponding data during compaction/merge). Am I understanding it right? If so, could you explain why DT has to remove storage data instantly? |
Actually DT don't remove storage data instantly, For TMT, it will do final optimize on storage if all its regions are removed, or background thread to calculate its parts satisfied threshold or not. |
My actual question was, why does DT have to "mark" this range for reclaim, rather than doing it the "tmt" way, which is leaving it to compaction to detect that "this region has been removed, we don't need it anymore"? |
@zanmato1984 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-integration-tests |
2 similar comments
/run-integration-tests |
/run-integration-tests |
Signed-off-by: JaySon-Huang <[email protected]>
dd51b82
to
deec32b
Compare
/merge |
/run-all-tests |
@JaySon-Huang merge failed. |
/run-all-tests |
@JaySon-Huang merge failed. |
/merge |
/run-all-tests |
@JaySon-Huang merge failed. |
/merge |
/run-all-tests |
…) (#539) * In InterpreterDropQuery, call shutdown within table lock * Temporary check Signed-off-by: JaySon-Huang <[email protected]>
* fix daily test fail (#520) * fix daily test fail * fix * Add fullstack test for engine DeltaTree (#524) ## Add fullstack test for engine DeltaTree. * Refine `tests/docker/run.sh` and split `tests/docker/docker-compose.yaml` into `tests/docker/{gtest/mock-test/cluster/tiflash-dt/tiflash-tmt}.yaml` `fullstack/ddl`,`fullstack-test/fault-inject` will be enabled in #526 ## Others * Add column `tidb_table_id` in `system.tables` * Add some debugging info Signed-off-by: JaySon-Huang <[email protected]> * Bugfix: schrodinger bank2 fail (#521) * Flush committed data in Region after resolve locks * Stop append into last packs after split. * Remove last_cache in Delta to reduce code complexity. * Add system table: dt_tables and dt_segments, for debug. * [FLASH-1008] Support br restore & ingest sst (#529) * [flash-1018]fix bug of datetime default value (#534) * fix bug of datetime default value * address comment * Using SegmentSnapshotPtr instead of SegmentSnapshot (#532) * [Flash-664] Enable DDL for engine DeltaTree (#526) - [x] Enable unittest in gtest_dbms - [x] Enable mock test in tests/delta-merge-test - [x] Enable fullstack-test/ddl - [x] Enable fullstack-test/inject (Imported in #443) - [x] Refine exception while read / write to DeltaTree (FLASH-994) * Use `Exception::addMessage` to add more diagnostics for locate which table is wrong (commit: 716ae4a) - [x] shutdown should cancel all background tasks (FLASH-995) (commit: 7470c2f) - [x] Run schrodinger/sddl test ## Others * Add atomic-rename table test in `tests/fullstack-test/fault-inject/rename-table.test`, but did not enable this. We will fix it later. * "dt" engine ONLY support disable_bg_flush = true. If background flush is enabled, read will not triggle schema sync. Which means that we may not get the right result with out-dated schema. * Found that 'zero' value of type year is not match with tikv (FLASH-1023) Signed-off-by: JaySon-Huang <[email protected]> * [FLASH-1027] Fix: proxy override system signal listening (#541) * remove signal listening from proxy. * while terminating, stop all learner read. * [FLASH-1026] Synchronization between drop table and remove region (#538) * Set default storage engine to DT (#547) * Do region flush in Region::handleWriteRaftCmd (#542) * [FLASH-1028] Region merge should not remove data (#544) * Remove region after region merge should not remove data Signed-off-by: JaySon-Huang <[email protected]> * Fix region init index Signed-off-by: JaySon-Huang <[email protected]> * Add region_merge.test for DT Signed-off-by: JaySon-Huang <[email protected]> * Fix different behavior between DT and TMT Signed-off-by: JaySon-Huang <[email protected]> * Fix mock remove region Signed-off-by: JaySon-Huang <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]> * clean useless code Co-authored-by: JaySon <[email protected]> Co-authored-by: Flowyi <[email protected]> Co-authored-by: Tong Zhigao <[email protected]> Co-authored-by: Han Fei <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]>
* implement join * fix * update tipb * update tipb * save work: refine dag interpreter, introduce dag interpreter query block * fix bug * comment useless code * refine code * fix bug * refine code * fix bug * save work * tiny fix * save work * support remote read * update tipb * refine code * fix bug * update client-c * refine cop read * update client-c to support cop reading from tiflash * refine code * support batch cop * fix build error * fix daily test fail * some bug fix * log dag execution time without encode to chunk * fix bug * fix bug * log dag execution time without encode * make encode multi processors * fix * parallel encode * refine code of batch coprocessor * refine code * delete useless code * update kvproto and client-c * [flash 1002]refine coprocessor read (#530) * refind coprocessor * fix * try fix ci * support key ranges in batch coprocessor (#533) * save work * save work * save work * save work * dt support key ranges in dag request * fix bug * fix bug * fix bug * add some comments * fix bug * address comments * merge master branch (#556) * fix daily test fail (#520) * fix daily test fail * fix * Add fullstack test for engine DeltaTree (#524) ## Add fullstack test for engine DeltaTree. * Refine `tests/docker/run.sh` and split `tests/docker/docker-compose.yaml` into `tests/docker/{gtest/mock-test/cluster/tiflash-dt/tiflash-tmt}.yaml` `fullstack/ddl`,`fullstack-test/fault-inject` will be enabled in #526 ## Others * Add column `tidb_table_id` in `system.tables` * Add some debugging info Signed-off-by: JaySon-Huang <[email protected]> * Bugfix: schrodinger bank2 fail (#521) * Flush committed data in Region after resolve locks * Stop append into last packs after split. * Remove last_cache in Delta to reduce code complexity. * Add system table: dt_tables and dt_segments, for debug. * [FLASH-1008] Support br restore & ingest sst (#529) * [flash-1018]fix bug of datetime default value (#534) * fix bug of datetime default value * address comment * Using SegmentSnapshotPtr instead of SegmentSnapshot (#532) * [Flash-664] Enable DDL for engine DeltaTree (#526) - [x] Enable unittest in gtest_dbms - [x] Enable mock test in tests/delta-merge-test - [x] Enable fullstack-test/ddl - [x] Enable fullstack-test/inject (Imported in #443) - [x] Refine exception while read / write to DeltaTree (FLASH-994) * Use `Exception::addMessage` to add more diagnostics for locate which table is wrong (commit: 716ae4a) - [x] shutdown should cancel all background tasks (FLASH-995) (commit: 7470c2f) - [x] Run schrodinger/sddl test ## Others * Add atomic-rename table test in `tests/fullstack-test/fault-inject/rename-table.test`, but did not enable this. We will fix it later. * "dt" engine ONLY support disable_bg_flush = true. If background flush is enabled, read will not triggle schema sync. Which means that we may not get the right result with out-dated schema. * Found that 'zero' value of type year is not match with tikv (FLASH-1023) Signed-off-by: JaySon-Huang <[email protected]> * [FLASH-1027] Fix: proxy override system signal listening (#541) * remove signal listening from proxy. * while terminating, stop all learner read. * [FLASH-1026] Synchronization between drop table and remove region (#538) * Set default storage engine to DT (#547) * Do region flush in Region::handleWriteRaftCmd (#542) * [FLASH-1028] Region merge should not remove data (#544) * Remove region after region merge should not remove data Signed-off-by: JaySon-Huang <[email protected]> * Fix region init index Signed-off-by: JaySon-Huang <[email protected]> * Add region_merge.test for DT Signed-off-by: JaySon-Huang <[email protected]> * Fix different behavior between DT and TMT Signed-off-by: JaySon-Huang <[email protected]> * Fix mock remove region Signed-off-by: JaySon-Huang <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]> * clean useless code Co-authored-by: JaySon <[email protected]> Co-authored-by: Flowyi <[email protected]> Co-authored-by: Tong Zhigao <[email protected]> Co-authored-by: Han Fei <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]> * fix type mismatch bug in broadcast join * broadcast join support join keys with different data type (#580) * fix type mismatch bug in broadcast join * refine code * refine code * some improvement for broadcast join (#600) * some improvement for broadcast join * format code * refine code * address comment * fix bug * address comments * fix bug * fix bug * make TiFlash backward compatible to old tipb (#653) * 1. re-enable exec info in dag response, 2. support old style dag request * basic support for execute summary * refine support of executor time for join plan * format code * address comments * fix bug * refine code * update header * Fix execute details regression after merge master (#678) * refine code * fix bug * fix bug * format code * update kvproto * fmt code * update client-c * update tipb Co-authored-by: xufei <[email protected]> Co-authored-by: xufei <[email protected]> Co-authored-by: JaySon <[email protected]> Co-authored-by: Flowyi <[email protected]> Co-authored-by: Tong Zhigao <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]>
* implement join * fix * update tipb * update tipb * save work: refine dag interpreter, introduce dag interpreter query block * fix bug * comment useless code * refine code * fix bug * refine code * fix bug * save work * tiny fix * save work * support remote read * update tipb * refine code * fix bug * update client-c * refine cop read * update client-c to support cop reading from tiflash * refine code * support batch cop * fix build error * fix daily test fail * some bug fix * log dag execution time without encode to chunk * fix bug * fix bug * log dag execution time without encode * make encode multi processors * fix * parallel encode * refine code of batch coprocessor * refine code * delete useless code * update kvproto and client-c * [flash 1002]refine coprocessor read (#530) * refind coprocessor * fix * try fix ci * support key ranges in batch coprocessor (#533) * save work * save work * save work * save work * dt support key ranges in dag request * fix bug * fix bug * fix bug * add some comments * fix bug * address comments * merge master branch (#556) * fix daily test fail (#520) * fix daily test fail * fix * Add fullstack test for engine DeltaTree (#524) ## Add fullstack test for engine DeltaTree. * Refine `tests/docker/run.sh` and split `tests/docker/docker-compose.yaml` into `tests/docker/{gtest/mock-test/cluster/tiflash-dt/tiflash-tmt}.yaml` `fullstack/ddl`,`fullstack-test/fault-inject` will be enabled in #526 ## Others * Add column `tidb_table_id` in `system.tables` * Add some debugging info Signed-off-by: JaySon-Huang <[email protected]> * Bugfix: schrodinger bank2 fail (#521) * Flush committed data in Region after resolve locks * Stop append into last packs after split. * Remove last_cache in Delta to reduce code complexity. * Add system table: dt_tables and dt_segments, for debug. * [FLASH-1008] Support br restore & ingest sst (#529) * [flash-1018]fix bug of datetime default value (#534) * fix bug of datetime default value * address comment * Using SegmentSnapshotPtr instead of SegmentSnapshot (#532) * [Flash-664] Enable DDL for engine DeltaTree (#526) - [x] Enable unittest in gtest_dbms - [x] Enable mock test in tests/delta-merge-test - [x] Enable fullstack-test/ddl - [x] Enable fullstack-test/inject (Imported in #443) - [x] Refine exception while read / write to DeltaTree (FLASH-994) * Use `Exception::addMessage` to add more diagnostics for locate which table is wrong (commit: 716ae4a) - [x] shutdown should cancel all background tasks (FLASH-995) (commit: 7470c2f) - [x] Run schrodinger/sddl test ## Others * Add atomic-rename table test in `tests/fullstack-test/fault-inject/rename-table.test`, but did not enable this. We will fix it later. * "dt" engine ONLY support disable_bg_flush = true. If background flush is enabled, read will not triggle schema sync. Which means that we may not get the right result with out-dated schema. * Found that 'zero' value of type year is not match with tikv (FLASH-1023) Signed-off-by: JaySon-Huang <[email protected]> * [FLASH-1027] Fix: proxy override system signal listening (#541) * remove signal listening from proxy. * while terminating, stop all learner read. * [FLASH-1026] Synchronization between drop table and remove region (#538) * Set default storage engine to DT (#547) * Do region flush in Region::handleWriteRaftCmd (#542) * [FLASH-1028] Region merge should not remove data (#544) * Remove region after region merge should not remove data Signed-off-by: JaySon-Huang <[email protected]> * Fix region init index Signed-off-by: JaySon-Huang <[email protected]> * Add region_merge.test for DT Signed-off-by: JaySon-Huang <[email protected]> * Fix different behavior between DT and TMT Signed-off-by: JaySon-Huang <[email protected]> * Fix mock remove region Signed-off-by: JaySon-Huang <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]> * clean useless code Co-authored-by: JaySon <[email protected]> Co-authored-by: Flowyi <[email protected]> Co-authored-by: Tong Zhigao <[email protected]> Co-authored-by: Han Fei <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]> * fix type mismatch bug in broadcast join * broadcast join support join keys with different data type (#580) * fix type mismatch bug in broadcast join * refine code * refine code * some improvement for broadcast join (#600) * some improvement for broadcast join * format code * refine code * address comment * fix bug * address comments * fix bug * fix bug * make TiFlash backward compatible to old tipb (#653) * 1. re-enable exec info in dag response, 2. support old style dag request * basic support for execute summary * refine support of executor time for join plan * format code * address comments * fix bug * refine code * update header * Fix execute details regression after merge master (#678) * refine code * fix bug * fix bug * format code * update kvproto * fmt code * update client-c * update tipb Co-authored-by: xufei <[email protected]> Co-authored-by: xufei <[email protected]> Co-authored-by: JaySon <[email protected]> Co-authored-by: Flowyi <[email protected]> Co-authored-by: Tong Zhigao <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]> Co-authored-by: xufei <[email protected]> Co-authored-by: xufei <[email protected]> Co-authored-by: JaySon <[email protected]> Co-authored-by: Flowyi <[email protected]> Co-authored-by: Tong Zhigao <[email protected]> Co-authored-by: pingcap-github-bot <[email protected]>
While running
tests/fullstack-test/fault-inject/drop-table.test
, tiflash may crash because of the synchronization between drop table and remove region.Problem
After
truncate table test.t ;
, SchemaBuilder will create aInterpreterDropQuery
to drop table. It callshutdown
without lock.https://github.com/pingcap/tics/blob/master/dbms/src/Interpreters/InterpreterDropQuery.cpp#L126-L129
At the same time, since the table is dropped, the region of that table is removed.
RegionTable::removeRegion
get a lock on DT table, and try to write a deleteRange on that table.https://github.com/pingcap/tics/blob/master/dbms/src/Storages/Transaction/RegionTable.cpp#L247-L273
When
storage->deleteRange()
callbackground_task_handle->wake()
, which has been reset byshutdown()
in InterpreterDropQuery, TiFlash will crash.Solution
In InterpreterDropQuery, I move
storage->shutdown()
after it get lock on storage. shutdown is supposed to be "wait till finish or cancel all background task of storage".