Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DTFile ingested may be deleted by accident #2040

Closed
JaySon-Huang opened this issue Jun 1, 2021 · 1 comment · Fixed by #2054
Closed

DTFile ingested may be deleted by accident #2040

JaySon-Huang opened this issue Jun 1, 2021 · 1 comment · Fixed by #2054
Assignees
Labels
severity/major type/bug The issue is confirmed as a bug.

Comments

@JaySon-Huang
Copy link
Contributor

JaySon-Huang commented Jun 1, 2021

Found by test on k8s env.

Add shuffle-region by pd-ctl: scheduler add shuffle-region-scheduler

A bug introduced by #1347, #1439

ingestPacks could be fail, and may lead to broken ref page

https://github.com/pingcap/tics/blob/dce81ce066981f0304fd1ad5512d301a780a2615/dbms/src/Storages/DeltaMerge/DeltaMergeStore.cpp#L630-L683

log files:
test.log.zip

Invalid Page 38262 -Ref-> 38255
[2021/06/01 05:07:01.743 +00:00] [DEBUG] [<unknown>] ["DeltaMergeStore[db_1.t_59]: Loading dt files"] [thread_id=4]
[2021/06/01 05:07:01.763 +00:00] [INFO] [<unknown>] ["PageStorage: db_1.t_59.log begin to restore data from disk. [path=/data0/db/data/t_59/log] [num_writers=4]"] [thread_id=4]
[2021/06/01 05:07:01.769 +00:00] [INFO] [<unknown>] ["PageStorage: db_1.t_59.log restore 0 puts and 38 refs and 0 deletes and 38 upserts from checkpoint PageFile_202_2 sequence: 27489"] [thread_id=4]
[2021/06/01 05:07:01.782 +00:00] [INFO] [<unknown>] ["PageStorage: db_1.t_59.log restore 28 pages, write batch sequence: 35299, 6759 puts and 6130 refs and 12861 deletes and 70 upserts"] [thread_id=4]
[2021/06/01 05:07:01.782 +00:00] [INFO] [<unknown>] ["PageStorage: db_1.t_59.data begin to restore data from disk. [path=/data0/db/data/t_59/data] [num_writers=1]"] [thread_id=4]
[2021/06/01 05:07:01.817 +00:00] [WARN] [<unknown>] ["PageStorage: db_1.t_59.data Ignore invalid RefPage in DeltaVersionEditAcceptor::applyInplace, RefPage38262 to non-exist Page38255"] [thread_id=4]
[2021/06/01 05:07:01.817 +00:00] [INFO] [<unknown>] ["PageStorage: db_1.t_59.data restore 14 pages, write batch sequence: 63499, 32965 puts and 5058 refs and 38042 deletes and 0 upserts"] [thread_id=4]
[2021/06/01 05:07:01.817 +00:00] [INFO] [<unknown>] ["PageStorage: db_1.t_59.meta begin to restore data from disk. [path=/data0/db/data/t_59/meta] [num_writers=2]"] [thread_id=4]
[2021/06/01 05:07:01.822 +00:00] [INFO] [<unknown>] ["PageStorage: db_1.t_59.meta restore 0 puts and 6 refs and 0 deletes and 6 upserts from checkpoint PageFile_3_0 sequence: 70714"] [thread_id=4]
[2021/06/01 05:07:01.849 +00:00] [INFO] [<unknown>] ["PageStorage: db_1.t_59.meta restore 6 pages, write batch sequence: 117995, 70705 puts and 6 refs and 39 deletes and 6 upserts"] [thread_id=4]
[2021/06/01 05:07:01.879 +00:00] [ERROR] [<unknown>] ["DB::DM::DeltaMergeStore::DeltaMergeStore(DB::Context&, bool, const String&, const String&, const ColumnDefines&, const DB::DM::ColumnDefine&, bool, size_t, const DB::DM::DeltaMergeStore::Settings&): Code: 0, e.displayText() = DB::Exception: Can not find path for DMFile [id=38262], e.what() = DB::Exception, Stack trace:\n\n0. /tiflash/tiflash(StackTrace::StackTrace()+0x15) [0x3623e35]\n1. /tiflash/tiflash(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x25) [0x361a705]\n2. /tiflash/tiflash(DB::StableDiskDelegator::getDTFilePath[abi:cxx11](unsigned long) const+0x173) [0x7057733]\n3. /tiflash/tiflash(DB::DM::DeltaPackFile::deserializeMetadata(DB::DM::DMContext&, DB::DM::RowKeyRange const&, DB::ReadBuffer&)+0xb3) [0x716a4c3]\n4. /tiflash/tiflash(DB::DM::deserializePacks_V3(DB::DM::DMContext&, DB::DM::RowKeyRange const&, DB::ReadBuffer&, unsigned long)+0x63c) [0x717441c]\n5. /tiflash/tiflash(DB::DM::deserializePacks(DB::DM::DMContext&, DB::DM::RowKeyRange const&, DB::ReadBuffer&)+0x8a) [0x7161e5a]\n6. /tiflash/tiflash(DB::DM::DeltaValueSpace::restore(DB::DM::DMContext&, DB::DM::RowKeyRange const&, unsigned long)+0xea) [0x7177f8a]\n7. /tiflash/tiflash(DB::DM::Segment::restoreSegment(DB::DM::DMContext&, unsigned long)+0xd79) [0x70e8a79]\n8. /tiflash/tiflash(DB::DM::DeltaMergeStore::DeltaMergeStore(DB::Context&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<DB::DM::ColumnDefine, std::allocator<DB::DM::ColumnDefine> > const&, DB::DM::ColumnDefine const&, bool, unsigned long, DB::DM::DeltaMergeStore::Settings const&)+0x92e) [0x70c5e3e]\n9. /tiflash/tiflash(DB::StorageDeltaMerge::StorageDeltaMerge(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::optional<std::reference_wrapper<TiDB::TableInfo const> >, DB::ColumnsDescription const&, std::shared_ptr<DB::IAST> const&, unsigned long, DB::Context&)+0x10be) [0x7073c4e]\n10. /tiflash/tiflash() [0x71214b3]\n11. /tiflash/tiflash() [0x7121d97]\n12. /tiflash/tiflash(DB::StorageFactory::get(DB::ASTCreateQuery&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::Context&, DB::Context&, DB::ColumnsDescription const&, bool, bool) const+0x1ba) [0x7083cda]\n13. /tiflash/tiflash(DB::createTableFromDefinition(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::Context&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x10b) [0x68897eb]\n14. /tiflash/tiflash(DB::DatabaseLoading::loadTable(DB::Context&, DB::IDatabase&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)+0x2da) [0x688b69a]\n15. /tiflash/tiflash() [0x687f759]\n16. /tiflash/tiflash(ThreadPool::worker()+0x166) [0x7984146]\n17. /tiflash/tiflash() [0x84335ee]\n18. /lib64/libpthread.so.0(+0x7dd4) [0x7fe1d56fbdd4]\n19. /lib64/libc.so.6(clone+0x6c) [0x7fe1d4cb2eac]\n"] [thread_id=4]
[2021/06/01 05:07:01.904 +00:00] [INFO] [<unknown>] ["DeltaMergeStore[db_1.t_57]: Release DeltaMerge Store start [db_1.t_57]"] [thread_id=1]
[2021/06/01 05:07:01.904 +00:00] [INFO] [<unknown>] ["DeltaMergeStore[db_1.t_57]: Release DeltaMerge Store end [db_1.t_57]"] [thread_id=1]
[2021/06/01 05:07:03.733 +00:00] [DEBUG] [<unknown>] ["Application: Destroyed global context."] [thread_id=1]
[2021/06/01 05:07:03.733 +00:00] [INFO] [<unknown>] ["Application: let tiflash proxy shutdown"] [thread_id=1]
[2021/06/01 05:07:03.733 +00:00] [INFO] [<unknown>] ["Application: wait for tiflash proxy thread to join"] [thread_id=1]
[2021/06/01 05:07:04.006 +00:00] [INFO] [<unknown>] ["Application: tiflash proxy thread is joined"] [thread_id=1]
[2021/06/01 05:07:04.007 +00:00] [ERROR] [<unknown>] ["Application: DB::Exception: Cannot create table from metadata file /data0/db/metadata/db_1/t_59.sql, error: DB::Exception: Can not find path for DMFile [id=38262], stack trace:\n0. /tiflash/tiflash(StackTrace::StackTrace()+0x15) [0x3623e35]\n1. /tiflash/tiflash(DB::Exception::Exception(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x25) [0x361a705]\n2. /tiflash/tiflash(DB::StableDiskDelegator::getDTFilePath[abi:cxx11](unsigned long) const+0x173) [0x7057733]\n3. /tiflash/tiflash(DB::DM::DeltaPackFile::deserializeMetadata(DB::DM::DMContext&, DB::DM::RowKeyRange const&, DB::ReadBuffer&)+0xb3) [0x716a4c3]\n4. /tiflash/tiflash(DB::DM::deserializePacks_V3(DB::DM::DMContext&, DB::DM::RowKeyRange const&, DB::ReadBuffer&, unsigned long)+0x63c) [0x717441c]\n5. /tiflash/tiflash(DB::DM::deserializePacks(DB::DM::DMContext&, DB::DM::RowKeyRange const&, DB::ReadBuffer&)+0x8a) [0x7161e5a]\n6. /tiflash/tiflash(DB::DM::DeltaValueSpace::restore(DB::DM::DMContext&, DB::DM::RowKeyRange const&, unsigned long)+0xea) [0x7177f8a]\n7. /tiflash/tiflash(DB::DM::Segment::restoreSegment(DB::DM::DMContext&, unsigned long)+0xd79) [0x70e8a79]\n8. /tiflash/tiflash(DB::DM::DeltaMergeStore::DeltaMergeStore(DB::Context&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<DB::DM::ColumnDefine, std::allocator<DB::DM::ColumnDefine> > const&, DB::DM::ColumnDefine const&, bool, unsigned long, DB::DM::DeltaMergeStore::Settings const&)+0x92e) [0x70c5e3e]\n9. /tiflash/tiflash(DB::StorageDeltaMerge::StorageDeltaMerge(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::optional<std::reference_wrapper<TiDB::TableInfo const> >, DB::ColumnsDescription const&, std::shared_ptr<DB::IAST> const&, unsigned long, DB::Context&)+0x10be) [0x7073c4e]\n10. /tiflash/tiflash() [0x71214b3]\n11. /tiflash/tiflash() [0x7121d97]\n12. /tiflash/tiflash(DB::StorageFactory::get(DB::ASTCreateQuery&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::Context&, DB::Context&, DB::ColumnsDescription const&, bool, bool) const+0x1ba) [0x7083cda]\n13. /tiflash/tiflash(DB::createTableFromDefinition(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, DB::Context&, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x10b) [0x68897eb]\n14. /tiflash/tiflash(DB::DatabaseLoading::loadTable(DB::Context&, DB::IDatabase&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)+0x2da) [0x688b69a]\n15. /tiflash/tiflash() [0x687f759]\n16. /tiflash/tiflash(ThreadPool::worker()+0x166) [0x7984146]\n17. /tiflash/tiflash() [0x84335ee]\n18. /lib64/libpthread.so.0(+0x7dd4) [0x7fe1d56fbdd4]\n19. /lib64/libc.so.6(clone+0x6c) [0x7fe1d4cb2eac]\n"] [thread_id=1]
[2021/06/01 05:07:04.007 +00:00] [INFO] [<unknown>] ["Application: shutting down"] [thread_id=1]

@JaySon-Huang JaySon-Huang self-assigned this Jun 1, 2021
@JaySon-Huang JaySon-Huang added the type/bug The issue is confirmed as a bug. label Jun 1, 2021
@JaySon-Huang JaySon-Huang reopened this Jun 2, 2021
@JaySon-Huang
Copy link
Contributor Author

JaySon-Huang commented Jun 2, 2021

Reproduce in a lower level: we may try to add ref page to a deleted page. After TiFlash restart, we can not find related DTFile.

Dump the data pages using ./page_ctl ./tmp 1

-- ingest dmf_24581 to seg [1]
PageFile_2_0, type: Formal      seq:     41186  PUT     pid:    24851                   2               0               0               0               0       0000000000000000
-- seg [1]    merge delta, epoch 14963, new stable 24583
PageFile_2_0, type: Formal      seq:     41187  PUT     pid:    24853                   2               0               0               0               0       0000000000000000
-- <del>ingest dmf_24581 to seg [134], with ref id 24584</del>
-- seg [1] add ref for non-compacted delta pack
PageFile_2_0, type: Formal      seq:     41188  REF     24854   24851
-- seg [1] remove old stable, delta
PageFile_2_0, type: Formal      seq:     41189  DEL     24852
PageFile_2_0, type: Formal      seq:     41189  DEL     24851
PageFile_2_0, type: Formal      seq:     41189  DEL     24850 -- old stable for seg[1], epoch 14962
-- seg [1]   merge delta, epoch 14964, new stable 24855
PageFile_2_0, type: Formal      seq:     41190  PUT     pid:    24855                   2               0               0               0               0       0000000000000000
-- seg [1] remove old stable, delta
PageFile_2_0, type: Formal      seq:     41191  DEL     24854
PageFile_2_0, type: Formal      seq:     41191  DEL     24853 -- old stable for seg[1], epoch 14963
-- seg [134] merge delta, epoch 39, new stable 24843
PageFile_2_0, type: Formal      seq:     41192  PUT     pid:    24843                   2               0               0               0               0       0000000000000000
-- seg [134] add ref for non-compacted delta pack
PageFile_2_0, type: Formal      seq:     41193  REF     24857   24838
PageFile_2_0, type: Formal      seq:     41193  REF     24858   24845
-- seg [134] remove old stable, delta
PageFile_2_0, type: Formal      seq:     41194  DEL     24839
PageFile_2_0, type: Formal      seq:     41194  DEL     24840
PageFile_2_0, type: Formal      seq:     41194  DEL     24841
PageFile_2_0, type: Formal      seq:     41194  DEL     24838
PageFile_2_0, type: Formal      seq:     41194  DEL     24845
PageFile_2_0, type: Formal      seq:     41194  DEL     24830 -- old stable for seg[134], epoch 38
-- ingest dmf_24581 to seg [134], with ref id 24859  -- BOOM! we will try to add a ref page to a "deleted" page
PageFile_2_0, type: Formal      seq:     41195  REF     24859   24851

PageFile_2_0, type: Formal      seq:     41196  PUT     pid:    24856                   2               0               0               0               0       0000000000000000
PageFile_2_0, type: Formal      seq:     41197  DEL     24855
PageFile_2_0, type: Formal      seq:     41198  PUT     pid:    24860                   2               0               0               0               0       0000000000000000
PageFile_2_0, type: Formal      seq:     41199  DEL     24856
PageFile_2_0, type: Formal      seq:     41200  PUT     pid:    24862                   2               0               0               0               0       0000000000000000
PageFile_2_0, type: Formal      seq:     41201  DEL     24860
PageFile_2_0, type: Formal      seq:     41202  PUT     pid:    24863                   2               0               0               0               0       0000000000000000
PageFile_2_0, type: Formal      seq:     41203  PUT     pid:    24864                   2               0               0               0               0       0000000000000000
PageFile_2_0, type: Formal      seq:     41204  DEL     24863
PageFile_2_0, type: Formal      seq:     41204  DEL     24862
PageFile_2_0, type: Formal      seq:     41205  PUT     pid:    24865                   2               0               0               0               0       0000000000000000
PageFile_2_0, type: Formal      seq:     41206  PUT     pid:    24866                   2               0               0               0               0       0000000000000000
PageFile_2_0, type: Formal      seq:     41207  DEL     24864
PageFile_2_0, type: Formal      seq:     41208  PUT     pid:    24867                   2               0               0               0               0       0000000000000000
PageFile_2_0, type: Formal      seq:     41209  DEL     24866
PageFile_2_0, type: Formal      seq:     41210  PUT     pid:    24868                   2               0               0               0               0       0000000000000000
-- seg [134] merge delta, epoch 40, new stable 24861
PageFile_2_0, type: Formal      seq:     41211  PUT     pid:    24861                   2               0               0               0               0       0000000000000000
-- seg [134] add ref for non-compacted delta
PageFile_2_0, type: Formal      seq:     41212  REF     24871   24851
PageFile_2_0, type: Formal      seq:     41212  REF     24872   24865
-- seg [134] remove compacted stable, delta
PageFile_2_0, type: Formal      seq:     41213  DEL     24857
PageFile_2_0, type: Formal      seq:     41213  DEL     24858
PageFile_2_0, type: Formal      seq:     41213  DEL     24859
PageFile_2_0, type: Formal      seq:     41213  DEL     24865
PageFile_2_0, type: Formal      seq:     41213  DEL     24843 -- old stable for seg[134], epoch 39

PageFile_2_0, type: Formal      seq:     41214  PUT     pid:    24870                   2               0               0               0               0       0000000000000000
PageFile_2_0, type: Formal      seq:     41215  REF     24874   24868
PageFile_2_0, type: Formal      seq:     41216  DEL     24868
PageFile_2_0, type: Formal      seq:     41216  DEL     24867

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants