Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to parse checkpoint manifest #7160

Closed
JaySon-Huang opened this issue Mar 24, 2023 · 3 comments · Fixed by #7170
Closed

Failed to parse checkpoint manifest #7160

JaySon-Huang opened this issue Mar 24, 2023 · 3 comments · Fixed by #7170
Labels
severity/moderate type/bug The issue is confirmed as a bug.

Comments

@JaySon-Huang
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

deploy tiflash with S3 disagg on AWS

2. What did you expect to see? (Required)

3. What did you see instead (Required)

There is some manifest with protobuf length > 1GiB

[2023/03/24 08:49:10.532 +00:00] [INFO] [S3GCManager.cpp:149] ["latest manifest, gc_store_id=172 upload_seq=486 key=s172/manifest/mf_486"] [thread_id=14]
[2023/03/24 08:49:10.532 +00:00] [INFO] [S3GCManager.cpp:512] ["Reading manifest, key=s172/manifest/mf_486"] [thread_id=14]
[2023/03/24 08:49:16.765 +00:00] [ERROR] [Exception.cpp:90] ["Code: 49, e.displayText() = DB::Exception: Check prefix_size < MAX_SIZE failed: Expect total message to be < 1GiB, size=1401166563, e.what() = DB::Exception, Stack trace:\n\n\n       0x1b4d7c3\tStackTrace::StackTrace() [tiflash+28628931]\n                \tdbms/src/Common/StackTrace.cpp:23\n       0x1b4b5c6\tDB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int) [tiflash+28620230]\n                \tdbms/src/Common/Exception.h:46\n       0x7428625\tDB::PS::V3::details::readMessageWithLength(DB::ReadBuffer&, google::protobuf::MessageLite&) [tiflash+121800229]\n                \tdbms/src/Storages/Page/V3/CheckpointFile/ProtoHelper.cpp:31\n       0x741a61b\tDB::PS::V3::CPManifestFileReader::readEdits(std::__1::unordered_map<std::__1::basic_string_view<char, std::__1::char_traits<char> >, std::__1::shared_ptr<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const>, std::__1::hash<std::__1::basic_string_view<char, std::__1::char_traits<char> > >, std::__1::equal_to<std::__1::basic_string_view<char, std::__1::char_traits<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string_view<char, std::__1::char_traits<char> > const, std::__1::shared_ptr<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const> > > >&) [tiflash+121742875]\n                \tdbms/src/Storages/Page/V3/CheckpointFile/CPManifestFileReader.cpp:40\n       0x7e99ba8\tDB::S3::S3GCManager::getValidLocksFromManifest(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) [tiflash+132750248]\n                \tdbms/src/Storages/S3/S3GCManager.cpp:520\n       0x7e9942c\tDB::S3::S3GCManager::runForStore(unsigned long) [tiflash+132748332]\n                \tdbms/src/Storages/S3/S3GCManager.cpp:152\n       0x7e98483\tDB::S3::S3GCManager::runOnAllStores() [tiflash+132744323]\n                \tdbms/src/Storages/S3/S3GCManager.cpp:126\n       0x7e9d7b1\tstd::__1::__function::__func<DB::S3::S3GCManagerService::S3GCManagerService(DB::Context&, std::__1::shared_ptr<pingcap::pd::IClient>, std::__1::shared_ptr<DB::OwnerManager>, std::__1::shared_ptr<DB::S3::IS3LockClient>, DB::S3::S3GCConfig const&)::$_23, std::__1::allocator<DB::S3::S3GCManagerService::S3GCManagerService(DB::Context&, std::__1::shared_ptr<pingcap::pd::IClient>, std::__1::shared_ptr<DB::OwnerManager>, std::__1::shared_ptr<DB::S3::IS3LockClient>, DB::S3::S3GCConfig const&)::$_23>, bool
()>::operator()() [tiflash+132765617]\n                \t/data3/jaysonhuang/tiflash-env-13/sysroot/bin/../include/c++/v1/__functional/function.h:345\n       0x7cdbc7d\tDB::BackgroundProcessingPool::threadFunction(unsigned long) [tiflash+130923645]\n                \tdbms/src/Storages/BackgroundProcessingPool.cpp:234\n       0x7cdc665\tvoid* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, DB::BackgroundProcessingPool::BackgroundProcessingPool(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >)::$_1> >(void*) [tiflash+130926181]\n                \t/data3/jaysonhuang/tiflash-env-13/sysroot/bin/../include/c++/v1/thread:291\n  0x7f6ba5451609\tstart_thread [libpthread.so.0+34313]\n                \t/build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477\n  0x7f6ba5298133\tclone [libc.so.6+1175859]\n                \t/build/glibc-SzIz7B/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95"] [source="void DB::BackgroundProcessingPool::threadFunction(size_t)"] [thread_id=14]

4. What is your TiFlash version? (Required)

master

@JaySon-Huang JaySon-Huang added type/bug The issue is confirmed as a bug. severity/moderate labels Mar 24, 2023
@JaySon-Huang
Copy link
Contributor Author

ref #6882

@JaySon-Huang
Copy link
Contributor Author

The file size (after compression) is 432MiB (./mf_486). The number of records is 22,501,014, and the serialized size of all records in one protobuf is 1.30 GiB (decompressed). The number of locks is 1532.

[2023/03/24 19:42:36.367 +08:00] [INFO] [ProtoHelper.cpp:34] ["reading pb with size=1401166563(1.30 GiB)"] [thread_id=1]
[2023/03/24 19:43:18.852 +08:00] [INFO] [gtest_file_read_write.cpp:122] ["num of records=22501014"] [source=CheckpointFileTest] [thread_id=1]
[2023/03/24 19:43:20.789 +08:00] [INFO] [ProtoHelper.cpp:34] ["reading pb with size=0(0.00 B)"] [thread_id=1]
[2023/03/24 19:43:20.789 +08:00] [INFO] [ProtoHelper.cpp:34] ["reading pb with size=64399(62.89 KiB)"] [thread_id=1]
[2023/03/24 19:43:20.791 +08:00] [INFO] [gtest_file_read_write.cpp:129] ["num of locks=1532"] [source=CheckpointFileTest] [thread_id=1]
[2023/03/24 19:43:20.791 +08:00] [INFO] [ProtoHelper.cpp:34] ["reading pb with size=0(0.00 B)"] [thread_id=1]

@JaySon-Huang
Copy link
Contributor Author

Fixed by #7160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant