Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coredumped on rocksdb::CopyFile during cold backup. #311

Closed
neverchanje opened this issue Mar 29, 2019 · 7 comments
Closed

coredumped on rocksdb::CopyFile during cold backup. #311

neverchanje opened this issue Mar 29, 2019 · 7 comments
Labels
type/bug This issue reports a bug.

Comments

@neverchanje
Copy link
Contributor

c3srv-browser,
2019/3/20

#0  0x000000000088bc95 in rocksdb::CopyFile(rocksdb::Env*, std::string const&, std::string const&, unsigned long, bool) ()
#1  0x00000000008a4c03 in rocksdb::CheckpointImpl::CreateCheckpointQuick(std::string const&, unsigned long*) ()
#2  0x0000000000570f0a in pegasus::server::pegasus_server_impl::copy_checkpoint_to_dir_unsafe (this=this@entry=0x2f9b57000, 
    checkpoint_dir=0x9895d8018 "/home/work/ssd5/pegasus/c3srv-browser/replica/reps/16.3.pegasus/backup/backup_tmp.every_day.1553888104417.1553888105354", checkpoint_decree=0x7fb9b3e92ff8)
    at /home/work/qinzuoyan/Pegasus/pegasus/src/server/pegasus_server_impl.cpp:1937
#3  0x0000000000571280 in pegasus::server::pegasus_server_impl::copy_checkpoint_to_dir (this=0x2f9b57000, checkpoint_dir=<optimized out>, last_decree=<optimized out>)
    at /home/work/qinzuoyan/Pegasus/pegasus/src/server/pegasus_server_impl.cpp:1909
#4  0x00007fb9fb2113e7 in dsn::replication::replica::local_create_backup_checkpoint (this=0x2f6c0a480, backup_context=...)
    at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/dist/replication/lib/replica_backup.cpp:683
#5  0x00007fb9fb211b04 in operator() (__closure=<optimized out>) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/dist/replication/lib/replica_backup.cpp:653
#6  std::_Function_handler<void(), dsn::replication::replica::wait_async_checkpoint_for_backup(dsn::replication::cold_backup_context_ptr)::__lambda23>::_M_invoke(const std::_Any_data &) (
    __functor=...) at /home/work/qinzuoyan/Pegasus/toolchain/output/include/c++/4.8.2/functional:2071
#7  0x00007fb9fb31dce9 in dsn::task::exec_internal (this=this@entry=0x9a4597ac5) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task.cpp:180
#8  0x00007fb9fb39e42d in dsn::task_worker::loop (this=0x23eb4a0) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task_worker.cpp:211
#9  0x00007fb9fb39e5f9 in dsn::task_worker::run_internal (this=0x23eb4a0) at /home/work/qinzuoyan/Pegasus/pegasus/rdsn/src/core/core/task_worker.cpp:191
#10 0x00007fb9f8158600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#11 0x00007fb9f8dc5dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fb9f78c273d in clone () from /lib64/libc.so.6
@neverchanje neverchanje added the type/bug This issue reports a bug. label Mar 29, 2019
@vagetablechicken
Copy link
Contributor

这个bug还是需要rocksdb的symbol table,目前rocksdb都是release模式编译的。
如果改成debug模式,没有-O3优化,可能会影响性能。
rocksdb有一个RelWithDebInfo模式,具体为

//Flags used by the CXX compiler during RELWITHDEBINFO builds.
CMAKE_CXX_FLAGS_RELWITHDEBINFO:STRING=-O2 -g -DNDEBUG

可以考虑用这个模式,得到更多信息,以便定位bug。

@neverchanje
Copy link
Contributor Author

neverchanje commented Aug 30, 2019

c4srv-msg, c4-hadoop-pegasus-srv-st22, 2019/8/30

Coredump stack

(gdb) bt
#0  0x00000000008f0c95 in rocksdb::CopyFile(rocksdb::Env*, std::string const&, std::string const&, unsigned long, bool) ()
#1  0x0000000000901a43 in rocksdb::CheckpointImpl::CreateCheckpointQuick(std::string const&, unsigned long*) ()
#2  0x00000000006ea09a in pegasus::server::pegasus_server_impl::copy_checkpoint_to_dir_unsafe (this=this@entry=0x70513800, 
    checkpoint_dir=0x2da800d08 "/home/work/ssd5/pegasus/c4srv-msg/replica/reps/14.4.pegasus/backup/backup_tmp.every_day.1567108847214.1567108847817", 
    checkpoint_decree=0x7ffc890e4ff8) at /home/wutao1/pegasus-release/src/server/pegasus_server_impl.cpp:1949
#3  0x00000000006ea410 in pegasus::server::pegasus_server_impl::copy_checkpoint_to_dir (this=0x70513800, checkpoint_dir=<optimized out>, last_decree=<optimized out>)
    at /home/wutao1/pegasus-release/src/server/pegasus_server_impl.cpp:1921
#4  0x00007ffccec8a8e7 in dsn::replication::replica::local_create_backup_checkpoint (this=0x1d0f0900, backup_context=...)
    at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica_backup.cpp:683
#5  0x00007ffccec8b004 in operator() (__closure=<optimized out>) at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica_backup.cpp:653
#6  std::_Function_handler<void(), dsn::replication::replica::wait_async_checkpoint_for_backup(dsn::replication::cold_backup_context_ptr)::__lambda24>::_M_invoke(const std::_Any_data &) (__functor=...) at /home/wutao1/app/include/c++/4.8.2/functional:2071
#7  0x00007ffcceda0cd9 in dsn::task::exec_internal (this=this@entry=0x4e7c6c600) at /home/wutao1/pegasus-release/rdsn/src/core/core/task.cpp:180
#8  0x00007ffccedb4a6d in dsn::task_worker::loop (this=0x27fb290) at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:211
#9  0x00007ffccedb4c39 in dsn::task_worker::run_internal (this=0x27fb290) at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:191
#10 0x00007ffccbba7600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#11 0x00007ffccc81adc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007ffccb31173d in clone () from /lib64/libc.so.6

Server version

Pegasus Server 1.11.6 (9f4e5ae) release

RelWithDebInfo enabled.

@vagetablechicken
Copy link
Contributor

发布版应该是BUILD_TYPE=release吧, https://github.com/XiaoMi/pegasus/blob/master/run.sh#L232 rocksdb实际应该是release版本. 我在core的主机上gdb, 也是"no symbol table info available". 如果是relwithdeginfo, 应该是有symbol的.

@foreverneverer
Copy link
Contributor

c3srv-browser c3-hadoop-pegasus-srv-st161.bj, 2019/9/9

Coredump stack

(gdb) bt
#0  0x00000000008f0c95 in rocksdb::CopyFile(rocksdb::Env*, std::string const&, std::string const&, unsigned long, bool) ()
#1  0x0000000000901a43 in rocksdb::CheckpointImpl::CreateCheckpointQuick(std::string const&, unsigned long*) ()
#2  0x00000000006ea09a in pegasus::server::pegasus_server_impl::copy_checkpoint_to_dir_unsafe (this=this@entry=0x2b8098800, 
    checkpoint_dir=0x492b780a8 "/home/work/ssd5/pegasus/c3srv-browser/replica/reps/2.29.pegasus/backup/backup_tmp.every_day.1567971025551.1567971027178", checkpoint_decree=0x7fdbe8fe1ff8)
    at /home/wutao1/pegasus-release/src/server/pegasus_server_impl.cpp:1949
#3  0x00000000006ea410 in pegasus::server::pegasus_server_impl::copy_checkpoint_to_dir (this=0x2b8098800, checkpoint_dir=<optimized out>, last_decree=<optimized out>)
    at /home/wutao1/pegasus-release/src/server/pegasus_server_impl.cpp:1921
#4  0x00007fdc2f3888e7 in dsn::replication::replica::local_create_backup_checkpoint (this=0x323f06d00, backup_context=...)
    at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica_backup.cpp:683
#5  0x00007fdc2f389004 in operator() (__closure=<optimized out>) at /home/wutao1/pegasus-release/rdsn/src/dist/replication/lib/replica_backup.cpp:653
#6  std::_Function_handler<void(), dsn::replication::replica::wait_async_checkpoint_for_backup(dsn::replication::cold_backup_context_ptr)::__lambda24>::_M_invoke(const std::_Any_data &) (__functor=...)
    at /home/wutao1/app/include/c++/4.8.2/functional:2071
#7  0x00007fdc2f49ecd9 in dsn::task::exec_internal (this=this@entry=0xf43baf2ee) at /home/wutao1/pegasus-release/rdsn/src/core/core/task.cpp:180
#8  0x00007fdc2f4b2a6d in dsn::task_worker::loop (this=0x2389340) at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:211
#9  0x00007fdc2f4b2c39 in dsn::task_worker::run_internal (this=0x2389340) at /home/wutao1/pegasus-release/rdsn/src/core/core/task_worker.cpp:191
#10 0x00007fdc2c2a5600 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
    at /home/qinzuoyan/git.xiaomi/pegasus/toolchain/objdir/../gcc-4.8.2/libstdc++-v3/src/c++11/thread.cc:84
#11 0x00007fdc2cf18dc5 in start_thread () from /lib64/libpthread.so.0
#12 0x00007fdc2ba0f73d in clone () from /lib64/libc.so.6

Server version

pegasus-server-1.11.6-9f4e5ae-glibc2.12-release

@vagetablechicken
Copy link
Contributor

尽快编译debug版本升级吧

@acelyc111
Copy link
Member

尽快编译debug版本升级吧

release 版本加-g 也是可以的吧?

@vagetablechicken
Copy link
Contributor

尽快编译debug版本升级吧

release 版本加-g 也是可以的吧?

当时的目的是为了查bug,所以比较希望优化O等级低一点,不然optimizedout的话,就查不出什么东西了

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug This issue reports a bug.
Projects
None yet
Development

No branches or pull requests

4 participants