Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data race in bucket merge path #4324

Closed
marta-lokhova opened this issue May 15, 2024 · 0 comments
Closed

data race in bucket merge path #4324

marta-lokhova opened this issue May 15, 2024 · 0 comments
Labels

Comments

@marta-lokhova
Copy link
Contributor

It looks like core is racing on shutdown:

  • in ~ApplicationImpl, we shutdown main IO service before threads are joined
  • As a result, background may still spin up a merge while application is in half-shutdown state (which uses IO context).

Glancing at the code, it looks like background thread uses Application, BucketManager and application's clock (specifically, its IO context to create a new buffered stream, which seems to be racing with main thread IO context shutdown). It's not clear whether this is an issue in the normal merge path as well, so definitely worth investigating also.

tsan output below:

==================
WARNING: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) (pid=39872)
  Write of size 8 at 0x0001183ca818 by main thread:
    #0 stellar::ApplicationImpl::~ApplicationImpl() ApplicationImpl.cpp:669 (stellar-core:arm64+0x1003ba918)
    #1 stellar::ApplicationLoopbackOverlay::~ApplicationLoopbackOverlay() Simulation.h:136 (stellar-core:arm64+0x100c0fa84)
    #2 std::__1::__shared_ptr_emplace<stellar::ApplicationLoopbackOverlay, std::__1::allocator<stellar::ApplicationLoopbackOverlay>>::__on_zero_shared() shared_ptr.h:324 (stellar-core:arm64+0x100c0fa24)
    #3 stellar::Simulation::Node::~Node() Simulation.h:109 (stellar-core:arm64+0x100c0e3cc)
    #4 std::__1::__tree<std::__1::__value_type<stellar::PublicKey, stellar::Simulation::Node>, std::__1::__map_value_compare<stellar::PublicKey, std::__1::__value_type<stellar::PublicKey, stellar::Simulation::Node>, std::__1::less<stellar::PublicKey>, true>, std::__1::allocator<std::__1::__value_type<stellar::PublicKey, stellar::Simulation::Node>>>::destroy(std::__1::__tree_node<std::__1::__value_type<stellar::PublicKey, stellar::Simulation::Node>, void*>*) __tree:1811 (stellar-core:arm64+0x100c0ea88)
    #5 stellar::Simulation::~Simulation() Simulation.cpp:63 (stellar-core:arm64+0x100c07afc)
    #6 stellar::Simulation::~Simulation() Simulation.cpp:51 (stellar-core:arm64+0x100c07e88)
    #7 std::__1::__shared_ptr_emplace<stellar::Simulation, std::__1::allocator<stellar::Simulation>>::__on_zero_shared() shared_ptr.h:324 (stellar-core:arm64+0x1007ae434)
    #8 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4717 (stellar-core:arm64+0x1007587f0)
    #9 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4594 (stellar-core:arm64+0x1007562cc)
    #10 Catch::RunContext::invokeActiveTestCase() catch.hpp:13025 (stellar-core:arm64+0x100cc0d04)
    #11 Catch::RunContext::runCurrentTest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) catch.hpp:12998 (stellar-core:arm64+0x100cbd8c8)
    #12 Catch::RunContext::runTest(Catch::TestCase const&) catch.hpp:12759 (stellar-core:arm64+0x100cbc834)
    #13 Catch::Session::runInternal() catch.hpp:13562 (stellar-core:arm64+0x100cc56d8)
    #14 Catch::Session::run() catch.hpp:13518 (stellar-core:arm64+0x100cc481c)
    #15 stellar::runTest(stellar::CommandLineArgs const&) test.cpp:438 (stellar-core:arm64+0x100cef370)
    #16 std::__1::__function::__func<int (*)(stellar::CommandLineArgs const&), std::__1::allocator<int (*)(stellar::CommandLineArgs const&)>, int (stellar::CommandLineArgs const&)>::operator()(stellar::CommandLineArgs const&) function.h:364 (stellar-core:arm64+0x100463fd4)
    #17 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1911 (stellar-core:arm64+0x100434f34)
    #18 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1823 (stellar-core:arm64+0x1004315f4)
    #19 <null> <null> (0x00019644a0e0)

  Previous read of size 8 at 0x0001183ca818 by thread T1177:
    #0 std::__1::__packaged_task_func<stellar::FutureBucket::startMerge(stellar::Application&, unsigned int, bool, unsigned int)::$_2, std::__1::allocator<stellar::FutureBucket::startMerge(stellar::Application&, unsigned int, bool, unsigned int)::$_2>, std::__1::shared_ptr<stellar::Bucket> ()>::operator()() future:1706 (stellar-core:arm64+0x1000d9e00)
    #1 std::__1::packaged_task<std::__1::shared_ptr<stellar::Bucket> ()>::operator()() future:1969 (stellar-core:arm64+0x1000d7c30)
    #2 std::__1::__function::__func<std::__1::__bind<void (std::__1::packaged_task<std::__1::shared_ptr<stellar::Bucket> ()>::*)(), std::__1::shared_ptr<std::__1::packaged_task<std::__1::shared_ptr<stellar::Bucket> ()>>&>, std::__1::allocator<std::__1::__bind<void (std::__1::packaged_task<std::__1::shared_ptr<stellar::Bucket> ()>::*)(), std::__1::shared_ptr<std::__1::packaged_task<std::__1::shared_ptr<stellar::Bucket> ()>>&>>, void ()>::operator()() function.h:364 (stellar-core:arm64+0x1000dae3c)
    #3 asio::detail::executor_op<asio::detail::binder0<stellar::ApplicationImpl::postOnBackgroundThread(std::__1::function<void ()>&&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>)::$_45>, std::__1::allocator<void>, asio::detail::scheduler_operation>::do_complete(void*, asio::detail::scheduler_operation*, std::__1::error_code const&, unsigned long) executor_op.hpp:70 (stellar-core:arm64+0x1003c4ff4)
    #4 asio::detail::scheduler::do_run_one(asio::detail::conditionally_enabled_mutex::scoped_lock&, asio::detail::scheduler_thread_info&, std::__1::error_code const&) scheduler.ipp:492 (stellar-core:arm64+0x10142e7f0)
    #5 asio::detail::scheduler::run(std::__1::error_code&) scheduler.ipp:209 (stellar-core:arm64+0x10141ddcc)
    #6 asio::io_context::run() io_context.ipp:63 (stellar-core:arm64+0x10141dbf8)
    #7 void* std::__1::__thread_proxy[abi:ue170006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, stellar::ApplicationImpl::ApplicationImpl(stellar::VirtualClock&, stellar::Config const&)::$_49>>(void*) thread.h:238 (stellar-core:arm64+0x1003c6264)

  Location is heap block of size 2032 at 0x0001183ca800 allocated by main thread:
    #0 operator new(unsigned long) <null>:60804740 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x84420)
    #1 std::__1::shared_ptr<stellar::ApplicationLoopbackOverlay> stellar::Application::create<stellar::ApplicationLoopbackOverlay, stellar::Simulation&>(stellar::VirtualClock&, stellar::Config const&, stellar::Simulation&, bool, bool) Application.h:316 (stellar-core:arm64+0x100c0f764)
    #2 stellar::Simulation::addNode(stellar::SecretKey, stellar::SCPQuorumSet, stellar::Config const*, bool, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) Simulation.cpp:140 (stellar-core:arm64+0x100c08580)
    #3 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4611 (stellar-core:arm64+0x10075665c)
    #4 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4594 (stellar-core:arm64+0x1007562cc)
    #5 Catch::RunContext::invokeActiveTestCase() catch.hpp:13025 (stellar-core:arm64+0x100cc0d04)
    #6 Catch::RunContext::runCurrentTest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) catch.hpp:12998 (stellar-core:arm64+0x100cbd8c8)
    #7 Catch::RunContext::runTest(Catch::TestCase const&) catch.hpp:12759 (stellar-core:arm64+0x100cbc834)
    #8 Catch::Session::runInternal() catch.hpp:13562 (stellar-core:arm64+0x100cc56d8)
    #9 Catch::Session::run() catch.hpp:13518 (stellar-core:arm64+0x100cc481c)
    #10 stellar::runTest(stellar::CommandLineArgs const&) test.cpp:438 (stellar-core:arm64+0x100cef370)
    #11 std::__1::__function::__func<int (*)(stellar::CommandLineArgs const&), std::__1::allocator<int (*)(stellar::CommandLineArgs const&)>, int (stellar::CommandLineArgs const&)>::operator()(stellar::CommandLineArgs const&) function.h:364 (stellar-core:arm64+0x100463fd4)
    #12 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1911 (stellar-core:arm64+0x100434f34)
    #13 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1823 (stellar-core:arm64+0x1004315f4)
    #14 <null> <null> (0x00019644a0e0)

  Thread T1177 (tid=24080104, running) created by main thread at:
    #0 pthread_create <null>:60804740 (libclang_rt.tsan_osx_dynamic.dylib:arm64e+0x3062c)
    #1 stellar::ApplicationImpl::ApplicationImpl(stellar::VirtualClock&, stellar::Config const&) ApplicationImpl.cpp:178 (stellar-core:arm64+0x1003b3e50)
    #2 stellar::TestApplication::TestApplication(stellar::VirtualClock&, stellar::Config const&) TestUtils.cpp:156 (stellar-core:arm64+0x100c74260)
    #3 std::__1::shared_ptr<stellar::ApplicationLoopbackOverlay> stellar::Application::create<stellar::ApplicationLoopbackOverlay, stellar::Simulation&>(stellar::VirtualClock&, stellar::Config const&, stellar::Simulation&, bool, bool) Application.h:316 (stellar-core:arm64+0x100c0f7e4)
    #4 stellar::Simulation::addNode(stellar::SecretKey, stellar::SCPQuorumSet, stellar::Config const*, bool, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) Simulation.cpp:140 (stellar-core:arm64+0x100c08580)
    #5 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4611 (stellar-core:arm64+0x10075665c)
    #6 C_A_T_C_H_T_E_S_T_243() HerderTests.cpp:4594 (stellar-core:arm64+0x1007562cc)
    #7 Catch::RunContext::invokeActiveTestCase() catch.hpp:13025 (stellar-core:arm64+0x100cc0d04)
    #8 Catch::RunContext::runCurrentTest(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) catch.hpp:12998 (stellar-core:arm64+0x100cbd8c8)
    #9 Catch::RunContext::runTest(Catch::TestCase const&) catch.hpp:12759 (stellar-core:arm64+0x100cbc834)
    #10 Catch::Session::runInternal() catch.hpp:13562 (stellar-core:arm64+0x100cc56d8)
    #11 Catch::Session::run() catch.hpp:13518 (stellar-core:arm64+0x100cc481c)
    #12 stellar::runTest(stellar::CommandLineArgs const&) test.cpp:438 (stellar-core:arm64+0x100cef370)
    #13 std::__1::__function::__func<int (*)(stellar::CommandLineArgs const&), std::__1::allocator<int (*)(stellar::CommandLineArgs const&)>, int (stellar::CommandLineArgs const&)>::operator()(stellar::CommandLineArgs const&) function.h:364 (stellar-core:arm64+0x100463fd4)
    #14 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1911 (stellar-core:arm64+0x100434f34)
    #15 stellar::handleCommandLine(int, char* const*) CommandLine.cpp:1823 (stellar-core:arm64+0x1004315f4)
    #16 <null> <null> (0x00019644a0e0)

SUMMARY: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) ApplicationImpl.cpp:669 in stellar::ApplicationImpl::~ApplicationImpl()
==================
latobarita added a commit that referenced this issue May 22, 2024
…ket-merge

Fix #4324 by switching io_context in merge, also reduce contact with app

Reviewed-by: marta-lokhova
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant