Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] TSAN race condition in YBTransaction::Impl::PrepareHeartbeatRPC #17957

Closed
1 task done
mbautin opened this issue Jun 27, 2023 · 6 comments
Closed
1 task done
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/medium Medium priority issue

Comments

@mbautin
Copy link
Contributor

mbautin commented Jun 27, 2023

Jira Link: DB-7026

Description

https://gist.githubusercontent.com/mbautin/55aecd59151b584cd1c7073d2ceffe84/raw

./yb_build.sh tsan --cxx-test pg_single_tserver-test --gtest_filter PgSingleTServerTest.ScanWithPackedRow

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@mbautin mbautin added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Jun 27, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 27, 2023
@rthallamko3
Copy link
Contributor

rthallamko3 commented Jun 27, 2023

WARNING: ThreadSanitizer: data race (pid=15463)
  Read of size 8 at 0x7b6000190410 by thread T74:
    #0 yb::client::YBTransaction::Impl::PrepareHeartbeatRPC(std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, scoped_refptr<yb::client::internal::RemoteTablet> const&, yb::TransactionStatus, std::function<void (yb::Status const&, yb::tserver::UpdateTransactionRequestPB const&, yb::tserver::UpdateTransactionResponsePB const&)>, std::optional<yb::UnsignedIntSet<unsigned int>>, std::unordered_map<string, yb::client::YBTransaction::Impl::TabletState, std::hash<string>, std::equal_to<string>, std::allocator<std::pair<string const, yb::client::YBTransaction::Impl::TabletState>>> const&) ${BUILD_ROOT}/../../src/yb/client/transaction.cc:1148:9 (libyb_client.so+0x4a4c75)
    #1 yb::client::YBTransaction::Impl::SendHeartbeat(yb::TransactionStatus, yb::StronglyTypedUuid<yb::TransactionId_Tag> const&, std::weak_ptr<yb::client::YBTransaction> const&, yb::StronglyTypedBool<yb::client::(anonymous namespace)::SendHeartbeatToNewTablet_Tag>) ${BUILD_ROOT}/../../src/yb/client/transaction.cc:1725:13 (libyb_client.so+0x484823)
    #2 yb::client::YBTransaction::Impl::HeartbeatDone(yb::Status, yb::tserver::UpdateTransactionRequestPB const&, yb::tserver::UpdateTransactionResponsePB const&, yb::TransactionStatus, std::shared_ptr<yb::client::YBTransaction> const&, yb::StronglyTypedBool<yb::client::(anonymous namespace)::SendHeartbeatToNewTablet_Tag>)::'lambda'(yb::Status const&)::operator()(yb::Status const&) const ${BUILD_ROOT}/../../src/yb/client/transaction.cc:1810:17 (libyb_client.so+0x48618b)
    #3 yb::rpc::ScheduledTask<yb::client::YBTransaction::Impl::HeartbeatDone(yb::Status, yb::tserver::UpdateTransactionRequestPB const&, yb::tserver::UpdateTransactionResponsePB const&, yb::TransactionStatus, std::shared_ptr<yb::client::YBTransaction> const&, yb::StronglyTypedBool<yb::client::(anonymous namespace)::SendHeartbeatToNewTablet_Tag>)::'lambda'(yb::Status const&)>::Run(yb::Status const&) ${BUILD_ROOT}/../../src/yb/rpc/scheduler.h:55:5 (libyb_client.so+0x48618b)
    #4 yb::rpc::Scheduler::Impl::HandleTimer(boost::system::error_code const&)::'lambda'()::operator()() const ${BUILD_ROOT}/../../src/yb/rpc/scheduler.cc:142:57 (libyrpc.so+0x1693d9)
    #5 void boost::asio::asio_handler_invoke<yb::rpc::Scheduler::Impl::HandleTimer(boost::system::error_code const&)::'lambda'()>(yb::rpc::Scheduler::Impl::HandleTimer(boost::system::error_code const&)::'lambda'()&, ...) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185651-6777477baa-centos7-x86_64-clang15/installed/tsan/include/boost/asio/handler_invoke_hook.hpp:88:3 (libyrpc.so+0x1693d9)
    #6 void boost_asio_handler_invoke_helpers::invoke<yb::rpc::Scheduler::Impl::HandleTimer(boost::system::error_code const&)::'lambda'()

  Previous write of size 8 at 0x7b6000190410 by thread T130 (mutexes: write M0):
    #0 yb::client::YBTransaction::Impl::CompleteInit(yb::IsolationLevel) ${BUILD_ROOT}/../../src/yb/client/transaction.cc:1080:12 (libyb_client.so+0x4a657b)
    #1 yb::client::YBTransaction::Impl::Init(yb::IsolationLevel, yb::ReadHybridTime const&) ${BUILD_ROOT}/../../src/yb/client/transaction.cc:314:5 (libyb_client.so+0x489705)
    #2 yb::client::YBTransaction::Init(yb::IsolationLevel, yb::ReadHybridTime const&) ${BUILD_ROOT}/../../src/yb/client/transaction.cc:2228:17 (libyb_client.so+0x480b6a)
    #3 yb::tserver::PgClientSession::BeginTransactionIfNecessary(yb::tserver::PgPerformOptionsPB const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>) ${BUILD_ROOT}/../../src/yb/tserver/pg_client_session.cc:1214:5 (libtserver.so+0x1deeb0)
    #4 yb::tserver::PgClientSession::SetupSession(yb::tserver::PgPerformRequestPB const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, yb::HybridTime) ${BUILD_ROOT}/../../src/yb/tserver/pg_client_session.cc:1078:5 (libtserver.so+0x1e16b4)
    #5 yb::Status yb::tserver::PgClientSession::DoPerform<std::shared_ptr<yb::tserver::RpcPerformQuery>>(std::shared_ptr<yb::tserver::RpcPerformQuery> const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, yb::rpc::RpcContext*) ${BUILD_ROOT}/../../src/yb/tserver/pg_client_session.cc:943:23 (libtserver.so+0x1f02cf)
    #6 yb::tserver::PgClientSession::Perform(yb::tserver::PgPerformRequestPB*, yb::tserver::PgPerformResponsePB*, yb::rpc::RpcContext*) ${BUILD_ROOT}/../../src/yb/tserver/pg_client_session.cc:911:17 (libtserver.so+0x1e0839)

@yugabyte-ci yugabyte-ci added kind/failing-test Tests and testing infra and removed status/awaiting-triage Issue awaiting triage labels Jun 27, 2023
@mbautin
Copy link
Contributor Author

mbautin commented Jun 27, 2023

The start_ field is set in CompleteInit but it is also used when preparing the heartbeat RPC. When the transaction is taken from a pool, it is already periodically sending heartbeat RPCs to the status tablet.

@mbautin
Copy link
Contributor Author

mbautin commented Jun 27, 2023

It would be good to add proper GUARDED_BY thread safety analysis annotation to this field.

@rthallamko3 rthallamko3 added priority/high High Priority and removed priority/medium Medium priority issue labels Jun 27, 2023
@bmatican
Copy link
Contributor

Potentially related issue in https://detective-gcp.dev.yugabyte.com/stability/test?branch=master&build_type=all&class=SealTxnTest&fail_tag=asan_heap_use_after_free&name=Update&platform=linux

==22080==ERROR: AddressSanitizer: heap-use-after-free on address 0x614000510110 at pc 0x7fa9b1e8d1d5 bp 0x7fa8d3fcb190 sp 0x7fa8d3fcb188
READ of size 8 at 0x614000510110 thread T234 (rpc_tp_TabletSe)
#0 0x7fa9b1e8d1d4 in std::shared_ptr::operator->[abi:v160006]() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/libcxx/include/c++/v1/__memory/shared_ptr.h:857:16
#1 0x7fa9b1e8d1d4 in yb::tablet::TransactionCoordinator::Impl::GetStatus(google::protobuf::RepeatedPtrField const&, std::chrono::time_point>>, yb::tserver::GetTransactionStatusResponsePB*) ${BUILD_ROOT}/../../src/yb/tablet/transaction_coordinator.cc:1204:37
...

freed by thread T262 (consensus [work) here:
#0 0x5555a9b28edd in operator delete(void*) /opt/yb-build/llvm/yb-llvm-v16.0.6-yb-1-1687337167-5c765d34-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:152:3
#1 0x7fa9b1e6ad22 in void std::__libcpp_operator_delete[abi:v160006](void*) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/libcxx/include/c++/v1/new:276:3
#2 0x7fa9b1e6ad22 in void std::__do_deallocate_handle_size[abi:v160006]<>(void*, unsigned long) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/libcxx/include/c++/v1/new:300:10
#3 0x7fa9b1e6ad22 in std::__libcpp_deallocate[abi:v160006](void*, unsigned long, unsigned long) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/libcxx/include/c++/v1/new:316:14
#4 0x7fa9b1e6ad22 in std::allocator>>>>>>::deallocate[abi:v160006](boost::multi_index::detail::hashed_index_node>>>>>*, unsigned long) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/libcxx/include/c++/v1/__memory/allocator.h:131:13
#5 0x7fa9b1e6ad22 in std::allocator_traits>>>>>>>::deallocate[abi:v160006](std::allocator>>>>>>&, boost::multi_index::detail::hashed_index_node>>>>>*, unsigned long) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/libcxx/include/c++/v1/__memory/allocator_traits.h:288:13
#6 0x7fa9b1e6ad22 in boost::multi_index::multi_index_container const&, &yb::tablet::(anonymous namespace)::TransactionState::id() const>, mpl_::na, mpl_::na, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, std::allocator>::deallocate_node(boost::multi_index::detail::hashed_index_node>>>>>*) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/include/boost/multi_index_container.hpp:645:5
#7 0x7fa9b1e6ad22 in boost::multi_index::multi_index_container const&, &yb::tablet::(anonymous namespace)::TransactionState::id() const>, mpl_::na, mpl_::na, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, std::allocator>::delete_node_(boost::multi_index::detail::hashed_index_node>>>>>*) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/include/boost/multi_index_container.hpp:956:5
#8 0x7fa9b1e6ad22 in boost::multi_index::multi_index_container const&, &yb::tablet::(anonymous namespace)::TransactionState::id() const>, mpl_::na, mpl_::na, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, std::allocator>::erase_(boost::multi_index::detail::hashed_index_node>>>>>*) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/include/boost/multi_index_container.hpp:950:5
#9 0x7fa9b1e6ad22 in boost::multi_index::detail::index_base const&, &yb::tablet::(anonymous namespace)::TransactionState::id() const>, mpl_::na, mpl_::na, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, std::allocator>::final_erase_(boost::multi_index::detail::hashed_index_node>>>>>*) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/include/boost/multi_index/detail/index_base.hpp:288:49
#10 0x7fa9b1e6ad22 in boost::multi_index::detail::hashed_index const&, &yb::tablet::(anonymous namespace)::TransactionState::id() const>, boost::hash const>, std::equal_to const>, boost::multi_index::detail::nth_layer<1, yb::tablet::(anonymous namespace)::TransactionState, boost::multi_index::indexed_by const&, &yb::tablet::(anonymous namespace)::TransactionState::id() const>, mpl_::na, mpl_::na, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, boost::multi_index::ordered_non_unique, boost::multi_index::const_mem_fun, mpl_::na>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, std::allocator>, boost::mpl::vector0, boost::multi_index::detail::hashed_unique_tag>::erase(boost::multi_index::detail::hashed_index_iterator>>>>>, boost::multi_index::detail::bucket_array>, boost::multi_index::detail::hashed_unique_tag, boost::multi_index::detail::hashed_index_global_iterator_tag>) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/include/boost/multi_index/hashed_index.hpp:373:11
#11 0x7fa9b1e6ad22 in yb::tablet::TransactionCoordinator::Impl::CheckCompleted(boost::multi_index::detail::hashed_index_iterator>>>>>, boost::multi_index::detail::bucket_array>, boost::multi_index::detail::hashed_unique_tag, boost::multi_index::detail::hashed_index_global_iterator_tag>) ${BUILD_ROOT}/../../src/yb/tablet/transaction_coordinator.cc:1868:29
#12 0x7fa9b1e801df in yb::tablet::TransactionCoordinator::Impl::ProcessReplicated(yb::tablet::TransactionCoordinator::ReplicatedData const&) ${BUILD_ROOT}/../../src/yb/tablet/transaction_coordinator.cc:1379:7
#13 0x7fa9b1e57c65 in yb::tablet::TransactionCoordinator::ProcessReplicated(yb::tablet::TransactionCoordinator::ReplicatedData const&) ${BUILD_ROOT}/../../src/yb/tablet/transaction_coordinator.cc:1934:17
#14 0x7fa9b1ff84fa in yb::tablet::UpdateTxnOperation::DoReplicated(long, yb::Status*) ${BUILD_ROOT}/../../src/yb/tablet/operations/update_txn_operation.cc:77:45
#15 0x7fa9b1f63dd0 in yb::tablet::Operation::Replicated(long, yb::StronglyTypedBool) ${BUILD_ROOT}/../../src/yb/tablet/operations/operation.cc:82:3
#16 0x7fa9b1fa6fe0 in yb::tablet::OperationDriver::ApplyTask(long, std::vector>*) ${BUILD_ROOT}/../../src/yb/tablet/operations/operation_driver.cc:404:31
#17 0x7fa9b1fa643b in yb::tablet::OperationDriver::ReplicationFinished(yb::Status const&, long, std::vector>*)
...

I wonder if this is something specific to this test (as we don't use the txn sealing feature), or if it's just a generic issue.

@mbautin
Copy link
Contributor Author

mbautin commented Jun 28, 2023

@bmatican I think ASAN error you posted is probably unrelated, because it is in TransactionCoordinator, while the stack trace in this issue is in YBTransaction.

@bmatican
Copy link
Contributor

Ok, my bad Mikhail. I took it out into a standalone GH issue: #17983

@yugabyte-ci yugabyte-ci added priority/medium Medium priority issue and removed priority/high High Priority labels Jul 12, 2023
es1024 added a commit that referenced this issue Jul 13, 2023
…reHeartbeatRPC

Summary:
This diff fixes a race condition introduced in 320cffb/D25351 between setting the `start_`
variable (time transaction was taken from the transaction pool) and reading the `start_` variable
during heartbeat in order to send it to the transaction coordinator for `GetOldTransactions` RPC
support by making `start_` atomic.

The value is only used for `GetOldTransactions`, was already not sent when `start_` has not been set
yet, and transaction coordinator code already expects cases where it has not been set, so aside
from the possibility of a torn read, this is a race condition with fairly isolated impact.
Jira: DB-7026

Test Plan: `ybd tsan --cxx-test pg_single_tserver-test --gtest_filter PgSingleTServerTest.ScanWithPackedRow`

Reviewers: sergei

Reviewed By: sergei

Subscribers: rsami, ybase, mbautin

Differential Revision: https://phorge.dev.yugabyte.com/D26741
@es1024 es1024 closed this as completed Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug kind/failing-test Tests and testing infra priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

5 participants