-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docdb] Various race conditions when doing concurrent TRUNCATE and write operations #3288
Comments
See for the user setup test case (in Java): https://github.com/lhotari/yugabyte-bugs-repro/blob/master/src/test/java/com/github/lhotari/dbcontainer/yugabyte/truncatetable/AbstractTruncateTableCrashBugTest.java We should probably port this into our test suite as well and confirm it repros, as expected! Should make it far easier to debug and fix then! |
Another stack seen with same workload -
|
Original race condition stack -
|
Stack from TSAN test
|
Stack when new test
|
Jotting down a recent ASAN failure from master highlighting use-after-free issue post shutdown: https://detective-gcp.dev.yugabyte.com/job/github-yugabyte-db-centos-master-clang-asan%2F626%2Fartifact%2Fbuild%2Fasan-clang-dynamic-ninja%2Fyb-test-logs%2Ftests-client__snapshot-txn-test%2FSnapshotTxnTest_MultiWriteWithRestart.log?max_lines=3000&start_line=9001 I'll change this issue description to repurpose it to be TRUNCATE specific and spin off a separate issue for the background flush in rocksdb not being waited on during destructor. |
Stack from #3288 (comment) is the result of race between the following two accesses -
|
…condition between insert and truncate; disable rocksdb flush on truncate Summary: - Prevent rocksdb instance from calling `ListenFilesChanged` callback after being detached from `Tablet` Full stack available at - #3288 (comment) Follow Up Work: #3476 - Prevent race between `Tablet::AcquireLocksAndPerformDocOperations() -> Tablet::StartDocWriteOperation()` and `Tablet::Truncate()` by incrementing `pending_op_counter_` Full stack available at - #3288 (comment) - Do not flush rocksdb memtable when user truncates table to prevent following crash on flush - ``` #0 0x000056371c706b00 in ?? () #1 0x00007f5229291099 in std::__invoke_impl<void, void (yb::tablet::Tablet::*&)(), yb::tablet::Tablet*&> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/7/bits/invoke.h:73 #2 std::__invoke<void (yb::tablet::Tablet::*&)(), yb::tablet::Tablet*&> (__fn=<optimized out>) at /usr/include/c++/7/bits/invoke.h:95 #3 std::_Bind<void (yb::tablet::Tablet::*(yb::tablet::Tablet*))()>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>) (__args=..., this=<optimized out>) at /usr/include/c++/7/functional:467 #4 std::_Bind<void (yb::tablet::Tablet::*(yb::tablet::Tablet*))()>::operator()<, void>() (this=<optimized out>) at /usr/include/c++/7/functional:551 #5 std::_Function_handler<void (), std::_Bind<void (yb::tablet::Tablet::*(yb::tablet::Tablet*))()> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316 #6 0x00007f5225aee92c in std::function<void ()>::operator()() const (this=0x56371c5534f0) at /usr/include/c++/7/bits/std_function.h:706 #7 rocksdb::DBImpl::FilesChanged (this=this@entry=0x56371c552b00) at ../../src/yb/rocksdb/db/db_impl.cc:4359 #8 0x00007f5225b14b1b in rocksdb::DBImpl::BackgroundCallFlush (this=this@entry=0x56371c552b00, cfd=cfd@entry=0x0) at ../../src/yb/rocksdb/db/db_impl.cc:3285 ``` Full stack available at - #3288 (comment) - WORKAROUND for race condition (#3288 (comment)) by using `ScopedPendingOperation` in `Tablet::ShouldApplyWrite()` to prevent it from seeing a null `regular_db_` due to a concurrent `Tablet::Truncate()` Full stack available at - #3288 (comment) Follow Up Work: #3477 Test Plan: ./yb_build.sh debug --cxx-test pg_libpq-test --gtest_filter PgLibPqTest.ConcurrentInsertTruncateForeignKey ./yb_build.sh debug --cxx-test snapshot-txn-test --gtest_filter SnapshotTxnTest.MultiWriteWithRestart Reviewers: bogdan, sergei, mikhail Reviewed By: mikhail Subscribers: kannan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D7808
When executing concurrent write and truncate operations, we have noticed several potential race conditions, leading to SEGVs on accessing null pointers. Describing these below:
Truncate
, the Tablet code will shutdown rocksdb, reset its unique_ptr and re-create it. However, the Tablet also sets up a background listener callback throughListenFilesChanged
. Due to [docdb] Rocksdb shutdown code does not wait for all pending flushes #3476, it is possible that the rocksdb we are closing is currently flushing/compacting a file, and, when the file is created and the callback has to be called, our rocksdb instance has already been destroyed.To address this situation, we consider the following:
On the leader write path, when calling
AcquireLocksAndPerformDocOperations-> StartDocWriteOperation
, we eventually require accessing the intents rocksdb for resolving conflicts. However, this path does not leverage thepending_op_counter_
, which is what we currently use to ensure we do not operate on invalid state.We also noticed a race condition on followers, between processing write operations (checking rocksdb::NeedsDelay) and applying truncate operations (essentially resetting the rocksdb instance). This requires further investigation into our async model of processing and applying operations, so have filed [docdb] Investigate raft race between incoming write operations and background applies #3477. For now, we can use the same
pending_op_counter_
mechanism to ensure that the call toTablet::ShouldApplyWrite
is similarly serialized, like above.The text was updated successfully, but these errors were encountered: