-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docdb] race condition on tablet shutdown with accessing rocksdb instance #6960
Labels
area/build-framework
Building and packaging third-party dependencies (repo: yugabyte/yugabyte-db-thirdparty)
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
Comments
robertsami
added a commit
that referenced
this issue
Feb 9, 2021
… in StillHasParentDataAfterSplit Summary: We noticed tsan failures caused by concurrent access of RocksDB pointers on a tablet by the heartbeater while said RocksDB pointers were being invalidated (by a TRUNCATE, DROP, or other shutdown which might invoke ResetRocksDBs). This diff synchronizes the Tablet::StillHasParentDataAfterSplit with the existing ScopedRWOperationPause invoked by TRUNCATE/DROP/ALTER operations by registering a ScopedRWOperation in the method call. In order to make this change, the signature of StillHasParentDataAfterSplit was also changed to return a Result<bool>, which then triggered some slightly more complex changes. Specifically: 1. We expose a new DefinitelyHasNoParentDataAfterSplit which returns false if StillHasParentDataAfterSplit fails or is true 2. ts_tablet_manager.cc uses DefinitelyHasNoParentDataAfterSplit to maintain a list of tablets which "maybe" should not be moved by the load balancer, and reports this list to the LB conservatively including any tablets for which DefinitelyHasNoParentDataAfterSplit returned false, which may include cases where StillHasParentDataAfterSplit simply failed 3. TabletSplitHeartbeatDataProvider will skip reporting tablets for which DefinitelyHasNoParentDataAfterSplit is false Test Plan: Try running tests which were triggering TSAN failures before: `ybd tsan --cxx-test client_snapshot-txn-test --gtest_filter SnapshotTxnTest.TruncateDuringShutdown -n 100` `ybd tsan --cxx-test client_backup-txn-test --gtest_filter BackupTxnTest.Simple -n 100` Reviewers: timur, nicolas Reviewed By: timur, nicolas Subscribers: mpolitov, sergei, nicolas, ybase, timur, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D10457
robertsami
added a commit
that referenced
this issue
Feb 23, 2021
…ing doc_db data in StillHasParentDataAfterSplit Summary: We noticed tsan failures caused by concurrent access of RocksDB pointers on a tablet by the heartbeater while said RocksDB pointers were being invalidated (by a TRUNCATE, DROP, or other shutdown which might invoke ResetRocksDBs). This diff synchronizes the Tablet::StillHasParentDataAfterSplit with the existing ScopedRWOperationPause invoked by TRUNCATE/DROP/ALTER operations by registering a ScopedRWOperation in the method call. In order to make this change, the signature of StillHasParentDataAfterSplit was also changed to return a Result<bool>, which then triggered some slightly more complex changes. Specifically: 1. We expose a new DefinitelyHasNoParentDataAfterSplit which returns false if StillHasParentDataAfterSplit fails or is true 2. ts_tablet_manager.cc uses DefinitelyHasNoParentDataAfterSplit to maintain a list of tablets which "maybe" should not be moved by the load balancer, and reports this list to the LB conservatively including any tablets for which DefinitelyHasNoParentDataAfterSplit returned false, which may include cases where StillHasParentDataAfterSplit simply failed 3. TabletSplitHeartbeatDataProvider will skip reporting tablets for which DefinitelyHasNoParentDataAfterSplit is false Original commit: D10457 / e6c3c57 Test Plan: Jenkins: rebase: 2.4 Try running tests which were triggering TSAN failures before: `ybd tsan --cxx-test client_snapshot-txn-test --gtest_filter SnapshotTxnTest.TruncateDuringShutdown -n 100` `ybd tsan --cxx-test client_backup-txn-test --gtest_filter BackupTxnTest.Simple -n 100` Reviewers: timur, nicolas, bogdan Reviewed By: bogdan Subscribers: mpolitov, ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D10676
polarweasel
pushed a commit
to lizayugabyte/yugabyte-db
that referenced
this issue
Mar 9, 2021
…_db data in StillHasParentDataAfterSplit Summary: We noticed tsan failures caused by concurrent access of RocksDB pointers on a tablet by the heartbeater while said RocksDB pointers were being invalidated (by a TRUNCATE, DROP, or other shutdown which might invoke ResetRocksDBs). This diff synchronizes the Tablet::StillHasParentDataAfterSplit with the existing ScopedRWOperationPause invoked by TRUNCATE/DROP/ALTER operations by registering a ScopedRWOperation in the method call. In order to make this change, the signature of StillHasParentDataAfterSplit was also changed to return a Result<bool>, which then triggered some slightly more complex changes. Specifically: 1. We expose a new DefinitelyHasNoParentDataAfterSplit which returns false if StillHasParentDataAfterSplit fails or is true 2. ts_tablet_manager.cc uses DefinitelyHasNoParentDataAfterSplit to maintain a list of tablets which "maybe" should not be moved by the load balancer, and reports this list to the LB conservatively including any tablets for which DefinitelyHasNoParentDataAfterSplit returned false, which may include cases where StillHasParentDataAfterSplit simply failed 3. TabletSplitHeartbeatDataProvider will skip reporting tablets for which DefinitelyHasNoParentDataAfterSplit is false Test Plan: Try running tests which were triggering TSAN failures before: `ybd tsan --cxx-test client_snapshot-txn-test --gtest_filter SnapshotTxnTest.TruncateDuringShutdown -n 100` `ybd tsan --cxx-test client_backup-txn-test --gtest_filter BackupTxnTest.Simple -n 100` Reviewers: timur, nicolas Reviewed By: timur, nicolas Subscribers: mpolitov, sergei, nicolas, ybase, timur, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D10457
ttyusupov
added
area/build-framework
Building and packaging third-party dependencies (repo: yugabyte/yugabyte-db-thirdparty)
kind/bug
This issue is a bug
labels
Jul 29, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/build-framework
Building and packaging third-party dependencies (repo: yugabyte/yugabyte-db-thirdparty)
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
TSAN report
Sample tests
Note: This is a product issue, as this is a race during truncate (likely same for drop table).
The text was updated successfully, but these errors were encountered: