-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BACKPORT 2.20][#23243] docdb: Fix tablet bootstrap stuck when replay…
…ing truncate operation Summary: Original commit: c8cbcbf / D37152 **Issue:** Tablet bootstrap can run into a deadlock if it needs to replay a bootstrap operation. Here are the sequence of events leading to the deadlock | Thread 1 (Main bootstrap thread) | Thread 2 (Load Transactions) | | 1. Begin tablet bootstrap | | | 2. During OpenTablet, if transaction is enabled(ysql table or ycql table with transaction enabled), acquire the the `start_latch` by setting its value to 1, and another thread(Thread 2) is created for transaction load| | | | 3. Execute transaction load, acquires `pending_op_counter_blocking_rocksdb_shutdown_start_` to prevent rocksdb shutdown | | 4. Replay tablet truncate operation, waiting for `pending_op_counter_blocking_rocksdb_shutdown_start_` to be released in order to shutdown rocksdb | | | | 5. Transaction load completed | | | 6. Call `LoadFinished` function, it starts waiting for `start_latch` to be 0| | 7. Bootstrap complete, release the the `start_latch` by setting its value to 0 | | | | 8. release the `pending_op_counter_blocking_rocksdb_shutdown_start_` | Thread 1 stucks at step 4, waiting for step 8 to be executed. Thread 2 stucks at step 6, waiting for step 7 to be executed. Result: Tablet bootstrap stuck at replaying truncation operation. This issue starts happening since D29000 (commit id: 5159eb3), the diff Introduced a change to only destroy executor instance(which holds the operation counter) after FinishLoad. **Fix:** Reset `pending_op_counter_blocking_rocksdb_shutdown_start_` before calling `loader_.FinishLoad(status)`. Also, reset both regular_iterator and intent_iterator as they hold refs to Rocksdb's SuperVersion. This is not clean fix, but it guarantees the safety, because FinishLoad doesn't need protection from the op counter as it acquires own op counter when processing the pending applies. **Affected Version 2.20.1** Starting from D29000 (commit id: 5159eb3), tablet bootstrap will get stuck when replaying truncate operation. Jira: DB-12175 Test Plan: ./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest.BootstrapReplayTruncate Reviewers: bkolagani, timur, sergei, mbautin, rthallam Reviewed By: rthallam Subscribers: ybase, rthallam, slingam, yql, mbautin Differential Revision: https://phorge.dev.yugabyte.com/D37653
- Loading branch information
1 parent
59a5ccf
commit 4267f5a
Showing
6 changed files
with
59 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters