-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[#23890] docdb: Add filtering for bootstrap intent iterators based on…
… min_replay_txn_start_ht Summary: This diff adds filtering for intent SST files during the transaction loading step of tablet bootstrap, using the newly introduced `min_replay_txn_start_ht`. When CDC is enabled, we persist intent SST files longer than they are otherwise needed, until CDC has streamed all transactions in the SST file and moved the retention barrier far enough ahead. This can lead to a large buildup of intent SST files which are not actually needed at bootstrap time. 8b23a4e / D35639 added changes to save `min_running_ht` periodically, and add intent SST file filtering during bootstrap time based on periodically saved values of `min_running_ht`. This can lead to data loss, if there is a transaction T such that the following is true: - T has been applied (APPLIED written to WALs) - T has intents that have been flushed to disk (this is rare but possible when CDC is disabled since in the ideal non-CDC case we never flush intents) - Changes made by T on regulardb have not been flushed - The metadata record for T is on an intent SST file whose max HT is less than min_running_ht after T apply (i.e. intentsdb flush happened between T writes and apply) - Tablet bootstrap state has been saved after T has committed These conditions will result in a `min_running_ht > T.start_time` being written to disk, and loaded during tablet bootstrap. Since regulardb changes have not been flushed, WAL replay will start from a point that includes T. However, transaction loader will not load T, because its metadata record has been excluded due to the SST file filter. This results in changes made by T being dropped, even though it has successfully committed. This change introduces a new `min_replay_txn_start_ht` and changes the intent SST file filter to be based off of periodically saved values of this new `min_replay_txn_start_ht`. `min_replay_txn_start_ht` is defined as the minimum of: - `min_running_ht` - `start_ht` of any transaction which may be read during WAL replay WAL replay begins at `bootstrap_start_op_id = min(intentsdb flushed_op_id, rocksdb flushed_op_id, retryable requests last_flushed_op_id)`. We calculate `min_replay_txn_start_ht` by maintaining a set of `(applied_op_id, start_ht)` for recently applied transactions. Transactions are added into this set when they are applied and cleaned from memory (removed from `transactions_`) and are removed when `bootstrap_start_op_id` is increased past `applied_op_id`. `min_replay_txn_start_ht` is then the minimum of `start_ht` of this set and `min_running_ht`. Since `replay_start_op_id` is only updated after flushes to disk, this ensures that any transaction whose metadata record is filtered out by the intent SST file filter will not be incorrectly loaded during WAL replay, since such a transaction would have `apply_op_id < replay_start_op_id` (the `replay_start_op_id` calculated at bootstrap time), so none of its records are read by WAL replay. **Upgrade/Rollback safety:** The `min_running_ht` field in `TabletBootstrapStatePB` was retired and a new `min_replay_txn_start_ht` field was added. There are no autoflags added because `min_replay_txn_start_ht` is only used for an optimization (intent SST file filtering) so the lack of its presence post-upgrade does not change correctness, and its presence post-rollback is simply ignored. `min_running_ht` was only used for a incorrect implementation of the optimization which was off by default, so the lack of its presence post-rollback does not harm correctness (and actually improves it if optimization was turned on) and its presence after upgrade is ignored. A different field was used for this change to ensure that values of `min_running_ht` set before upgrade are not used, since it is unsafe to use it. Jira: DB-12794 Test Plan: Added test case to reproduce the data loss scenario when filter was using `min_running_ht`: ``` ./yb_build.sh --cxx_test pgwrapper_pg_mini-test --gtest_filter PgMiniTestSingleNode.TestBootstrapOnAppliedTransactionWithIntents ``` Also confirmed that CDC stress tests stop failing after these changes. Reviewers: sergei, qhu Reviewed By: sergei Subscribers: rthallam, ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D37792
- Loading branch information
Showing
18 changed files
with
409 additions
and
68 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.