-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BACKPORT 2024.1.3][#21116] docdb: Persist transaction promotion state
Summary: In the transaction promotion path, we did not properly replicate and save the promotion (an empty batch with the pre-promotion transaction metadata was replicated and written into WALs). This leads to changes made on tablets first touched before/after promotion to be treated as separate transactions with the same id but different status tablets after leader stepdowns and during tablet bootstrap, resulting in data loss. For example, leader stepdown on participant tablets touched before promotion after the transaction on the old status tablet has been aborted will cause changes to be cleaned up, even if the transaction has committed or later commits. This revision changes the promotion path to send a UpdateTransaction(PROMOTING) with the new status tablet to the participant tablet, which is then replicated and written to WALs. This entirely replaces the old UpdateTransactionStatusLocation RPC calls. The empty batch write in the UpdateTransactionStatusLocation path was also removed, as it was effectively useless. **Upgrade/Rollback safety:** The change to send UpdateTransaction(PROMOTING) is gated by the auto flag `replicate_transaction_promotion` to ensure that we don't send or write the new value until the upgrade has been finalized; the old UpdateTransactionStatusLocation sending code is used until then. The old UpdateTransactionStatusLocation handling code will be left intact until after 2024.2. Jira: DB-10148 Original commit: 6109c23 / D38718 Test Plan: Jenkins. Added new test: - `./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter GeoTransactionsPromotionTest.TestParticipantLeaderStepDown -n 100` Jenkins run with transaction_promotion_use_update_transaction turned off (pre-upgrade case) done on D38793. Ran ysql/sz.ol.geo.append Jepsen workload with 600s timeout 20x without failures. Reviewers: sergei Reviewed By: sergei Subscribers: svc_phabricator, yql, ybase, rthallam Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D39831
- Loading branch information
Showing
10 changed files
with
358 additions
and
138 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.