-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[BACKPORT 2.12.9][#13770] CDCSDK: Intents are getting GCed after Tabl…
…et LEADER changes Summary: "Original commit: 8eaec8c/D19149" CDC Service retains intents for a tablet based on the flag //cdc_intent_retention_ms// (default value is 4 hours), to track the intent expiration per tablet level, there is a separate thread //UpdatePeersAndMetrics// which periodically does the following:- 1. Read //cdc_state// table using the method //PopulateTabletCheckPointInfo//.Each row in the cdc_state table is uniquely identified by the tablet_id, and stream_id pair. so it calculates the minimum checkpoint among all active streams belonging to the same tablet_id. and max remaining time among all streams by reading the tablet LEADER cache and creating map // std::unordered_map<TabletId, TabletCDCCheckpointInfo>. // 2. The above map is passed to the method //UpdateTabletPeersWithMinReplicatedIndex//, which internally does the following:- a. pick each tablet_id from the map, if the current Tablet is not a LEADER then it continues for the next tablet_id in the map. b. If the current tablet is a LEADER then it sends the minimum checkpoint as well as intent expiration time remaining for the FOLLOWERS tablets. Above 2 operations are not atomic, that means during step:-1 (//PopulateTabletCheckPointInfo// call) whatever maximum remaining time we have calculated based on the tablet LEADER cache reference and created a map. but when it comes to step:-2(UpdateTabletPeersWithMinReplicatedIndex) the same tablet_id may become FOLLOWER or vice versa. In our scenario, we have a single stream (stream_1) and single tablet (tabet_1) and 3 tservers(TS-1, TS-2, TS-3). i). Initially the Tablet LEADER is present in TS1, the client is called GetChanges, so the UpdatePeersAndMetrics thread will be enabled in TS1. ii). Now there is a LEADER switch happening to TS2, so the UpdatePeersAndMetrics thread will be active in TS2. iii) now TS1 which is now a FOLLOWER, has UpdatePeersAndMetrics activated which will periodically do the above 2 steps described. In our FAILURE scenario TS1 which is FOLLOWER, create a map as part //PopulateTabletCheckPointInfo// call with active time remaining set 0, because it doesn't contain the tablet LEADER. Before //UpdateTabletPeersWithMinReplicatedIndex// call TS1 is becoming LEADER so as part of the above description for step:-2, the LEADER tablet will send the remaining expiration as 0, causing whole intents GCed. To handle this scenario, in step:-1(//PopulateTabletCheckPointInfo//), the maximum active time we will calculate based on tablet cache(either it can be LEADER or FOLLOWER) and in regular interval stream active time we update in the FOLLOWERS cache so that they are in sync. Test Plan: Running all the c and java testcase Reviewers: abharadwaj, srangavajjula, skumar Reviewed By: skumar Subscribers: ycdcxcluster Differential Revision: https://phabricator.dev.yugabyte.com/D19181
- Loading branch information
1 parent
d788192
commit 1170d3a
Showing
7 changed files
with
77 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters