Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CDCSDK] Intents are getting GCed after Tablet LEADER changes #13770

Closed
sureshdash2022-yb opened this issue Aug 25, 2022 · 0 comments
Closed
Assignees
Labels

Comments

@sureshdash2022-yb
Copy link
Contributor

CDC Service retains intents for a tablet based on the flag cdc_intent_retention_ms (default value is 4 hours), to track the intent expiration per tablet level, there is a separate thread UpdatePeersAndMetrics which periodically does the following:-

Read cdc_state table using the method PopulateTabletCheckPointInfo.Each row in the cdc_state table is uniquely identified by the tablet_id, and stream_id pair. so it calculates the minimum checkpoint among all active streams belonging to the same tablet_id. and max remaining time among all streams by reading the tablet LEADER cache and creating map std::unordered_map<TabletId, TabletCDCCheckpointInfo>.
The above map is passed to the method UpdateTabletPeersWithMinReplicatedIndex, which internally does the following:- a. pick each tablet_id from the map, if the current Tablet is not a LEADER then it continues for the next tablet_id in the map. b. If the current tablet is a LEADER then it sends the minimum checkpoint as well as intent expiration time remaining for the FOLLOWERS tablets.
Above 2 operations are not atomic, that means during step:-1 (PopulateTabletCheckPointInfo call) whatever maximum remaining time we have calculated based on the tablet LEADER cache reference and created a map. but when it comes to step:-2(UpdateTabletPeersWithMinReplicatedIndex) the same tablet_id may become FOLLOWER or vice versa.

In our scenario, we have a single stream (stream_1) and single tablet (tabet_1) and 3 tservers(TS-1, TS-2, TS-3).

i). Initially the Tablet LEADER is present in TS1, the client is called GetChanges, so the UpdatePeersAndMetrics thread will be enabled in TS1.
ii). Now there is a LEADER switch happening to TS2, so the UpdatePeersAndMetrics thread will be active in TS2.
iii) now TS1 which is now a FOLLOWER, has UpdatePeersAndMetrics activated which will periodically do the above 2 steps described.
In our FAILURE scenario TS1 which is FOLLOWER, create a map as part PopulateTabletCheckPointInfo call with active time remaining set 0, because it doesn't contain the tablet LEADER. Before UpdateTabletPeersWithMinReplicatedIndex call TS1 is becoming LEADER so as part of the above description for step:-2, the LEADER tablet will send the remaining expiration as 0, causing whole intents GCed.

@sureshdash2022-yb sureshdash2022-yb self-assigned this Aug 25, 2022
sureshdash2022-yb added a commit that referenced this issue Aug 25, 2022
Summary:
CDC Service retains intents for a tablet based on the flag //cdc_intent_retention_ms// (default value is 4 hours), to track the intent expiration per tablet level, there is a separate thread //UpdatePeersAndMetrics// which periodically does the following:- 
1. Read //cdc_state// table using the method //PopulateTabletCheckPointInfo//.Each row in the cdc_state table is uniquely identified by the tablet_id, and stream_id pair. so it calculates the minimum checkpoint among all active streams belonging to the same tablet_id. and max remaining time among all streams by reading the tablet LEADER cache and creating map // std::unordered_map<TabletId, TabletCDCCheckpointInfo>. //

2. The above map is passed to the method //UpdateTabletPeersWithMinReplicatedIndex//, which internally does the following:-
    a. pick each tablet_id from the map, if the current Tablet is not a LEADER then it continues for the next tablet_id in the map.
    b. If the current tablet is a LEADER then it sends the minimum checkpoint as well as intent expiration time remaining for the FOLLOWERS tablets.

Above 2 operations are not atomic, that means during step:-1 (//PopulateTabletCheckPointInfo// call) whatever maximum remaining time we have calculated based on the tablet LEADER cache reference and created a map. but when it comes to step:-2(UpdateTabletPeersWithMinReplicatedIndex) the same tablet_id may become FOLLOWER or vice versa.

In our scenario, we have a single stream (stream_1) and single tablet (tabet_1) and 3 tservers(TS-1, TS-2, TS-3).
   i). Initially the Tablet LEADER is present in TS1, the client is called GetChanges, so the UpdatePeersAndMetrics thread will be enabled in TS1. 
  ii). Now there is a LEADER switch happening to TS2, so the UpdatePeersAndMetrics thread will be active in TS2. 
  iii) now TS1 which is now a FOLLOWER, has UpdatePeersAndMetrics activated which will periodically do the above 2 steps described.

In our FAILURE scenario TS1 which is FOLLOWER, create a map as part //PopulateTabletCheckPointInfo// call with active time remaining set 0, because it doesn't contain the tablet LEADER. Before //UpdateTabletPeersWithMinReplicatedIndex// call TS1 is becoming LEADER so as part of the above description for step:-2, the LEADER tablet will send the remaining expiration as 0, causing whole intents GCed.

To handle this scenario, in step:-1(//PopulateTabletCheckPointInfo//), the maximum active time we will calculate based on tablet cache(either it can be LEADER or FOLLOWER) and in regular interval stream active time we update in the FOLLOWERS cache so that they are in sync.

Test Plan: Running all the c and java testcase

Reviewers: skumar, abharadwaj, srangavajjula

Reviewed By: abharadwaj, srangavajjula

Subscribers: ycdcxcluster

Differential Revision: https://phabricator.dev.yugabyte.com/D19149
sureshdash2022-yb added a commit that referenced this issue Aug 25, 2022
…et LEADER changes

Summary:
"Original commit: 8eaec8c/D19149"
CDC Service retains intents for a tablet based on the flag //cdc_intent_retention_ms// (default value is 4 hours), to track the intent expiration per tablet level, there is a separate thread //UpdatePeersAndMetrics// which periodically does the following:- 
1. Read //cdc_state// table using the method //PopulateTabletCheckPointInfo//.Each row in the cdc_state table is uniquely identified by the tablet_id, and stream_id pair. so it calculates the minimum checkpoint among all active streams belonging to the same tablet_id. and max remaining time among all streams by reading the tablet LEADER cache and creating map // std::unordered_map<TabletId, TabletCDCCheckpointInfo>. //

2. The above map is passed to the method //UpdateTabletPeersWithMinReplicatedIndex//, which internally does the following:-
    a. pick each tablet_id from the map, if the current Tablet is not a LEADER then it continues for the next tablet_id in the map.
    b. If the current tablet is a LEADER then it sends the minimum checkpoint as well as intent expiration time remaining for the FOLLOWERS tablets.

Above 2 operations are not atomic, that means during step:-1 (//PopulateTabletCheckPointInfo// call) whatever maximum remaining time we have calculated based on the tablet LEADER cache reference and created a map. but when it comes to step:-2(UpdateTabletPeersWithMinReplicatedIndex) the same tablet_id may become FOLLOWER or vice versa.

In our scenario, we have a single stream (stream_1) and single tablet (tabet_1) and 3 tservers(TS-1, TS-2, TS-3).
   i). Initially the Tablet LEADER is present in TS1, the client is called GetChanges, so the UpdatePeersAndMetrics thread will be enabled in TS1. 
  ii). Now there is a LEADER switch happening to TS2, so the UpdatePeersAndMetrics thread will be active in TS2. 
  iii) now TS1 which is now a FOLLOWER, has UpdatePeersAndMetrics activated which will periodically do the above 2 steps described.

In our FAILURE scenario TS1 which is FOLLOWER, create a map as part //PopulateTabletCheckPointInfo// call with active time remaining set 0, because it doesn't contain the tablet LEADER. Before //UpdateTabletPeersWithMinReplicatedIndex// call TS1 is becoming LEADER so as part of the above description for step:-2, the LEADER tablet will send the remaining expiration as 0, causing whole intents GCed.

To handle this scenario, in step:-1(//PopulateTabletCheckPointInfo//), the maximum active time we will calculate based on tablet cache(either it can be LEADER or FOLLOWER) and in regular interval stream active time we update in the FOLLOWERS cache so that they are in sync.

Test Plan: Running all the c and java testcase

Reviewers: abharadwaj, srangavajjula, skumar

Reviewed By: skumar

Subscribers: ycdcxcluster

Differential Revision: https://phabricator.dev.yugabyte.com/D19181
sureshdash2022-yb added a commit that referenced this issue Aug 25, 2022
… LEADER changes

Summary:
"Original commit: 8eaec8c/D19149"
CDC Service retains intents for a tablet based on the flag //cdc_intent_retention_ms// (default value is 4 hours), to track the intent expiration per tablet level, there is a separate thread //UpdatePeersAndMetrics// which periodically does the following:- 
1. Read //cdc_state// table using the method //PopulateTabletCheckPointInfo//.Each row in the cdc_state table is uniquely identified by the tablet_id, and stream_id pair. so it calculates the minimum checkpoint among all active streams belonging to the same tablet_id. and max remaining time among all streams by reading the tablet LEADER cache and creating map // std::unordered_map<TabletId, TabletCDCCheckpointInfo>. //

2. The above map is passed to the method //UpdateTabletPeersWithMinReplicatedIndex//, which internally does the following:-
    a. pick each tablet_id from the map, if the current Tablet is not a LEADER then it continues for the next tablet_id in the map.
    b. If the current tablet is a LEADER then it sends the minimum checkpoint as well as intent expiration time remaining for the FOLLOWERS tablets.

Above 2 operations are not atomic, that means during step:-1 (//PopulateTabletCheckPointInfo// call) whatever maximum remaining time we have calculated based on the tablet LEADER cache reference and created a map. but when it comes to step:-2(UpdateTabletPeersWithMinReplicatedIndex) the same tablet_id may become FOLLOWER or vice versa.

In our scenario, we have a single stream (stream_1) and single tablet (tabet_1) and 3 tservers(TS-1, TS-2, TS-3).
   i). Initially the Tablet LEADER is present in TS1, the client is called GetChanges, so the UpdatePeersAndMetrics thread will be enabled in TS1. 
  ii). Now there is a LEADER switch happening to TS2, so the UpdatePeersAndMetrics thread will be active in TS2. 
  iii) now TS1 which is now a FOLLOWER, has UpdatePeersAndMetrics activated which will periodically do the above 2 steps described.

In our FAILURE scenario TS1 which is FOLLOWER, create a map as part //PopulateTabletCheckPointInfo// call with active time remaining set 0, because it doesn't contain the tablet LEADER. Before //UpdateTabletPeersWithMinReplicatedIndex// call TS1 is becoming LEADER so as part of the above description for step:-2, the LEADER tablet will send the remaining expiration as 0, causing whole intents GCed.

To handle this scenario, in step:-1(//PopulateTabletCheckPointInfo//), the maximum active time we will calculate based on tablet cache(either it can be LEADER or FOLLOWER) and in regular interval stream active time we update in the FOLLOWERS cache so that they are in sync.

Test Plan: Running all the c and java testcase

Reviewers: srangavajjula, skumar, abharadwaj

Reviewed By: skumar, abharadwaj

Subscribers: ycdcxcluster

Differential Revision: https://phabricator.dev.yugabyte.com/D19182
sureshdash2022-yb added a commit that referenced this issue Aug 26, 2022
… LEADER changes

Summary:
"Original commit: 8eaec8c/D19149"
CDC Service retains intents for a tablet based on the flag //cdc_intent_retention_ms// (default value is 4 hours), to track the intent expiration per tablet level, there is a separate thread //UpdatePeersAndMetrics// which periodically does the following:- 
1. Read //cdc_state// table using the method //PopulateTabletCheckPointInfo//.Each row in the cdc_state table is uniquely identified by the tablet_id, and stream_id pair. so it calculates the minimum checkpoint among all active streams belonging to the same tablet_id. and max remaining time among all streams by reading the tablet LEADER cache and creating map // std::unordered_map<TabletId, TabletCDCCheckpointInfo>. //

2. The above map is passed to the method //UpdateTabletPeersWithMinReplicatedIndex//, which internally does the following:-
    a. pick each tablet_id from the map, if the current Tablet is not a LEADER then it continues for the next tablet_id in the map.
    b. If the current tablet is a LEADER then it sends the minimum checkpoint as well as intent expiration time remaining for the FOLLOWERS tablets.

Above 2 operations are not atomic, that means during step:-1 (//PopulateTabletCheckPointInfo// call) whatever maximum remaining time we have calculated based on the tablet LEADER cache reference and created a map. but when it comes to step:-2(UpdateTabletPeersWithMinReplicatedIndex) the same tablet_id may become FOLLOWER or vice versa.

In our scenario, we have a single stream (stream_1) and single tablet (tabet_1) and 3 tservers(TS-1, TS-2, TS-3).
   i). Initially the Tablet LEADER is present in TS1, the client is called GetChanges, so the UpdatePeersAndMetrics thread will be enabled in TS1. 
  ii). Now there is a LEADER switch happening to TS2, so the UpdatePeersAndMetrics thread will be active in TS2. 
  iii) now TS1 which is now a FOLLOWER, has UpdatePeersAndMetrics activated which will periodically do the above 2 steps described.

In our FAILURE scenario TS1 which is FOLLOWER, create a map as part //PopulateTabletCheckPointInfo// call with active time remaining set 0, because it doesn't contain the tablet LEADER. Before //UpdateTabletPeersWithMinReplicatedIndex// call TS1 is becoming LEADER so as part of the above description for step:-2, the LEADER tablet will send the remaining expiration as 0, causing whole intents GCed.

To handle this scenario, in step:-1(//PopulateTabletCheckPointInfo//), the maximum active time we will calculate based on tablet cache(either it can be LEADER or FOLLOWER) and in regular interval stream active time we update in the FOLLOWERS cache so that they are in sync.

Test Plan: Running all the c and java testcase

Reviewers: abharadwaj, srangavajjula, skumar

Reviewed By: skumar

Subscribers: ycdcxcluster

Differential Revision: https://phabricator.dev.yugabyte.com/D19188
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant