Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
…rom existing CDCSDK stream Summary: Some non-eligible tables like indexes etc. created after creation of a CDC stream were getting added to the CDC stream due to [[ https://phorge.dev.yugabyte.com/D35856 | this missing logic in addition of dynamic tables codepath ]]. We do not hold retention barriers on tablets of such tables until and unless they split, in which case, we start holding retention barriers on the children tablets. This leads to heavy resource usage over time and since the CDC stream never polls on tables of such tables, retention barriers are not lifted until the active time of these tablets exceed `cdc_intent_retention_ms`. Therefore, to prevent resource consumption from such tables, we want to achieve the following: 1. Remove these non-eligible tables from the stream metadata so that any further tablet splitting on these tables do not lead to addition of children tablets in cdc state table. 2. Release retention barriers on the existing tablets that are part of the cdc state table and finally remove these state table entries. We have followed the same pattern of achieving the above tasks via a background thread, exactly similar to addition of dynamic table in CDC streams. **Working:** - //FindAllNonUserTablesInCDCSDKStream// - On a master restart/leadership change, while loading CDCSDK streams into memory, we will compute the set difference between tables present in stream metadata and tables in the namespace that are eligible for a CDC stream. This set difference will give us the set of non-eligible tables that were not supposed to get added to the CDC stream, but got added because of the above mentioned bug. These non-eligible tables will be added to `namespace_to_cdcsdk_non_user_table_map_` which is further processed in catalog manager background thread by //FindCDCSDKStreamsForNonUserTables//. The bg thread of catalog manager (CatalogManagerBgTasks), with the following methods handles the actual table removal: - //FindCDCSDKStreamsForNonUserTables//: This method is run in every subsequent iteration of the bg thread of catalog manager. It scans the cdc_stream_map_ and finds all streams in ACTIVE/DELETING METADATA state which have the non-eligible table entry in stream metadata, and collects the details to be further processed by //RemoveNonUserTablesForCDCSDKStreams//. - //RemoveNonUserTablesForCDCSDKStreams//: This method is run after FindCDCSDKStreamsForNonUserTables and does the following for each stream that contains the non-eligible table entry: 1. Update the checkpoint of cdc state entries related to non-eligible table to OpId max. Incase of colocated tables, entries with a colocated_table_id will be deleted. 2. Removes the table from stream metadata and cdcsdk_tables_to_stream_map_. Once the table is removed from all relevant CDC streams, then we remove the table entry from `namespace_to_cdcsdk_non_user_table_map_`. Note: 1. To enable this cleanup of non-eligible tables, user has to set the master flag `enable_cleanup_of_non_eligible_tables_from_cdcsdk_stream`. 2. In single iteration of the bg thread, we only process two non-eligible tables across all namespaces. This processing limit is configurable and we are reusing the existing flag `cdcsdk_table_processing_limit_per_run`. Additionally, in the tablet split codepath, before adding cdc state entries for children tables, we will now check if the table is a non-eligible table for CDC stream or not. This also helps in preventing a race condition when a tablet of a non-eligible is split and concurrently, there was a master restart/leadership changes and we are trying to remove the table from stream metadata. Jira: DB-11778, DB-11733, DB-11676 Test Plan: Jenkins: urgent ./yb_build.sh --cxx-test integration-tests_cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestNonEligibleTableRemovalFromNonConsistentSnapshotCDCStream ./yb_build.sh --cxx-test integration-tests_cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestNonEligibleTableRemovalFromConsistentSnapshotCDCStream ./yb_build.sh --cxx-test integration-tests_cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestChildTabletsOfNonEligibleTableDoNotGetAddedToNonConsistentSnapshotStream ./yb_build.sh --cxx-test integration-tests_cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestChildTabletsOfNonEligibleTableDoNotGetAddedToConsistentSnapshotStream Reviewers: asrinivasan, stiwary, skumar Reviewed By: stiwary Subscribers: ycdcxcluster Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36031
- Loading branch information