Skip to content

2.23.1.0-b89

Summary:
We recently introduced the capability to remove a user table from a CDC stream ([[ https://phorge.dev.yugabyte.com/D35870 | diff ]]). To remove a user table from a CDC stream, dynamic table addition is required to be disabled on the particular CDC stream. This was primarily required to prevent consideration of the table being removed as a dynamic table on a master restart/leadership change.

This diff eliminates the need to disable dynamic table addition for removal of tables from CDC stream. The idea is to not consider the removed tables as dynamic tables and add them back to the CDC stream on master restart/leadership change.

  - The existing `table_id` proto field in stream metadata will now represent the qualified list of tables for a CDC stream.

  - A new proto field `unqualified_table_id` is introduced in the stream metadata to store the list of user created tables that have been removed (via UpdatePeersAndMetrics bg thread or yb-admin command) from a CDC stream. These tables would fall into the not of interest/expired category. There are other tables that will not be added to the `unqualified_table_id` list which are non-eligible tables (without a pk) that were not even added to the CDC stream.

All tables (non-dynamic/dynamic) will be first added to the qualified table list. Only on table removal, they will be removed from the qualified list and added to the unqualified table list.

Semantics of the fields are as follows:

  - Both the lists (qualified & unqualified) should be mutually exclusive at any point of time.

  - A table, part of the unqualified list, cannot get added back to the qualified list.

  - Once a table is shifted to the unqualified table list, //eventually// all its cdc state entries will be updated to max checkpoint. Deletion of such entries is not guaranteed as it depends on the presence of tablet peer.

Both the qualified & unqualified table lists will be displayed in the output of following yb-admin commands:

  - list_change_data_stream

```
/yb-admin --master_addresses 127.0.0.1:7100 list_change_data_streams
CDC Streams:
streams {
  stream_id: "b226f9a40508f0b511469a61a12321d7"
  table_id: "000033c3000030008000000000004000"
  options {
    key: "id_type"
    value: "NAMESPACEID"
  }
  ..
  ..
  namespace_id: "000033c3000030008000000000000000"
  cdcsdk_consistent_snapshot_time: 7062474127424495616
  cdcsdk_consistent_snapshot_option: USE_SNAPSHOT
  stream_creation_time: 1724236847015741
  unqualified_table_id: "000033c300003000800000000000400a"
  unqualified_table_id: "000033c3000030008000000000004005"
}
```

  - get_change_data_stream_info

```
/yb-admin --master_addresses 127.0.0.1:7100 get_change_data_stream_info b226f9a40508f0b511469a61a12321d7
CDC DB Stream Info:
table_info {
  stream_id: "b226f9a40508f0b511469a61a12321d7"
  table_id: "000033c3000030008000000000004000"
}
namespace_id: "000033c3000030008000000000000000"
unqualified_table_info {
  stream_id: "b226f9a40508f0b511469a61a12321d7"
  table_id: "000033c300003000800000000000400a"
}
unqualified_table_info {
  stream_id: "b226f9a40508f0b511469a61a12321d7"
  table_id: "000033c3000030008000000000004005"
}
```

**Implementation changes:**

  - Dynamic table addition codepath: When CDC streams are loaded on a master restart/leadership change, the set of dynamic tables to be added to a CDC stream will be computed via the following modified formula that will now consider tables present in the unqualified_table_id list.
     **New set difference = tables in the namespace - qualified tables in stream metadata (`table_id`) - unqualified tables in stream metadata (`unqualified_table_id`)**. Only those tables present in the new set difference will be considered for dynamic table addition.

  - Remove table codepath: Requirement of dynamic tables being disabled for table removal is removed from the RemoveUserTableFromCDCSDKStream RPC.
    During modification of stream metadata, the table will now be removed from `table_id` proto field (if present) and added into `unqualified_table_id` field (if not already present).

  - Drop Table codepath: In addition to drop of qualified tables, we will now consider also streams for metadata cleanup when one of its unqualified table is dropped. On drop of an unqualified table from a CDC stream, the table_id will be removed from the `unqualified_table_id` field and its corresponding cdc state table entries (if any) will be deleted from cdc state table.

**Upgrade/Rollback safety:**
New auto flag `cdcsdk_enable_dynamic_table_addition_with_table_cleanup` with default value true is introduced for accessing the new `unqualified_table_id` proto field.

Following protos have been modified:
**Master**
CDCStreamInfoPB - Added a repeated field `unqualified_table_id`
SysCDCStreamEntryPB - Added a repeated field `unqualified_table_id`
GetCDCDBStreamInfoResponsePB - Added a new repeated field to hold information of unqualified table ids

**Cdc Service**
GetCDCDBStreamInfoResponsePB - Added a new repeated field to hold information of unqualified table ids
Jira: DB-12498

Test Plan:
Existing ctests for user table removal
./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestCleanupOfUnqualifiedTableOnDrop
./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestUserTableRemovalWithDynamicTableAddition

Reviewers: xCluster, hsunder, skumar, sumukh.phalgaonkar, asrinivasan, stiwary

Reviewed By: sumukh.phalgaonkar

Subscribers: asrivastava, ybase, ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D37448
Assets 2
Loading