-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CDCSDK] Mutex lock error in CDCStreamLoader leading to master crash loop #23278
Labels
Comments
shamanthchandra-yb
added
priority/high
High Priority
area/cdcsdk
CDC SDK
status/awaiting-triage
Issue awaiting triage
labels
Jul 24, 2024
yugabyte-ci
added
kind/bug
This issue is a bug
and removed
status/awaiting-triage
Issue awaiting triage
labels
Jul 24, 2024
yugabyte-ci
changed the title
[CDCSDK] Mutex lock error in CDCStreamLoader leading to tserver crash loop
[CDCSDK] Mutex lock error in CDCStreamLoader leading to master crash loop
Aug 5, 2024
siddharth2411
added a commit
that referenced
this issue
Aug 5, 2024
…hile loading CDC stream Summary: When a table present under a CDC stream is dropped, it is removed from the CDC stream metadata by a background thread. Suppose before the background thread could cleanup, there was a master restart or a master leadership change. On either of these scenarios, while loading the CDC streams, we check all tables present in the CDC stream metadata for ineligibility. Table schema is one of the objects that is scanned while checking for ineligibility. To get the table schema, we fetch the `TableInfo` object from master. This step was leading to a master crash as we receive a nullptr while fetching TableInfo since the table has been dropped. Jira: DB-12205 Test Plan: ./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestNonEligibleTablesCleanupWhenDropTableCleanupIsDisabled Reviewers: hsunder, asrinivasan, stiwary, skumar Reviewed By: skumar Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37053
siddharth2411
added a commit
that referenced
this issue
Aug 6, 2024
…with drop table while loading CDC stream Summary: **Backport Description:** Faced minor merge conflicts as some code is refactored in latest master. **Original Description:** Original commit: 64e1bf8 / D37053 When a table present under a CDC stream is dropped, it is removed from the CDC stream metadata by a background thread. Suppose before the background thread could cleanup, there was a master restart or a master leadership change. On either of these scenarios, while loading the CDC streams, we check all tables present in the CDC stream metadata for ineligibility. Table schema is one of the objects that is scanned while checking for ineligibility. To get the table schema, we fetch the `TableInfo` object from master. This step was leading to a master crash as we receive a nullptr while fetching TableInfo since the table has been dropped. Jira: DB-12205 Test Plan: ./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestNonEligibleTablesCleanupWhenDropTableCleanupIsDisabled Reviewers: asrinivasan, stiwary, skumar Reviewed By: stiwary Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37067
jasonyb
pushed a commit
that referenced
this issue
Aug 7, 2024
Summary: 50931bf [#23273] yugabyted: Fix `yugabyted configure_read_replica` commands. 64e1bf8 [#23278] CDCSDK: Handle non-eligible tables cleanup with drop table while loading CDC stream ce80f7a [#13358] YSQL: Refactor DDL Atomicity Stress Test Excluded: 6d40d27 [#23407] YSQL: clean up compound BNL logic 5cb74a7 [PLAT-14164] New Alert for clock drift f39c76c [PLAT-14800] Fix yb.allow_db_version_more_than_yba_version being insufficient for YBA/DB version checks a42549e [#23377] DocDB: Implement the way to apply vector index updates to DocDB 3923ec5 [PLAT-14749][Platform]Add a warning message to image upgrade dialog 709cd92 [PLAT-14848] postgres.service file did not have RestartSec filled out da10672 [#23069] docdb: implemented per-iterator readahead for sequential reads f439c8a [PLAT-14852]: Do not raise error when JWT_JWKS_URL has valid value and JWT has empty keyset Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37095
siddharth2411
added a commit
that referenced
this issue
Aug 8, 2024
…with drop table while loading CDC stream Summary: **Backport description:** Minor merge conflicts in test's base class because of missing flag. **Original description:** Original commit: 64e1bf8 / D37053 When a table present under a CDC stream is dropped, it is removed from the CDC stream metadata by a background thread. Suppose before the background thread could cleanup, there was a master restart or a master leadership change. On either of these scenarios, while loading the CDC streams, we check all tables present in the CDC stream metadata for ineligibility. Table schema is one of the objects that is scanned while checking for ineligibility. To get the table schema, we fetch the `TableInfo` object from master. This step was leading to a master crash as we receive a nullptr while fetching TableInfo since the table has been dropped. Jira: DB-12205 Test Plan: ./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestNonEligibleTablesCleanupWhenDropTableCleanupIsDisabled Reviewers: asrinivasan, stiwary, skumar Reviewed By: stiwary Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37090
siddharth2411
added a commit
that referenced
this issue
Aug 14, 2024
…th drop table while loading CDC stream Summary: **Backport description:** Faced minor merge conflicts as some code is refactored in latest master. **Original description:** Original commit: 64e1bf8 / D37053 When a table present under a CDC stream is dropped, it is removed from the CDC stream metadata by a background thread. Suppose before the background thread could cleanup, there was a master restart or a master leadership change. On either of these scenarios, while loading the CDC streams, we check all tables present in the CDC stream metadata for ineligibility. Table schema is one of the objects that is scanned while checking for ineligibility. To get the table schema, we fetch the `TableInfo` object from master. This step was leading to a master crash as we receive a nullptr while fetching TableInfo since the table has been dropped. Jira: DB-12205 Test Plan: ./yb_build.sh --cxx-test cdcsdk_ysql-test --gtest_filter CDCSDKYsqlTest.TestNonEligibleTablesCleanupWhenDropTableCleanupIsDisabled Reviewers: asrinivasan, stiwary, skumar Reviewed By: stiwary Subscribers: ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37091
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Jira Link: DB-12205
Description
Testname:
test_cdc_main_without_tablet_splitting
Suspecting this could be related to #22773 changes.
Please find slack thread and stress run link in JIRA.
Source connector version
1.9.5.y.220.4
Connector configuration
YugabyteDB version
2.23.0.0-b625
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: