Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Deadlock found via DDL atomicity stress test #22882

Closed
1 task done
myang2021 opened this issue Jun 14, 2024 · 0 comments
Closed
1 task done

[YSQL] Deadlock found via DDL atomicity stress test #22882

myang2021 opened this issue Jun 14, 2024 · 0 comments
Assignees
Labels
2024.1 Backport Required area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@myang2021
Copy link
Contributor

myang2021 commented Jun 14, 2024

Jira Link: DB-11784

Description

Fix deadlock in DDL atomicity

In working on diff https://phorge.dev.yugabyte.com/D32682, I found one deadlock
that caused a test to be flaky. The dead lock involved 3 threads.

Thread A is executing yb::RWCLock::UpgradeToCommitLock:

void RWCLock::UpgradeToCommitLock() {
  lock_.lock();
  DCHECK(write_locked_);
#ifndef NDEBUG
  bool first_wait = true;
  while (reader_count_ > 0) {
    if (!no_readers_.TimedWait(first_wait ? kFirstWait : kSecondWait)) {
      std::ostringstream ss;
      ss << "Too long commit lock wait, num readers: " << reader_count_
         << ", current thread stack: " << GetStackTrace();

Thread A blocks because reader_count_ > 0, while it holds the write lock:

  DCHECK(write_locked_);

Thread A is executing the work flow of an ALTER TABLE statement with a call chain

yb::RWCLock::UpgradeToCommitLock
yb::CowObject<yb::master::PersistentTableInfo>::CommitMutation
yb::CowWriteLock<yb::master::PersistentTableInfo>::Commit
yb::master::CatalogManager::ClearYsqlDdlTxnState
yb::master::CatalogManager::HandleSuccessfulYsqlDdlTxn
yb::master::CatalogManager::YsqlDdlTxnCompleteCallbackInternal


From test log I see the thread (Thread B) that caused reader_count_ > 0 is
calling GetTableSchemaInternal:

Thread B:

Status CatalogManager::GetTableSchemaInternal(const GetTableSchemaRequestPB* req,
                                              GetTableSchemaResponsePB* resp,
                                              bool get_fully_applied_indexes) {
  VLOG(1) << "Servicing GetTableSchema request for " << req->ShortDebugString();

  // Lookup the table and verify if it exists.
  TRACE("Looking up table");
  scoped_refptr<TableInfo> table = VERIFY_RESULT(FindTable(req->table()));

  TRACE("Locking table");
  auto l = table->LockForRead();

......
  auto nsinfo = FindNamespaceById(table->namespace_id());

and it is blocked inside FindNamespaceById in order to get a shared lock on
mutex_ of CatalogManager:

Result<scoped_refptr<NamespaceInfo>> CatalogManager::FindNamespaceById(
    const NamespaceId& id) const {
  SharedLock lock(mutex_);
  return FindNamespaceByIdUnlocked(id);
}

The mutex_ is held exclusively by Thread C from CatalogManager::CreateTable:

  {
    LockGuard lock(mutex_);
    auto ns_lock = ns->LockForRead();
......
        if (IsIndex(req)) {
          TRACE("Locking indexed table");
          indexed_table_write_lock = indexed_table->LockForWrite();
        }
......
  }

Note Thread C exclusively holds mutex_ via LockGuard, and it is blocked on
LockForWrite, which calls RWCLock::WriteLock and cannot proceed because
write_locked_ (owned by Thread A).

void RWCLock::WriteLock() {
  ThreadRestrictions::AssertWaitAllowed();

  MutexLock l(lock_);
  // Wait for any other mutations to finish.
#ifndef NDEBUG
  bool first_wait = true;
  while (write_locked_) {
    if (!no_mutators_.TimedWait(first_wait ? kFirstWait : kSecondWait)) {

In summary, we have a deadlock cycle:
Thread A (holds write_locked_) waits for Thread B to release reader_count_
Thread B (holds reader_count_) waits for Thread C to release mutex_
Thread C (holds mutex_) waits for Thread A to release write_locked_

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@myang2021 myang2021 added the area/ysql Yugabyte SQL (YSQL) label Jun 14, 2024
@myang2021 myang2021 self-assigned this Jun 14, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 14, 2024
myang2021 added a commit that referenced this issue Jun 18, 2024
Summary:
In working on https://phorge.dev.yugabyte.com/D32682, the test
PgDdlAtomicityColocatedStressTest.TestFailDdlRollback showed logs like the following:

```
[m-1] W0614 20:35:35.849383 26140 ysql_ddl_handler.cc:551] YsqlDdlTxnCompleteCallback failed: Corruption (yb/util/uuid.cc:312): Invalid length of binary data with TransactionId '': 0 (expected 16)
```

The relevant code are:
```
      const string pb_txn_id = table->LockForRead()->pb_transaction_id();
      return background_tasks_thread_pool_->SubmitFunc(
        [this, pb_txn_id, is_committed, epoch]() {
            WARN_NOT_OK(YsqlDdlTxnCompleteCallback(pb_txn_id, is_committed, epoch),
                        "YsqlDdlTxnCompleteCallback failed");
        }
      );
```

and
```
Status CatalogManager::YsqlDdlTxnCompleteCallback(const string& pb_txn_id,
                                                  bool is_committed,
                                                  const LeaderEpoch& epoch) {

  SleepFor(MonoDelta::FromMicroseconds(RandomUniformInt<int>(0,
    FLAGS_TEST_ysql_max_random_delay_before_ddl_verification_usecs)));

  auto txn = VERIFY_RESULT(FullyDecodeTransactionId(pb_txn_id));

```

It is possible that `pb_txn_id` of `table` is already cleared. In this case
`pb_txn_id` is an empty string. The error is reported when an empty string is
passed to `FullyDecodeTransactionId`.

I made a fix not to call `YsqlDdlTxnCompleteCallback` if all the tables in the
DDL transaction verifier state have their `pb_txn_id` already cleared. In other
words, we only call `YsqlDdlTxnCompleteCallback` when we can find a table
in the DDL transaction verifier state that has `pb_txn_id` and this `pb_txn_id`
is the transaction id that we pass to `YsqlDdlTxnCompleteCallback` to process
all the tables in this DDL transaction.
Jira: DB-11784

Test Plan:
Tested in the context of https://phorge.dev.yugabyte.com/D32682.

./yb_build.sh --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailDdlRollback -n 20 --tp 2

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D35879
karthik-ramanathan-3006 pushed a commit to karthik-ramanathan-3006/yugabyte-db that referenced this issue Jun 24, 2024
Summary:
In working on https://phorge.dev.yugabyte.com/D32682, the test
PgDdlAtomicityColocatedStressTest.TestFailDdlRollback showed logs like the following:

```
[m-1] W0614 20:35:35.849383 26140 ysql_ddl_handler.cc:551] YsqlDdlTxnCompleteCallback failed: Corruption (yb/util/uuid.cc:312): Invalid length of binary data with TransactionId '': 0 (expected 16)
```

The relevant code are:
```
      const string pb_txn_id = table->LockForRead()->pb_transaction_id();
      return background_tasks_thread_pool_->SubmitFunc(
        [this, pb_txn_id, is_committed, epoch]() {
            WARN_NOT_OK(YsqlDdlTxnCompleteCallback(pb_txn_id, is_committed, epoch),
                        "YsqlDdlTxnCompleteCallback failed");
        }
      );
```

and
```
Status CatalogManager::YsqlDdlTxnCompleteCallback(const string& pb_txn_id,
                                                  bool is_committed,
                                                  const LeaderEpoch& epoch) {

  SleepFor(MonoDelta::FromMicroseconds(RandomUniformInt<int>(0,
    FLAGS_TEST_ysql_max_random_delay_before_ddl_verification_usecs)));

  auto txn = VERIFY_RESULT(FullyDecodeTransactionId(pb_txn_id));

```

It is possible that `pb_txn_id` of `table` is already cleared. In this case
`pb_txn_id` is an empty string. The error is reported when an empty string is
passed to `FullyDecodeTransactionId`.

I made a fix not to call `YsqlDdlTxnCompleteCallback` if all the tables in the
DDL transaction verifier state have their `pb_txn_id` already cleared. In other
words, we only call `YsqlDdlTxnCompleteCallback` when we can find a table
in the DDL transaction verifier state that has `pb_txn_id` and this `pb_txn_id`
is the transaction id that we pass to `YsqlDdlTxnCompleteCallback` to process
all the tables in this DDL transaction.
Jira: DB-11784

Test Plan:
Tested in the context of https://phorge.dev.yugabyte.com/D32682.

./yb_build.sh --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailDdlRollback -n 20 --tp 2

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D35879
myang2021 added a commit that referenced this issue Jun 26, 2024
Summary:
In working on diff https://phorge.dev.yugabyte.com/D32682, I found one deadlock
that caused a test to be flaky. The dead lock involved 3 threads.

Thread A is executing yb::RWCLock::UpgradeToCommitLock:
```
void RWCLock::UpgradeToCommitLock() {
  lock_.lock();
  DCHECK(write_locked_);
#ifndef NDEBUG
  bool first_wait = true;
  while (reader_count_ > 0) {
    if (!no_readers_.TimedWait(first_wait ? kFirstWait : kSecondWait)) {
      std::ostringstream ss;
      ss << "Too long commit lock wait, num readers: " << reader_count_
         << ", current thread stack: " << GetStackTrace();
```

Thread A blocks because `reader_count_ > 0`, while it holds the write lock:

```
  DCHECK(write_locked_);
```

Thread A is executing the work flow of an ALTER TABLE statement with a call chain

```
yb::RWCLock::UpgradeToCommitLock
yb::CowObject<yb::master::PersistentTableInfo>::CommitMutation
yb::CowWriteLock<yb::master::PersistentTableInfo>::Commit
yb::master::CatalogManager::ClearYsqlDdlTxnState
yb::master::CatalogManager::HandleSuccessfulYsqlDdlTxn
yb::master::CatalogManager::YsqlDdlTxnCompleteCallbackInternal

```
From test log I see the thread (Thread B) that caused `reader_count_ > 0` is
calling `GetTableSchemaInternal`:

Thread B:

```
Status CatalogManager::GetTableSchemaInternal(const GetTableSchemaRequestPB* req,
                                              GetTableSchemaResponsePB* resp,
                                              bool get_fully_applied_indexes) {
  VLOG(1) << "Servicing GetTableSchema request for " << req->ShortDebugString();

  // Lookup the table and verify if it exists.
  TRACE("Looking up table");
  scoped_refptr<TableInfo> table = VERIFY_RESULT(FindTable(req->table()));

  TRACE("Locking table");
  auto l = table->LockForRead();

......
  auto nsinfo = FindNamespaceById(table->namespace_id());
```

Thread B causes `reader_count_ > 0` via `table->LockForRead()` and it is blocked
inside `FindNamespaceById` in order to get a shared lock on `mutex_` of `CatalogManager`:

```
Result<scoped_refptr<NamespaceInfo>> CatalogManager::FindNamespaceById(
    const NamespaceId& id) const {
  SharedLock lock(mutex_);
  return FindNamespaceByIdUnlocked(id);
}
```

The `mutex_` is held exclusively by Thread C from `CatalogManager::CreateTable`:
```
  {
    LockGuard lock(mutex_);
    auto ns_lock = ns->LockForRead();
......
        if (IsIndex(req)) {
          TRACE("Locking indexed table");
          indexed_table_write_lock = indexed_table->LockForWrite();
        }
......
  }
```

Note Thread C exclusively holds `mutex_` via `LockGuard`, and it is blocked on
`LockForWrite`, which calls `RWCLock::WriteLock` and cannot proceed because
`write_locked_` (owned by Thread A).

```
void RWCLock::WriteLock() {
  ThreadRestrictions::AssertWaitAllowed();

  MutexLock l(lock_);
  // Wait for any other mutations to finish.
#ifndef NDEBUG
  bool first_wait = true;
  while (write_locked_) {
    if (!no_mutators_.TimedWait(first_wait ? kFirstWait : kSecondWait)) {
```

In summary, we have a deadlock cycle:
Thread A (holds write_locked_) waits for Thread B to release reader_count_
Thread B (holds reader_count_) waits for Thread C to release mutex_
Thread C (holds mutex_) waits for Thread A to release write_locked_

To break the dead lock cycle, I made changes in Thread C, so that it will release
the exclusive lock on `mutex_` before calling `LockForWrite`. In addition,
it does not need to re-acquire `mutex_` after `LockForWrite` because the rest
code in the relevant scope does not need to be protected by having an exclusive
lock on `mutex_`.
Jira: DB-11784

Test Plan: Tested in the context of https://phorge.dev.yugabyte.com/D32682.

Reviewers: hsunder, asrivastava, zdrudi

Reviewed By: asrivastava, zdrudi

Subscribers: zdrudi, ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D35843
jasonyb pushed a commit that referenced this issue Jun 27, 2024
Summary:
 411a32e [DB-11813] Rename ysql_conn_mgr_idle_or_pending_clients metric name
 c2e13ef [#15682] YSQL: Fix stack_is_too_deep function in ASAN
 ef31455 [PLAT-14188] Fixing upgrade disk availability check
 db6b1b7 [#23004] YCQL: Fix tserver crash due to NULL pointer dereference
 0ada80a [PLAT-14433] Use correct kubeconfig for edit provider validation
 eccbc10 [PLAT-14414] Enable Kubernetes provider validation by default
 199f679 [PLAT-14324]: Move all node agent based flags from BETA to INTERNAL in Provider Conf keys file
 86a865d [PLAT-14443] YBA Installer wait for ready time configurable.
 ac184a8 [#22882] YSQL: Fix deadlock in DDL atomicity
 a4218fb [Docs] Sort feature to tables (Where fulfills the criteria) (#22836)
 2f267ca [#22996] xCluster: Add SOURCE_UNREACHABLE and SYSTEM_ERROR enums

Skipped due to conflict:
dee7691 [#21534] docdb: Set owner correctly for cloned databases
34632ba [PLAT-14495] Set up the node_exporter for ynp
7c99ff9 [#22876][#22773] CDCSDK: Add new yb-admin command to remove user table from CDCSDK stream
4e9a81c [#22876][#22835][#22773] CDCSDK: Remove non-eligible tables for CDC from existing CDCSDK stream
f2e574e [#23013] xClusterDDLRepl: Allow table_ids for GetXClusterStreams

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: tfoucher, sanketh, jenkins-bot

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D36184
myang2021 added a commit that referenced this issue Jun 27, 2024
Summary:
In working on https://phorge.dev.yugabyte.com/D32682, the test
PgDdlAtomicityColocatedStressTest.TestFailDdlRollback showed logs like the following:

```
[m-1] W0614 20:35:35.849383 26140 ysql_ddl_handler.cc:551] YsqlDdlTxnCompleteCallback failed: Corruption (yb/util/uuid.cc:312): Invalid length of binary data with TransactionId '': 0 (expected 16)
```

The relevant code are:
```
      const string pb_txn_id = table->LockForRead()->pb_transaction_id();
      return background_tasks_thread_pool_->SubmitFunc(
        [this, pb_txn_id, is_committed, epoch]() {
            WARN_NOT_OK(YsqlDdlTxnCompleteCallback(pb_txn_id, is_committed, epoch),
                        "YsqlDdlTxnCompleteCallback failed");
        }
      );
```

and
```
Status CatalogManager::YsqlDdlTxnCompleteCallback(const string& pb_txn_id,
                                                  bool is_committed,
                                                  const LeaderEpoch& epoch) {

  SleepFor(MonoDelta::FromMicroseconds(RandomUniformInt<int>(0,
    FLAGS_TEST_ysql_max_random_delay_before_ddl_verification_usecs)));

  auto txn = VERIFY_RESULT(FullyDecodeTransactionId(pb_txn_id));

```

It is possible that `pb_txn_id` of `table` is already cleared. In this case
`pb_txn_id` is an empty string. The error is reported when an empty string is
passed to `FullyDecodeTransactionId`.

I made a fix not to call `YsqlDdlTxnCompleteCallback` if all the tables in the
DDL transaction verifier state have their `pb_txn_id` already cleared. In other
words, we only call `YsqlDdlTxnCompleteCallback` when we can find a table
in the DDL transaction verifier state that has `pb_txn_id` and this `pb_txn_id`
is the transaction id that we pass to `YsqlDdlTxnCompleteCallback` to process
all the tables in this DDL transaction.
Jira: DB-11784

Original commit: ec88a5d / D35879

Test Plan:
Tested in the context of https://phorge.dev.yugabyte.com/D32682.

./yb_build.sh --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailDdlRollback -n 20 --tp 2

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql, ybase

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36192
myang2021 added a commit that referenced this issue Jun 28, 2024
Summary:
In working on diff https://phorge.dev.yugabyte.com/D32682, I found one deadlock
that caused a test to be flaky. The dead lock involved 3 threads.

Thread A is executing yb::RWCLock::UpgradeToCommitLock:
```
void RWCLock::UpgradeToCommitLock() {
  lock_.lock();
  DCHECK(write_locked_);
  bool first_wait = true;
  while (reader_count_ > 0) {
    if (!no_readers_.TimedWait(first_wait ? kFirstWait : kSecondWait)) {
      std::ostringstream ss;
      ss << "Too long commit lock wait, num readers: " << reader_count_
         << ", current thread stack: " << GetStackTrace();
```

Thread A blocks because `reader_count_ > 0`, while it holds the write lock:

```
  DCHECK(write_locked_);
```

Thread A is executing the work flow of an ALTER TABLE statement with a call chain

```
yb::RWCLock::UpgradeToCommitLock
yb::CowObject<yb::master::PersistentTableInfo>::CommitMutation
yb::CowWriteLock<yb::master::PersistentTableInfo>::Commit
yb::master::CatalogManager::ClearYsqlDdlTxnState
yb::master::CatalogManager::HandleSuccessfulYsqlDdlTxn
yb::master::CatalogManager::YsqlDdlTxnCompleteCallbackInternal

```
From test log I see the thread (Thread B) that caused `reader_count_ > 0` is
calling `GetTableSchemaInternal`:

Thread B:

```
Status CatalogManager::GetTableSchemaInternal(const GetTableSchemaRequestPB* req,
                                              GetTableSchemaResponsePB* resp,
                                              bool get_fully_applied_indexes) {
  VLOG(1) << "Servicing GetTableSchema request for " << req->ShortDebugString();

  // Lookup the table and verify if it exists.
  TRACE("Looking up table");
  scoped_refptr<TableInfo> table = VERIFY_RESULT(FindTable(req->table()));

  TRACE("Locking table");
  auto l = table->LockForRead();

......
  auto nsinfo = FindNamespaceById(table->namespace_id());
```

Thread B causes `reader_count_ > 0` via `table->LockForRead()` and it is blocked
inside `FindNamespaceById` in order to get a shared lock on `mutex_` of `CatalogManager`:

```
Result<scoped_refptr<NamespaceInfo>> CatalogManager::FindNamespaceById(
    const NamespaceId& id) const {
  SharedLock lock(mutex_);
  return FindNamespaceByIdUnlocked(id);
}
```

The `mutex_` is held exclusively by Thread C from `CatalogManager::CreateTable`:
```
  {
    LockGuard lock(mutex_);
    auto ns_lock = ns->LockForRead();
......
        if (IsIndex(req)) {
          TRACE("Locking indexed table");
          indexed_table_write_lock = indexed_table->LockForWrite();
        }
......
  }
```

Note Thread C exclusively holds `mutex_` via `LockGuard`, and it is blocked on
`LockForWrite`, which calls `RWCLock::WriteLock` and cannot proceed because
`write_locked_` (owned by Thread A).

```
void RWCLock::WriteLock() {
  ThreadRestrictions::AssertWaitAllowed();

  MutexLock l(lock_);
  // Wait for any other mutations to finish.
  bool first_wait = true;
  while (write_locked_) {
    if (!no_mutators_.TimedWait(first_wait ? kFirstWait : kSecondWait)) {
```

In summary, we have a deadlock cycle:
Thread A (holds write_locked_) waits for Thread B to release reader_count_
Thread B (holds reader_count_) waits for Thread C to release mutex_
Thread C (holds mutex_) waits for Thread A to release write_locked_

To break the dead lock cycle, I made changes in Thread C, so that it will release
the exclusive lock on `mutex_` before calling `LockForWrite`. In addition,
it does not need to re-acquire `mutex_` after `LockForWrite` because the rest
code in the relevant scope does not need to be protected by having an exclusive
lock on `mutex_`.
Jira: DB-11784

Original commit: ac184a8 / D35843

Test Plan: Tested in the context of https://phorge.dev.yugabyte.com/D32682.

Reviewers: hsunder, asrivastava, zdrudi

Reviewed By: asrivastava

Subscribers: yql, ybase, zdrudi

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36229
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024.1 Backport Required area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants