Skip to content

Commit

Permalink
[BACKPORT 2024.1][#23923] YSQL: Fix DDL atomicity check failure
Browse files Browse the repository at this point in the history
Summary:
The unit test Colocation/YbAdminSnapshotScheduleTestWithYsqlParam.PgsqlAddUniqueConstraint/1
in asan build fails a lot due to a newly added assertion by commit
a6981f2 like:

```
F20240916 20:59:08 ../../src/yb/master/catalog_entity_info.cc:531] Check failed: !l->is_being_created_by_ysql_ddl_txn()
```

After debugging, I found that the assertion failure is related to a DDL statement
that does alter table add a unique constraint. It creates an index for
the base table. The DDL transaction involves two tables: the base table
and the index. If we compare the base table's schema to decide whether
the PG side of the DDL transaction has committed or aborted, we will
find that the base table's schema does not change. So it is not possible
to use base table's schema to make a decision of the transaction's
commit/abort state. In this case we must use the index for schema
comparison because it is newly created.

The bug is that the base table's schema is used for schema comparison
and we returned std::nullopt to signify that we do not know whether the
DDL transaction is committed or aborted. However, the std::nullopt is
not only used by the base table, but also used by the index to call
`TableInfo::IsBeingDroppedDueToDdlTxn` because there are two tables in
the DDL transaction and we just loop through them using the same
std::nullopt. For the index, it is being created so the first DCHECK
below fails:

```
  if (!txn_success.has_value()) {
    // This means the DDL txn only increments the schema version and does not change
    // the DocDB schema at all. It cannot be one of the following 2 cases.
    DCHECK(!l->is_being_created_by_ysql_ddl_txn());
    DCHECK(!l->is_being_deleted_by_ysql_ddl_txn());
    return false;
  }
```

I fixed the DCHECK failure by changing the API of `TableInfo::IsBeingDroppedDueToDdlTxn` to
not use std::optional<bool> for its argument `txn_success`. The function used to
take a simple `bool txn_success` as argument. I get it back and only call the
function when the argument isn't `std::nullopt`. In other words, I now only call
the function when the caller has already figured out the DDL transaction's
commit/abort status.

Added a new unit test that would fail the assertion before the fix.

NOTE: this diff only fixes the DCHECK failure so that we are not worse than before.
However we are not better than before either. There is still a flaw with the current
DDL atomicity work flow based upon schema comparison when the DDL transaction
involves multiple tables. Only the first table's schema is used for comparison
on the grounds that we only need to compare one table's schema against PG catalog
to determine the DDL transaction's committed/aborted status. While this assumption
is generally true, it fails when the first table's schema does not change and only involves
a schema version increment. Currently it is not straightforward to revise the work flow
to pick the right table which can be used to tell the DDL transaction's status based upon
schema comparison. Therefore if the PG side has aborted the DDL transaction, schema
comparison based method will conclude that there is no schema change of the base table
and therefore it simply clears the DDL verification state for **both the base table and the index**,
which is equivalent to a successful commit. This isn't right and the DocDB index will be left
as garbage that isn't cleaned up. This issue is tracked separately by #23988
Jira: DB-12825

Original commit: 8d228a8 / D38107

Test Plan:
./yb_build.sh debug --cxx-test pg_ddl_atomicity-test --gtest_filter PgDdlAtomicityTest.TestAlterTableAddUniqueConstraint -n 10

./yb_build.sh asan --cxx-test tools_yb-admin-snapshot-schedule-test --gtest_filter Colocation/YbAdminSnapshotScheduleTestWithYsqlParam.PgsqlAddUniqueConstraint/1 --clang17 -n 10 --tp 1

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D38224
  • Loading branch information
myang2021 committed Sep 20, 2024
1 parent 793356b commit 5298877
Show file tree
Hide file tree
Showing 4 changed files with 34 additions and 16 deletions.
14 changes: 3 additions & 11 deletions src/yb/master/catalog_entity_info.cc
Original file line number Diff line number Diff line change
Expand Up @@ -518,24 +518,16 @@ TableType TableInfo::GetTableType() const {
return LockForRead()->pb.table_type();
}

bool TableInfo::IsBeingDroppedDueToDdlTxn(
const std::string& pb_txn_id, std::optional<bool> txn_success) const {
bool TableInfo::IsBeingDroppedDueToDdlTxn(const std::string& pb_txn_id, bool txn_success) const {
auto l = LockForRead();
if (l->pb_transaction_id() != pb_txn_id) {
return false;
}
if (!txn_success.has_value()) {
// This means the DDL txn only increments the schema version and does not change
// the DocDB schema at all. It cannot be one of the following 2 cases.
DCHECK(!l->is_being_created_by_ysql_ddl_txn());
DCHECK(!l->is_being_deleted_by_ysql_ddl_txn());
return false;
}
// The table can be dropped in 2 cases due to a DDL:
// 1. This table was created by a transaction that subsequently aborted.
// 2. This is a successful transaction that DROPs the table.
return (l->is_being_created_by_ysql_ddl_txn() && !*txn_success) ||
(l->is_being_deleted_by_ysql_ddl_txn() && *txn_success);
return (l->is_being_created_by_ysql_ddl_txn() && !txn_success) ||
(l->is_being_deleted_by_ysql_ddl_txn() && txn_success);
}

void TableInfo::AddTablet(const TabletInfoPtr& tablet) {
Expand Down
3 changes: 1 addition & 2 deletions src/yb/master/catalog_entity_info.h
Original file line number Diff line number Diff line change
Expand Up @@ -608,8 +608,7 @@ class TableInfo : public RefCountedThreadSafe<TableInfo>,
return GetTableType() == REDIS_TABLE_TYPE;
}

bool IsBeingDroppedDueToDdlTxn(
const std::string& txn_id_pb, std::optional<bool> txn_success) const;
bool IsBeingDroppedDueToDdlTxn(const std::string& txn_id_pb, bool txn_success) const;

// Add a tablet to this table.
void AddTablet(const TabletInfoPtr& tablet);
Expand Down
6 changes: 3 additions & 3 deletions src/yb/master/ysql_ddl_handler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -255,13 +255,13 @@ Status CatalogManager::YsqlDdlTxnCompleteCallback(const string& pb_txn_id,
<< " and is no longer bound by txn " << pb_txn_id;
continue;
}
if (table->is_index()) {
if (table->is_index() && is_committed.has_value()) {
// This is an index. If the indexed table is being deleted or marked for deletion, then skip
// doing anything as the deletion of the table will delete this index.
const auto& indexed_table_id = table->indexed_table_id();
auto indexed_table = VERIFY_RESULT(FindTableById(indexed_table_id));
if (table->IsBeingDroppedDueToDdlTxn(pb_txn_id, is_committed) &&
indexed_table->IsBeingDroppedDueToDdlTxn(pb_txn_id, is_committed)) {
if (table->IsBeingDroppedDueToDdlTxn(pb_txn_id, *is_committed) &&
indexed_table->IsBeingDroppedDueToDdlTxn(pb_txn_id, *is_committed)) {
LOG(INFO) << "Skipping DDL transaction verification for index " << table->ToString()
<< " as the indexed table " << indexed_table->ToString()
<< " is also being dropped";
Expand Down
27 changes: 27 additions & 0 deletions src/yb/yql/pgwrapper/pg_ddl_atomicity-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1775,5 +1775,32 @@ TEST_F(PgDdlAtomicityTest, TestCreateColocatedTable) {
ASSERT_OK(conn.Execute("CREATE TABLE foo(id int)"));
}

TEST_F(PgDdlAtomicityTest, TestAlterTableAddUniqueConstraint) {
auto conn = ASSERT_RESULT(Connect());
auto client = ASSERT_RESULT(cluster_->CreateClient());
ASSERT_OK(cluster_->SetFlagOnTServers(
"report_ysql_ddl_txn_status_to_master", "false"));
ASSERT_OK(conn.Execute("CREATE TABLE foo(x character varying(20))"));
// Adding a unique constraint does not change the table schema of the base table foo.
// If we use schema comparison of foo, we will not be able to tell whether the
// DDL transaction has committed at PG side or not. In this case we must use schema
// comparison of the index x_unique instead.
ASSERT_EQ(ASSERT_RESULT(client->ListTables("x_unique")).size(), 0);
ASSERT_OK(conn.Execute("ALTER TABLE foo ADD CONSTRAINT x_unique UNIQUE(x)"));
ASSERT_EQ(ASSERT_RESULT(client->ListTables("x_unique")).size(), 1);

ASSERT_OK(conn.Execute("CREATE TABLE bar (y character varying(20))"));
ASSERT_EQ(ASSERT_RESULT(client->ListTables("y_unique")).size(), 0);
ASSERT_OK(conn.Execute("SET yb_test_fail_next_ddl=true"));
// As of 2024-09-17, the aborted ALTER TABLE bar statement leaves an orphan index
// inside DocDB that is not garbage collected. But its existence will not prevent
// a retry of the same statement to succeed, which will create another index with
// the same name y_unique (but with a different table id).
ASSERT_NOK(conn.Execute("ALTER TABLE bar ADD CONSTRAINT y_unique UNIQUE(y)"));
ASSERT_EQ(ASSERT_RESULT(client->ListTables("y_unique")).size(), 1);
ASSERT_OK(conn.Execute("ALTER TABLE bar ADD CONSTRAINT y_unique UNIQUE(y)"));
ASSERT_EQ(ASSERT_RESULT(client->ListTables("y_unique")).size(), 2);
}

} // namespace pgwrapper
} // namespace yb

0 comments on commit 5298877

Please sign in to comment.