Skip to content

Commit

Permalink
[#21199] YSQL: Refactor TRUNCATE to use table rewrite
Browse files Browse the repository at this point in the history
Summary:
This diff changes the `TRUNCATE` implementation (for both colocated and non-colocated tables) to use table rewrite.
The new implementation is guarded by the auto-flag for table rewrite - `yb_enable_alter_table_rewrite`.

Specific code changes:

- Move the logic for creating a new DocDB table (relfilenode) from `cluster.c` to `YbRelationSetNewRelfileNode` in `pg_yb_utils.c`.
- Change the `TRUNCATE` implementation: Simply drop the old DocDB table and create a new one using `YbRelationSetNewRelfileNode()`. After the new DocDB table (relfilenode) is created, `reindex_relation` re-creates the DocDB tables associated with the secondary indexes. Note: the PG OIDs of the table and its indexes remain unchanged.
- `TRUNCATE` operations will now increment the catalog version, as they change the metadata of the affected tables (`pg_class.relfilenode`).
- Changes in `index.c` (for `reindex_index()` and `RelationSetNewRelfilenode()`) -- move `YBCDropIndex()` out of `RelationSetNewRelfilenode()` and into the caller (`reindex_index()`). We do not need to perform a `YBCDropIndex()` in the case of a table rewrite, because dropping the base DocDB table will automatically drop all the secondary indexes. Additionally, execute `reindex_index()` for PK indexes. Although PK indexes do not need a new "relfilenode" in YB, we still want to reset their `pg_class.reltuples` entry (done in `RelationSetNewRelfilenode()`).
- Since the new `TRUNCATE` approach does not require table level tombstones for colocated tables, we can, in the future, avoid tombstone checks on colocated tables. For this purpose, the diff also introduces a new tablet metadata field - `skip_table_tombstone_check`. This field is set for all newly created DocDB tables. In a subsequent diff, this field will also be set upon compaction, so that colocated tables that were created (and possibly truncated) on an older version of YB will not require tombstone checks.
- Add an `is_truncate` field to create table requests -- used to allow `TRUNCATE`s on CDC tables if `enable_truncate_cdcsdk_table` is set.
- Allow truncate with PITR. The existing flag `enable_truncate_on_pitr_table` now only applies to YCQL tables.
- The flag `enable_delete_truncate_xcluster_replicated_table` now only applies to YCQL. Deletion is always allowed for YSQL tables in xCluster (existing behavior) and now, `TRUNCATE` is always disallowed.

Test changes:
- `TestMasterMetrics.java` : `TRUNCATE` uses table rewrite, so the relevant metrics are now under `Create_Tablet_Attempt` / `Create_Tablet_Task`.
- `TestPgSavepoints.java` : add an extra commit to tests that use `TRUNCATE` so that we don't mix `TRUNCATE` with other DMLs (this would lead to the transaction being aborted).
- `TestYbBackup.java` : can no longer use `TRUNCATE` in a test that performs a backup/restore into the same DB, as we do not support a restore into the same DB after DDLs have been performed.
-  `colocation-test.cc`: Deleted `CreateAndTruncateOnSeparateTables`. `TRUNCATE` bumps catalog version, so it is not expected to work concurrently with a `CREATE INDEX`.
- `colocation-test.cc`: Refactor `InsertOnTablesWithFK`, `TransactionsOnTablesWithFK` : De-duplicate code between these two tests. Also, rewrite the test logic to fix potential flakiness.
- `pg_fkey-test.cc`: All the tests in this file check RPC counts, which now change due to a different implementation of `TRUNCATE`. Simply change the `TRUNCATE` in `SetUp()` to a `DELETE` to fix this.
- `pg_libpq-test.cc`: Reduce the number of iterations in `ConcurrentInsertAndDeleteOnTablesWithForeignKey` to prevent TSAN failures due to timeout.
- `pg_tablet_shutdown-test.cc`: Disable retry on cache refresh so that we can correctly test abort due to tablet shutdown.
- Refer to the "Test Plan" for new tests.

Upgrade/Downgrade safety:
- This feature is scoped under the existing `LocalPersisted` autoflag `yb_enable_alter_table_rewrite`. The feature will only be turned on after upgrade is finalized.

Jira: DB-10129

Test Plan:
Regress tests: `yb_truncate` and `yb_pg_truncate`
Atomicity tests: `PgLibPqTableRewrite.TestTableRewriteRollback`, `PgLibPqTableRewrite.TestTableRewriteSuccess`
Backup/restore tests:
`YBBackupTestWithTableRewrite.TestYSQLBackupAndRestoreAfterRewrite`, `TestYSQLBackupAndRestoreAfterFailedRewrite`
PITR test: `YbAdminSnapshotScheduleTestWithYsqlParam.PgsqlTestTruncate`
xCluster test: `XClusterYsqlTest.TestTableRewriteOperations`
CDC test: `CDCSDKYsqlTest.TestTableRewriteOperations`

Reviewers: myang, asrivastava, timur, qhu, bogdan

Reviewed By: myang, timur

Subscribers: timur, arybochkin, yql, ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D32071
  • Loading branch information
fizaaluthra committed Mar 21, 2024
1 parent c1bd57d commit fad94f7
Show file tree
Hide file tree
Showing 56 changed files with 591 additions and 336 deletions.
13 changes: 5 additions & 8 deletions java/yb-pgsql/src/test/java/org/yb/pgsql/TestMasterMetrics.java
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,6 @@ public class TestMasterMetrics extends BasePgSQLTest {
protected static final String CREATE_TASK_METRIC = "Create_Tablet_Task";
protected static final String ALTER_ATTEMPT_METRIC = "Alter_Table_Attempt";
protected static final String ALTER_TASK_METRIC = "Alter_Table_Task";
protected static final String TRUNCATE_ATTEMPT_METRIC = "Truncate_Tablet_Attempt";
protected static final String TRUNCATE_TASK_METRIC = "Truncate_Tablet_Task";

@Override
public int getTestMethodTimeoutSec() {
return 1800;
Expand Down Expand Up @@ -100,16 +97,16 @@ public void testMasterMetrics() throws Exception {
assertTrue(miniCluster.getNumTServers() <=
getMetricsCount(ALTER_ATTEMPT_METRIC) - alterAttempt);


long truncateTask = getMetricsCount(TRUNCATE_TASK_METRIC);
long truncateAttempt = getMetricsCount(TRUNCATE_ATTEMPT_METRIC);
// Truncates use table rewrites, so they should also be counted as create tasks.
long truncateTask = getMetricsCount(CREATE_TASK_METRIC);
long truncateAttempt = getMetricsCount(CREATE_ATTEMPT_METRIC);
try (Statement statement = connection.createStatement()) {
statement.execute("TRUNCATE TABLE test");
}
assertTrue(miniCluster.getNumTServers() <=
getMetricsCount(TRUNCATE_TASK_METRIC) - truncateTask);
getMetricsCount(CREATE_TASK_METRIC) - truncateTask);
assertTrue(miniCluster.getNumTServers() <=
getMetricsCount(TRUNCATE_ATTEMPT_METRIC) - truncateAttempt);
getMetricsCount(CREATE_ATTEMPT_METRIC) - truncateAttempt);

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ public void testStackedSavepoints() throws Exception {
assertEquals(OptionalInt.empty(), getSingleValue(conn, 3));
assertEquals(OptionalInt.empty(), getSingleValue(conn, 5));
assertEquals(OptionalInt.empty(), getSingleValue(conn, 7));
conn.commit();
statement.execute("truncate t");
conn.commit();
}
Expand All @@ -272,6 +273,7 @@ public void testSavepointCommitWithAbort() throws Exception {
assertEquals(getSingleValue(conn, 1), OptionalInt.of(2));
assertEquals(getSingleValue(conn, 3), OptionalInt.empty());
assertEquals(getSingleValue(conn, 5), OptionalInt.of(6));
conn.commit();
statement.execute("truncate t");
conn.commit();
}
Expand Down Expand Up @@ -314,6 +316,7 @@ public void testSavepointLargeApplyWithAborts() throws Exception {
}
}
}
conn.commit();
statement.execute("truncate t");
conn.commit();
}
Expand Down
4 changes: 2 additions & 2 deletions java/yb-pgsql/src/test/java/org/yb/pgsql/TestYbBackup.java
Original file line number Diff line number Diff line change
Expand Up @@ -715,9 +715,9 @@ private void doColocatedDatabaseRestoreToOriginalDB() throws Exception {
"--keyspace", "ysql." + initialDBName);
backupDir = new JSONObject(output).getString("snapshot_url");

// Truncate tables after taking the snapshot.
// Delete all rows from the tables after taking a snapshot.
for (int j = 1; j <= num_tables; ++j) {
stmt.execute("TRUNCATE TABLE test_tbl" + String.valueOf(j));
stmt.execute("DELETE FROM test_tbl" + String.valueOf(j));
}

// Restore back into this same database, this way all the ids will happen to be the same.
Expand Down
1 change: 1 addition & 0 deletions src/postgres/src/backend/bootstrap/ybcbootstrap.c
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ void YBCCreateSysCatalogTable(const char *table_name,
false /* is_matview */,
InvalidOid /* pg_table_oid */,
InvalidOid /* old_relfilenode_oid */,
false /* is_truncate */,
&yb_stmt));

/* Add all key columns first, then the regular columns */
Expand Down
40 changes: 23 additions & 17 deletions src/postgres/src/backend/catalog/index.c
Original file line number Diff line number Diff line change
Expand Up @@ -4072,9 +4072,11 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,

/*
* YB pk indexes share the same storage as their tables, so it is not
* possible to reindex them.
* possible to reindex them. However, this code-path may be internally
* invoked by table rewrite, and we need to reset the index's reltuples.
*/
if (iRel->rd_index->indisprimary && IsYBRelation(iRel))
if (!is_yb_table_rewrite && iRel->rd_index->indisprimary &&
IsYBRelation(iRel))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("cannot reindex nontemporary pk indexes"),
Expand Down Expand Up @@ -4412,23 +4414,27 @@ reindex_relation(Oid relid, int flags, int options, bool is_yb_table_rewrite,
Relation iRel = index_open(indexOid, AccessExclusiveLock);
if (is_pg_class)
RelationSetIndexList(rel, doneIndexes, InvalidOid);
/*
* For YB relations, we can ignore the primary key index because
* it is an implicit part of the DocDB table.
*/
if (!(IsYBRelation(iRel) && iRel->rd_index->indisprimary))
{
index_close(iRel, AccessExclusiveLock);
reindex_index(indexOid,
!(flags & REINDEX_REL_CHECK_CONSTRAINTS),
persistence, options, is_yb_table_rewrite,
yb_copy_split_options);
}
else
if (IsYBRelation(iRel))
{
index_close(iRel, AccessExclusiveLock);
RemoveReindexPending(indexOid);
if (!is_yb_table_rewrite && !iRel->rd_index->indisprimary)
/*
* Drop the old DocDB table associated with this index.
* This is only required for secondary indexes, because a
* primary index in YB doesn't have a DocDB table separate
* from the base relation's table.
* If this is a table rewrite, the indexes on the table
* will automatically be dropped when the table is dropped.
* Note: The drop isn't finalized until after the txn
* commits/aborts.
*/
YBCDropIndex(iRel);
}
index_close(iRel, AccessExclusiveLock);
reindex_index(indexOid,
!(flags & REINDEX_REL_CHECK_CONSTRAINTS),
persistence, options, is_yb_table_rewrite,
yb_copy_split_options);

CommandCounterIncrement();

/* Index should no longer be in the pending list */
Expand Down
33 changes: 3 additions & 30 deletions src/postgres/src/backend/commands/cluster.c
Original file line number Diff line number Diff line change
Expand Up @@ -725,36 +725,9 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
Assert(OIDNewHeap != InvalidOid);

if (IsYugaByteEnabled() && relpersistence != RELPERSISTENCE_TEMP)
{
CreateStmt *dummyStmt = makeNode(CreateStmt);
dummyStmt->relation = makeRangeVar(NULL, NewHeapName, -1);
Relation pg_constraint = heap_open(ConstraintRelationId,
RowExclusiveLock);
YbATCopyPrimaryKeyToCreateStmt(OldHeap, pg_constraint, dummyStmt);
heap_close(pg_constraint, RowExclusiveLock);
if (yb_copy_split_options)
{
YbGetTableProperties(OldHeap);
dummyStmt->split_options = YbGetSplitOptions(OldHeap);
}
bool is_null;
HeapTuple tuple = SearchSysCache1(RELOID,
ObjectIdGetDatum(RelationGetRelid(OldHeap)));
Datum datum = SysCacheGetAttr(RELOID,
tuple, Anum_pg_class_reloptions, &is_null);
if (!is_null)
dummyStmt->options = untransformRelOptions(datum);
ReleaseSysCache(tuple);
YBCCreateTable(dummyStmt, RelationGetRelationName(OldHeap),
OldHeap->rd_rel->relkind, OldHeapDesc, OIDNewHeap,
namespaceid,
YbGetTableProperties(OldHeap)->tablegroup_oid,
InvalidOid, NewTableSpace, OIDOldHeap,
OldHeap->rd_rel->relfilenode);

if (yb_test_fail_table_rewrite_after_creation)
elog(ERROR, "Injecting error.");
}
YbRelationSetNewRelfileNode(OldHeap, OIDNewHeap,
yb_copy_split_options,
false /* is_truncate */);

ReleaseSysCache(tuple);

Expand Down
9 changes: 5 additions & 4 deletions src/postgres/src/backend/commands/tablecmds.c
Original file line number Diff line number Diff line change
Expand Up @@ -1045,7 +1045,8 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
YBCCreateTable(stmt, relname, relkind, descriptor, relationId,
namespaceId, tablegroupId, colocation_id, tablespaceId,
InvalidOid /* pgTableId */,
InvalidOid /* oldRelfileNodeId */);
InvalidOid /* oldRelfileNodeId */,
false /* isTruncate */);
}

/*
Expand Down Expand Up @@ -1874,13 +1875,13 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
* truncate it in-place, because a rollback would cause the whole
* table or the current physical file to be thrown away anyway.
*/
if (IsYBRelation(rel))
if (IsYBRelation(rel) && !yb_enable_alter_table_rewrite)
{
// Call YugaByte API to truncate tables.
YbTruncate(rel);
}
else if (rel->rd_createSubid == mySubid ||
rel->rd_newRelfilenodeSubid == mySubid)
rel->rd_newRelfilenodeSubid == mySubid || !IsYBRelation(rel))
{
/* Immediate, non-rollbackable truncation is OK */
heap_truncate_one_rel(rel);
Expand Down Expand Up @@ -1938,7 +1939,7 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
* Reconstruct the indexes to match, and we're done.
*/
reindex_relation(heap_relid, REINDEX_REL_PROCESS_TOAST, 0,
false /* is_yb_table_rewrite */,
true /* is_yb_table_rewrite */,
false /* yb_copy_split_options */);
}

Expand Down
3 changes: 2 additions & 1 deletion src/postgres/src/backend/commands/ybccmds.c
Original file line number Diff line number Diff line change
Expand Up @@ -560,7 +560,7 @@ void
YBCCreateTable(CreateStmt *stmt, char *tableName, char relkind, TupleDesc desc,
Oid relationId, Oid namespaceId, Oid tablegroupId,
Oid colocationId, Oid tablespaceId, Oid pgTableId,
Oid oldRelfileNodeId)
Oid oldRelfileNodeId, bool isTruncate)
{
bool is_internal_rewrite = oldRelfileNodeId != InvalidOid;
if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE &&
Expand Down Expand Up @@ -767,6 +767,7 @@ YBCCreateTable(CreateStmt *stmt, char *tableName, char relkind, TupleDesc desc,
is_matview,
pgTableId,
oldRelfileNodeId,
isTruncate,
&handle));

CreateTableAddColumns(handle, desc, primary_key, is_colocated_via_database,
Expand Down
36 changes: 28 additions & 8 deletions src/postgres/src/backend/utils/cache/relcache.c
Original file line number Diff line number Diff line change
Expand Up @@ -5383,16 +5383,36 @@ RelationSetNewRelfilenode(Relation relation, char persistence,

if (IsYBRelation(relation))
{
/* Currently, this function is only used during reindex in YB. */
Assert(relation->rd_rel->relkind == RELKIND_INDEX);
/*
* Drop the old DocDB table associated with this index.
* Note: The drop isn't finalized until after the txn commits/aborts.
* Currently, this function is only used during reindex/truncate in YB.
*/
YBCDropIndex(relation);
/* Create a new DocDB table for the index. */
YbIndexSetNewRelfileNode(relation, newrelfilenode,
yb_copy_split_options);
Assert(relation->rd_rel->relkind == RELKIND_INDEX ||
relation->rd_rel->relkind == RELKIND_RELATION);

if (relation->rd_rel->relkind == RELKIND_INDEX &&
!relation->rd_index->indisprimary)
/*
* Note: caller is responsible for dropping the old DocDB table
* associated with the index, if required.
* Create a new DocDB table for the secondary index.
* This is not required for primary indexes, as they are
* an implicit part of the base table.
*/
YbIndexSetNewRelfileNode(relation, newrelfilenode,
yb_copy_split_options);
else if (relation->rd_rel->relkind == RELKIND_RELATION)
{
/*
* Drop the old DocDB table associated with this relation.
* Note: The drop isn't finalized until after the txn
* commits/aborts.
*/
YBCDropTable(relation);
/* Create a new DocDB table for the relation. */
YbRelationSetNewRelfileNode(relation, newrelfilenode,
yb_copy_split_options,
true /* is_truncate */);
}
}
else
{
Expand Down
59 changes: 58 additions & 1 deletion src/postgres/src/backend/utils/misc/pg_yb_utils.c
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@
#include "libpq/hba.h"
#include "libpq/libpq.h"
#include "libpq/libpq-be.h"
#include "nodes/makefuncs.h"
#include "optimizer/cost.h"
#include "parser/parse_utilcmd.h"
#include "tcop/utility.h"
Expand Down Expand Up @@ -1709,7 +1710,6 @@ YbDdlModeOptional YbGetDdlMode(
case T_DefineStmt: // CREATE OPERATOR/AGGREGATE/COLLATION/etc
case T_CommentStmt: // COMMENT (create new comment)
case T_RuleStmt: // CREATE RULE
case T_TruncateStmt: // TRUNCATE changes system catalog in case of non-YB (i.e. TEMP) tables
case T_YbCreateProfileStmt:
/*
* Simple add objects are not breaking changes, and they do not even require
Expand All @@ -1719,6 +1719,20 @@ YbDdlModeOptional YbGetDdlMode(
is_breaking_change = false;
break;

case T_TruncateStmt:
/*
* TRUNCATE (on YB relations) using the old approach does not
* make any system catalog changes, so it doesn't require a
* version increment.
* TRUNCATE using the new rewrite approach changes the
* relfilenode field in pg_class, so it requires a version
* increment.
*/
if (!yb_enable_alter_table_rewrite)
is_version_increment = false;
is_breaking_change = false;
break;

case T_ViewStmt: // CREATE VIEW
is_breaking_change = false;

Expand Down Expand Up @@ -4542,3 +4556,46 @@ YbGetRedactedQueryString(const char* query, int query_len,
*redacted_query = RedactPasswordIfExists(*redacted_query);
*redacted_query_len = strlen(*redacted_query);
}

/*
* In YB, a "relfilenode" corresponds to a DocDB table.
* This function creates a new DocDB table for the given table,
* with UUID corresponding to the given relfileNodeId.
*/
void
YbRelationSetNewRelfileNode(Relation rel, Oid newRelfileNodeId,
bool yb_copy_split_options, bool is_truncate)
{
CreateStmt *dummyStmt = makeNode(CreateStmt);
dummyStmt->relation =
makeRangeVar(NULL, RelationGetRelationName(rel), -1);
Relation pg_constraint = heap_open(ConstraintRelationId,
RowExclusiveLock);
YbATCopyPrimaryKeyToCreateStmt(rel, pg_constraint, dummyStmt);
heap_close(pg_constraint, RowExclusiveLock);
if (yb_copy_split_options)
{
YbGetTableProperties(rel);
dummyStmt->split_options = YbGetSplitOptions(rel);
}
bool is_null;
HeapTuple tuple = SearchSysCache1(RELOID,
ObjectIdGetDatum(RelationGetRelid(rel)));
Datum datum = SysCacheGetAttr(RELOID,
tuple, Anum_pg_class_reloptions, &is_null);
if (!is_null)
dummyStmt->options = untransformRelOptions(datum);
ReleaseSysCache(tuple);
YBCCreateTable(dummyStmt, RelationGetRelationName(rel),
rel->rd_rel->relkind, RelationGetDescr(rel),
newRelfileNodeId,
RelationGetNamespace(rel),
YbGetTableProperties(rel)->tablegroup_oid,
InvalidOid, rel->rd_rel->reltablespace,
RelationGetRelid(rel),
rel->rd_rel->relfilenode,
is_truncate);

if (yb_test_fail_table_rewrite_after_creation)
elog(ERROR, "Injecting error.");
}
3 changes: 2 additions & 1 deletion src/postgres/src/include/commands/ybccmds.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ extern void YBCCreateTable(CreateStmt *stmt,
Oid colocationId,
Oid tablespaceId,
Oid pgTableId,
Oid oldRelfileNodeId);
Oid oldRelfileNodeId,
bool isTruncate);

extern void YBCDropTable(Relation rel);

Expand Down
3 changes: 3 additions & 0 deletions src/postgres/src/include/pg_yb_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -1108,4 +1108,7 @@ extern SortByDir YbSortOrdering(SortByDir ordering, bool is_colocated, bool is_t
extern void YbGetRedactedQueryString(const char* query, int query_len,
const char** redacted_query, int* redacted_query_len);

extern void YbRelationSetNewRelfileNode(Relation rel, Oid relfileNodeId,
bool yb_copy_split_options,
bool is_truncate);
#endif /* PG_YB_UTILS_H */
Loading

0 comments on commit fad94f7

Please sign in to comment.