-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] YSQL DDL Atomicity #13358
Labels
area/ysql
Yugabyte SQL (YSQL)
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Comments
deeps1991
added
area/ysql
Yugabyte SQL (YSQL)
status/awaiting-triage
Issue awaiting triage
labels
Jul 19, 2022
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Jul 19, 2022
This was referenced Jul 19, 2022
Adding other folks who might be interested in this issue @phendric-uk @jasonriddell |
yugabyte-ci
added
kind/enhancement
This is an enhancement of an existing feature
and removed
kind/bug
This issue is a bug
labels
Oct 5, 2022
deeps1991
added a commit
that referenced
this issue
Nov 17, 2022
Summary: Currently the metadata for YSQL Tables is stored in both the PG catalog and DocDB schema. Sometimes when a DDL transaction fails, we can hit an issue where these two sources of metadata go out of sync. This is because the PG catalog is modified using the DocDB transactions framework, but the DocDB schema is stored in the Kudu-inherited structures in the YB-Master sys catalog. Thus when a transaction is aborted, only the changes to the PG catalog rollback, but the changes made to the DocDB schema cannot be rolled-back. This patch contains the first phase of changes required to fix this issue. We now enforce that only one DDL that modifies the DocDB schema can occur on a table. This simplifies the problem because, if a DDL transaction starts, the DocDB schema now only needs to maintain two versions of the schema - the current committed schema, and the uncommitted schema of the ongoing transaction. This patch then extends the Transaction GC framework introduced in commit: 88e9d59 to also handle ALTER operations and DROP operations. When a DDL transaction modifies the DocDB schema, the transaction metadata, and the current schema of the table is stored in the SysTablesEntryPB. YB-Master monitors the state of the transaction. Once the transaction ends, YB-Master detects whether the transaction was successful or not by comparing the DocDB schema and PG catalog. If the transaction is aborted, YB-Master will rollback the changes made to the DocDB schema using the state present in the SysTablesEntryPB. **Correctness** Thus, the two schemas can be described to be "eventually consistent" because we can have periods of time where both the schemas do not match. However this mostly not affect correctness because of the following two properties: - When inconsistent, the DocDB schema always has more columns/constraints than PG schema. - Clients always use the PG schema (which is guaranteed to not return uncommitted state) to prepare their read/write requests. These two properties ensure that we can't have orphaned data or failed integrity checks or use DDL entities created by uncommitted transactions because of DocDB schema being inconsistent. **Feature Flag:** ysql_ddl_rollback_enabled Note that this flag is shared across pg backends and YB-Master. This flag is used by the PG backends to determine whether or not to perform best-effort rollback for ALTER TABLE ADD column introduced in 5a6bfcd Note that we do not support changing this flag //**during**// a DDL operation. If this flag is changed to false after a DDL but before the rollback completes, then all future DDL operations will ignore the presence of YsqlTxnVerifierState. Alter operations in particular will clear the YsqlTxnVerifierState as well. Notes: - Since this patch contains only the first phase of changes, this feature is disabled by default using the flag `ysql_ddl_rollback_enabled` - This patch only handles Create Table, Alter Table Add Column, Alter Table Add Primary Key, Alter Table Rename Table/Column and Drop Table - This patch does not have support for databases, materialized views, indexes, colocated tables, PG catalog tables and tablegroups. - GUC `yb_test_fail_next_ddl` has been changed to fail any upcoming DDL operation, not just CREATE TABLE ddl operations. Design Doc: https://docs.google.com/document/d/1YAUnSV9X7Gv2ZVpILUe15NbXbVwgFvpV8IRaj6wFMZo/edit?usp=sharing Test Plan: ybd --cxx-test pg_ddl_atomicity-test Reviewers: ena, nicolas, myang Reviewed By: nicolas, myang Subscribers: zyu, pjain, jmaley, hsunder, yguan, dmitry, ybase, yql Differential Revision: https://phabricator.dev.yugabyte.com/D18398
jayant07-yb
pushed a commit
to jayant07-yb/yugabyte-db
that referenced
this issue
Dec 7, 2022
Summary: Currently the metadata for YSQL Tables is stored in both the PG catalog and DocDB schema. Sometimes when a DDL transaction fails, we can hit an issue where these two sources of metadata go out of sync. This is because the PG catalog is modified using the DocDB transactions framework, but the DocDB schema is stored in the Kudu-inherited structures in the YB-Master sys catalog. Thus when a transaction is aborted, only the changes to the PG catalog rollback, but the changes made to the DocDB schema cannot be rolled-back. This patch contains the first phase of changes required to fix this issue. We now enforce that only one DDL that modifies the DocDB schema can occur on a table. This simplifies the problem because, if a DDL transaction starts, the DocDB schema now only needs to maintain two versions of the schema - the current committed schema, and the uncommitted schema of the ongoing transaction. This patch then extends the Transaction GC framework introduced in commit: 88e9d59 to also handle ALTER operations and DROP operations. When a DDL transaction modifies the DocDB schema, the transaction metadata, and the current schema of the table is stored in the SysTablesEntryPB. YB-Master monitors the state of the transaction. Once the transaction ends, YB-Master detects whether the transaction was successful or not by comparing the DocDB schema and PG catalog. If the transaction is aborted, YB-Master will rollback the changes made to the DocDB schema using the state present in the SysTablesEntryPB. **Correctness** Thus, the two schemas can be described to be "eventually consistent" because we can have periods of time where both the schemas do not match. However this mostly not affect correctness because of the following two properties: - When inconsistent, the DocDB schema always has more columns/constraints than PG schema. - Clients always use the PG schema (which is guaranteed to not return uncommitted state) to prepare their read/write requests. These two properties ensure that we can't have orphaned data or failed integrity checks or use DDL entities created by uncommitted transactions because of DocDB schema being inconsistent. **Feature Flag:** ysql_ddl_rollback_enabled Note that this flag is shared across pg backends and YB-Master. This flag is used by the PG backends to determine whether or not to perform best-effort rollback for ALTER TABLE ADD column introduced in 5a6bfcd Note that we do not support changing this flag //**during**// a DDL operation. If this flag is changed to false after a DDL but before the rollback completes, then all future DDL operations will ignore the presence of YsqlTxnVerifierState. Alter operations in particular will clear the YsqlTxnVerifierState as well. Notes: - Since this patch contains only the first phase of changes, this feature is disabled by default using the flag `ysql_ddl_rollback_enabled` - This patch only handles Create Table, Alter Table Add Column, Alter Table Add Primary Key, Alter Table Rename Table/Column and Drop Table - This patch does not have support for databases, materialized views, indexes, colocated tables, PG catalog tables and tablegroups. - GUC `yb_test_fail_next_ddl` has been changed to fail any upcoming DDL operation, not just CREATE TABLE ddl operations. Design Doc: https://docs.google.com/document/d/1YAUnSV9X7Gv2ZVpILUe15NbXbVwgFvpV8IRaj6wFMZo/edit?usp=sharing Test Plan: ybd --cxx-test pg_ddl_atomicity-test Reviewers: ena, nicolas, myang Reviewed By: nicolas, myang Subscribers: zyu, pjain, jmaley, hsunder, yguan, dmitry, ybase, yql Differential Revision: https://phabricator.dev.yugabyte.com/D18398
deeps1991
added a commit
that referenced
this issue
Jan 21, 2023
…atus to YB-Master Summary: In Part-1, for DDL operations that change the DocDB schema, YB-Master asynchronously polls the transaction status tablet to fetch the status of those DDL transactions. Once the transaction is determined to have completed, Yb-Master performs the relatively expensive operation of comparing the DocDB schema and PG schema to determine whether the transaction was a success to know whether to roll-back the DocDB changes. This behavior is optimized in this patch, wherein at the end of the transaction, YSQL sends the status of the transaction to YB-Master. This way YB-Master can stop polling the transaction status tablet and also skip the relatively expensive schema comparison. If the YSQL backend crashes or is unable to send the status of the transaction to YB-Master, it will still fall-back to the behavior of Part-1 and compare the DocDB and PG schema to know whether the transaction succeeded or failed. **Flags** The above behavior can be controlled by the runtime safe TServer flag **report_ysql_ddl_txn_status_to_master**. It is turned off for now as ysql_ddl_rollback enabled is false Test Plan: ybd --cxx-test pg_ddl_atomicity Reviewers: nicolas, myang, dmitry Reviewed By: myang, dmitry Subscribers: lnguyen, yql, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D18915
1 task
deeps1991
added a commit
that referenced
this issue
Jun 26, 2023
…OLUMN Summary: The current workflow for ALTER TABLE DROP COLUMN is as follows: 1. Start YSQL transaction to update the PG catalog tables 2. Send AlterTable request to the YB-Master. This aborts any ongoing transactions on the table and drops the column. Any future transactions on the table with the old schema fail due to schema version mismatch. 3. YSQL polls the YB-Master until the above Alter operation is complete on all TServers and commits the transaction. The above workflow has the issue that if the YSQL backend crashes during Step 3, it is possible that the changes made to the PG catalog are aborted, but the column (and data) are already dropped on the DocDB side. It is not safe to merely postpone the actual column drop until the end of the transaction. In that case, we could have transactions using the dropped column until either of the following conditions are met: (1) The new table schema is propagated to all the TServers. This is not instantaneous and can be subject to internal or network delays (2) The new catalog version is propagated to all the PG backends (which can take a long time, because Alter Table operations do not cause breaking_catalog_version increment. This means caches will be refreshed on the backends only on transaction boundaries). This patch introduces the following algorithm for dropping a column: 1. Start YSQL transaction to update the PG catalog tables 2. Send AlterTable request to the YB-Master. This aborts any ongoing transactions on the table and marks the column for deletion. The TServers will fail any future operations on this table that access this dropped column. 3. YSQL polls the YB-Master until the above Alter operation is complete on all TServers and completes the transaction. 4. YB-Master detects that this transaction is complete (either by RPC from YSQL backend ([[ https://phabricator.dev.yugabyte.com/D18915 | Part2]]) or through its background task polling the txn status tablet ([[ https://phabricator.dev.yugabyte.com/D18398 | Part1 ]]). 5. If success, it drops the column. If abort, it rolls-back to the previous schema. **Design doc:** https://docs.google.com/document/d/1YAUnSV9X7Gv2ZVpILUe15NbXbVwgFvpV8IRaj6wFMZo/edit#heading=h.mc087k1zv9cr **Note** 1. DROP column now has 2 schema changes. This can cause an increase in the number of SchemaVersionMismatch errors. Ideally, merely marking a column for deletion should not cause schema version mismatch. If the operation is touching a column, the tablet server will drop the operation anyway. Clients on schema n-1 need not fail with SchemaVersionMismatch if they are not accessing this dropped column. This is out of scope for the current diff, which is intended mainly to solve the possibility of data loss when dropping a column. This improvement will be implemented as part of a separate diff. 2. This diff will not catch the case where a query accessing the dropped column falls on an index. This will be covered as part of 2 upcoming changes in DDL atomicity: a) Enabling DDL rollback for indexes b) Implement 2-phase rollback for indexes Jira: DB-2996 Test Plan: ybd --cxx-test pg_drop_column_test ybd --cxx-test pg_ddl_atomicity Manually run TestAlterTableWithConcurrentTxn.java with ysql_ddl_rollback enabled. Reviewers: hsunder, myang Reviewed By: hsunder, myang Subscribers: pjain, mihnea, tverona, amitanand, ena, jason, ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D19373
deeps1991
added a commit
that referenced
this issue
Aug 17, 2023
…ocatedTables/Tablegroups Summary: Commit 6e604ba introduced the YSQL DDL Atomicity framework. However the framework explicitly handled only regular tables. This diff introduces DDL Atomicity support for Indexes, Matviews, Colocated Tables and Tablegroups and contains the following changes: (i) Remove all the checks disabling DDL Atomicity support for entities other than tables. (ii) If DDL Atomicity is enabled, disable the old method of handling DROP INDEX, DROP TABLEGROUP (The old way involved sending the DocDB request to Drop the entity only after transaction commit) (iii) The framework in ysql_transaction_ddl must know whether to check pg_class or pg_tablegroup based on the entity for which we are performing DDL verification (iv) YB-Master hits a deadlock when an index and the indexed table are dropped at the same time. This is because dropping an index acquires a lock on the index and then on the indexed table. Dropping the indexed table acquires a lock on the indexed table and then the index. This case is extremely difficult to hit through YSQL, however if CREATE TABLE with UNIQUE CONSTRAINT runs, we will create a table and index in the same transaction. If this transaction fails, the atomicity framework left as is will issue a DROP for the table and index separately at the same time. This will run into deadlock due to the inverted lock acquisition. To prevent this, introduce separate handling for create-table-with-index/drop-table-with-index - enqueue a transaction verification task only for the indexed table if a CREATE/DROP transaction operates on both the table and the index. Additionally this diff also fixes the behavior of the ysql_ddl_rollback_enabled flag. Earlier this flag could be modified at either the TServer/YB-Master. Now the flag can only be set at the TServer, and will be propagated to YB-Master as part of the RequestPB proto for the affected DDLs. Thus the value of this flag will be checked only once at the beginning of a DDL transaction. Jira: DB-2996 Test Plan: ybd --cxx-test pg_ddl_atomicity ybd --cxx-test pg_drop_column_test Reviewers: hsunder, myang Reviewed By: hsunder, myang Subscribers: ybase, jhe, ena, yql, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D22148
deeps1991
added a commit
that referenced
this issue
Sep 25, 2023
Summary: **Problem** This patch handles the case where YB-Backup is performed when Ddl verification is still underway. Currently there is a check in the yb_backup.py script that loops on the catalog version to verify that no Ddls have happened during the backupprocess. However, if it is only the DDL verification that happened after a successful/failed DDL then this will not be picked up the catalog version check. Thus, it is possible for backup to have run when the PG catalog and DocDB schema were not in sync with each other. This could lead to the snapshot containing a table with different metadata compared to the table created on the target side based on the PG metadata (as it uses ysql_dump). Currently we try to match the source table and the target table by comparing their schemas. Since they may mismatch, we can no longer match the tables. Technically the source side does contain the YsqlDdlTxnVerifierState which means the snapshot contains both the current and previous schemas. However it is possible that we find a match for both the current and previous schemas. Example: Say rename failed for a table because the name was already in use. It could be possible for the rename to have succeeded on the DocDB side, leading to 2 tables with the same name on YB-Master. If backup starts at this point, the snapshot would contain two tables with the same name. When we iterate through the tables in the snapshot, we would see this table with the transaction verification state with 2 schemas. However we would find a table that matches the current schema (the other table with the same name) and a table that matches the previous schema (the old table before rename). We wouldn't have a simple way of finding out which table is the matching table. **Fix** Fail backup if we find tables in the snapshot with YsqlDdlTxnVerifierState on them. **Other minor fixes** The new tests caught the following problems, so their one-line fixes have been sneaked into this diff 1.While matching schemas DDL Atomicity should handle the presence of column **ybidxbasectid** which is a DocDB only column used in UNIQUE indexes 2.Missed a continue statement in YsqlTransactionDdl::MatchPgDocDBSchemaColumns which was leading to issues in the happy path when ALTER TABLE DROP COLUMN was successful. It was not caught earlier as existing tests only dropped the last column in the schema. **API/Gflag changes** 1.GetTableSchema RPC now additionally returns any YsqlDdlTxnVerifierState for this table. Currently this is only used by tests to wait for DDL verification to end. 2.**FLAGS_retry_if_ddl_txn_verification_pending** is used to indicate whether CatalogManager::AlterTable and CatalogManager::DeleteTable should best-effort wait for any existing YsqlDdlTxnVerifierState on the table to be resolved before throwing an error (before this patch it threw error immediately). This flag helps the case when sql scripts (such as running output of ysql_dump/PG regress tests) run 2 DDLs on a table in quick succession. In this case, without this best-effort wait, we might fail the sequentially occurring second DDL simply because DDL verification has not ended. It is not easy for customers to introduce retries in such DDL scripts, so this best effort check should be useful. This fix is required in this diff as it is needed for backup tests to finish successfully (as they run ysql_dump) Jira: DB-2996 Test Plan: ./yb_build.sh debug --cxx-test yb-backup-cross-feature-test --gtest_filter "*YBDdlAtomicityBackupTest*" ./yb_build.sh debug --cxx-test pg_ddl_atomicity-test I made some proto changes to YsqlDdlTxnVerifierState structure. It should theoretically not affect upgrade as it does not change the actual number/type of fields in the proto but I ran an upgrade test anyway. 1)Revert this diff. Enable DDL rollback but disable DDL verification. Run some DDLs. This will lead to YB-Master writing SysTableEntryProto with the old format into disk 2)Stop the cluster. Un-revert the diff and start the cluster with new changes 3)Verify that DDLs were verified and all future DML and DDL succeed with the new proto changes Reviewers: jhe, myang, loginov Reviewed By: jhe, loginov Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D27966
deeps1991
pushed a commit
that referenced
this issue
Oct 20, 2023
…pport Summary: Original commit: 6ff2aea / D27966 **Problem** This patch handles the case where YB-Backup is performed when Ddl verification is still underway. Currently there is a check in the yb_backup.py script that loops on the catalog version to verify that no Ddls have happened during the backupprocess. However, if it is only the DDL verification that happened after a successful/failed DDL then this will not be picked up the catalog version check. Thus, it is possible for backup to have run when the PG catalog and DocDB schema were not in sync with each other. This could lead to the snapshot containing a table with different metadata compared to the table created on the target side based on the PG metadata (as it uses ysql_dump). Currently we try to match the source table and the target table by comparing their schemas. Since they may mismatch, we can no longer match the tables. Technically the source side does contain the YsqlDdlTxnVerifierState which means the snapshot contains both the current and previous schemas. However it is possible that we find a match for both the current and previous schemas. Example: Say rename failed for a table because the name was already in use. It could be possible for the rename to have succeeded on the DocDB side, leading to 2 tables with the same name on YB-Master. If backup starts at this point, the snapshot would contain two tables with the same name. When we iterate through the tables in the snapshot, we would see this table with the transaction verification state with 2 schemas. However we would find a table that matches the current schema (the other table with the same name) and a table that matches the previous schema (the old table before rename). We wouldn't have a simple way of finding out which table is the matching table. **Fix** Fail backup if we find tables in the snapshot with YsqlDdlTxnVerifierState on them. **Other minor fixes** The new tests caught the following problems, so their one-line fixes have been sneaked into this diff 1.While matching schemas DDL Atomicity should handle the presence of column **ybidxbasectid** which is a DocDB only column used in UNIQUE indexes 2.Missed a continue statement in YsqlTransactionDdl::MatchPgDocDBSchemaColumns which was leading to issues in the happy path when ALTER TABLE DROP COLUMN was successful. It was not caught earlier as existing tests only dropped the last column in the schema. **API/Gflag changes** 1.GetTableSchema RPC now additionally returns any YsqlDdlTxnVerifierState for this table. Currently this is only used by tests to wait for DDL verification to end. 2.**FLAGS_retry_if_ddl_txn_verification_pending** is used to indicate whether CatalogManager::AlterTable and CatalogManager::DeleteTable should best-effort wait for any existing YsqlDdlTxnVerifierState on the table to be resolved before throwing an error (before this patch it threw error immediately). This flag helps the case when sql scripts (such as running output of ysql_dump/PG regress tests) run 2 DDLs on a table in quick succession. In this case, without this best-effort wait, we might fail the sequentially occurring second DDL simply because DDL verification has not ended. It is not easy for customers to introduce retries in such DDL scripts, so this best effort check should be useful. This fix is required in this diff as it is needed for backup tests to finish successfully (as they run ysql_dump) Jira: DB-2996 Test Plan: Jenkins: urgent ./yb_build.sh debug --cxx-test yb-backup-cross-feature-test --gtest_filter "*YBDdlAtomicityBackupTest*" ./yb_build.sh debug --cxx-test pg_ddl_atomicity-test I made some proto changes to YsqlDdlTxnVerifierState structure. It should theoretically not affect upgrade as it does not change the actual number/type of fields in the proto but I ran an upgrade test anyway. 1)Revert this diff. Enable DDL rollback but disable DDL verification. Run some DDLs. This will lead to YB-Master writing SysTableEntryProto with the old format into disk 2)Stop the cluster. Un-revert the diff and start the cluster with new changes 3)Verify that DDLs were verified and all future DML and DDL succeed with the new proto changes Reviewers: jhe, myang, loginov Reviewed By: jhe Subscribers: yql, ybase, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29521
myang2021
added a commit
that referenced
this issue
May 17, 2024
Summary: The way the test YBDdlAtomicityBackupTest::RunDdlAtomicityTest is written, all the DDLs will be inflight during the backup - so backup will fail. It's better to test the case where backup fails if any single DDL is inflight. This diff changes the test to randomly pick one DDL to be inflight and verify the backup fails. Jira: DB-2996 Test Plan: ./yb_build.sh release --cxx-test yb-backup-cross-feature-test --gtest_filter YBDdlAtomicityBackupTest.DdlRollbackAtomicityTest -n 20 Reviewers: fizaa Reviewed By: fizaa Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D35140
myang2021
added a commit
that referenced
this issue
May 17, 2024
…cityTest test change Summary: The way the test YBDdlAtomicityBackupTest::RunDdlAtomicityTest is written, all the DDLs will be inflight during the backup - so backup will fail. It's better to test the case where backup fails if any single DDL is inflight. This diff changes the test to randomly pick one DDL to be inflight and verify the backup fails. Jira: DB-2996 Original commit: 5ebfef8 / D35140 Test Plan: ./yb_build.sh release --cxx-test yb-backup-cross-feature-test --gtest_filter YBDdlAtomicityBackupTest.DdlRollbackAtomicityTest -n 20 Reviewers: fizaa Reviewed By: fizaa Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35150
jharveysmith
pushed a commit
that referenced
this issue
May 24, 2024
…pport Summary: Original commit: 20b330695d58fa4840ab558e1c1134ec74f97721 / D27966 **Problem** This patch handles the case where YB-Backup is performed when Ddl verification is still underway. Currently there is a check in the yb_backup.py script that loops on the catalog version to verify that no Ddls have happened during the backupprocess. However, if it is only the DDL verification that happened after a successful/failed DDL then this will not be picked up the catalog version check. Thus, it is possible for backup to have run when the PG catalog and DocDB schema were not in sync with each other. This could lead to the snapshot containing a table with different metadata compared to the table created on the target side based on the PG metadata (as it uses ysql_dump). Currently we try to match the source table and the target table by comparing their schemas. Since they may mismatch, we can no longer match the tables. Technically the source side does contain the YsqlDdlTxnVerifierState which means the snapshot contains both the current and previous schemas. However it is possible that we find a match for both the current and previous schemas. Example: Say rename failed for a table because the name was already in use. It could be possible for the rename to have succeeded on the DocDB side, leading to 2 tables with the same name on YB-Master. If backup starts at this point, the snapshot would contain two tables with the same name. When we iterate through the tables in the snapshot, we would see this table with the transaction verification state with 2 schemas. However we would find a table that matches the current schema (the other table with the same name) and a table that matches the previous schema (the old table before rename). We wouldn't have a simple way of finding out which table is the matching table. **Fix** Fail backup if we find tables in the snapshot with YsqlDdlTxnVerifierState on them. **Other minor fixes** The new tests caught the following problems, so their one-line fixes have been sneaked into this diff 1.While matching schemas DDL Atomicity should handle the presence of column **ybidxbasectid** which is a DocDB only column used in UNIQUE indexes 2.Missed a continue statement in YsqlTransactionDdl::MatchPgDocDBSchemaColumns which was leading to issues in the happy path when ALTER TABLE DROP COLUMN was successful. It was not caught earlier as existing tests only dropped the last column in the schema. **API/Gflag changes** 1.GetTableSchema RPC now additionally returns any YsqlDdlTxnVerifierState for this table. Currently this is only used by tests to wait for DDL verification to end. 2.**FLAGS_retry_if_ddl_txn_verification_pending** is used to indicate whether CatalogManager::AlterTable and CatalogManager::DeleteTable should best-effort wait for any existing YsqlDdlTxnVerifierState on the table to be resolved before throwing an error (before this patch it threw error immediately). This flag helps the case when sql scripts (such as running output of ysql_dump/PG regress tests) run 2 DDLs on a table in quick succession. In this case, without this best-effort wait, we might fail the sequentially occurring second DDL simply because DDL verification has not ended. It is not easy for customers to introduce retries in such DDL scripts, so this best effort check should be useful. This fix is required in this diff as it is needed for backup tests to finish successfully (as they run ysql_dump) Jira: DB-2996 Test Plan: Jenkins: urgent ./yb_build.sh debug --cxx-test yb-backup-cross-feature-test --gtest_filter "*YBDdlAtomicityBackupTest*" ./yb_build.sh debug --cxx-test pg_ddl_atomicity-test I made some proto changes to YsqlDdlTxnVerifierState structure. It should theoretically not affect upgrade as it does not change the actual number/type of fields in the proto but I ran an upgrade test anyway. 1)Revert this diff. Enable DDL rollback but disable DDL verification. Run some DDLs. This will lead to YB-Master writing SysTableEntryProto with the old format into disk 2)Stop the cluster. Un-revert the diff and start the cluster with new changes 3)Verify that DDLs were verified and all future DML and DDL succeed with the new proto changes Reviewers: jhe, myang, loginov Reviewed By: jhe Subscribers: yql, ybase, bogdan Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29521
svarnau
pushed a commit
that referenced
this issue
May 25, 2024
Summary: The way the test YBDdlAtomicityBackupTest::RunDdlAtomicityTest is written, all the DDLs will be inflight during the backup - so backup will fail. It's better to test the case where backup fails if any single DDL is inflight. This diff changes the test to randomly pick one DDL to be inflight and verify the backup fails. Jira: DB-2996 Test Plan: ./yb_build.sh release --cxx-test yb-backup-cross-feature-test --gtest_filter YBDdlAtomicityBackupTest.DdlRollbackAtomicityTest -n 20 Reviewers: fizaa Reviewed By: fizaa Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D35140
svarnau
pushed a commit
that referenced
this issue
May 29, 2024
…cityTest test change Summary: The way the test YBDdlAtomicityBackupTest::RunDdlAtomicityTest is written, all the DDLs will be inflight during the backup - so backup will fail. It's better to test the case where backup fails if any single DDL is inflight. This diff changes the test to randomly pick one DDL to be inflight and verify the backup fails. Jira: DB-2996 Original commit: 5ebfef8 / D35140 Test Plan: ./yb_build.sh release --cxx-test yb-backup-cross-feature-test --gtest_filter YBDdlAtomicityBackupTest.DdlRollbackAtomicityTest -n 20 Reviewers: fizaa Reviewed By: fizaa Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D35150
myang2021
added a commit
that referenced
this issue
Jul 24, 2024
Summary: This diff adds a new DDL atomicity stress test. The stress test setup a test table with some data, then spawns 4 concurrent test threads. The first 3 of these 4 threads continuously run a pair of complementing DDL statements for 10 iterations, the 4th thread runs a update DML statement for 10 iterations. Thread 1: alter table add and drop column Thread 2: add and drop columns with default values Thread 3: create/drop an index on this table Thread 4: thread continuously updates the rows on this table. Because there are many concurrency between DDLs and DMLs, there will be execution errors that are expected. The test lists all such expected errors. If an error is one of these expected errors, the test will retry the failed statement until eventually succeed. The test fails if an unexpected error is encountered. When all the 4 threads have completed, the test does a number of verifications: 1. the table does not contain any additional columns 2. no indexes are present on this table 3. the rows on this table are updated correctly The test adds some error injections that are controlled by new test gflags. This diff also fixes a tsan data race that showed up in the newly added stress tests by re-acquiring CatalogManager's mutex_ which is already released earlier in order to fix a deadlock (commit ac184a8). Note that if we add more test scenarios with more concurrent DDL threads, we may see more expected errors that are not already listed and the test would fail. In that case we need to debug the failure. If it turns out that the error is expected, we need to add the new expected error to the list of expected errors. If the error indicates a bug, then we need to fix the bug to allow the test to pass. The stress test has two sets: non-colocated table and colocated table. A third set for partitioned table was taken out of this diff. Partitioned table stress test will be addressed by a separate bug fix diff. Jira: DB-2996 Test Plan: ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.BasicTest -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestTxnVerificationFailure -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailCatalogWrites -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailDdlRollback -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailDdlVerification -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.BasicTest -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestTxnVerificationFailure -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailCatalogWrites -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailDdlRollback -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailDdlVerification -n 20 --tp 1 Reviewers: hsunder, zdrudi, fizaa Reviewed By: zdrudi Subscribers: ybase, yql Differential Revision: https://phorge.dev.yugabyte.com/D32682
jasonyb
pushed a commit
that referenced
this issue
Jul 24, 2024
Summary: Excluded: 7414711 [#20972] YSQL: Fix new local variable names in PG code 3484013 [#20766] YSQL: Remove time stuff from the GetTableKeyRanges functions. 8b23629 [#18360] YSQL: fix yb_hash_code compared to an out of range constant 4ee3511 [PLAT-13438]: Add support for preview gflags fe5e675 [#23218] YSQL: Closing the backend connection on receiving the error "Database might have been dropped by another user" 0901f6c [doc] Default databases (#23261) afaf353 [PLAT-14722]Upgrade YBC client and server versio to 2.2.0.0-b3 fdffc33 [#13358] YSQL: DDL Atomicity Stress Test f0b3c46 [ #21499] YSQL: Skipping the test with connection manager as they are unsuitable 5ed864d [#23260] CDCSDK: Propagate stream expiry & GC error to walsender e37d791 [PLAT-14627][PLAT-14729][PLAT-14726]Process group-based specific gflags comprehensively across multiple callsites in YBA 0e17c97 [PLAT-14737]Missing/incorrect universe status loader positions on dashboard page 6477bad [PLAT-12434] YNP fix XDG_RUNTIME_DIR to be set in .bashrc for non login shells f3ffb18 [docs] Tablet metadata and ASH updates (#23221) Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36814
myang2021
added a commit
that referenced
this issue
Jul 30, 2024
Summary: This diff adds a new type of DDL atomicity stress test for partitioned table into the existing DDL atomicity stress test suite. These new tests exposed a bug that caused them to be flaky (manifested as test timeout after 10 minutes). The bug happens when the DDL transaction is rolled back. In the DDL transaction verifier state, we have `verifier_state->tables` to represent all the tables involved in this transaction. Once we know that the transaction is aborted, we call `YsqlDdlTxnCompleteCallbackInternal` to process all the tables in verifier_state->tables. Then we set the verifier state to YsqlDdlVerificationState::kDdlPostProcessing to indicate that the transaction is finished. But it is possible that verifier_state->tables does not already have all the tables in the txn. For example, a txn involves three tables t1, t2, t3. After t1 and t2 are added to txn, the txn is aborted due to a conflict. In this case `YsqlDdlTxnCompleteCallbackInternal` is only called on t1 and t2 and the state is set to kDdlPostProcessing before t3 gets added. Later when t3 gets added. We currently returns Status::OK() if the state == kDdlPostProcessing, and t3 will never be processed, leading to test failure due to timeout. The fix is always process all the tables in verifier_state->tables. In the above example we re-process t1 and t2 (they will turn into no-op), and process t3. Jira: DB-2996 Test Plan: ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityPartitionedTablesStressTest.BasicTest -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityPartitionedTablesStressTest.TestTxnVerificationFailure -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityPartitionedTablesStressTest.TestFailCatalogWrites -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityPartitionedTablesStressTest.TestFailDdlRollback -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityPartitionedTablesStressTest.TestFailDdlVerification -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.BasicTest -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestTxnVerificationFailure -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailCatalogWrites -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailDdlRollback -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailDdlVerification -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.BasicTest -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestTxnVerificationFailure -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailCatalogWrites -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailDdlRollback -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailDdlVerification -n 20 --tp 1 Reviewers: hsunder, fizaa Reviewed By: fizaa Subscribers: yql, ybase Differential Revision: https://phorge.dev.yugabyte.com/D36862
myang2021
added a commit
that referenced
this issue
Jul 30, 2024
Summary: This diff adds a new DDL atomicity stress test. The stress test setup a test table with some data, then spawns 4 concurrent test threads. The first 3 of these 4 threads continuously run a pair of complementing DDL statements for 10 iterations, the 4th thread runs a update DML statement for 10 iterations. Thread 1: alter table add and drop column Thread 2: add and drop columns with default values Thread 3: create/drop an index on this table Thread 4: thread continuously updates the rows on this table. Because there are many concurrency between DDLs and DMLs, there will be execution errors that are expected. The test lists all such expected errors. If an error is one of these expected errors, the test will retry the failed statement until eventually succeed. The test fails if an unexpected error is encountered. When all the 4 threads have completed, the test does a number of verifications: 1. the table does not contain any additional columns 2. no indexes are present on this table 3. the rows on this table are updated correctly The test adds some error injections that are controlled by new test gflags. This diff also fixes a tsan data race that showed up in the newly added stress tests by re-acquiring CatalogManager's mutex_ which is already released earlier in order to fix a deadlock (commit ac184a8). Note that if we add more test scenarios with more concurrent DDL threads, we may see more expected errors that are not already listed and the test would fail. In that case we need to debug the failure. If it turns out that the error is expected, we need to add the new expected error to the list of expected errors. If the error indicates a bug, then we need to fix the bug to allow the test to pass. The stress test has two sets: non-colocated table and colocated table. A third set for partitioned table was taken out of this diff. Partitioned table stress test will be addressed by a separate bug fix diff. Jira: DB-2996 Original commit: fdffc33 / D32682 Test Plan: ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.BasicTest -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestTxnVerificationFailure -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailCatalogWrites -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailDdlRollback -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailDdlVerification -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.BasicTest -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestTxnVerificationFailure -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailCatalogWrites -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailDdlRollback -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailDdlVerification -n 20 --tp 1 Reviewers: hsunder, zdrudi, fizaa Reviewed By: zdrudi Subscribers: yql, ybase Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36927
jasonyb
pushed a commit
that referenced
this issue
Jul 31, 2024
Summary: 80c02fc [#23296] xCluster: Skip BackfillMetadataForXRepl for SequencesSystemTable d4eeace [#23295] YSQL: Fix test failure when the whole of TestPgAuthorization is run. baf343f [#13358] YSQL: Fix bug in partitioned table DDL atomicity stress test 7bf0eed [#23294] YSQL, ASH: Fix flaky test 'TestYbAsh#testEmptyCircularBuffer' 17828e2 [PLAT-13286] Make ReprovisionNode task retryable 34e6e7a [#23250] YSQL: UT org.yb.pgsql.TestPgMemoryGC#testPgMemoryGcOrderBy is failing with connection manager 3104eb7 [PLAT-14204] Azure Provider - Allow the Resource Group for the Network to be overridden per Region. e27c3b9 [#18360] YSQL: Rework yb_hash_code comparision with an out of range constants 2463ee0 [PLAT-14709] Fix DevSpace command for shared Kubernetes clusters PgDdlAtomicityPartitionedTablesStressTest.* flaky fails with ``` [ts-1] 2024-07-30 15:46:42.577 UTC [16209] ERROR: Restart read required at: { read: { days: 19934 time: 15:46:42.510692 } local_limit: { days: 19934 time: 15:46:42.510692 } global_limit: <min> in_txn_limit: <max> serial_no: 0 } [ts-1] 2024-07-30 15:46:42.577 UTC [16209] CONTEXT: Catalog Version Mismatch: A DDL occurred while processing this query. Try again. [ts-1] 2024-07-30 15:46:42.577 UTC [16209] STATEMENT: CREATE INDEX idx_0 ON test_table(key) [ts-1] TRAP: FailedAssertion("entry->refcount == 1", File: "../../../../../../../src/postgres/src/backend/utils/cache/yb_inheritscache.c", Line: 337, PID: 16209) [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX(ExceptionalCondition+0x96)[0xb1f06d] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX(YbPgInheritsCacheDelete+0x125)[0xaf8e4c] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX(YbPgInheritsCacheInvalidate+0x88)[0xaf8f5f] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX[0xaf90f7] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX(InvalidateSystemCachesExtended+0x63)[0xafdd3c] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX(CallSystemCacheCallbacks+0x13)[0xafddad] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX[0x9a13fb] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX(PostgresMain+0x66c)[0x9a8642] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX[0x8e964d] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX(PostmasterMain+0x1d74)[0x8ebca2] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX(main+0x0)[0x81ed89] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX(main+0x20)[0x81eda9] [ts-1] /lib64/libc.so.6(__libc_start_main+0xe5)[0x7f1a00fcd7e5] [ts-1] postgres: postgres yugabyte 127.0.0.1(43848) CREATE INDEX(_start+0x2e)[0x4bbdfe] ``` Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36922
myang2021
added a commit
that referenced
this issue
Jul 31, 2024
…icity stress test Summary: This diff adds a new type of DDL atomicity stress test for partitioned table into the existing DDL atomicity stress test suite. These new tests exposed a bug that caused them to be flaky (manifested as test timeout after 10 minutes). The bug happens when the DDL transaction is rolled back. In the DDL transaction verifier state, we have `verifier_state->tables` to represent all the tables involved in this transaction. Once we know that the transaction is aborted, we call `YsqlDdlTxnCompleteCallbackInternal` to process all the tables in verifier_state->tables. Then we set the verifier state to YsqlDdlVerificationState::kDdlPostProcessing to indicate that the transaction is finished. But it is possible that verifier_state->tables does not already have all the tables in the txn. For example, a txn involves three tables t1, t2, t3. After t1 and t2 are added to txn, the txn is aborted due to a conflict. In this case `YsqlDdlTxnCompleteCallbackInternal` is only called on t1 and t2 and the state is set to kDdlPostProcessing before t3 gets added. Later when t3 gets added. We currently returns Status::OK() if the state == kDdlPostProcessing, and t3 will never be processed, leading to test failure due to timeout. The fix is always process all the tables in verifier_state->tables. In the above example we re-process t1 and t2 (they will turn into no-op), and process t3. Jira: DB-2996 Original commit: baf343f / D36862 Test Plan: ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityPartitionedTablesStressTest.BasicTest -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityPartitionedTablesStressTest.TestTxnVerificationFailure -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityPartitionedTablesStressTest.TestFailCatalogWrites -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityPartitionedTablesStressTest.TestFailDdlRollback -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityPartitionedTablesStressTest.TestFailDdlVerification -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.BasicTest -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestTxnVerificationFailure -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailCatalogWrites -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailDdlRollback -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest.TestFailDdlVerification -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.BasicTest -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestTxnVerificationFailure -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailCatalogWrites -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailDdlRollback -n 20 --tp 1 ./yb_build.sh debug --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityColocatedStressTest.TestFailDdlVerification -n 20 --tp 1 Reviewers: hsunder, fizaa Reviewed By: fizaa Subscribers: ybase, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D36933
myang2021
added a commit
that referenced
this issue
Aug 5, 2024
Summary: This diff addresses two previous review comments regarding the DDL atomicity stress test: (1) Use parameterized test to repeat the same test logic across different environments / fixtures instead of virtual methods on test classes. (2) Make a connection for each test thread instead of making a connection per request, this increases the concurrency of the test. As a result of (1), I was able to add one more test scenario naturally/easily: colocated + partitioned tables. I also introduced a global_conn that is used for test table/data setup and test result verification. Added one more allowed error message "current transaction is expired or aborted" in addition to the existing "expired or aborted by a conflict". Jira: DB-2996 Test Plan: ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 -n 20 --tp 1 Reviewers: zdrudi Reviewed By: zdrudi Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D36996
myang2021
added a commit
that referenced
this issue
Aug 6, 2024
Summary: This diff addresses two previous review comments regarding the DDL atomicity stress test: (1) Use parameterized test to repeat the same test logic across different environments / fixtures instead of virtual methods on test classes. (2) Make a connection for each test thread instead of making a connection per request, this increases the concurrency of the test. As a result of (1), I was able to add one more test scenario naturally/easily: colocated + partitioned tables. I also introduced a global_conn that is used for test table/data setup and test result verification. Added one more allowed error message "current transaction is expired or aborted" in addition to the existing "expired or aborted by a conflict". Jira: DB-2996 Original commit: ce80f7a / D36996 Test Plan: ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 -n 20 --tp 1 ./yb_build.sh fastdebug --gcc11 --sj --cxx-test pg_ddl_atomicity_stress-test --gtest_filter=PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 -n 20 --tp 1 Reviewers: zdrudi Reviewed By: zdrudi Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37068
jasonyb
pushed a commit
that referenced
this issue
Aug 7, 2024
Summary: 50931bf [#23273] yugabyted: Fix `yugabyted configure_read_replica` commands. 64e1bf8 [#23278] CDCSDK: Handle non-eligible tables cleanup with drop table while loading CDC stream ce80f7a [#13358] YSQL: Refactor DDL Atomicity Stress Test Excluded: 6d40d27 [#23407] YSQL: clean up compound BNL logic 5cb74a7 [PLAT-14164] New Alert for clock drift f39c76c [PLAT-14800] Fix yb.allow_db_version_more_than_yba_version being insufficient for YBA/DB version checks a42549e [#23377] DocDB: Implement the way to apply vector index updates to DocDB 3923ec5 [PLAT-14749][Platform]Add a warning message to image upgrade dialog 709cd92 [PLAT-14848] postgres.service file did not have RestartSec filled out da10672 [#23069] docdb: implemented per-iterator readahead for sequential reads f439c8a [PLAT-14852]: Do not raise error when JWT_JWKS_URL has valid value and JWT has empty keyset Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37095
myang2021
added a commit
that referenced
this issue
Aug 8, 2024
Summary: The DDL atomicity stress tests failed more on pg15 branch with an error like: ``` WARNING: ThreadSanitizer: data race (pid=180911) Write of size 8 at 0x7b2c000257b8 by thread T17 (mutexes: write M0): #0 profile_open_file prof_file.c (libkrb5.so.3+0xf45b3) #1 profile_init_flags <null> (libkrb5.so.3+0xfb056) #2 k5_os_init_context <null> (libkrb5.so.3+0xe5546) #3 krb5_init_context_profile <null> (libkrb5.so.3+0xabc90) #4 krb5_init_context <null> (libkrb5.so.3+0xabbd5) #5 krb5_gss_init_context init_sec_context.c (libgssapi_krb5.so.2+0x448da) #6 acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39159) #7 krb5_gss_acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39072) #8 gss_add_cred_from <null> (libgssapi_krb5.so.2+0x1fcd3) #9 gss_acquire_cred_from <null> (libgssapi_krb5.so.2+0x1f69d) #10 gss_acquire_cred <null> (libgssapi_krb5.so.2+0x1f431) #11 pg_GSS_have_cred_cache ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-gssapi-common.c:68:10 (libpq.so.5+0x543fe) #12 PQconnectPoll ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2909:22 (libpq.so.5+0x359ca) #13 connectDBComplete ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2241:10 (libpq.so.5+0x30807) #14 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:719:10 (libpq.so.5+0x30af1) #15 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:348:24 (libpq_utils.so+0x13c5b) #16 yb::pgwrapper::PGConn::Connect(string const&, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.h:254:12 (libpq_utils.so+0x1a77e) #17 yb::pgwrapper::PGConnBuilder::Connect(bool) const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:743:10 (libpq_utils.so+0x1a77e) #18 yb::pgwrapper::LibPqTestBase::ConnectToDBAsUser(string const&, string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:54:6 (libpg_wrapper_test_base.so+0x26f34) #19 yb::pgwrapper::LibPqTestBase::ConnectToDB(string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:44:10 (libpg_wrapper_test_base.so+0x26b1e) #20 yb::pgwrapper::LibPqTestBase::Connect(bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:40:10 (libpg_wrapper_test_base.so+0x26b1e) #21 yb::pgwrapper::PgDdlAtomicityStressTest::Connect() ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:147:25 (pg_ddl_atomicity_stress-test+0x136d6c) #22 yb::pgwrapper::PgDdlAtomicityStressTest::TestDdl(std::vector<string, std::allocator<string>> const&, int) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:165:15 (pg_ddl_atomicity_stress-test+0x136df5) #23 yb::pgwrapper::PgDdlAtomicityStressTest_StressTest_Test::TestBody()::$_2::operator()() const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:316:5 (pg_ddl_atomicity_stress-test+0x13d2eb) ``` It appears that the function `yb::pgwrapper::LibPqTestBase::Connect` isn't thread safe. I restructured the code to make the connections in a single thread and then pass them to various concurrent threads for testing. Jira: DB-2996 Test Plan: ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 --clang17 Verified that no more tsan errors. Reviewers: fizaa Reviewed By: fizaa Subscribers: yql Differential Revision: https://phorge.dev.yugabyte.com/D37111
myang2021
added a commit
that referenced
this issue
Aug 8, 2024
… in tsan build Summary: The DDL atomicity stress tests failed more on pg15 branch with an error like: ``` WARNING: ThreadSanitizer: data race (pid=180911) Write of size 8 at 0x7b2c000257b8 by thread T17 (mutexes: write M0): #0 profile_open_file prof_file.c (libkrb5.so.3+0xf45b3) #1 profile_init_flags <null> (libkrb5.so.3+0xfb056) #2 k5_os_init_context <null> (libkrb5.so.3+0xe5546) #3 krb5_init_context_profile <null> (libkrb5.so.3+0xabc90) #4 krb5_init_context <null> (libkrb5.so.3+0xabbd5) #5 krb5_gss_init_context init_sec_context.c (libgssapi_krb5.so.2+0x448da) #6 acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39159) #7 krb5_gss_acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39072) #8 gss_add_cred_from <null> (libgssapi_krb5.so.2+0x1fcd3) #9 gss_acquire_cred_from <null> (libgssapi_krb5.so.2+0x1f69d) #10 gss_acquire_cred <null> (libgssapi_krb5.so.2+0x1f431) #11 pg_GSS_have_cred_cache ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-gssapi-common.c:68:10 (libpq.so.5+0x543fe) #12 PQconnectPoll ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2909:22 (libpq.so.5+0x359ca) #13 connectDBComplete ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2241:10 (libpq.so.5+0x30807) #14 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:719:10 (libpq.so.5+0x30af1) #15 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:348:24 (libpq_utils.so+0x13c5b) #16 yb::pgwrapper::PGConn::Connect(string const&, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.h:254:12 (libpq_utils.so+0x1a77e) #17 yb::pgwrapper::PGConnBuilder::Connect(bool) const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:743:10 (libpq_utils.so+0x1a77e) #18 yb::pgwrapper::LibPqTestBase::ConnectToDBAsUser(string const&, string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:54:6 (libpg_wrapper_test_base.so+0x26f34) #19 yb::pgwrapper::LibPqTestBase::ConnectToDB(string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:44:10 (libpg_wrapper_test_base.so+0x26b1e) #20 yb::pgwrapper::LibPqTestBase::Connect(bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:40:10 (libpg_wrapper_test_base.so+0x26b1e) #21 yb::pgwrapper::PgDdlAtomicityStressTest::Connect() ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:147:25 (pg_ddl_atomicity_stress-test+0x136d6c) #22 yb::pgwrapper::PgDdlAtomicityStressTest::TestDdl(std::vector<string, std::allocator<string>> const&, int) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:165:15 (pg_ddl_atomicity_stress-test+0x136df5) #23 yb::pgwrapper::PgDdlAtomicityStressTest_StressTest_Test::TestBody()::$_2::operator()() const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:316:5 (pg_ddl_atomicity_stress-test+0x13d2eb) ``` It appears that the function `yb::pgwrapper::LibPqTestBase::Connect` isn't thread safe. I restructured the code to make the connections in a single thread and then pass them to various concurrent threads for testing. Jira: DB-2996 Original commit: bd4874b / D37111 Test Plan: ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 --clang17 ./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 --clang17 Verified that no more tsan errors. Reviewers: fizaa Reviewed By: fizaa Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37167
jasonyb
pushed a commit
that referenced
this issue
Aug 9, 2024
Summary: 43d4b49 [PLAT-14771]: Restrict user actions on releases page based on appropriate RBAC permissions 1473a23 [PLAT-14792]: Master and TServer nodes displays empty tool tip if nodes are unreachable Excluded: 78a2bc7 [#23114] YSQL: Refactoring the yb_pg_pgaudit.sql/.out test. da9b281 [doc][ybm] Prometheus integration (#23292) bd4874b [#13358] YSQL: Fix DDL atomicity stress test failure in tsan build edd8e3f [PLAT-14524] Undo all @JsonProperty annotations added earlier to fix APIs Excluded: 4a2657e [PLAT-14787] Implement a master tablet lb api for YW b9d2e9d [#23445]yugabyted: Node not starting with DNS name and `--secure` option f171e13 [#23447]yugabyted: Node doesn't restart with `--secure` enabled d4f036d [PLAT-14790] Region metadata is not populated when provisioned nodes are added via Node Agent when using assisted Manual Provisioning (provision_instance.py script) 71ab66f [doc][ybm] Backup and restore clarifications (#23400) 4768023 [doc] ysql_yb_bnl_batch_size flag (#23397) 3d4bc2a [#23117] YSQL: Enable ALTER VIEW in parser Excluded: 03bbbed Bumping version to 2.23.1.0 on branch master 622046d [#23335] DocDB: Set field close timestamp for the log segment being copied Excluded: f69b08f [#1999] YSQL: fix temp table INSERT ON CONFLICT Excluded: efd4cb7 [#23429] YSQL: fix INSERT ON CONFLICT TupleDesc ref leak Excluded: 9e7181f [PLAT-14785] Add REST APIs for job scheduler (auto-master failover) d56903c [PLAT-14850]API Token authentication loops through the users and checks token against each of these. Test Plan: Jenkins: rebase: pg15-cherrypicks Reviewers: jason, tfoucher Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D37201
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/ysql
Yugabyte SQL (YSQL)
kind/enhancement
This is an enhancement of an existing feature
priority/medium
Medium priority issue
Jira Link: DB-2996
Description
Currently YSQL table metadata is stored in both the PG catalog and DocDB sys-catalog. This ticket tracks building infrastructure to roll-back any DocDB side schema changes if the DDL transaction is aborted.
The basic idea is to update the DocDB schema and PG catalog in a 2 phase manner. We bind the DocDB schema change to its corresponding YSQL transaction, maintain both the previous and the in-flux changes in the schema and poll the status of the YSQL transaction. Just two versions of schema is sufficient, because we do not support Concurrent DDL on a table. Once the transaction completes, based on the transaction status, roll-back or roll-forward the changes in the DocDB schema such that they are both consistent.
Preview/Early Adoption
GA
Longer Term/Post GA
The text was updated successfully, but these errors were encountered: