-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] YB does not release LWLocks on AbortTransaction #19509
Labels
Comments
timothy-e
added
area/ysql
Yugabyte SQL (YSQL)
status/awaiting-triage
Issue awaiting triage
labels
Oct 12, 2023
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
and removed
status/awaiting-triage
Issue awaiting triage
labels
Oct 12, 2023
There are a seperate type of locks, identified by LOCKTAG that we don't use in YB, so function calls have been disabled. Unfortunately, the LWLockReleaseAll function was accidentally disabled as well. Fix: diff --git a/src/postgres/src/backend/access/transam/xact.c b/src/postgres/src/backend/access/transam/xact.c
index 138dc4002f..3fe664da7d 100644
--- a/src/postgres/src/backend/access/transam/xact.c
+++ b/src/postgres/src/backend/access/transam/xact.c
@@ -2691,15 +2691,13 @@ AbortTransaction(void)
AtAbort_Memory();
AtAbort_ResourceOwner();
- if (YBIsPgLockingEnabled()) {
- /*
- * Release any LW locks we might be holding as quickly as possible.
- * (Regular locks, however, must be held till we finish aborting.)
- * Releasing LW locks is critical since we might try to grab them again
- * while cleaning up!
- */
- LWLockReleaseAll();
- }
+ /*
+ * Release any LW locks we might be holding as quickly as possible.
+ * (Regular locks, however, must be held till we finish aborting.)
+ * Releasing LW locks is critical since we might try to grab them again
+ * while cleaning up!
+ */
+ LWLockReleaseAll();
/* Clear wait information and command progress indicator */
pgstat_report_wait_end(); |
dr0pdb
added a commit
that referenced
this issue
Oct 29, 2023
… of replication slots Summary: Introduce support for creating, viewing, and dropping replication slots in YSQL. This is part of the project to support Publication/Replication slot API in YSQL (#18724). There are two interfaces for the support for create and drop: 1. Functions: - `pg_create_logical_replication_slot` - `pg_drop_replication_slot` 2. Walsender commands: - CREATE_REPLICATION_SLOT - DROP_REPLICATION_SLOT Both create and drop statements go to yb-master directly. Most of the PG code isn't applicable to YSQL yet and hence it is skipped. For viewing replication slots, we have a view `pg_replication_slots` which is backed by the function `pg_get_replication_slots`. The schema of the view has been modified by adding an extra yb-specific column `yb_stream_id` which is a text. Limitations: 1. Only `yboutput` plugin is supported. It'll only be relevant once we add support for consuming replication slots but we are enforcing it from this diff onwards Apart from the above, this diff fixes two issues: 1. #19509 - Cleanup of held locks in case of an `ereport(elevel >= ERROR)`. This diff fixes that by making sure that we call `LWLockReleaseAll` in `src/postgres/src/backend/access/transam/xact.c` in case of an error. Thanks to Timothy Elgersma. 2. Skipping cache refresh in case of an error in the execution of a replication command. `src/postgres/src/backend/tcop/postgres.c`. This is ok because we only cache DMLs and none of the replication commands are DMLs. We need to do that because the check `yb_is_dml_command` tries to parse the query to check whether it is a DML or not but it doesn't support replication commands. So any `ereport(elevel >= ERROR)` in the execution of a replication command would lead to a syntax error. TODOs for future: 1. This diff creates a CDC stream with CDCRecordType as `CHANGE`. We need to extend the `pg_create_logical_replication_slot` and `CREATE_REPLICATION_SLOT` syntax to take the CDCRecordType. It'll be done in a future diff 2. DROP_REPLICATION_SLOT commands allows waiting for a slot to become inactive before dropping it. It is unsupported currently and will be done in a future diff 3. Temporary replication slots are unsupported. Will be added in future once we also support consumption via Walsender Upgrade\Rollback safety: These changes rely on sys-catalog changes done in yb-master. As a result, all the commands are disabled during upgrade using an autoflag yb_enable_replication_commands (LocalPersisted) and will only be enabled once the user has committed to the new version. The autoflag was introduced during the Publication syntax support and is being reused here since these are both part of the same project: https://phorge.dev.yugabyte.com/D28721 Jira: DB-8008, DB-8009, DB-8305 Test Plan: New unit test ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgReplicationSlot' ``` New Regress test ``` ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressReplicationSlot' ``` I've also updated most of the CDCSDK tests to now use the ReplicationSlot commands to create a CDCSDK stream instead of an RPC. Remaining tests will be updated in future diffs Reviewers: dsrinivasan, skumar, asrinivasan, aagrawal Reviewed By: dsrinivasan Subscribers: ycdcxcluster, bogdan, ybase, yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29194
Closed by 7e22b28 |
Reopening to address on 2.14, 2.16, 2.18, 2.20 |
timothy-e
changed the title
[YSQL] Deadlock on duplicate replication slot names
[YSQL] YB does not release LWLocks on AbortTransaction
Nov 2, 2023
timothy-e
pushed a commit
that referenced
this issue
Nov 7, 2023
…tTransaction Summary: Original commit: 7e22b28 / D29194 This revision backports the fix for releasing LWLocks on AbortTransaction. It is imported from the replication slot support revision: https://phorge.dev.yugabyte.com/D29194. The issue was introduced in 3f536a8 / D5518 Jira: DB-8305 Test Plan: ```./yb_build.sh --cxx-test pgwrapper_pg_libpq-test --gtest_filter PgLibPqTest.TestLWPgBackendErrorAfterLWLockAcquire``` Reviewers: telgersma Reviewed By: telgersma Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30022
dr0pdb
added a commit
that referenced
this issue
Nov 8, 2023
…tTransaction Summary: Original commit: 7e22b28 / D29194 This revision backports the fix for releasing LWLocks on AbortTransaction. It is imported from the replication slot support revision: https://phorge.dev.yugabyte.com/D29194. The issue was introduced in 3f536a8 / D5518 Jira: DB-8305 Test Plan: ``` ./yb_build.sh --cxx-test pgwrapper_pg_libpq-test --gtest_filter PgLibPqTest.TestLWPgBackendErrorAfterLWLockAcquire ``` Reviewers: telgersma Reviewed By: telgersma Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29997
dr0pdb
added a commit
that referenced
this issue
Nov 10, 2023
…tTransaction Summary: Original commit: 7e22b28 / D29194 This revision backports the fix for releasing LWLocks on AbortTransaction. It is imported from the replication slot support revision: https://phorge.dev.yugabyte.com/D29194. The issue was introduced in 3f536a8 / D5518 Jira: DB-8305 Test Plan: ``` ./yb_build.sh --cxx-test pgwrapper_pg_libpq-test --gtest_filter PgLibPqTest.TestLWPgBackendErrorAfterLWLockAcquire ``` Reviewers: telgersma Reviewed By: telgersma Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29995
dr0pdb
added a commit
that referenced
this issue
Nov 10, 2023
…tTransaction Summary: Original commit: 7e22b28 / D29194 This revision backports the fix for releasing LWLocks on AbortTransaction. It is imported from the replication slot support revision: https://phorge.dev.yugabyte.com/D29194. The issue was introduced in 3f536a8 / D5518 Jira: DB-8305 Test Plan: ``` ./yb_build.sh --cxx-test pgwrapper_pg_libpq-test --gtest_filter PgLibPqTest.TestLWPgBackendErrorAfterLWLockAcquire ``` Reviewers: telgersma Reviewed By: telgersma Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D29996
1 task
jharveysmith
pushed a commit
that referenced
this issue
May 24, 2024
…tTransaction Summary: Original commit: e6c21c81ebaff62465fb48c665846795d034d1b1 / D29194 This revision backports the fix for releasing LWLocks on AbortTransaction. It is imported from the replication slot support revision: https://phorge.dev.yugabyte.com/D29194. The issue was introduced in 3b3f38c / D5518 Jira: DB-8305 Test Plan: ```./yb_build.sh --cxx-test pgwrapper_pg_libpq-test --gtest_filter PgLibPqTest.TestLWPgBackendErrorAfterLWLockAcquire``` Reviewers: telgersma Reviewed By: telgersma Subscribers: yql Tags: #jenkins-ready Differential Revision: https://phorge.dev.yugabyte.com/D30022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Jira Link: DB-8305
Description
In PostgreSQL when ereport is called with
level = ERROR
that automatically invokes the AbortTransaction. Currently as part of AbortTransaction, lwlocks held by the Process are not released.As the locks are not released, that prevents the subsequent acquisition of the lock and causes stuck PostgreSQL processes.
All the lwlocks should be released as part of AbortTransaction. This can happen in any path when lwlock is acquired and ereport is called with
level = ERROR
.One Example Scenario which leads to this:
Deadlock when creating replication slots with duplicate names:
Issue Type
kind/bug
Warning: Please confirm that this issue does not contain any sensitive information
The text was updated successfully, but these errors were encountered: