-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DocDB] PgTableSizeTest_PartitionedTableSize fails on TSAN build due to function yb::client::YBSubTransaction::active() #15930
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
kind/failing-test
Tests and testing infra
priority/medium
Medium priority issue
Comments
deeps1991
added
area/docdb
YugabyteDB core features
status/awaiting-triage
Issue awaiting triage
labels
Feb 1, 2023
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Feb 1, 2023
yugabyte-ci
added
kind/failing-test
Tests and testing infra
and removed
status/awaiting-triage
Issue awaiting triage
labels
Feb 6, 2023
yugabyte-ci
added
priority/high
High Priority
and removed
priority/medium
Medium priority issue
labels
Feb 24, 2023
basavaraj29
added a commit
that referenced
this issue
Apr 7, 2023
…izeTest_PartitionedTableSize Summary: As we allow buffering of operations in YSQL, multiple batches are launched in async mode without completion of the previous batch (only when there is no dependency among these operations). There can be only one outstanding batch executing `PgClientSession::SetupSession`, while there can be many outstanding batches executing `YBTransaction::Impl::Prepare` as part of callback from `LookupByKeyRpc`. All these batches belong to the same subtxn id. This can be confirmed as we wait for result of all previously launched in-flight ops in `PgSession::SetActiveSubTransaction`. In the current implementation, it leads to data race issues with YBSubTransaction. A previously launched batch is trying to access `highest_subtransaction_id_` during the `Prepare` to populate in-flight ops metadata, while a subsequent batch is trying to set the same field `highest_subtransaction_id_`. Though the writer thread tries to overwrite `highest_subtransaction_id_` to the same old value, this leads to a read-write conflict To address the data race, we now set subtxn metadata for the batch (batch of ops) by setting it during `Batcher::FlushAsync`. Batcher then launches `YBTransaction::Impl::Prepare` for the underlying transaction, which sets only the transaction metadata. The diff also addresses an anomaly with `active_sub_transaction_id_` passed from `pg_session`. Postgres assigns subtransaction id(s) starting from 1. but in the existing implementation, we see that `active_sub_transaction_id_` starts from 0 and then bumps up to 2 on savepoint creation (value as seen in the requests at `pg_client_session.cc`). In `client/transaction.cc`, we check if savepoint has been created, and if not, leave the subtxn metadata unpopulated. Down the stream, it is assumed that the subtransaction belonged to id 1 since the subtxn metadata was left empty. To avoid this confusion, we change the default value of `active_sub_transaction_id_` and populate the subtxn metadata pb only when subtxn is not in its default state. Not enabling the test to run in tsan mode for now, as there are a few more race issues that need to be addressed with the pggate code. For instance, the below stack trace points out an issue. (filed github [[ #16390 | issue ]]) ``` WARNING: ThreadSanitizer: data race (pid=1415195) Read of size 8 at 0x7fb746055e18 by thread T1: #0 YBCPgIsYugaByteEnabled /nfusr/dev-server/bkolagani/code/yugabyte-db/build/tsan-clang15-dynamic-ninja/../../src/yb/yql/pggate/ybc_pggate.cc:1368:10 (libyb_pggate.so+0x8f21a) #1 IsYugaByteEnabled /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:177:9 (postgres+0xc77565) #2 die /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:2752:6 (postgres+0xa4adc1) #3 __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) /opt/yb-build/llvm/yb-llvm-v15.0.3-yb-1-1667030060-0b8d1183-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:2025:5 (postgres+0x415a3a) ... Previous write of size 8 at 0x7fb746055e18 by main thread: #0 YBCDestroyPgGate /nfusr/dev-server/bkolagani/code/yugabyte-db/build/tsan-clang15-dynamic-ninja/../../src/yb/yql/pggate/ybc_pggate.cc:196:11 (libyb_pggate.so+0x866ae) #1 YBOnPostgresBackendShutdown /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:609:2 (postgres+0xc79003) #2 proc_exit /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/storage/ipc/../../../../../../../src/postgres/src/backend/storage/ipc/ipc.c:153:3 (postgres+0xa080cc) ``` Test Plan: Jenkins ``` ./yb_build.sh --gtest_filter PgTableSizeTest.PartitionedTableSize ``` Reviewers: esheng, rthallam, pjain, rsami Reviewed By: rthallam, pjain, rsami Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D23412
premkumr
pushed a commit
to premkumr/yugabyte-db
that referenced
this issue
Apr 10, 2023
…n TableSizeTest_PartitionedTableSize Summary: As we allow buffering of operations in YSQL, multiple batches are launched in async mode without completion of the previous batch (only when there is no dependency among these operations). There can be only one outstanding batch executing `PgClientSession::SetupSession`, while there can be many outstanding batches executing `YBTransaction::Impl::Prepare` as part of callback from `LookupByKeyRpc`. All these batches belong to the same subtxn id. This can be confirmed as we wait for result of all previously launched in-flight ops in `PgSession::SetActiveSubTransaction`. In the current implementation, it leads to data race issues with YBSubTransaction. A previously launched batch is trying to access `highest_subtransaction_id_` during the `Prepare` to populate in-flight ops metadata, while a subsequent batch is trying to set the same field `highest_subtransaction_id_`. Though the writer thread tries to overwrite `highest_subtransaction_id_` to the same old value, this leads to a read-write conflict To address the data race, we now set subtxn metadata for the batch (batch of ops) by setting it during `Batcher::FlushAsync`. Batcher then launches `YBTransaction::Impl::Prepare` for the underlying transaction, which sets only the transaction metadata. The diff also addresses an anomaly with `active_sub_transaction_id_` passed from `pg_session`. Postgres assigns subtransaction id(s) starting from 1. but in the existing implementation, we see that `active_sub_transaction_id_` starts from 0 and then bumps up to 2 on savepoint creation (value as seen in the requests at `pg_client_session.cc`). In `client/transaction.cc`, we check if savepoint has been created, and if not, leave the subtxn metadata unpopulated. Down the stream, it is assumed that the subtransaction belonged to id 1 since the subtxn metadata was left empty. To avoid this confusion, we change the default value of `active_sub_transaction_id_` and populate the subtxn metadata pb only when subtxn is not in its default state. Not enabling the test to run in tsan mode for now, as there are a few more race issues that need to be addressed with the pggate code. For instance, the below stack trace points out an issue. (filed github [[ yugabyte#16390 | issue ]]) ``` WARNING: ThreadSanitizer: data race (pid=1415195) Read of size 8 at 0x7fb746055e18 by thread T1: #0 YBCPgIsYugaByteEnabled /nfusr/dev-server/bkolagani/code/yugabyte-db/build/tsan-clang15-dynamic-ninja/../../src/yb/yql/pggate/ybc_pggate.cc:1368:10 (libyb_pggate.so+0x8f21a) #1 IsYugaByteEnabled /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:177:9 (postgres+0xc77565) #2 die /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:2752:6 (postgres+0xa4adc1) yugabyte#3 __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) /opt/yb-build/llvm/yb-llvm-v15.0.3-yb-1-1667030060-0b8d1183-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:2025:5 (postgres+0x415a3a) ... Previous write of size 8 at 0x7fb746055e18 by main thread: #0 YBCDestroyPgGate /nfusr/dev-server/bkolagani/code/yugabyte-db/build/tsan-clang15-dynamic-ninja/../../src/yb/yql/pggate/ybc_pggate.cc:196:11 (libyb_pggate.so+0x866ae) #1 YBOnPostgresBackendShutdown /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:609:2 (postgres+0xc79003) #2 proc_exit /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/storage/ipc/../../../../../../../src/postgres/src/backend/storage/ipc/ipc.c:153:3 (postgres+0xa080cc) ``` Test Plan: Jenkins ``` ./yb_build.sh --gtest_filter PgTableSizeTest.PartitionedTableSize ``` Reviewers: esheng, rthallam, pjain, rsami Reviewed By: rthallam, pjain, rsami Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D23412
basavaraj29
added a commit
that referenced
this issue
Apr 26, 2023
…ction in TableSizeTest_PartitionedTableSize" Summary: This reverts commit 8bcb9b4 for master branch. Test Plan: Jenkins Reviewers: rsami, rthallam Reviewed By: rthallam Subscribers: ybase, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D24799
basavaraj29
added a commit
that referenced
this issue
Apr 26, 2023
…ith YBSubTransaction in TableSizeTest_PartitionedTableSize" Summary: This reverts commit 8bcb9b4 for 2.18 branch. Original commit: e5620ba / D24799 Test Plan: Jenkins Reviewers: rsami, rthallam Reviewed By: rthallam Subscribers: bogdan, ybase Differential Revision: https://phabricator.dev.yugabyte.com/D24837
re-opening this due to #16988 |
Closed
1 task
yugabyte-ci
added
priority/medium
Medium priority issue
and removed
priority/high
High Priority
labels
May 31, 2023
basavaraj29
added a commit
that referenced
this issue
May 2, 2024
…es subtxn id semantics at pg/pg_client_service Summary: As we allow buffering of operations in YSQL, multiple batches belonging to the same subtxn id are launched in parallel in an async manner. There is only one outstanding batch executing `PgClientSession::SetupSession`, while there can be many outstanding batches executing `YBTransaction::Impl::Prepare` as part of callback from `LookupByKeyRpc`. In the existing implementation, it leads to data race issues with `YBSubTransaction`. A previously launched batch is trying to access `highest_subtransaction_id_` during `Prepare` to populate in-flight ops metadata, while a subsequent batch is trying to set the same field `highest_subtransaction_id_`. Though the writer thread tries to overwrite `highest_subtransaction_id_` with the same old value, it leads to a data race. Also, for RC transactions, we seem to hit a similar race while executing `RollbackToSubTransaction`. Seems like the pg backend issues a rollback on error before all inflight operations complete. ``` Status PgApiImpl::RollbackToSubTransaction(SubTransactionId id) { pg_session_->DropBufferedOperations(); return pg_session_->RollbackToSubTransaction(id); } ``` **Change 1** To address the data race, we now set subtxn metadata for the batch (batch of ops) by setting it during `Batcher::FlushAsync`. Batcher then launches `YBTransaction::Impl::Prepare` for the underlying transaction, which sets only the transaction metadata. If the batch fails, the subtxn metadata is copied to the retry batcher. **Change 2** This diff also addresses an anomaly with `active_sub_transaction_id_` passed from `pg_session`. Postgres assigns subtransaction id(s) starting from 1. But in the existing implementation, we see that `active_sub_transaction_id_` starts from 0 and then bumps up to 2 on savepoint creation (value as seen in the requests at `pg_client_session.cc`). In `client/transaction.cc`, we leave the subtxn metadata unpopulated if no savepoint has been created yet. The downstream code assumes that the subtxn belonged to id 1 since the subtxn metadata was unpopulated. To avoid this confusion, we change the default value of `active_sub_transaction_id_` and populate the subtxn metadata pb only when subtxn is not in its default state. Enabling test `PgTableSizeTest.PartitionedTableSize` to run in tsan mode, as the most of the race issues with pggate are resolved now. Note: The earlier version of the [[ https://phorge.dev.yugabyte.com/D23412 | fix ]] lead to a data loss [[ #16988 | issue ]] as the subtxn metadata was not being populated for retry batchers (which are quite probable in case of tablet splits). Hence, these ops were wrongly being treated as belonging to `kMinSubtransactionId` by downstream code, which might have lead to the issue. Added a test that validates the reasoning. **Additional note** We discussed offline that this change shouldn't have upgrade/downgrade consequences. Since pg process is a child process of the tserver process, changes at pg_client_session/service and pg_txn_manager would have taken effect in combination. So the subtxn metadata in the actual rpc to docdb would remain unchanged. Jira: DB-5343, DB-10441 Test Plan: Jenkins ``` ./yb_build.sh tsan --gtest_filter PgTableSizeTest.PartitionedTableSize -n 100 ./yb_build.sh --gtest_filter PgTabletSplitTest.SplitAmidstRunningTransaction ./yb_builds tsan --cxx-test='TEST_F(GeoPartitionedReadCommiittedTest, TestPromotionAmidstConflicts) {' ``` Reviewers: pjain, esheng, rsami, rthallam Reviewed By: pjain Subscribers: yql, ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D25071
svarnau
pushed a commit
that referenced
this issue
May 25, 2024
…es subtxn id semantics at pg/pg_client_service Summary: As we allow buffering of operations in YSQL, multiple batches belonging to the same subtxn id are launched in parallel in an async manner. There is only one outstanding batch executing `PgClientSession::SetupSession`, while there can be many outstanding batches executing `YBTransaction::Impl::Prepare` as part of callback from `LookupByKeyRpc`. In the existing implementation, it leads to data race issues with `YBSubTransaction`. A previously launched batch is trying to access `highest_subtransaction_id_` during `Prepare` to populate in-flight ops metadata, while a subsequent batch is trying to set the same field `highest_subtransaction_id_`. Though the writer thread tries to overwrite `highest_subtransaction_id_` with the same old value, it leads to a data race. Also, for RC transactions, we seem to hit a similar race while executing `RollbackToSubTransaction`. Seems like the pg backend issues a rollback on error before all inflight operations complete. ``` Status PgApiImpl::RollbackToSubTransaction(SubTransactionId id) { pg_session_->DropBufferedOperations(); return pg_session_->RollbackToSubTransaction(id); } ``` **Change 1** To address the data race, we now set subtxn metadata for the batch (batch of ops) by setting it during `Batcher::FlushAsync`. Batcher then launches `YBTransaction::Impl::Prepare` for the underlying transaction, which sets only the transaction metadata. If the batch fails, the subtxn metadata is copied to the retry batcher. **Change 2** This diff also addresses an anomaly with `active_sub_transaction_id_` passed from `pg_session`. Postgres assigns subtransaction id(s) starting from 1. But in the existing implementation, we see that `active_sub_transaction_id_` starts from 0 and then bumps up to 2 on savepoint creation (value as seen in the requests at `pg_client_session.cc`). In `client/transaction.cc`, we leave the subtxn metadata unpopulated if no savepoint has been created yet. The downstream code assumes that the subtxn belonged to id 1 since the subtxn metadata was unpopulated. To avoid this confusion, we change the default value of `active_sub_transaction_id_` and populate the subtxn metadata pb only when subtxn is not in its default state. Enabling test `PgTableSizeTest.PartitionedTableSize` to run in tsan mode, as the most of the race issues with pggate are resolved now. Note: The earlier version of the [[ https://phorge.dev.yugabyte.com/D23412 | fix ]] lead to a data loss [[ #16988 | issue ]] as the subtxn metadata was not being populated for retry batchers (which are quite probable in case of tablet splits). Hence, these ops were wrongly being treated as belonging to `kMinSubtransactionId` by downstream code, which might have lead to the issue. Added a test that validates the reasoning. **Additional note** We discussed offline that this change shouldn't have upgrade/downgrade consequences. Since pg process is a child process of the tserver process, changes at pg_client_session/service and pg_txn_manager would have taken effect in combination. So the subtxn metadata in the actual rpc to docdb would remain unchanged. Jira: DB-5343, DB-10441 Test Plan: Jenkins ``` ./yb_build.sh tsan --gtest_filter PgTableSizeTest.PartitionedTableSize -n 100 ./yb_build.sh --gtest_filter PgTabletSplitTest.SplitAmidstRunningTransaction ./yb_builds tsan --cxx-test='TEST_F(GeoPartitionedReadCommiittedTest, TestPromotionAmidstConflicts) {' ``` Reviewers: pjain, esheng, rsami, rthallam Reviewed By: pjain Subscribers: yql, ybase, bogdan Differential Revision: https://phorge.dev.yugabyte.com/D25071
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/docdb
YugabyteDB core features
kind/bug
This issue is a bug
kind/failing-test
Tests and testing infra
priority/medium
Medium priority issue
Jira Link: DB-5343
Description
PgTableSizeTest_PartitionedTableSize.log
The text was updated successfully, but these errors were encountered: