Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architecting for HA - Transactions #1

Merged
merged 9 commits into from
Mar 10, 2023
Merged

Conversation

premkumr
Copy link
Owner

  • Adding a page in the Develop section for handling error codes during Transaction Processing

karan-yb and others added 9 commits March 9, 2023 19:15
Summary:
This change reverts the change `4a6b2774ae3005655e66cf5675b14b05d6eccae2` which introduced Iterate() API for two reasons:
* There are no performance gains since we end up using virtual methods again in Iterate call
* Reduces the readability because of callback

Test Plan:
./yb_build.sh --sj
./build/latest/tests-docdb/docrowwiseiterator-test
./build/latest/tests-pgwrapper/create_initial_sys_catalog_snapshot
./build/latest/tests-pgwrapper/pg_mini-test --gtest_filter=PgMiniTest.BigRead
./build/latest/tests-pgwrapper/pg_mini-test --gtest_filter=PgMiniTest.BigReadWithCompaction
./build/latest/tests-pgwrapper/pg_mini-test --gtest_filter=PgMiniTest.Scan

Reviewers: rsami, dmitry

Reviewed By: dmitry

Subscribers: rthallam, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D23398
…ter creation failed

Summary:
**Context**
If an xCluster creation task fails, the user must not be able to select which tables they want to restart. Only
restarting the whole config is allowed.

**Change**
Skip the table selection step if xCluster config is in `failed` status.

Test Plan:
Create an xCluster config and put it in failed status.
Verify that the `restart replication` action will skip the table selection step and
bring users directly to the bootstrap configuration step.
Verify that the payload contains an empty array for tableUUIDs.
Verify that the restart is successful.

Reviewers: hzare, rmadhavan

Reviewed By: rmadhavan

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D23314
Summary:
**Context**
The list of providers is cached and invalidated when the UI is aware of possible changes to the cached values.
Now that the delete provider api returns a YBTask, we can poll with the taskUUID on the frontend to update the provider list
when the provider is deleted (without requiring users to refresh the page).
If the task errors, we display an error toast.

Test Plan:
- Create providers
- Delete providers and verify that the provider is removed from the provider list once the delete provider
  task is complete.
{F35255}
{F35256}
{F35259}
{F35260}

Reviewers: svarshney, rmadhavan

Reviewed By: rmadhavan

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D23458
Summary:
We were not providing the region code while making calls to create sts and route53 client but luckliy it was able to pick up from the env vars which is invalid.
Now, we will be validating the hosted zone(route 53) and identity(sts client) across all regions provided in the provider request body.

Test Plan: Reproduced this issue manually by removing the env vars and then successfully validate the provider details.

Reviewers: sb-yb, svarshney

Reviewed By: svarshney

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D23441
Summary: Fixed the PitrControllerTest.testListPitrConfigs flaky UT

Test Plan: UTs

Reviewers: vkumar, vbansal, sneelakantan

Reviewed By: sneelakantan

Subscribers: jenkins-bot, sneelakantan, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D23427
…/detach

Summary:
Include scheduled backups + relevant storage configs in attach/detach tar gz
- Allow for one stop detach -> attach from source to target platform

Examples:
Sample detach:
        python3 ./yb_attach_detach.py -u edf981dc-a11a-46d1-9518-c022b1f0bc80 detach
        -cs f33e3c9b-75ab-4c30-80ad-cba85646ea39  -ts ad09f10f-c377-4cdf-985c-898d13eae783
        -ps http://167.123.191.88:9000   -f /tmp/universe-export-spec.tar.gz -s

 Sample attach:
        python3 ./yb_attach_detach.py -u edf981dc-a11a-46d1-9518-c022b1f0bc80 attach
        -cd c920db92-063c-483e-839f-e8264dbe5a0d  -td 4abb151c-3020-43fb-bd04-f6a63934e4e5
        -pd http://10.150.7.155:9000   -f /tmp/universe-export-spec.tar.gz

Sample one stop detach/attach:
        python3 ./yb_attach_detach.py -u edf981dc-a11a-46d1-9518-c022b1f0bc80 run
        -cs f33e3c9b-75ab-4c30-80ad-cba85646ea39  -ts ad09f10f-c377-4cdf-985c-898d13eae783
        -ps http://167.123.191.88:9000   -f /tmp/universe-export-spec.tar.gz -s
        -cd c920db92-063c-483e-839f-e8264dbe5a0d  -td 4abb151c-3020-43fb-bd04-f6a63934e4e5
        -pd http://10.150.7.155:9000

Test Plan:
Create GCP and AWS universes and for each universe on source platform:

1. Add a backup config (GCS or s3) on source platform.
2. Create a backup using backup config on source platform.
3. Add a scheduled backup using the config on source platform.
4. Run detach universe (see example in summary) using `yb_attach_detach.py` script
5. Run attach universe (see example in summary) using `yb_attach_detach.py` script
6. Perform a rolling restart on destination platform and make sure that universe works as expected.

Instead of running steps 4-5, can instead use the `run` command in the `yb_attach_detach.py` script which will complete steps 4 and 5 in one go

Reviewers: nsingh, sanketh

Reviewed By: nsingh, sanketh

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D23362
Summary:
When making multiple edits on a universe simultaneously along with volume resize, the current sequence of applying the edits leads to an error. Say the user is expanding the universe from 3 nodes to 4 nodes along with changing the volume size from 100Gi to 150Gi. In this case we have two plans possible:
Plan1:
1. Expand universe from 3 nodes to 4 (apply StatefulSet with replicas=4)
2. Expand volume for all 4 nodes
Plan2:
1. Expand volume for 3 nodes of current universe
2. Expand universe from 3 nodes to 4 (note that 4th new node automatically gets the new volume size of 150Gi).

The code was following Plan 1 above as done as part of D20994. So a helm upgrade in step 1 includes both these changes to the chart - replicas=4 and new PVC size of 150Gi. This leads to error when applying the statefulset since PVC size change is not allowed in an existing statefulset.

This diff is to fix this by using Plan 2. At step 1, the volume size of "current universe" can be expanded by using the `curPlacement` instead of `newPlacement`.

Figured out with Vivek's help that this issue existed since D20994 was landed (ie. nearly ever since this feature was implemented in 2.16).

Test Plan:
**Ran k8s iTests** - https://jenkins.dev.yugabyte.com/job/dev-itest-job/7878/. Only unrelated failures. These iTests do not cover volume resize, so just covers regression for edit universe.

**Manual verification** - passed.

For manual verification, tried the following and found it working.
1. Expand volume of a universe.
2. Expand volume of a universe along with increase number of nodes.
3. Expand volume of a universe along with change in instance type.
4. Expand volume of a universe along with shrink number of nodes and change in instance type.

**Note**: Found another bug during the above testing. When the universe is shrunk, the pods are deleted but the corresponding PVC is not. Since that remains in the namespace, the next call to expand volume on this universe will be stuck forever. This is because the dangling PVC is also expanded and YBA waits for a pod to restart in order to complete the PVC expansion. Since there is no pod associated with this dangling PVC, YBA keeps looping until timeout. I will track this as a separate issue to be fixed - https://yugabyte.atlassian.net/browse/PLAT-7702

Reviewers: bgandhi, hzare, vkumar, sanketh, anijhawan

Reviewed By: anijhawan

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D23408
Summary:
This diff adds UI support for AWS 'quick validation'.
Backend changes landed with these commits:
868bc91 / D22481
f40d84e / D22841
5c6c883 / D22888
1358c06 / D22953

The AWS provider form will send create provider requests with validation enabled.
If the validation fails, the validation errors are presented to the user at the bottom of the page.
If the validation fails, the user is able to resubmit and skip validation.

This feature sits behind the `enableAWSProviderValidation` feature flag which is off by default.

Additional changes:
- Adds `YBPError` and `YBBeanValidationError` type interface for API responses

Test Plan:
Create AWS provider with invalid field values.
Verify that the form submission gets rejected and the validation
errors are presented to the user at the bottom of the page.
{F35246}
{F35250}
{F35252}

Reviewers: rmadhavan, lsangappa, asathyan, kkannan

Reviewed By: kkannan

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D23424
…gateway + fix YBA scraping

Summary:
Currently we're using host interface for yugaware<->prometheus communications.
We shouldn't do that.
Hense, we need to make prometheus and yugaware continers be exposed on both docker gateway and host interfaces in Replicated environment.
And we need the rest of the code to use internal/external urls/ips/ports where needed.
Another issue was that prometheus port was hardcoded in some places in backup script - passed configured port everwhere.

As a result - yugaware and prometheus containers will continue to listen on ports 9000/9443 and 9090 on the host interface - for external communications.
But will also listen on ports 9100/9543 and 9190 on docker gateway interface for all inter-container comunications.

Test Plan:
Installed Replicated YBA.
Make sure yugaware and prometheus targets are scraped by Prometheus.
Make sure yugaware queries metrics and alerts from prometheus successfully.
Make sure host IP is used in Prometheus links in YBA UI.

Start YBA locally.
Make sure local prometheus URL is used in links in YBA UI.

Reviewers: vbansal, sanketh, sb-yb

Reviewed By: sb-yb

Subscribers: jenkins-bot, yugaware

Differential Revision: https://phabricator.dev.yugabyte.com/D23312
@premkumr premkumr merged commit 4467546 into premkumr:develop-txn Mar 10, 2023
premkumr pushed a commit that referenced this pull request Apr 3, 2023
Summary:
This commit addresses a memory leak in the pg_isolation_regress test suite
detected by ASAN:

```
 ==18113==ERROR: LeakSanitizer: detected memory leaks

 Direct leak of 2048 byte(s) in 1 object(s) allocated from:
     #0 0x55f21c2846d6 in realloc /opt/yb-build/llvm/yb-llvm-v15.0.3-yb-1-1667030060-0b8d1183-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:85:3
     #1 0x55f21c2cb0a9 in pg_realloc ${YB_SRC_ROOT}/src/postgres/src/common/../../../../../src/postgres/src/common/fe_memutils.c:72:8
     #2 0x55f21c2c27b2 in addlitchar ${YB_SRC_ROOT}/src/postgres/src/test/isolation/specscanner.l:116:12
     yugabyte#3 0x55f21c2c27b2 in spec_yylex ${YB_SRC_ROOT}/src/postgres/src/test/isolation/specscanner.l:90:6
     yugabyte#4 0x55f21c2be95f in spec_yyparse ${YB_SRC_ROOT}/src/postgres/src/test/isolation/specparse.c:1190:16
     yugabyte#5 0x55f21c2c5492 in main ${YB_SRC_ROOT}/src/postgres/src/test/isolation/../../../../../../src/postgres/src/test/isolation/isolationtester.c:116:2
     yugabyte#6 0x7f027726ed84 in __libc_start_main (/lib64/libc.so.6+0x3ad84) (BuildId: d18afae5244bc9c85026bd7d64b276d51b452d93)

 Objects leaked above:
 0x61d000000080 (2048 bytes)

 SUMMARY: AddressSanitizer: 2048 byte(s) leaked in 1 allocation(s).
```

The memory leak is caused by the litbuf variable in specscanner.l, which is
being reallocated in the addlitchar function but not freed properly. The leak
occurs in two ways:

  1. litbuf is allocated multiple times without being freed, leading to memory
     leaks.
  2. litbuf is not properly released after spec_yyparse() is executed.

To resolve these issues, the following changes have been made:

  1. litbuf memory allocation is now initialized only when it is NULL, preventing
     multiple allocations and enabling the buffer to be reused.
  2. A spec_scanner_finish() function is introduced to clean up the allocated
    memory. This function frees the litbuf memory after spec_yyparse() is
    executed, preventing the memory leak.

Test Plan:
Run pg_isolation_regress to confirm that the memory leak is resolved:

/yb_build.sh asan --java-test 'org.yb.pgsql.TestPgWithoutWaitQueuesIsolationRegress' -n 100 --tp 1

Reviewers: bogdan, pjain

Reviewed By: pjain

Subscribers: smishra, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D23912
premkumr pushed a commit that referenced this pull request Apr 7, 2023
…n TableSizeTest_PartitionedTableSize

Summary:
As we allow buffering of operations in YSQL, multiple batches are launched in async mode without completion of the previous batch (only when there is no dependency among these operations). There can be only one outstanding batch executing `PgClientSession::SetupSession`, while there can be many outstanding batches executing `YBTransaction::Impl::Prepare` as part of callback from `LookupByKeyRpc`. All these batches belong to the same subtxn id. This can be confirmed as we wait for result of all previously launched in-flight ops in `PgSession::SetActiveSubTransaction`.

In the current implementation, it leads to data race issues with YBSubTransaction. A previously launched batch is trying to access `highest_subtransaction_id_` during the `Prepare` to populate in-flight ops metadata, while a subsequent batch is trying to set the same field `highest_subtransaction_id_`. Though the writer thread tries to overwrite `highest_subtransaction_id_` to the same old value, this leads to a read-write conflict

To address the data race, we now set subtxn metadata for the batch (batch of ops) by setting it during `Batcher::FlushAsync`. Batcher then launches `YBTransaction::Impl::Prepare` for the underlying transaction, which sets only the transaction metadata.

The diff also addresses an anomaly with `active_sub_transaction_id_` passed from `pg_session`. Postgres assigns subtransaction id(s) starting from 1. but in the existing implementation, we see that `active_sub_transaction_id_` starts from 0 and then bumps up to 2 on savepoint creation (value as seen in the requests at `pg_client_session.cc`). In `client/transaction.cc`, we check if savepoint has been created, and if not, leave the subtxn metadata unpopulated. Down the stream, it is assumed that the subtransaction belonged to id 1 since the subtxn metadata was left empty. To avoid this confusion, we change the default value of `active_sub_transaction_id_` and populate the subtxn metadata pb only when subtxn is not in its default state.

Not enabling the test to run in tsan mode for now, as there are a few more race issues that need to be addressed with the pggate code. For instance, the below stack trace points out an issue. (filed github [[ yugabyte#16390 | issue ]])
```
WARNING: ThreadSanitizer: data race (pid=1415195)
  Read of size 8 at 0x7fb746055e18 by thread T1:
    #0 YBCPgIsYugaByteEnabled /nfusr/dev-server/bkolagani/code/yugabyte-db/build/tsan-clang15-dynamic-ninja/../../src/yb/yql/pggate/ybc_pggate.cc:1368:10 (libyb_pggate.so+0x8f21a)
    #1 IsYugaByteEnabled /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:177:9 (postgres+0xc77565)
    #2 die /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:2752:6 (postgres+0xa4adc1)
    yugabyte#3 __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) /opt/yb-build/llvm/yb-llvm-v15.0.3-yb-1-1667030060-0b8d1183-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:2025:5 (postgres+0x415a3a)
...
  Previous write of size 8 at 0x7fb746055e18 by main thread:
    #0 YBCDestroyPgGate /nfusr/dev-server/bkolagani/code/yugabyte-db/build/tsan-clang15-dynamic-ninja/../../src/yb/yql/pggate/ybc_pggate.cc:196:11 (libyb_pggate.so+0x866ae)
    #1 YBOnPostgresBackendShutdown /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:609:2 (postgres+0xc79003)
    #2 proc_exit /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/storage/ipc/../../../../../../../src/postgres/src/backend/storage/ipc/ipc.c:153:3 (postgres+0xa080cc)
```

Test Plan:
Jenkins
```
./yb_build.sh --gtest_filter PgTableSizeTest.PartitionedTableSize
```

Reviewers: esheng, rthallam, pjain, rsami

Reviewed By: rthallam, pjain, rsami

Subscribers: bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D23412
premkumr pushed a commit that referenced this pull request Apr 10, 2023
Summary:
This commit addresses a memory leak in the pg_isolation_regress test suite
detected by ASAN:

```
 ==18113==ERROR: LeakSanitizer: detected memory leaks

 Direct leak of 2048 byte(s) in 1 object(s) allocated from:
     #0 0x55f21c2846d6 in realloc /opt/yb-build/llvm/yb-llvm-v15.0.3-yb-1-1667030060-0b8d1183-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:85:3
     #1 0x55f21c2cb0a9 in pg_realloc ${YB_SRC_ROOT}/src/postgres/src/common/../../../../../src/postgres/src/common/fe_memutils.c:72:8
     #2 0x55f21c2c27b2 in addlitchar ${YB_SRC_ROOT}/src/postgres/src/test/isolation/specscanner.l:116:12
     yugabyte#3 0x55f21c2c27b2 in spec_yylex ${YB_SRC_ROOT}/src/postgres/src/test/isolation/specscanner.l:90:6
     yugabyte#4 0x55f21c2be95f in spec_yyparse ${YB_SRC_ROOT}/src/postgres/src/test/isolation/specparse.c:1190:16
     yugabyte#5 0x55f21c2c5492 in main ${YB_SRC_ROOT}/src/postgres/src/test/isolation/../../../../../../src/postgres/src/test/isolation/isolationtester.c:116:2
     yugabyte#6 0x7f027726ed84 in __libc_start_main (/lib64/libc.so.6+0x3ad84) (BuildId: d18afae5244bc9c85026bd7d64b276d51b452d93)

 Objects leaked above:
 0x61d000000080 (2048 bytes)

 SUMMARY: AddressSanitizer: 2048 byte(s) leaked in 1 allocation(s).
```

The memory leak is caused by the litbuf variable in specscanner.l, which is
being reallocated in the addlitchar function but not freed properly. The leak
occurs in two ways:

  1. litbuf is allocated multiple times without being freed, leading to memory
     leaks.
  2. litbuf is not properly released after spec_yyparse() is executed.

To resolve these issues, the following changes have been made:

  1. litbuf memory allocation is now initialized only when it is NULL, preventing
     multiple allocations and enabling the buffer to be reused.
  2. A spec_scanner_finish() function is introduced to clean up the allocated
    memory. This function frees the litbuf memory after spec_yyparse() is
    executed, preventing the memory leak.

Test Plan:
Run pg_isolation_regress to confirm that the memory leak is resolved:

/yb_build.sh asan --java-test 'org.yb.pgsql.TestPgWithoutWaitQueuesIsolationRegress' -n 100 --tp 1

Reviewers: bogdan, pjain

Reviewed By: pjain

Subscribers: smishra, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D23912
premkumr pushed a commit that referenced this pull request Apr 10, 2023
…n TableSizeTest_PartitionedTableSize

Summary:
As we allow buffering of operations in YSQL, multiple batches are launched in async mode without completion of the previous batch (only when there is no dependency among these operations). There can be only one outstanding batch executing `PgClientSession::SetupSession`, while there can be many outstanding batches executing `YBTransaction::Impl::Prepare` as part of callback from `LookupByKeyRpc`. All these batches belong to the same subtxn id. This can be confirmed as we wait for result of all previously launched in-flight ops in `PgSession::SetActiveSubTransaction`.

In the current implementation, it leads to data race issues with YBSubTransaction. A previously launched batch is trying to access `highest_subtransaction_id_` during the `Prepare` to populate in-flight ops metadata, while a subsequent batch is trying to set the same field `highest_subtransaction_id_`. Though the writer thread tries to overwrite `highest_subtransaction_id_` to the same old value, this leads to a read-write conflict

To address the data race, we now set subtxn metadata for the batch (batch of ops) by setting it during `Batcher::FlushAsync`. Batcher then launches `YBTransaction::Impl::Prepare` for the underlying transaction, which sets only the transaction metadata.

The diff also addresses an anomaly with `active_sub_transaction_id_` passed from `pg_session`. Postgres assigns subtransaction id(s) starting from 1. but in the existing implementation, we see that `active_sub_transaction_id_` starts from 0 and then bumps up to 2 on savepoint creation (value as seen in the requests at `pg_client_session.cc`). In `client/transaction.cc`, we check if savepoint has been created, and if not, leave the subtxn metadata unpopulated. Down the stream, it is assumed that the subtransaction belonged to id 1 since the subtxn metadata was left empty. To avoid this confusion, we change the default value of `active_sub_transaction_id_` and populate the subtxn metadata pb only when subtxn is not in its default state.

Not enabling the test to run in tsan mode for now, as there are a few more race issues that need to be addressed with the pggate code. For instance, the below stack trace points out an issue. (filed github [[ yugabyte#16390 | issue ]])
```
WARNING: ThreadSanitizer: data race (pid=1415195)
  Read of size 8 at 0x7fb746055e18 by thread T1:
    #0 YBCPgIsYugaByteEnabled /nfusr/dev-server/bkolagani/code/yugabyte-db/build/tsan-clang15-dynamic-ninja/../../src/yb/yql/pggate/ybc_pggate.cc:1368:10 (libyb_pggate.so+0x8f21a)
    #1 IsYugaByteEnabled /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:177:9 (postgres+0xc77565)
    #2 die /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/tcop/../../../../../../src/postgres/src/backend/tcop/postgres.c:2752:6 (postgres+0xa4adc1)
    yugabyte#3 __tsan::CallUserSignalHandler(__tsan::ThreadState*, bool, bool, int, __sanitizer::__sanitizer_siginfo*, void*) /opt/yb-build/llvm/yb-llvm-v15.0.3-yb-1-1667030060-0b8d1183-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:2025:5 (postgres+0x415a3a)
...
  Previous write of size 8 at 0x7fb746055e18 by main thread:
    #0 YBCDestroyPgGate /nfusr/dev-server/bkolagani/code/yugabyte-db/build/tsan-clang15-dynamic-ninja/../../src/yb/yql/pggate/ybc_pggate.cc:196:11 (libyb_pggate.so+0x866ae)
    #1 YBOnPostgresBackendShutdown /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/utils/misc/../../../../../../../src/postgres/src/backend/utils/misc/pg_yb_utils.c:609:2 (postgres+0xc79003)
    #2 proc_exit /nfusr/dev-server/bkolagani/code/yugabyte-db/src/postgres/src/backend/storage/ipc/../../../../../../../src/postgres/src/backend/storage/ipc/ipc.c:153:3 (postgres+0xa080cc)
```

Test Plan:
Jenkins
```
./yb_build.sh --gtest_filter PgTableSizeTest.PartitionedTableSize
```

Reviewers: esheng, rthallam, pjain, rsami

Reviewed By: rthallam, pjain, rsami

Subscribers: bogdan, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D23412
premkumr pushed a commit that referenced this pull request Jul 7, 2023
…izableIsolation

Summary:
The following error was seen because `num_write_iterations` was used by
the test thread after it was destructed. The thread should stop before
`num_write_iterations` is destructed.

```
==27369==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f4570be7660 at pc 0x55b8b47bbe66 bp 0x7f44ec861290 sp 0x7f44ec861288
WRITE of size 8 at 0x7f4570be7660 thread T145
    #0 0x55b8b47bbe65 in unsigned long std::__cxx_atomic_fetch_add[abi:v160006]<unsigned long>(std::__cxx_atomic_base_impl<unsigned long>*, unsigned long, std::memory_order) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/libcxx/include/c++/v1/atomic:1014:12
    #1 0x55b8b47bbe65 in std::__atomic_base<unsigned long, true>::fetch_add[abi:v160006](unsigned long, std::memory_order) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230621185529-6777477baa-almalinux8-x86_64-clang16/installed/asan/libcxx/include/c++/v1/atomic:1649:17
    #2 0x55b8b47bbe65 in yb::pgwrapper::PgSingleTServerTest_TestDeferrablePagingInSerializableIsolation_Test::TestBody()::$_0::operator()() const ${BUILD_ROOT}/../../src/yb/yql/pgwrapper/pg_single_tserver-test.cc:361:32
```
Jira: DB-6908

Test Plan:
Jenkins: test regex: .*single_tserver.*
./yb_build.sh --cxx-test pgwrapper_pg_single_tserver-test --gtest_filter PgSingleTServerTest.TestPagingInSerializableIsolation

Reviewers: dmitry

Reviewed By: dmitry

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D26571
premkumr pushed a commit that referenced this pull request Sep 1, 2023
…hich resulted in tsan race in connection

Summary:
This change fixes TSAN race issues caused by change 98586ef. In initial change, we were incorrectly setting the RPC start time to now when processing response, because of which detection logic was not working correctly, hence resulting in tsan race issues.

tsan race details -
- Connection getting reset when we are trying to get DebugString. This is solved by avoiding the call to DebugString() until we figure out that call is actually stuck.
```
[ts-2]   Read of size 8 at 0x7b54000c0198 by thread T21 (mutexes: write M0):
[ts-2]     #0 std::shared_ptr<yb::rpc::Connection>::get[abi:v160006]() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230803170255-057e0a1188-centos7-x86_64-clang16/installed/tsan/libcxx/include/c++/v1/__memory/shared_ptr.h:843:16 (libyrpc.so+0x11383f)
[ts-2]     #1 std::shared_ptr<yb::rpc::Connection>::operator bool[abi:v160006]() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230803170255-057e0a1188-centos7-x86_64-clang16/installed/tsan/libcxx/include/c++/v1/__memory/shared_ptr.h:875:16 (libyrpc.so+0x11383f)
[ts-2]     #2 yb::rpc::OutboundCall::DebugString() const ${BUILD_ROOT}/../../src/yb/rpc/outbound_call.cc:613:7 (libyrpc.so+0x11383f)
[ts-2]     yugabyte#3 yb::rpc::RpcController::CallStateDebugString() const ${BUILD_ROOT}/../../src/yb/rpc/rpc_controller.cc:163:19 (libyrpc.so+0x156976)
[ts-2]     yugabyte#4 yb::consensus::Peer::SignalRequest(yb::consensus::RequestTriggerMode) ${BUILD_ROOT}/../../src/yb/consensus/consensus_peers.cc:188:77 (libconsensus.so+0x94c2e)

[ts-2]   Previous write of size 8 at 0x7b54000c0198 by thread T25:
[ts-2]     #0 std::enable_if<is_move_constructible<yb::rpc::Connection*>::value && is_move_assignable<yb::rpc::Connection*>::value, void>::type std::swap[abi:v160006]<yb::rpc::Connection*>(yb::rpc::Connection*&, yb::rpc::Connection*&) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230803170255-057e0a1188-centos7-x86_64-clang16/installed/tsan/libcxx/include/c++/v1/__utility/swap.h:41:7 (libyrpc.so+0x11166a)
[ts-2]     #1 std::shared_ptr<yb::rpc::Connection>::swap[abi:v160006](std::shared_ptr<yb::rpc::Connection>&) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230803170255-057e0a1188-centos7-x86_64-clang16/installed/tsan/libcxx/include/c++/v1/__memory/shared_ptr.h:805:9 (libyrpc.so+0x11166a)
[ts-2]     #2 std::shared_ptr<yb::rpc::Connection>::reset[abi:v160006]() /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230803170255-057e0a1188-centos7-x86_64-clang16/installed/tsan/libcxx/include/c++/v1/__memory/shared_ptr.h:812:22 (libyrpc.so+0x11166a)
[ts-2]     yugabyte#3 yb::rpc::OutboundCall::InvokeCallback() ${BUILD_ROOT}/../../src/yb/rpc/outbound_call.cc:372:15 (libyrpc.so+0x11166a)
[ts-2]     yugabyte#4 yb::rpc::OutboundCall::SetTimedOut() ${BUILD_ROOT}/../../src/yb/rpc/outbound_call.cc:538:5 (libyrpc.so+0x112c64)
```

- RpcController call_ concurrent access - fix will avoid calling finished() function call.
```
[m-2] WARNING: ThreadSanitizer: data race (pid=27931)
[m-2]   Write of size 8 at 0x7b6000040708 by thread T29 (mutexes: write M0):
[m-2]     #0 std::enable_if<is_move_constructible<yb::rpc::OutboundCall*>::value && is_move_assignable<yb::rpc::OutboundCall*>::value, void>::type std::swap[abi:v160006]<yb::rpc::OutboundCall*>(yb::rpc::OutboundCall*&, yb::rpc::OutboundCall*&) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230803170255-057e0a1188-centos7-x86_64-clang16/installed/tsan/libcxx/include/c++/v1/__utility/swap.h:41:7 (libyrpc.so+0x1560a4)
[m-2]     #1 std::shared_ptr<yb::rpc::OutboundCall>::swap[abi:v160006](std::shared_ptr<yb::rpc::OutboundCall>&) /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230803170255-057e0a1188-centos7-x86_64-clang16/installed/tsan/libcxx/include/c++/v1/__memory/shared_ptr.h:805:9 (libyrpc.so+0x1560a4)
[m-2]     #2 std::shared_ptr<yb::rpc::OutboundCall>::reset[abi:v160006]() /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230803170255-057e0a1188-centos7-x86_64-clang16/installed/tsan/libcxx/include/c++/v1/__memory/shared_ptr.h:812:22 (libyrpc.so+0x1560a4)
[m-2]     yugabyte#3 yb::rpc::RpcController::Reset() ${BUILD_ROOT}/../../src/yb/rpc/rpc_controller.cc:83:9 (libyrpc.so+0x1560a4)
[m-2]     yugabyte#4 yb::consensus::Peer::ProcessResponse() ${BUILD_ROOT}/../../src/yb/consensus/consensus_peers.cc:496:15 (libconsensus.so+0x97ffb)

[m-2]   Previous read of size 8 at 0x7b6000040708 by thread T7:
[m-2]     #0 std::shared_ptr<yb::rpc::OutboundCall>::get[abi:v160006]() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230803170255-057e0a1188-centos7-x86_64-clang16/installed/tsan/libcxx/include/c++/v1/__memory/shared_ptr.h:843:16 (libyrpc.so+0x1561ea)
[m-2]     #1 std::shared_ptr<yb::rpc::OutboundCall>::operator bool[abi:v160006]() const /opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20230803170255-057e0a1188-centos7-x86_64-clang16/installed/tsan/libcxx/include/c++/v1/__memory/shared_ptr.h:875:16 (libyrpc.so+0x1561ea)
[m-2]     #2 yb::rpc::RpcController::finished() const ${BUILD_ROOT}/../../src/yb/rpc/rpc_controller.cc:87:7 (libyrpc.so+0x1561ea)
[m-2]     yugabyte#3 yb::consensus::Peer::SignalRequest(yb::consensus::RequestTriggerMode) ${BUILD_ROOT}/../../src/yb/consensus/consensus_peers.cc:183:81 (libconsensus.so+0x94934)
[m-2]     yugabyte#4 yb::consensus::Peer::Init()::$_0::operator()() const ${BUILD_ROOT}/../../src/yb/consensus/consensus_peers.cc:156:25 (libconsensus.so+0x9bc67)
```
Jira: DB-7637

Test Plan:
./yb_build.sh tsan -n 10 --cxx-test integration-tests_master_failover-itest --gtest_filter MasterFailoverTestIndexCreation/MasterFailoverTestIndexCreation.TestPauseAfterCreateIndexIssued/0
./yb_build.sh tsan -n 10 --cxx-test integration-tests_raft_consensus-itest --gtest_filter RaftConsensusITest.MultiThreadedInsertWithFailovers
Jenkins

Reviewers: mbautin

Reviewed By: mbautin

Subscribers: mbautin, ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D28161
premkumr pushed a commit that referenced this pull request Sep 8, 2023
Summary:
This PR does the following:
1. Adds built-in roles to the DB for the new RBAC with all the permissions defined in `R__Sync_System_Roles.java -> getPredefinedSystemRolesMap()`.
2. Changes resource type `DEFAULT` to `OTHER`.

For #1, the permissions for these built-in roles are defined in `R__Sync_System_Roles::getPredefinedSystemRolesMap`. This is a repeatable migration that runs every time the system defined role permissions are modified. This is done by customising the migration checksum to use the system defined role permissions itself.

So if there needs to be any modification in the pg DB for the built-in roles, either for the permissions list or the description, we can just modify the values in that file and the update will take place on YBA restart.

Test Plan:
Manually tested the migration by running YBA and checking in the DB for 5 new system roles added.
Added UTs to verify the repeatable migration.
Run UTs.
Run itests.

Reviewers: vpatibandla, sneelakantan

Reviewed By: sneelakantan

Subscribers: sanketh, yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D28156
premkumr pushed a commit that referenced this pull request Oct 17, 2023
Summary:
Adds MyDatabaseId to the T-server cache key to address an issue caused by different databases sharing T-server cache entries.

When a Postgres backend starts, it prefetches these 3 shared tables:
```
        YbRegisterTable(prefetcher, YB_PFETCH_TABLE_PG_AUTH_MEMBERS);
	YbRegisterTable(prefetcher, YB_PFETCH_TABLE_PG_DATABASE);
	YbRegisterTable(prefetcher, YB_PFETCH_TABLE_PG_DB_ROLE_SETTINGS);
```
These tables are then cached in the T-server cache, which is keyed by the database OID and the OIDs of these tables. Because these are shared tables, the OID of the template1 database is used. As a result, when another backend process starts up, it will issue the same prefetch request, which will result in a hit in the T-server cache (assuming the catalog version has not changed).

Here is how the issue manifests in detail by @tverona, (requires D28071, ysql_enable_read_request_caching=true):

### 1. Start with just `yugabytedb`. Connect to `yugabytedb` for the first time.
   * **a.** Relcacheinit file is built. So we preload a bunch of tables from master.
   * **b.** Create tserver cache entry for those tables (which includes `pg_database`). Key contains `yugabytedb` oid (since that’s part of the request).
   * **c.** Create `db1`.

### 2. Connect to `db1` for the first time.
   * **a.** Same flow as #1 above - we create a new relcacheinit file for `db1`.
   * **b.** We create another tserver cache entry (might be more than one, but just simplifying) with key containing `db1` oid.

### 3. Connect to `db1` for the 2nd time.
   * **a. With D28071:**
     - **i.** Relcache file is not built, since cache is not invalidated.
     - **ii.** We fetch the 3 tables (including `pg_database`) from master and create a new tserver cache entry, with key include `template0` dbid(?). Values include `db1`.
   * **b. Without D28071:**
     - **i.** We preload a bunch of tables for `db1`. We match on tserver cache entry from 2.b. We do not hit master.
   * **c.** Create `db2`.

### 4. Connect to `db2` for the first time.
   * **a.** Same flow as #1 above - we create a new relcacheinit file for `db2`.
   * **b.** We create another tserver cache entry (might be more than one, but just simplifying) with key containing `db2` oid.

### 5. Connect to `db2` for the 2nd time.
   * **a. With D28071:**
     - **i.** Relcache file is not built, since cache is not invalidated.
     - **ii.** We fetch the 3 tables (including `pg_database`) from master and match on the key in 3.a.ii. We do not hit master. We get back `pg_database` containing entries for `db1` but not `db2`.
     - **iii.** We fail later in `CheckMyDatabase`.
   * **b. Without the diff:**
     - **i.** We preload a bunch of tables for `db2`. We match on tserver cache entry from 4.b. We do not hit master.

By always including MyDatabaseId in the cache key, we avoid serving stale versions of shared relations to different databases.

**Upgrade/Rollback safety:**
Only PG to T-Server RPCs are changed.
Jira: DB-8163

Test Plan:
  # Connect to yugabyte
  # Connect to yugabyte
    # Create db1
  # Connect to db1
  # Connect to db1 <-- fails before this change with D28071, ysql_enable_read_request_caching=true

Reviewers: myang, dmitry

Reviewed By: dmitry

Subscribers: ybase, yql, tverona

Differential Revision: https://phorge.dev.yugabyte.com/D28945
premkumr pushed a commit that referenced this pull request Nov 15, 2023
…nnections

Summary:
This diff fixes two issues -
  - **PLAT-11176**: Previously, we were only passing YBA's PEM trust store from the custom CA trust store for `play.ws.ssl` TLS handshakes. Consequently, when we attempted to upload multiple CA certificates to YBA's trust store, it resulted in SSL handshake failures for the previously uploaded certificates. With this update, we have included YBA's Java trust store as well.

  - **PLAT-11170**: There was an issue with deletion of CA cert from YBA's trust store. Specifically, when we had uploaded one certificate chain and another certificate that only contained the root of the previously uploaded certificate chain, the deletion of the latter was failing. This issue has been resolved in this diff.

Test Plan:
**PLAT-11170**
  - Uploaded the root cert to YBA's trust store.
  - Created a certificate chain using the root certificate mentioned above and also uploaded it.
  - Verified that deletion of cert uploaded in #1 was successful.

**PLAT-11176**
  - Created HA setup with two standup portals.
  - Each portal is using it's own custom CA certs.
  - Uploaded both the cert chains to YBA's trust store.
  - Verified that the backup is successful on both the standby setups configured.

Reviewers: amalyshev

Reviewed By: amalyshev

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D29985
premkumr pushed a commit that referenced this pull request Nov 15, 2023
Summary:
One of the first tasks that are kicked off during an edit universe is for DiskResizing.
This change makes the createResizeDiskTask function idempotent.
It will only create the disk resize tasks if the size specified is different from the current
volume on the pod.

Test Plan:
Tested by making task abortable and retryable, retried the edit kubernetes task after aborting disk resize in the middle.

```
YW 2023-11-03T06:04:18.533Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from EditKubernetesUniverse in TaskPool-6 - Creating task for disk size change from 100 to 200
YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json
YW 2023-11-03T06:04:18.587Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out5549963527175565003tmp, stderr=/tmp/shell_process_err5819548201875528501tmp
YW 2023-11-03T06:04:19.095Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from ShellProcessHandler in TaskPool-6 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 508 ms ]
YW 2023-11-03T06:04:19.104Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Incrementing RF for us-west1-a to: 1
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from PlacementInfoUtil in TaskPool-6 - Number of nodes in us-west1-a: 1
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCheckVolumeExpansion
YW 2023-11-03T06:04:19.105Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCheckVolumeExpansion details= {"platformVersion":"2.21.0.0-PRE_RELEASE","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"newNamingStyle":true,"namespace":"yb-admin-test1","providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","helmReleaseName":"ybtest1-us-west1-a-twed"}
YW 2023-11-03T06:04:19.105Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #0: KubernetesVolumeInfo
YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from AbstractTaskBase in TaskPool-6 - Executor name: task
YW 2023-11-03T06:04:19.108Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:04:19.110Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"STS_DELETE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]}
YW 2023-11-03T06:04:19.111Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #1: ResizingDisk
YW 2023-11-03T06:04:19.113Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning
YW 2023-11-03T06:04:19.115Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:04:19.117Z [DEBUG] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Details for task #0: KubernetesCommandExecutor(347eb7be-88b5-44ed-b519-1052487e5ced) details= {"platformVersion":"2.21.0.0-PRE_RELEASE","sleepAfterMasterRestartMillis":180000,"sleepAfterTServerRestartMillis":180000,"nodeExporterUser":"prometheus","universeUUID":"347eb7be-88b5-44ed-b519-1052487e5ced","enableYbc":false,"installYbc":false,"ybcInstalled":false,"encryptionAtRestConfig":{"encryptionAtRestEnabled":false,"opType":"UNDEFINED","type":"DATA_KEY"},"communicationPorts":{"masterHttpPort":7000,"masterRpcPort":7100,"tserverHttpPort":9000,"tserverRpcPort":9100,"ybControllerHttpPort":14000,"ybControllerrRpcPort":18018,"redisServerHttpPort":11000,"redisServerRpcPort":6379,"yqlServerHttpPort":12000,"yqlServerRpcPort":9042,"ysqlServerHttpPort":13000,"ysqlServerRpcPort":5433,"nodeExporterPort":9300},"extraDependencies":{"installNodeExporter":true},"providerUUID":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","universeName":"test1","commandType":"PVC_EXPAND_SIZE","helmReleaseName":"ybtest1-us-west1-a-twed","namespace":"yb-admin-test1","isReadOnlyCluster":false,"ybSoftwareVersion":"2.19.3.0-b80","enableNodeToNodeEncrypt":true,"enableClientToNodeEncrypt":true,"serverType":"TSERVER","tserverPartition":0,"masterPartition":0,"newDiskSize":"200Gi","masterAddresses":"ybtest1-us-west1-a-twed-yb-master-0.ybtest1-us-west1-a-twed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-b-uwed-yb-master-0.ybtest1-us-west1-b-uwed-yb-masters.yb-admin-test1.svc.cluster.local:7100,ybtest1-us-west1-c-vwed-yb-master-0.ybtest1-us-west1-c-vwed-yb-masters.yb-admin-test1.svc.cluster.local:7100","placementInfo":{"cloudList":[{"uuid":"7ae205f4-95ee-4aa5-b2f5-edb9ce793554","code":"kubernetes","regionList":[{"uuid":"80f07c68-f739-45b5-a91a-e8f8f4b0fc6d","code":"us-west1","name":"Oregon","azList":[{"uuid":"42b3fd5a-2c30-48c5-9335-d71dc60a773f","name":"us-west1-a","replicationFactor":1,"numNodesInAZ":1,"isAffinitized":true}]}]}]},"updateStrategy":"RollingUpdate","config":{"KUBECONFIG_PULL_SECRET":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/anijhawan_quay_pull_secret","KUBECONFIG":"/opt/yugaware/keys/7ae205f4-95ee-4aa5-b2f5-edb9ce793554/kubeconfig-202301","STORAGE_CLASS":"yb-standard","KUBECONFIG_PROVIDER":"gke","KUBECONFIG_IMAGE_PULL_SECRET_NAME":"anijhawan-pull-secret","KUBECONFIG_IMAGE_REGISTRY":"quay.io/yugabyte/yugabyte-itest"},"azCode":"us-west1-a","targetXClusterConfigs":[],"sourceXClusterConfigs":[]}
YW 2023-11-03T06:04:19.119Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Adding SubTaskGroup #2: ResizingDisk
YW 2023-11-03T06:04:19.120Z [INFO] 9ec7f5dd-bdcd-4917-868e-2d7bf85e4f9e from TaskExecutor in TaskPool-6 - Setting subtask(ResizingDisk) group type to Provisioning
...
```
Verified disk size was increased

```
[centos@dev-server-anijhawan-4 managed]$ kubectl -n yb-admin-test1  get pvc ybtest1-us-west1-b-uwed-datadir0-ybtest1-us-west1-b-uwed-yb-tserver-0  ybtest1-us-west1-a-twed-datadir0-ybtest1-us-west1-a-twed-yb-tserver-0  ybtest1-us-west1-c-vwed-datadir0-ybtest1-us-west1-c-vwed-yb-tserver-0  -o yaml | grep storage
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi
      volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
      volume.kubernetes.io/storage-resizer: pd.csi.storage.gke.io
        storage: 200Gi
    storageClassName: yb-standard
      storage: 200Gi

```

Retry logs we can see function was invoke but task creation was skipped.

```
YW 2023-11-03T06:07:10.173Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TaskExecutor in TaskPool-7 - Invoking run() of task EditKubernetesUniverse(347eb7be-88b5-44ed-b519-1052487e5ced)
YW 2023-11-03T06:07:10.173Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from CustomerTaskController in application-akka.actor.default-dispatcher-2292 - Saved task uuid 66611664-a25f-4ad2-93aa-e40a7db67654 in customer tasks table for target 347eb7be-88b5-44ed-b519-1052487e5ced:test1
YW 2023-11-03T06:07:10.322Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from TransactionUtil in TaskPool-7 - Trying(1)...
YW 2023-11-03T06:07:10.333Z [DEBUG] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from UniverseTaskBase in TaskPool-7 - Cancelling any active health-checks for universe 347eb7be-88b5-44ed-b519-1052487e5ced
YW 2023-11-03T06:07:10.379Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from EditKubernetesUniverse in TaskPool-7 - Creating task for disk size change from 100 to 200
YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json
YW 2023-11-03T06:07:10.436Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed' '-o' 'json' - logging stdout=/tmp/shell_process_out15761747450556728945tmp, stderr=/tmp/shell_process_err16162390392062292532tmp
YW 2023-11-03T06:07:10.941Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-a-twed -o json' status=success [ 505 ms ]
YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json
YW 2023-11-03T06:07:10.982Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed' '-o' 'json' - logging stdout=/tmp/shell_process_out16328458040940971014tmp, stderr=/tmp/shell_process_err9595293916813332432tmp
YW 2023-11-03T06:07:11.487Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-b-uwed -o json' status=success [ 505 ms ]
YW 2023-11-03T06:07:11.526Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (abbrev cmd) - kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json
YW 2023-11-03T06:07:11.527Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Starting proc (full cmd) - 'kubectl' '--namespace' 'yb-admin-test1' 'get' 'pvc' '-l' 'app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed' '-o' 'json' - logging stdout=/tmp/shell_process_out11035907328384396246tmp, stderr=/tmp/shell_process_err3826067280996541352tmp
YW 2023-11-03T06:07:12.031Z [INFO] ab2e48ec-a204-4449-af99-dd1db5cb15d8 from ShellProcessHandler in TaskPool-7 - Completed proc 'kubectl --namespace yb-admin-test1 get pvc -l app.kubernetes.io/name=yb-tserver,release=ybtest1-us-west1-c-vwed -o json' status=success [ 505 ms ]
```

Reviewers: sanketh, nsingh, sneelakantan, dshubin

Reviewed By: sanketh, nsingh, dshubin

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D29938
premkumr pushed a commit that referenced this pull request Jan 12, 2024
…wuid function

Summary:
The are several unit tests which suffers from tsan data race warning with the following stack:

```
WARNING: ThreadSanitizer: data race (pid=38656)
  Read of size 8 at 0x7f6f2a44b038 by thread T21:
    #0 memcpy /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors_memintrinsics.inc:115:5 (pg_ddl_concurrency-test+0x9e197)
    #1 <null> <null> (libnss_sss.so.2+0x72ef) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b)
    #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9)
    yugabyte#3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7)
    yugabyte#4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7)
    yugabyte#5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe)
    yugabyte#6 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:647:20 (libpq.so.5+0x2c279)
    yugabyte#7 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${BUILD_ROOT}/../../src/yb/yql/pgwrapper/libpq_utils.cc:278:24 (libpq_utils.so+0x11d6b)
...

  Previous write of size 8 at 0x7f6f2a44b038 by thread T20 (mutexes: write M0):
    #0 mmap64 /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/../../sanitizer_common/sanitizer_common_interceptors.inc:7485:3 (pg_ddl_concurrency-test+0xda204)
    #1 <null> <null> (libnss_sss.so.2+0x7169) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b)
    #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9)
    yugabyte#3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7)
    yugabyte#4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7)
    yugabyte#5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe)
    yugabyte#6 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:647:20 (libpq.so.5+0x2c279)
    yugabyte#7 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${BUILD_ROOT}/../../src/yb/yql/pgwrapper/libpq_utils.cc:278:24 (libpq_utils.so+0x11d6b)
...

  Location is global '??' at 0x7f6f2a44b000 (passwd+0x38)

  Mutex M0 (0x7f6f2af29380) created at:
    #0 pthread_mutex_lock /opt/yb-build/llvm/yb-llvm-v17.0.2-yb-1-1696896765-6a83e4b2-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1339:3 (pg_ddl_concurrency-test+0xa464b)
    #1 <null> <null> (libnss_sss.so.2+0x70d6) (BuildId: a17afeaa37369696ec2457ab7a311139707fca9b)
    #2 pqGetpwuid ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/thread.c:99:9 (libpq.so.5+0x4a8c9)
    yugabyte#3 pqGetHomeDirectory ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:6674:9 (libpq.so.5+0x2d3c7)
    yugabyte#4 connectOptions2 ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:1150:8 (libpq.so.5+0x2d3c7)
    yugabyte#5 PQconnectStart ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:791:7 (libpq.so.5+0x2c2fe)
...
```

All failing tests has common feature - all of them creates connection to postgres from multiple threads at same time.
On creating new connection the `libpq` library calls the `getpwuid_r` standard function internally. This function is thread safe and tsan warning is not expected there.

Solution is to suppress warning in the `getpwuid_r` function.
**Note:** because there is no `getpwuid_r` function name in the tsan warning stack the warning for the caller function `pqGetpwuid` is suppressed.
Jira: DB-9523

Test Plan: Jenkins

Reviewers: sergei, bogdan

Reviewed By: sergei

Subscribers: yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D31646
premkumr pushed a commit that referenced this pull request Mar 14, 2024
Summary:
Restore YBC flow currently has preflight checks for:
1. DB version comparison
2. Autoflags check

This diff modifies #1 to check for version numbers greater (compare stable to stable, preview to preview, other combinations result in error).
Autoflags check remains the same.

Test Plan:
Manually test all existing flows work as usual.
Run UTs.
Run itests.

Reviewers: sanketh, vbansal

Reviewed By: vbansal

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D32944
premkumr pushed a commit that referenced this pull request Apr 19, 2024
… SST files only retained for CDC"

Summary:
D33131 introduced a segmentation fault which was  identified in multiple tests.
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x00007f4d2b6f3a84 libpthread.so.0`__pthread_mutex_lock + 4
    frame #1: 0x000055d6d1e1190b yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>) const [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock[abi:v170002](this=0x00007f4ccb6feaa0, __m=0x0000000000000110) at unique_lock.h:41:11
    frame #2: 0x000055d6d1e118f5 yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(this=0x00000000000000f0, min_allowed=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4ccb6feb08) const at mvcc.cc:500:32
    frame yugabyte#3: 0x000055d6d1ef58e3 yb-tserver`yb::tablet::TransactionParticipant::Impl::ProcessRemoveQueueUnlocked(this=0x000037e27d26fb00, min_running_notifier=0x00007f4ccb6fef28) at transaction_participant.cc:1537:45
    frame yugabyte#4: 0x000055d6d1efc11a yb-tserver`yb::tablet::TransactionParticipant::Impl::EnqueueRemoveUnlocked(this=0x000037e27d26fb00, id=<unavailable>, reason=<unavailable>, min_running_notifier=0x00007f4ccb6fef28, expected_deadlock_status=<unavailable>) at transaction_participant.cc:1516:5
    frame yugabyte#5: 0x000055d6d1e3afbe yb-tserver`yb::tablet::RunningTransaction::DoStatusReceived(this=0x000037e2679b5218, status_tablet="d5922c26c9704f298d6812aff8f615f6", status=<unavailable>, response=<unavailable>, serial_no=56986, shared_self=std::__1::shared_ptr<yb::tablet::RunningTransaction>::element_type @ 0x000037e2679b5218) at running_transaction.cc:424:16
    frame yugabyte#6: 0x000055d6d0d7db5f yb-tserver`yb::client::(anonymous namespace)::TransactionRpcBase::Finished(this=0x000037e29c80b420, status=<unavailable>) at transaction_rpc.cc:67:7
```
This diff reverts the change to unblock the tests.

The proper fix for this problem is WIP
Jira: DB-10780, DB-10466

Test Plan: Jenkins: urgent

Reviewers: rthallam

Reviewed By: rthallam

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34245
ddhodge pushed a commit that referenced this pull request May 19, 2024
Summary:
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                #2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                yugabyte#3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
#/bin/bash
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

# Count the number of dump files to avoid having to use `sudo coredumpctl`
idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: ybase, smishra, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35116
premkumr pushed a commit that referenced this pull request May 28, 2024
… SST files only retained for CDC"

Summary:
D33131 introduced a segmentation fault which was  identified in multiple tests.
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x00007f4d2b6f3a84 libpthread.so.0`__pthread_mutex_lock + 4
    frame #1: 0x000055d6d1e1190b yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(yb::HybridTime, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>>) const [inlined] std::__1::unique_lock<std::__1::mutex>::unique_lock[abi:v170002](this=0x00007f4ccb6feaa0, __m=0x0000000000000110) at unique_lock.h:41:11
    frame #2: 0x000055d6d1e118f5 yb-tserver`yb::tablet::MvccManager::SafeTimeForFollower(this=0x00000000000000f0, min_allowed=<unavailable>, deadline=yb::CoarseTimePoint @ 0x00007f4ccb6feb08) const at mvcc.cc:500:32
    frame yugabyte#3: 0x000055d6d1ef58e3 yb-tserver`yb::tablet::TransactionParticipant::Impl::ProcessRemoveQueueUnlocked(this=0x000037e27d26fb00, min_running_notifier=0x00007f4ccb6fef28) at transaction_participant.cc:1537:45
    frame yugabyte#4: 0x000055d6d1efc11a yb-tserver`yb::tablet::TransactionParticipant::Impl::EnqueueRemoveUnlocked(this=0x000037e27d26fb00, id=<unavailable>, reason=<unavailable>, min_running_notifier=0x00007f4ccb6fef28, expected_deadlock_status=<unavailable>) at transaction_participant.cc:1516:5
    frame yugabyte#5: 0x000055d6d1e3afbe yb-tserver`yb::tablet::RunningTransaction::DoStatusReceived(this=0x000037e2679b5218, status_tablet="d5922c26c9704f298d6812aff8f615f6", status=<unavailable>, response=<unavailable>, serial_no=56986, shared_self=std::__1::shared_ptr<yb::tablet::RunningTransaction>::element_type @ 0x000037e2679b5218) at running_transaction.cc:424:16
    frame yugabyte#6: 0x000055d6d0d7db5f yb-tserver`yb::client::(anonymous namespace)::TransactionRpcBase::Finished(this=0x000037e29c80b420, status=<unavailable>) at transaction_rpc.cc:67:7
```
This diff reverts the change to unblock the tests.

The proper fix for this problem is WIP
Jira: DB-10780, DB-10466

Test Plan: Jenkins: urgent

Reviewers: rthallam

Reviewed By: rthallam

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D34245
premkumr pushed a commit that referenced this pull request May 28, 2024
Summary:
The YSQL webserver has occasionally produced coredumps of the following form upon receiving a termination signal from postmaster.
```
                #0  0x00007fbac35a9ae3 _ZNKSt3__112__hash_tableINS_17__hash_value_typeINS_12basic_string <snip>
                #1  0x00007fbac005485d _ZNKSt3__113unordered_mapINS_12basic_string <snip> (libyb_util.so)
                #2  0x00007fbac0053180 _ZN2yb16PrometheusWriter16WriteSingleEntryERKNSt3__113unordered_mapINS1_12basic_string <snip>
                yugabyte#3  0x00007fbab21ff1eb _ZN2yb6pggateL26PgPrometheusMetricsHandlerERKNS_19WebCallbackRegistry10WebRequestEPNS1_11WebResponseE (libyb_pggate_webserver.so)
                ....
                ....
```

The coredump indicates corruption of a namespace-scoped variable of type unordered_map while attempting to serve a request after a termination signal has been received.
The current code causes the webserver (postgres background worker) to call postgres' `proc_exit()` which consequently calls `exit()`.

According to the [[ https://en.cppreference.com/w/cpp/utility/program/exit | C++ standard ]], a limited amount of cleanup is performed on exit():
 - Notably destructors of variables with automatic storage duration are not invoked. This implies that the webserver's destructor is not called, and therefore the server is not stopped.
 - Namespace-scoped variables have [[ https://en.cppreference.com/w/cpp/language/storage_duration | static storage duration ]].
 - Objects with static storage duration are destroyed.
 - This leads to a possibility of undefined behavior where the webserver may continue running for a short duration of time, while the static variables used to serve requests may have been GC'ed.

This revision explicitly stops the webserver upon receiving a termination signal, by calling its destructor.
It also adds logic to the handlers to return a `503 SERVICE_UNAVAILABLE` once termination has been initiated.
Jira: DB-7796

Test Plan:
To test this manually, use a HTTP load generation tool like locust to bombard the YSQL Webserver with requests to an endpoint like `<address>:13000/prometheus-metrics`.
On a standard devserver, I configured locust to use 30 simultaneous users (30 requests per second) to reproduce the issue.

The following bash script can be used to detect the coredumps:
```
#/bin/bash
ITERATIONS=50
YBDB_PATH=/path/to/code/yugabyte-db

# Count the number of dump files to avoid having to use `sudo coredumpctl`
idumps=$(ls /var/lib/systemd/coredump/ | wc -l)
for ((i = 0 ; i < $ITERATIONS ; i++ ))
do
        echo "Iteration: $(($i + 1))";
        $YBDB_PATH/bin/yb-ctl restart > /dev/null

        nservers=$(netstat -nlpt 2> /dev/null | grep 13000 | wc -l)
        if (( nservers != 1)); then
                echo "Web server has not come up. Exiting"
                exit 1;
        fi

        sleep 5s

        # Kill the webserver
        pkill -TERM -f 'YSQL webserver'

        # Count the number of coredumps
        # Please validate that the coredump produced is that of postgres/webserver
        ndumps=$(ls /var/lib/systemd/coredump/ | wc -l)
        if (( ndumps > idumps  )); then
                echo "Core dumps: $(($ndumps - $idumps))"
        else
                echo "No new core dumps found"
        fi
done
```

Run the script with the load generation tool running against the webserver in the background.
 - Without the fix in this revision, the above script produced 8 postgres/webserver core dumps in 50 iterations.
 - With the fix, no coredumps were observed.

Reviewers: telgersma, fizaa

Reviewed By: telgersma

Subscribers: ybase, smishra, yql

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D35116
premkumr pushed a commit that referenced this pull request Aug 9, 2024
…ugabyte#23065)

* initial commit for logical replication docs

* title changes

* changes to view table

* fixed line break

* fixed line break

* added content for delete and update

* added more content

* replaced hyperlink todos with reminders

* added snapshot metrics

* added more content

* added more config properties to docs

* added more config properties to docs

* added more config properties to docs

* replaced postgresql instances with yugabytedb

* added properties

* added complete properties

* changed postgresql to yugabytedb

* added example for all record types

* fixed highlighting of table header

* added type representations

* added type representations

* full content in now;

* full content in now;

* changed postgres references appropriately

* added a missing keyword

* changed name

* self review comments

* self review comments

* added section for logical replication

* added section for logical replication

* modified content for monitor page

* added content for monitoring

* rebased to master;

* CDC logical replication overview (yugabyte#3)


Co-authored-by: Vaibhav Kushwaha <[email protected]>

* advanced-topic (yugabyte#5)


Co-authored-by: Vaibhav Kushwaha <[email protected]>

* removed references to incremental and ad-hoc snapshots

* replaced index page with an empty one

* addressed review comments

* added getting started section

* added section for get started

* self review comments

* self review comments

* group review comments

* added hstore and domain type docs

* Advance configurations for CDC using logical replication (#2)

* Fix overview section (yugabyte#7)

* Monitor section (yugabyte#4)


Co-authored-by: Vaibhav Kushwaha <[email protected]>

* Initial Snapshot content (yugabyte#6)

* Add getting started (#1)

* Fix for broken note (yugabyte#9)

* Fix the issue yaml parsing

Summary:
Fixes the issue yaml parsing. We changed the formatting for yaml list. This diff fixes the
usage for the same.

Test Plan:
Prepared alma9 node using ynp.
Verified universe creation.

Reviewers: vbansal, asharma

Reviewed By: asharma

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D36711

* [PLAT-14534]Add regex match for GCP Instance template

Summary:
Added regex match for gcp instance template.
Regex taken from gcp documentation [[https://cloud.google.com/compute/docs/reference/rest/v1/instanceTemplates | here]].

Test Plan: Tested manually that validation fails with invalid characters.

Reviewers: #yba-api-review!, svarshney

Reviewed By: svarshney

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D36543

* update diagram (yugabyte#23245)

* [/PLAT-14708] Fix JSON field name in TaskInfo query

Summary: This was missed when task params were moved out from details field.

Test Plan: Trivial - existing tests should succeed.

Reviewers: vbansal, cwang

Reviewed By: vbansal

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D36705

* [yugabyte#23173] DocDB: Allow large bytes to be passed to RateLimiter

Summary:
RateLimiter has a debug assert that you cannot `Request` more than `GetSingleBurstBytes`. In release mode we do not perform this check and any call gets stuck forever. This change allows large bytes to be requested on RateLimiter. It does so by breaking requests larger than `GetSingleBurstBytes` into multiple smaller requests.

This change is a temporary fix to allow xCluster to operate without any issues. RocksDB RateLimiter has multiple enhancements over the years that would help avoid this and more starvation issues. Ex: facebook/rocksdb@cb2476a. We should consider pulling in those changes.

Fixes yugabyte#23173
Jira: DB-12112

Test Plan: RateLimiterTest.LargeRequests

Reviewers: slingam

Reviewed By: slingam

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D36703

* [yugabyte#23179] CDCSDK: Support data types with dynamically alloted oids in CDC

Summary:
This diff adds support for data types with dynamically alloted oids in CDC (for ex: hstore, enum array, etc). Such types contain invalid pg_type_oid for the corresponding columns in docdb schema.

In the current implemtation, in `ybc_pggate`, while decoding the cdc records we look at the `type_map_` to obtain YBCPgTypeEntity, which is then used for decoding. However the `type_map_` does not contain any entries for the data types with dynamically alloted oids. As a result, this causes segmentation fault. To prevent such crashes, CDC prevents addition of tables with such columns to the stream.

This diff removes the filtering logic and adds the tables to the stream even if it has such a type column. A function pointer will now be passed to `YBCPgGetCDCConsistentChanges`, which takes attribute number and the table_oid and returns the appropriate type entity by querying the `pg_type` catalog table. While decoding if a column is encountered with invalid pg_type_oid then, the passed function is invoked and type entity is obtained for decoding.

**Upgrade/Rollback safety:**
This diff adds a field `optional int32 attr_num` to DatumMessagePB. These changes are protected by the autoflag `ysql_yb_enable_replication_slot_consumption` which already exists but has not yet been released.
Jira: DB-12118

Test Plan:
Jenkins: urgent

All the existing cdc tests

./yb_build.sh --java-test 'org.yb.pgsql.TestPgReplicationSlot#replicationConnectionConsumptionAllDataTypesWithYbOutput'

Reviewers: skumar, stiwary, asrinivasan, dmitry

Reviewed By: stiwary, dmitry

Subscribers: steve.varnau, skarri, yql, ybase, ycdcxcluster

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D36689

* [PLAT-14710] Do not return apiToken in response to getSessionInfo

Summary:
**Context**
The GET /session_info YBA API returns:
{
    "authToken": "…",
    "apiToken": "….",
    "apiTokenVersion": "….",
    "customerUUID": "uuid1",
    "userUUID": "useruuid1"
}

The apiToken and apiTokenVersion is supposed to be the last generated token that is valid. We had the following sequence of changes to this API.

https://yugabyte.atlassian.net/browse/PLAT-8028 - Do not store YBA token in YBA.

After the above fix, YBA does not store the apiToken anymore. So it cannot return it as part of the /session_info. The change for this ticket returned the hashed apiToken instead.

https://yugabyte.atlassian.net/browse/PLAT-14672 - getSessionInfo should generate and return api key in response

Since the hashed apiToken value is not useful to any client, and it broke YBM create cluster (https://yugabyte.atlassian.net/browse/CLOUDGA-22117), the first change for this ticket returned a new apiToken instead.

Note that GET /session_info is meant to get customer and user information for the currently authenticated session. This is useful for automation starting off an authenticated session from an existing/cached API token. It is not necessary for the /session_info API to return the authToken and apiToken. The client already has one of authToken or apiToken with which it invoked /session_info API. In fact generating a new apiToken whenever /session_info is called will invalidate the previous apiToken which would not be expected by the client. There is a different API /api_token to regenerate the apiToken explicitly.

**Fix in this change**
So the right behaviour is for /session_info to stop sending the apiToken in the response. In fact, the current behaviour of generating a new apiToken everytime will break a client (for example node-agent usage of /session_info here (https://github.com/yugabyte/yugabyte-db/blob/4ca56cfe27d1cae64e0e61a1bde22406e003ec04/managed/node-agent/app/server/handler.go#L19).

**Client impact of not returning apiToken in response of /session_info**

This should not impact any normal client that was using /session_info only to get the user uuid and customer uuid.

However, there might be a few clients (like YBM for example) that invoked /session_info to get the last generated apiToken from YBA. Unfortunately, this was a mis-use of this API. YBA generates the apiToken in response to a few entry point APIs like /register, /api_login and /api_token. The apiToken is long lived. YBA could choose to expire these apiTokens after a fixed amount of (long) time, but for now there is no expiration. The clients are expected to store the apiToken at their end and use the token to reestablish a session with YBA whenever needed. After establishinig a new session, clients would call GET /session_info to get the user uuid and customer uuid. This is getting fixed in YBM with https://yugabyte.atlassian.net/browse/CLOUDGA-22117. So this PLAT change should be taken up by YBM only after CLOUDGA-22117 is fixed.

Test Plan:
* Manually verified that session_info does not return authToken
* Shubham verified that node-agent works with this fix. Thanks Shubham!

Reviewers: svarshney, dkumar, tbedi, #yba-api-review!

Reviewed By: svarshney

Subscribers: yugaware

Differential Revision: https://phorge.dev.yugabyte.com/D36712

* [docs] updates to CVE table status column (yugabyte#23225)

* updates to status column

* review comment

* format

---------

Co-authored-by: Dwight Hodge <[email protected]>

* [docs] Fix load balance keyword in drivers page (yugabyte#23253)

[docs] Fix `load_balance` -> `load-balance` in jdbc driver
[docs] Fix `load_balance` -> `loadBalance` in nodejs driver

* fixed compilation

* fix link, format

* format, links

* links, format

* format

* format

* minor edit

* best practice (yugabyte#8)

* moved sections

* moved pages

* added key concepts page

* added link to getting started

* Dynamic table doc changes (yugabyte#11)

* icons

* added box for lead link

* revert ybclient change

* revert accidental change

* revert accidental change

* revert accidental change

* fix link block for getting started page

* format

* minor edit

* links, format

* format

* links

* format

* remove reminder references

* Modified output plugin docs (yugabyte#12)

* Naming edits

* format

* review comments

* diagram

* review comment

* fix links

* format

* format

* link

* review comments

* copy to stable

* link

---------

Co-authored-by: siddharth2411 <[email protected]>
Co-authored-by: Shubham <[email protected]>
Co-authored-by: asharma-yb <[email protected]>
Co-authored-by: Dwight Hodge <[email protected]>
Co-authored-by: Naorem Khogendro Singh <[email protected]>
Co-authored-by: Hari Krishna Sunder <[email protected]>
Co-authored-by: Sumukh-Phalgaonkar <[email protected]>
Co-authored-by: Subramanian Neelakantan <[email protected]>
Co-authored-by: Aishwarya Chakravarthy <[email protected]>
Co-authored-by: Dwight Hodge <[email protected]>
Co-authored-by: ddorian <[email protected]>
Co-authored-by: Sumukh-Phalgaonkar <[email protected]>
premkumr pushed a commit that referenced this pull request Aug 9, 2024
…ount backward scans improvement

Summary:
The change updates cost based optimizer to take backward scan improvements into account, so that
backward scans are picked instead of the forward scan+sort when fast backward scan feautuer is
enabled via `FLAGS_use_fast_backward_scan`.

Results for TAQO run first. The first 4 columns are the values for 'Best Execution Plan Picked',
the last 4 columns are for number of queries with backward scans in execution plan. Cost based
optimizer is turned on for 'Master' and 'D36614'.

| Model                        | Master | D36614 | PG     | Num queries | Improved | Degraded | Plan changed
| ---------------------------- | ------ | ------ | ------ | ----------- | -------- | -------- | ------------
| basic                        |  91.04 |  91.04 |  96.64 |           0 |        0 |        0 |        0
| complex                      |  85.42 |  84.38 |  86.46 |           3 |        2 |        1 |        1
| cost-validation-joins        |  78.46 |  79.79 |  95.48 |          23 |       23 |        0 |        0
| cost-validation-misc         |  94.43 |   95.3 |  92.86 |          62 |       62 |        0 |        6
| cost-validation-single-table |  96.09 |  96.92 |   98.7 |           0 |        0 |        0 |        0
| join-order-benchmark         |  66.37 |   64.6 |  42.48 |           0 |        0 |        0 |        0
| subqueries                   |     80 |  86.67 |     80 |           0 |        0 |        0 |        0
| more-subqueries              |  77.94 |  76.47 |    100 |           1 |        1 |        0 |        0
| seek-next-estimation         |    100 |    100 |  96.88 |           0 |        0 |        0 |        0
| tpch                         |  72.73 |  68.18 |  72.73 |           0 |        0 |        0 |        0
| tuning_tests                 |  93.63 |  94.12 |  99.06 |          10 |       10 |        0 |        0

Some queries results by model (queries with backward scans):
| complex                          | Master (default) | Master (best) | D36614 (default) | D36614 (best)  | Comment
| -------------------------------- | ---------------- | ------------- | ---------------- | -------------- | ------------
| 59ebf1c77e58cb2291c35486e2f96137 |                  |               |                  |                | plan changed
| Estimated cost                   |          2285.29 |       2285.29 |          1438.33 | 20000002970.96 |
| Execution time                   |             8.20 |          8.20 |           299.57 |          10.46 |
|                                  |                  |               |                  |                |
| 4039d9f16fb8f4ed263f48d5f5232215 |                  |               |                  |                |
| Estimated cost                   |         18222.67 |      18222.67 |         14353.61 |       14353.61 |
| Execution time                   |            22.04 |         22.04 |            13.19 |          13.19 |
|                                  |                  |               |                  |                |
| 5c918663b34f55e514fc6e6edc046556 |                  |               |                  |                |
| Estimated cost                   |         27084.99 |      27084.99 |         23715.02 |       23715.02 |
| Execution time                   |            35.53 |         35.53 |            27.63 |          27.63 |
|                                  |                  |               |                  |                |
| cost-validation-joins            | Master (default) | Master (best) | D36614 (default) | D36614 (best)  | Comment
| -------------------------------- | ---------------- | ------------- | ---------------- | -------------- | ------------
| c8327c54b05e0781b1e095fefe6e314e |                  |               |                  |                |
| Estimated cost                   |           387.39 |        387.39 |           384.04 |         384.04 |
| Execution time                   |              6.3 |          6.30 |            	3.16 |           3.16 |
|                                  |                  |               |                  |                |
| 4ce5afcdad024545f07198bd75eb5312 |                  |               |                  |                |
| Estimated cost                   |           361.45 |        384.96 |           361.41 |         604.03 |
| Execution time                   |           153.51 |          5.99 |            137.8 |           2.41 |
|                                  |                  |               |                  |                |
| 01c77bebc5d6fe9d1444170d92a14ec1 |                  |               |                  |                |
| Estimated cost                   |           387.77 |        387.77 |           384.44 |         384.44 |
| Execution time                   |             6.34 |          6.34 |              3.3 |           3.30 |
|                                  |                  |               |                  |                |
| 3ca0ff16775634091419ecf365bbfcf5 |                  |               |                  |                |
| Estimated cost                   |           362.67 |        386.53 |           362.34 |         389.85 |
| Execution time                   |            21.43 |          6.09 |            16.82 |           2.47 |
|                                  |                  |               |                  |                |
| e4e31014bb2b4b3ab9478f7200516310 |                  |               |                  |                |
| Estimated cost                   |           374.17 |        386.53 |           373.83 |         383.19 |
| Execution time                   |            75.73 |          5.97 |            46.86 |           3.01 |
|                                  |                  |               |                  |                |
| 7a993deaf6d49e2b053987240fe78c02 |                  |               |                  |                |
| Estimated cost                   |           362.68 |        362.68 |           362.34 |         362.34 |
| Execution time                   |            10.21 |         10.21 |              6.7 |           6.70 |
|                                  |                  |               |                  |                |
| cost-validation-misc             | Master (default) | Master (best) | D36614 (default) | D36614 (best)  | Comment
| -------------------------------- | ---------------- | ------------- | ---------------- | -------------- | ------------
| e8b548c59a52976c30bde07e5576fef3 |                  |               |                  |                | best plan changed
| Estimated cost                   |           379.01 |   10000000674 |           372.27 |         372.27 |
| Execution time                   |                1 |          0.73 |             0.65 |           0.65 |
|                                  |                  |               |                  |                |
| 7a353cc0498dfa2d44a441b37c5a1be2 |                  |               |                  |                |
| Estimated cost                   |          13766.3 |       13766.3 |         10364.91 |       10364.91 |
| Execution time                   |            13.91 |         13.91 |             7.62 |           7.62 |
|                                  |                  |               |                  |                |
| 45241b1e1c945e2d13ba488e42fa9f5f |                  |               |                  |                | plan changed
| Estimated cost                   |          1138.17 |       1327.55 |           956.81 |         956.81 |
| Execution time                   |             1.36 |          0.88 |             0.64 |           0.64 |
|                                  |                  |               |                  |                |
| df22d138bac1990b9a2b664478be3940 |                  |               |                  |                | best plan changed
| Estimated cost                   |         81398.55 |     133322.29 |         81398.55 |       99993.43 |
| Execution time                   |            92.79 |         14.05 |             80.1 |           7.41 |
|                                  |                  |               |                  |                |
| bd56ddf3c3edcbb4daed5e90222f3f43 |                  |               |                  |                | plan changed
| Estimated cost                   |          1156.06 |       1214.59 |           880.92 |         880.92 |
| Execution time                   |             1.84 |          0.93 |              0.7 |           0.70 |
|                                  |                  |               |                  |                |
| c2b94854c7334aaeab12fb6a7ad90bbe |                  |               |                  |                | best plan changed
| Estimated cost                   |          6621.15 |      16245.17 |          6621.15 |       11662.79 |
| Execution time                   |              8.3 |          1.06 |             7.53 |           0.87 |
|                                  |                  |               |                  |                |
| 8c35b5522e8da2fbb14154afbb949c17 |                  |               |                  |                | best plan changed
| Estimated cost                   |         61271.63 |     249323.79 |         61271.63 |      185166.06 |
| Execution time                   |            77.92 |          2.98 |            70.86 |           1.89 |
|                                  |                  |               |                  |                |
| 3e17a5426c1240a7cb15413483c6857b |                  |               |                  |                | best plan changed
| Estimated cost                   |         83146.63 |     286833.39 |         83146.63 |      200156.46 |
| Execution time                   |           102.32 |          3.01 |            89.26 |           1.91 |
|                                  |                  |               |                  |                |
| more-subqueries                  | Master (default) | Master (best) | D36614 (default) | D36614 (best)  | Comment
| -------------------------------- | ---------------- | ------------- | ---------------- | -------------- | ------------
| df720fdc87e9d33aa1006ede6868310f |                  |               |                  |                |
| Estimated cost                   |        432396.84 |     432396.84 |        389020.18 |      389020.18 |
| Execution time                   |           268.49 |        268.49 |           220.67 |         220.67 |
|                                  |                  |               |                  |                |
| tuning_tests                     | Master (default) | Master (best) | D36614 (default) | D36614 (best)  | Comment
| -------------------------------- | ---------------- | ------------- | ---------------- | -------------- | ------------
| a7a1a762f96b9445990d3057904a8f9b |                  |               |                  |                |
| Estimated cost                   |        102825.88 |     102825.88 |         56113.48 |       56113.48 |
| Execution time                   |            70.95 |         70.95 |            38.41 |          38.41 |
|                                  |                  |               |                  |                |
| 13026bfac847da1ec9f13fdaad5c39cf |                  |               |                  |                |
| Estimated cost                   |        117489.58 |     117489.58 |         64103.98 |       64103.98 |
| Execution time                   |            79.83 |         79.83 |            42.35 |          42.35 |
|                                  |                  |               |                  |                |
| 90e1ac9dfdbd77fa5ad96b1fe3526a41 |                  |               |                  |                |
| Estimated cost                   |        132153.27 |     132153.27 |         72094.47 |       72094.47 |
| Execution time                   |            90.19 |         90.19 |            46.56 |          46.56 |
|                                  |                  |               |                  |                |
| 90c2ad6894acac53c20efdd886e0fc03 |                  |               |                  |                |
| Estimated cost                   |        146816.97 |     146816.97 |         80084.97 |       80084.97 |
| Execution time                   |           101.19 |        101.19 |            52.69 |          52.69 |

The full report: https://taqo.dev.yugabyte.com/regression/33

Most of the queries with backward scans improved their time of execution. However, there's one
query 59ebf1c77e58cb2291c35486e2f96137 which shows a regression. From other tests it is clearly
seen that new approach gives a good improvement for backward scans, which may mean some other parts
of Cost Based Optimizer may have been tweaked additionally (like costs for seeks, next, etc). This
action requires additional analysis and will be covered by a separate ticket.
Query link: https://taqo.dev.yugabyte.com/reports/b5e885c8e491050e70320e4b801469b0/20240719-115328/tags/distinct.html#59ebf1c77e58cb2291c35486e2f96137

Jira: DB-11271

Test Plan:
Test case #1 (backward scan improvements are turned off).
1. Start a cluster: `./bin/yb-ctl start --rf=1`
2. Open `ysqlsh`
3. Create a table with some data:
`# CREATE TABLE ttable(h INT, r INT, c INT, PRIMARY KEY(h, r ASC));`
`# INSERT INTO ttable SELECT i, i, i FROM generate_series(1, 10) AS i;`
4. Turn CBO on: `# SET yb_enable_base_scans_cost_model TO true;`
5. Run a query `# EXPLAIN ANALYZE SELECT c, r FROM ttable WHERE h = 1 ORDER BY r DESC;`
6. Result:
```
 Sort  (cost=555.50..555.51 rows=5 width=8) (actual time=0.706..0.706 rows=1 loops=1)
   Sort Key: r DESC
   Sort Method: quicksort  Memory: 25kB
   ->  Index Scan using ttable_pkey on ttable  (cost=180.00..555.44 rows=5 width=8) (actual time=0.674..0.677 rows=1 loops=1)
         Index Cond: (h = 1)
 Planning Time: 6.147 ms
 Execution Time: 0.776 ms
 Peak Memory Usage: 60 kB
(8 rows)
```
It is expected to have Forward Scan + Sort in case of fast backward scan is turned off.

Test case #2 (backward scan improvements are turned on).
1. Start a cluster: `./bin/yb-ctl start --rf=1 --tserver_flags=allowed_preview_flags_csv=use_fast_backward_scan,use_fast_backward_scan=true`
2. Open `ysqlsh`
3. Create a table with some data:
`# CREATE TABLE ttable(h INT, r INT, c INT, PRIMARY KEY(h, r ASC));`
`# INSERT INTO ttable SELECT i, i, i FROM generate_series(1, 10) AS i;`
4. Turn CBO on: `# SET yb_enable_base_scans_cost_model TO true;`
5. Run a query `# EXPLAIN ANALYZE SELECT c, r FROM ttable WHERE h = 1 ORDER BY r DESC;`
6. Result:
```
 Index Scan Backward using ttable_pkey on ttable  (cost=180.00..557.77 rows=5 width=8) (actual time=1.075..1.079 rows=1 loops=1)
   Index Cond: (h = 1)
 Planning Time: 0.073 ms
 Execution Time: 1.129 ms
 Peak Memory Usage: 24 kB
(5 rows)
```
It is seen that CBO takes backward scan improvements into account and the planner prefers Index Scan Backward over Forward Scan + Sort.

Reviewers: rthallam, gkukreja, amartsinchyk

Reviewed By: rthallam, gkukreja, amartsinchyk

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D36614
premkumr pushed a commit that referenced this pull request Aug 9, 2024
…build

Summary:
The DDL atomicity stress tests failed more on pg15 branch with an error like:

```
WARNING: ThreadSanitizer: data race (pid=180911)
  Write of size 8 at 0x7b2c000257b8 by thread T17 (mutexes: write M0):
    #0 profile_open_file prof_file.c (libkrb5.so.3+0xf45b3)
    #1 profile_init_flags <null> (libkrb5.so.3+0xfb056)
    #2 k5_os_init_context <null> (libkrb5.so.3+0xe5546)
    yugabyte#3 krb5_init_context_profile <null> (libkrb5.so.3+0xabc90)
    yugabyte#4 krb5_init_context <null> (libkrb5.so.3+0xabbd5)
    yugabyte#5 krb5_gss_init_context init_sec_context.c (libgssapi_krb5.so.2+0x448da)
    yugabyte#6 acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39159)
    yugabyte#7 krb5_gss_acquire_cred_from acquire_cred.c (libgssapi_krb5.so.2+0x39072)
    yugabyte#8 gss_add_cred_from <null> (libgssapi_krb5.so.2+0x1fcd3)
    yugabyte#9 gss_acquire_cred_from <null> (libgssapi_krb5.so.2+0x1f69d)
    yugabyte#10 gss_acquire_cred <null> (libgssapi_krb5.so.2+0x1f431)
    yugabyte#11 pg_GSS_have_cred_cache ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-gssapi-common.c:68:10 (libpq.so.5+0x543fe)
    yugabyte#12 PQconnectPoll ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2909:22 (libpq.so.5+0x359ca)
    yugabyte#13 connectDBComplete ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:2241:10 (libpq.so.5+0x30807)
    yugabyte#14 PQconnectdb ${YB_SRC_ROOT}/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-connect.c:719:10 (libpq.so.5+0x30af1)
    yugabyte#15 yb::pgwrapper::PGConn::Connect(string const&, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:348:24 (libpq_utils.so+0x13c5b)
    yugabyte#16 yb::pgwrapper::PGConn::Connect(string const&, bool, string const&) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.h:254:12 (libpq_utils.so+0x1a77e)
    yugabyte#17 yb::pgwrapper::PGConnBuilder::Connect(bool) const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_utils.cc:743:10 (libpq_utils.so+0x1a77e)
    yugabyte#18 yb::pgwrapper::LibPqTestBase::ConnectToDBAsUser(string const&, string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:54:6 (libpg_wrapper_test_base.so+0x26f34)
    yugabyte#19 yb::pgwrapper::LibPqTestBase::ConnectToDB(string const&, bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:44:10 (libpg_wrapper_test_base.so+0x26b1e)
    yugabyte#20 yb::pgwrapper::LibPqTestBase::Connect(bool) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/libpq_test_base.cc:40:10 (libpg_wrapper_test_base.so+0x26b1e)
    yugabyte#21 yb::pgwrapper::PgDdlAtomicityStressTest::Connect() ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:147:25 (pg_ddl_atomicity_stress-test+0x136d6c)
    yugabyte#22 yb::pgwrapper::PgDdlAtomicityStressTest::TestDdl(std::vector<string, std::allocator<string>> const&, int) ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:165:15 (pg_ddl_atomicity_stress-test+0x136df5)
    yugabyte#23 yb::pgwrapper::PgDdlAtomicityStressTest_StressTest_Test::TestBody()::$_2::operator()() const ${YB_SRC_ROOT}/src/yb/yql/pgwrapper/pg_ddl_atomicity_stress-test.cc:316:5 (pg_ddl_atomicity_stress-test+0x13d2eb)
```

It appears that the function `yb::pgwrapper::LibPqTestBase::Connect` isn't
thread safe. I restructured the code to make the connections in a single thread
and then pass them to various concurrent threads for testing.
Jira: DB-2996

Test Plan:
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/0 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/1 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/2 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/3 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/4 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/5 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/6 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/7 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/8 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/9 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/10 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/11 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/12 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/13 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/14 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/15 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/16 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/17 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/18 --clang17
./yb_build.sh tsan --cxx-test pgwrapper_pg_ddl_atomicity_stress-test --gtest_filter PgDdlAtomicityStressTest/PgDdlAtomicityStressTest.StressTest/19 --clang17

Verified that no more tsan errors.

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D37111
aishwarya24 pushed a commit that referenced this pull request Sep 5, 2024
…ng the lock

Summary:
Call callback in ScopeExit block only. Not while holding the lock.

Without this fix, it is possible that a thread can get into a deadlock, trying to request a shared_lock on a mutex, while already holding an exclusive lock on the same mutex:

This deadlock can be triggered if there are active read/write requests to a Table (from more than 1 thread)  right after the table had a tablet-split.

 If there is only 1 thread, it is unlikely to run into the deadlock, as the thread notices -- as part of the callback -- that the table's partition info is stale. Having a different thread refresh the partition version before the main thread checks if the table version is stale, is likely necessary to trigger the stack trace seen below.

e.g:
```
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00005640c3eb441b in std::__1::shared_timed_mutex::lock_shared() ()
#2  0x00005640c3ffcbff in yb::client::internal::MetaCache::LookupTabletByKey(std::__1::shared_ptr<yb::client::YBTable> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, std::__1::function<void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>, yb::StronglyTypedBool<yb::client::internal::FailOnPartitionListRefreshed_Tag>) ()
yugabyte#3  0x00005640c3f7549a in yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*) ()
yugabyte#4  0x00005640c401855e in yb::client::(anonymous namespace)::FlushBatcherAsync(std::__1::shared_ptr<yb::client::internal::Batcher> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig, yb::StronglyTypedBool<yb::client::internal::IsWithinTransactionRetry_Tag>) ()
yugabyte#5  0x00005640c401aa76 in yb::client::(anonymous namespace)::BatcherFlushDone(std::__1::shared_ptr<yb::client::internal::Batcher> const&, yb::Status const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig) ()
yugabyte#6  0x00005640c401b371 in boost::detail::function::void_function_obj_invoker1<std::__1::__bind<void (*)(std::__1::shared_ptr<yb::client::internal::Batcher> const&, yb::Status const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig), std::__1::shared_ptr<yb::client::internal::Batcher> const&, std::__1::placeholders::__ph<1> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig&>, void, yb::Status const&>::invoke(boost::detail::function::function_buffer&, yb::Status const&) ()
yugabyte#7  0x00005640c3f70398 in yb::client::internal::Batcher::Run() ()
yugabyte#8  0x00005640c3f72656 in yb::client::internal::Batcher::FlushFinished() ()
yugabyte#9  0x00005640c3f74a4d in yb::client::internal::Batcher::TabletLookupFinished(yb::client::internal::InFlightOp*, yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> >) ()
yugabyte#10 0x00005640c3f759bc in std::__1::__function::__func<yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*)::$_0, std::__1::allocator<yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*)::$_0>, void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>::operator()(yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&) ()

yugabyte#11 0x00005640c3fff05d in yb::client::internal::MetaCache::LookupTabletByKey(std::__1::shared_ptr<yb::client::YBTable> const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > >, std::__1::function<void (yb::Result<scoped_refptr<yb::client::internal::RemoteTablet> > const&)>,  yb::StronglyTypedBool<yb::client::internal::FailOnPartitionListRefreshed_Tag>) ()
** Is holding an exclusive lock in MetaCache::LookupTabletByKey/DoLookupTabletByKey **

yugabyte#12 0x00005640c3f7549a in yb::client::internal::Batcher::LookupTabletFor(yb::client::internal::InFlightOp*) ()
yugabyte#13 0x00005640c401855e in yb::client::(anonymous namespace)::FlushBatcherAsync(std::__1::shared_ptr<yb::client::internal::Batcher> const&, boost::function<void (yb::client::FlushStatus*)>, yb::client::YBSession::BatcherConfig, yb::StronglyTypedBool<yb::client::internal::IsWithinTransactionRetry_Tag>) ()
yugabyte#14 0x00005640c4017130 in yb::client::YBSession::FlushAsync(boost::function<void (yb::client::FlushStatus*)>) ()
yugabyte#15 0x00005640c5225a0c in yb::tserver::PgClientServiceImpl::Perform(yb::tserver::PgPerformRequestPB const*, yb::tserver::PgPerformResponsePB*, yb::rpc::RpcContext) ()
yugabyte#16 0x00005640c51c4487 in std::__1::__function::__func<yb::tserver::PgClientServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_20, std::__1::allocator<yb::tserver::PgClientServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_20>, void (std::__1::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::__1::shared_ptr<yb::rpc::InboundCall>&&) ()
yugabyte#17 0x00005640c51d374f in yb::tserver::PgClientServiceIf::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) ()
yugabyte#18 0x00005640c4f5f420 in yb::rpc::ServicePoolImpl::Handle(std::__1::shared_ptr<yb::rpc::InboundCall>) ()
yugabyte#19 0x00005640c4e845af in yb::rpc::InboundCall::InboundCallTask::Run() ()
yugabyte#20 0x00005640c4f6e243 in yb::rpc::(anonymous namespace)::Worker::Execute() ()
yugabyte#21 0x00005640c570ecb4 in yb::Thread::SuperviseThread(void*) ()
yugabyte#22 0x00007f808b7c6694 in start_thread (arg=0x7f76d8caf700) at pthread_create.c:333
yugabyte#23 0x00007f808bac341d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
```
Jira: DB-12651

Test Plan:
Jenkins
yb_build.sh --cxx-test ql-stress-test QLStressTest.ReproMetaCacheDeadlock

Reviewers: rthallam, hsunder, qhu, timur

Reviewed By: hsunder

Subscribers: svc_phabricator, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D37706
premkumr pushed a commit that referenced this pull request Sep 30, 2024
Summary:
It is possible for tablet peer's `tablet_` to be null when a rocksdb flush finishes. We call `tablet_->MaxPersistentOpId()` after flush to clean up recently applied transaction state, and this causes a SIGSEGV:
```
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(yb::RWOperationCounter*, yb::StatusHolder const*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::basic_string(this="", __str=<unavailable>) at string:898:9
    frame #1: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(yb::RWOperationCounter*, yb::StatusHolder const*, std::__1::chrono::time_point<yb::CoarseMonoClock, std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l>>> const&) [inlined] yb::RWOperationCounter::resource_name(this=0x0000000000000378) const at operation_counter.h:95:12
    frame #2: 0x000055885b97311d yb-tserver`yb::ScopedRWOperation::ScopedRWOperation(this=0x00007f9455305d58, counter=0x0000000000000378, abort_status_holder=<unavailable>, deadline=0x00007f9455305d98) at operation_counter.cc:190:62
    frame yugabyte#3: 0x000055885b247ea6 yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(bool) const [inlined] yb::ScopedRWOperation::ScopedRWOperation(this=0x00007f9455305d58, counter=<unavailable>, deadline=0x00007f9455305d98) at operation_counter.h:140:9
    frame yugabyte#4: 0x000055885b247e9f yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(bool) const [inlined] yb::tablet::Tablet::CreateScopedRWOperationBlockingRocksDbShutdownStart(this=0x0000000000000000, deadline=yb::CoarseTimePoint @ 0x00007f9455305d98) const at tablet.cc:3375:10
    frame yugabyte#5: 0x000055885b247e90 yb-tserver`yb::tablet::Tablet::MaxPersistentOpId(this=0x0000000000000000, invalid_if_no_new_data=<unavailable>) const at tablet.cc:3540:32
    frame yugabyte#6: 0x000055885b277f5e yb-tserver`yb::tablet::TabletPeer::MaxPersistentOpId(this=<unavailable>) const at tablet_peer.cc:946:23
    frame yugabyte#7: 0x000055885b278e52 yb-tserver`non-virtual thunk to yb::tablet::TabletPeer::MaxPersistentOpId() const at tablet_peer.cc:0
    frame yugabyte#8: 0x000055885b2dec44 yb-tserver`yb::tablet::TransactionParticipant::Impl::DoProcessRecentlyAppliedTransactions(this=0x0000153123151500, retryable_requests_flushed_op_id=<unavailable>, persist=<unavailable>) at transaction_participant.cc:2186:22
    frame yugabyte#9: 0x000055885b2e0a8e yb-tserver`yb::tablet::TransactionParticipant::ProcessRecentlyAppliedTransactions() [inlined] yb::tablet::TransactionParticipant::Impl::ProcessRecentlyAppliedTransactions(this=0x0000153123151500) at transaction_participant.cc:1440:27
    frame yugabyte#10: 0x000055885b2e0a63 yb-tserver`yb::tablet::TransactionParticipant::ProcessRecentlyAppliedTransactions(this=<unavailable>) at transaction_participant.cc:2629:17
    frame yugabyte#11: 0x000055885b226093 yb-tserver`yb::tablet::Tablet::RocksDbListener::OnFlushCompleted(this=0x0000153110c2da58, (null)=<unavailable>, (null)=<unavailable>) at tablet.cc:503:34
    frame yugabyte#12: 0x000055885af0e507 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) at db_impl.cc:2121:19
    frame yugabyte#13: 0x000055885af0e275 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) [inlined] rocksdb::DBImpl::FlushMemTableToOutputFile(this=0x0000153123150a80, cfd=0x000015317d651600, mutable_cf_options=0x00007f94553077d8, made_progress=<unavailable>, job_context=0x00007f9455306938, log_buffer=0x00007f9455306048) at db_impl.cc:2008:3
    frame yugabyte#14: 0x000055885af0d859 yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(rocksdb::ColumnFamilyData*) [inlined] rocksdb::DBImpl::BackgroundFlush(this=0x0000153123150a80, made_progress=<unavailable>, job_context=0x00007f9455306938, log_buffer=0x00007f9455306048, cfd=0x000015317d651600) at db_impl.cc:3399:10
    frame yugabyte#15: 0x000055885af0d21f yb-tserver`rocksdb::DBImpl::BackgroundCallFlush(this=0x0000153123150a80, cfd=<unavailable>) at db_impl.cc:3470:31
    frame yugabyte#16: 0x000055885b024a53 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() at thread_posix.cc:133:5
    frame yugabyte#17: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] rocksdb::ThreadPool::StartBGThreads(this=<unavailable>)::$_0::operator()() const at thread_posix.cc:172:5
    frame yugabyte#18: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] decltype(__f=<unavailable>)::$_0&>()()) std::__1::__invoke[abi:ue170006]<rocksdb::ThreadPool::StartBGThreads()::$_0&>(rocksdb::ThreadPool::StartBGThreads()::$_0&) at invoke.h:340:25
    frame yugabyte#19: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ue170006]<rocksdb::ThreadPool::StartBGThreads(__args=<unavailable>)::$_0&>(rocksdb::ThreadPool::StartBGThreads()::$_0&) at invoke.h:415:5
    frame yugabyte#20: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator()() [inlined] std::__1::__function::__alloc_func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator(this=<unavailable>)[abi:ue170006]() at function.h:192:16
    frame yugabyte#21: 0x000055885b024900 yb-tserver`std::__1::__function::__func<rocksdb::ThreadPool::StartBGThreads()::$_0, std::__1::allocator<rocksdb::ThreadPool::StartBGThreads()::$_0>, void ()>::operator(this=<unavailable>)() at function.h:363:12
    frame yugabyte#22: 0x000055885b9c1543 yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::__function::__value_func<void ()>::operator(this=0x000015313de3b380)[abi:ue170006]() const at function.h:517:16
    frame yugabyte#23: 0x000055885b9c152d yb-tserver`yb::Thread::SuperviseThread(void*) [inlined] std::__1::function<void ()>::operator(this=0x000015313de3b380)() const at function.h:1168:12
    frame yugabyte#24: 0x000055885b9c152d yb-tserver`yb::Thread::SuperviseThread(arg=0x000015313de3b320) at thread.cc:866:3
    frame yugabyte#25: 0x00007f94994d81ca libpthread.so.0`start_thread + 234
    frame yugabyte#26: 0x00007f9499729e73 libc.so.6`__clone + 67
```

This diff adds a null check and returns `OpId::Min()` (i.e. don't clean anything up) if `tablet_` is null and we cannot call `MaxPersistentOpId`.
Jira: DB-12915

Test Plan: Jenkins

Reviewers: sergei, rthallam

Reviewed By: sergei, rthallam

Subscribers: rthallam, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D38323
ddhodge pushed a commit that referenced this pull request Oct 24, 2024
Summary:
### Issue

Test ClockSynchronizationTest.TestClockSkewError fails with tsan failure

```
WARNING: ThreadSanitizer: data race (pid=226462)
  Read of size 8 at 0x7b4000000bf0 by thread T82:
    #0 boost::intrusive_ptr<yb::Status::State>::get() const ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:181:16 (libyb_util.so+0x3c5994)
    #1 bool boost::operator==<yb::Status::State>(boost::intrusive_ptr<yb::Status::State> const&, std::nullptr_t) ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:263:14 (libyb_util.so+0x3c5994)
    #2 yb::Status::ok() const ${YB_SRC_ROOT}/src/yb/util/status.h:120:51 (libyb_util.so+0x3c5994)
    yugabyte#3 yb::MockClock::Now() ${YB_SRC_ROOT}/src/yb/util/physical_time.cc:141:3 (libyb_util.so+0x3c5994)
    yugabyte#4 yb::server::HybridClock::NowWithError(yb::HybridTime*, unsigned long*) ${YB_SRC_ROOT}/src/yb/server/hybrid_clock.cc:155:22 (libserver_common.so+0xa5e12)
    yugabyte#5 yb::server::HybridClock::NowRange() ${YB_SRC_ROOT}/src/yb/server/hybrid_clock.cc:144:3 (libserver_common.so+0xa5ceb)
    yugabyte#6 yb::ClockBase::Now() ${YB_SRC_ROOT}/src/yb/common/clock.h:26:29 (libtserver.so+0x23a77a)
    yugabyte#7 yb::tserver::Heartbeater::Thread::TryHeartbeat() ${YB_SRC_ROOT}/src/yb/tserver/heartbeater.cc:437:41 (libtserver.so+0x23a77a)
    yugabyte#8 yb::tserver::Heartbeater::Thread::DoHeartbeat() ${YB_SRC_ROOT}/src/yb/tserver/heartbeater.cc:650:19 (libtserver.so+0x23d05f)
    yugabyte#9 yb::tserver::Heartbeater::Thread::RunThread() ${YB_SRC_ROOT}/src/yb/tserver/heartbeater.cc:697:16 (libtserver.so+0x23d74d)
    yugabyte#10 decltype(*std::declval<yb::tserver::Heartbeater::Thread*&>().*std::declval<void (yb::tserver::Heartbeater::Thread::*&)()>()()) std::__invoke[abi:ue170006]<void (yb::tserver::Heartbeater::Thread::*&)(), yb::tserver::Heartbeater::Thread*&, void>(void (yb::tserver::Heartbeater::Thread::*&)(), yb::tserver::Heartbeater::Thread*&) ${YB_THIRDPARTY_DIR}/installed/tsan/libcxx/include/c++/v1/__type_traits/invoke.h:308:25 (libtserver.so+0x24206b)
...

  Previous write of size 8 at 0x7b4000000bf0 by main thread:
    #0 boost::intrusive_ptr<yb::Status::State>::swap(boost::intrusive_ptr<yb::Status::State>&) ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:210:16 (libyb_util.so+0x3c5c54)
    #1 boost::intrusive_ptr<yb::Status::State>::operator=(boost::intrusive_ptr<yb::Status::State>&&) ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:122:61 (libyb_util.so+0x3c5c54)
    #2 yb::Status::operator=(yb::Status&&) ${YB_SRC_ROOT}/src/yb/util/status.h:98:7 (libyb_util.so+0x3c5c54)
    yugabyte#3 yb::MockClock::Set(yb::PhysicalTime const&) ${YB_SRC_ROOT}/src/yb/util/physical_time.cc:147:16 (libyb_util.so+0x3c5c54)
    yugabyte#4 yb::ClockSynchronizationTest_TestClockSkewError_Test::TestBody() ${YB_SRC_ROOT}/src/yb/integration-tests/clock_synchronization-itest.cc:131:15 (clock_synchronization-itest+0x12e3ca)
    yugabyte#5 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2599:10 (libgtest.so.1.12.1+0x894f9)
    yugabyte#6 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2635:14 (libgtest.so.1.12.1+0x894f9)
    yugabyte#7 testing::Test::Run() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2674:5 (libgtest.so.1.12.1+0x6123f)
    yugabyte#8 testing::TestInfo::Run() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:2853:11 (libgtest.so.1.12.1+0x62a05)
    yugabyte#9 testing::TestSuite::Run() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:3012:30 (libgtest.so.1.12.1+0x63f04)
    yugabyte#10 testing::internal::UnitTestImpl::RunAllTests() ${YB_THIRDPARTY_DIR}/src/googletest-1.12.1/googletest/src/gtest.cc:5870:44 (libgtest.so.1.12.1+0x7be3d)
...

**SUMMARY**: ThreadSanitizer: data race ${YB_THIRDPARTY_DIR}/installed/tsan/include/boost/smart_ptr/intrusive_ptr.hpp:181:16 in boost::intrusive_ptr<yb::Status::State>::get() const
```

### Fix

Do what value_ does => wrap mock_status_ in boost::atomic.
Jira: DB-13604

Test Plan:
Jenkins

Ran

```
./yb_build.sh tsan --cxx-test integration-tests_clock_synchronization-itest --gtest_filter ClockSynchronizationTest.TestClockSkewError -n 50
```

Reviewers: asrivastava

Reviewed By: asrivastava

Subscribers: ybase

Differential Revision: https://phorge.dev.yugabyte.com/D39315
premkumr pushed a commit that referenced this pull request Nov 20, 2024
…l connection manager

Summary:
Set pthread_attr_setstacksize to 512 KB in ysql connection manager. This is to fix crashes involving Alma 9 machines.
```
#0  0x000055e430191fd7 in tcmalloc::tcmalloc_internal::PageTracker::Get(tcmalloc::tcmalloc_internal::Length) ()
#1  0x000055e430192774 in tcmalloc::tcmalloc_internal::HugePageFiller<tcmalloc::tcmalloc_internal::PageTracker>::TryGet(tcmalloc::tcmalloc_internal::Length, unsigned long)
    ()
#2  0x000055e430160b3a in tcmalloc::tcmalloc_internal::HugePageAwareAllocator::New(tcmalloc::tcmalloc_internal::Length, unsigned long) ()
yugabyte#3  0x000055e4301437a2 in void* tcmalloc::tcmalloc_internal::SampleifyAllocation<tcmalloc::tcmalloc_internal::Static, tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy> >(tcmalloc::tcmalloc_internal::Static&, tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, unsigned long, unsigned long, unsigned long, void*, tcmalloc::tcmalloc_internal::Span*, unsigned long*) ()
yugabyte#4  0x000055e43014339b in void* slow_alloc<tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, decltype(nullptr)>(tcmalloc::tcmalloc_internal::TCMallocPolicy<tcmalloc::tcmalloc_internal::MallocOomPolicy, tcmalloc::tcmalloc_internal::MallocAlignPolicy, tcmalloc::tcmalloc_internal::AllocationAccessHotPolicy, tcmalloc::tcmalloc_internal::InvokeHooksPolicy, tcmalloc::tcmalloc_internal::LocalNumaPartitionPolicy>, unsigned long, decltype(nullptr))
    ()
yugabyte#5  0x000055e4301404e6 in malloc ()
yugabyte#6  0x00007fc3a67b1d7e in ssl3_setup_write_buffer () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
yugabyte#7  0x00007fc3a67ae824 in do_ssl3_write () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
yugabyte#8  0x00007fc3a67ae3e1 in ssl3_write_bytes () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
yugabyte#9  0x00007fc3a67cc251 in ssl3_do_write () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
yugabyte#10 0x00007fc3a67c1a6a in state_machine () from /home/centos/code/local_testing/conn_manager_ssl_auth/yugabyte-b124/bin/../lib/yb-thirdparty/libssl.so.3
yugabyte#11 0x000055e43013d303 in mm_tls_handshake_cb (handle=<optimized out>) at ../../src/odyssey/third_party/machinarium/sources/tls.c:453
yugabyte#12 0x000055e43013b9e7 in mm_epoll_step (poll=0x37beffd71a60, timeout=<optimized out>) at ../../src/odyssey/third_party/machinarium/sources/epoll.c:79
yugabyte#13 0x000055e43013b386 in mm_loop_step (loop=0x37beffd72980) at ../../src/odyssey/third_party/machinarium/sources/loop.c:64
yugabyte#14 machine_main (arg=0x37beffd72780) at ../../src/odyssey/third_party/machinarium/sources/machine.c:56
yugabyte#15 0x00007fc3a5e89c02 in start_thread (arg=<optimized out>) at pthread_create.c:443
yugabyte#16 0x00007fc3a5f0ec40 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81```
Note that, this 512 KB value is the same as the value used in tserver and master processes via the min_thread_stack_size_bytes GFlag introduced in https://phorge.dev.yugabyte.com/D38053.
Jira: DB-13388

Test Plan: Jenkins: enable connection manager, all tests

Reviewers: skumar, stiwary

Reviewed By: stiwary

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D40087
premkumr pushed a commit that referenced this pull request Dec 5, 2024
… ALTER TABLE

Summary:
In some specific cases, a failed ALTER TABLE operation does not adequately invalidate the table cache, causing a "schema mismatch" error in the subsequent command. Consider the following example:

```
CREATE TABLE pk(a int primary key);
INSERT INTO pk values (1);
CREATE TABLE fk(a int);
INSERT INTO fk values (2);
ALTER TABLE fk ADD FOREIGN KEY (a) REFERENCES pk; -- fails due to FK constraint violation
BEGIN;
SELECT * from pk; -- throws schema version mismatch error
COMMIT;
```

Before the start of ALTER TABLE, the schema version of both fk and pk is zero.

The above ALTER TABLE does the following:
1. Increments schema version of both relations to 1 (YBCPrepareAlterTableCmd() increments schema version of both - the main relation and dependent relations).
2. Invalidates any pre-existing table schema of the two relations (ATRewriteCatalogs()).
3. Loads schema of both the relations (version 1) to check for FK violation, which it finds to be there (ATRewriteTables()).
4. ybAlteredTableIds contains only the oid corresponding to fk. Hence, only fk's table cache is invalidated during error recovery. This is where the bug is (explained below).
5. DDL txn verification on master increments the schema version of both the relations to 2 (see YsqlDdlTxnAlterTableHelper()).

Because of step yugabyte#4, the table cache still contains the stale entry of pk (corresponding to version 1). The subsequent SELECT operation ends up using it. This leads to the "schema version mismatch, expected 2, got 1" error.

Resolution:
On a failure, in step yugabyte#4, invalidate the table cache entries of all the relations whose schema is incremented at the start (step #1) and then at the end by YBResetDdlState() (step yugabyte#5). This is done by including the oids of the dependent relations in `ybAlteredTableIds`.
Jira: DB-14126

Test Plan:
   ./yb_build.sh --java-test 'org.yb.pgsql.TestPgRegressTable'

Reviewers: fizaa

Reviewed By: fizaa

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D40275
ddhodge pushed a commit that referenced this pull request Dec 14, 2024
…kOperation::GetDocPaths

Summary:
```
../../src/yb/docdb/conflict_resolution.cc:865:5: runtime error: load of value 4458368, which is not a valid value for type 'IsolationLevel'
[m-1] W1212 08:27:12.423846 99051 master_heartbeat_service.cc:426] Could not get YSQL db catalog versions for heartbeat response:
[m-1] W1212 08:27:12.425787 99052 master_heartbeat_service.cc:426] Could not get YSQL db catalog versions for heartbeat response:
    #0 0x7fd9ae920c0e in yb::docdb::(anonymous namespace)::GetWriteRequestIntents(std::vector<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>, std::allocator<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>>> const&, yb::dockv::KeyBytes*, yb::StronglyTypedBool<yb::dockv::PartialRangeKeyIntents_Tag>, yb::IsolationLevel) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:865:5
    #1 0x7fd9ae9190ce in yb::docdb::(anonymous namespace)::TransactionConflictResolverContext::GetRequestedIntents(yb::docdb::(anonymous namespace)::ConflictResolver*, yb::dockv::KeyBytes*) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:1130:22
    #2 0x7fd9ae90fcee in yb::docdb::(anonymous namespace)::TransactionConflictResolverContext::ReadConflicts(yb::docdb::(anonymous namespace)::ConflictResolver*) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:1164:22
    yugabyte#3 0x7fd9ae8fd333 in yb::docdb::(anonymous namespace)::ConflictResolver::Resolve() ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:198:26
    yugabyte#4 0x7fd9ae8fc3b2 in yb::docdb::(anonymous namespace)::WaitOnConflictResolver::TryPreWait() ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:697:25
    yugabyte#5 0x7fd9ae8fc3b2 in yb::docdb::(anonymous namespace)::WaitOnConflictResolver::Run() ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:670:7
    yugabyte#6 0x7fd9ae8fa47d in yb::docdb::ResolveTransactionConflicts(std::vector<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>, std::allocator<std::unique_ptr<yb::docdb::DocOperation, std::default_delete<yb::docdb::DocOperation>>>> const&, yb::docdb::ConflictManagementPolicy, yb::docdb::LWKeyValueWriteBatchPB const&, yb::HybridTime, yb::HybridTime, long, unsigned long, long, yb::docdb::DocDB const&, yb::StronglyTypedBool<yb::dockv::PartialRangeKeyIntents_Tag>, yb::TransactionStatusManager*, yb::tablet::TabletMetrics*, yb::docdb::LockBatch*, yb::docdb::WaitQueue*, std::chrono::time_point<yb::CoarseMonoClock, std::chrono::duration<long long, std::ratio<1l, 1000000000l>>>, boost::function<void (yb::Result<yb::HybridTime> const&)>) ${YB_SRC_ROOT}/src/yb/docdb/conflict_resolution.cc:1401:15
    yugabyte#7 0x7fd9aff22ca9 in yb::tablet::WriteQuery::DoExecute() ${YB_SRC_ROOT}/src/yb/tablet/write_query.cc:801:10
    yugabyte#8 0x7fd9aff1fda3 in yb::tablet::WriteQuery::Execute(std::unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery>>) ${YB_SRC_ROOT}/src/yb/tablet/write_query.cc:618:28
    yugabyte#9 0x7fd9afb9b708 in yb::tablet::Tablet::AcquireLocksAndPerformDocOperations(std::unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery>>) ${YB_SRC_ROOT}/src/yb/tablet/tablet.cc:2147:3
    yugabyte#10 0x7fd9afd166d7 in yb::tablet::TabletPeer::WriteAsync(std::unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery>>) ${YB_SRC_ROOT}/src/yb/tablet/tablet_peer.cc:704:12
    yugabyte#11 0x7fd9b0fd4d57 in yb::tserver::TabletServiceImpl::PerformWrite(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext*) ${YB_SRC_ROOT}/src/yb/tserver/tablet_service.cc:2325:16
    yugabyte#12 0x7fd9b0fd711a in yb::tserver::TabletServiceImpl::Write(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext) ${YB_SRC_ROOT}/src/yb/tserver/tablet_service.cc:2345:17
    yugabyte#13 0x7fd9a91098ec in yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext)::operator()(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext) const ${BUILD_ROOT}/src/yb/tserver/tserver_service.service.cc:848:9
    yugabyte#14 0x7fd9a91098ec in auto yb::rpc::HandleCall<yb::rpc::RpcCallPBParamsImpl<yb::tserver::WriteRequestPB, yb::tserver::WriteResponsePB>, yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext)>(std::shared_ptr<yb::rpc::InboundCall>, yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::WriteRequestPB const*, yb::tserver::WriteResponsePB*, yb::rpc::RpcContext)) ${YB_SRC_ROOT}/src/yb/rpc/local_call.h:126:7
    yugabyte#15 0x7fd9a91098ec in yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0::operator()(std::shared_ptr<yb::rpc::InboundCall>) const ${BUILD_ROOT}/src/yb/tserver/tserver_service.service.cc:846:7
    yugabyte#16 0x7fd9a91098ec in decltype(std::declval<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&>()(std::declval<std::shared_ptr<yb::rpc::InboundCall>>())) std::__invoke[abi:ue170006]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>>(yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__type_traits/invoke.h:340:25
    yugabyte#17 0x7fd9a91098ec in void std::__invoke_void_return_wrapper<void, true>::__call[abi:ue170006]<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>>(yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0&, std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__type_traits/invoke.h:415:5
    yugabyte#18 0x7fd9a91098ec in std::__function::__alloc_func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0, std::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0>, void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ue170006](std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:192:16
    yugabyte#19 0x7fd9a91098ec in std::__function::__func<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0, std::allocator<yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_0>, void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::shared_ptr<yb::rpc::InboundCall>&&) ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:363:12
    yugabyte#20 0x7fd9a9108b34 in std::__function::__value_func<void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()[abi:ue170006](std::shared_ptr<yb::rpc::InboundCall>&&) const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:517:16
    yugabyte#21 0x7fd9a9108b34 in std::function<void (std::shared_ptr<yb::rpc::InboundCall>)>::operator()(std::shared_ptr<yb::rpc::InboundCall>) const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:1168:12
    yugabyte#22 0x7fd9a9108b34 in yb::tserver::TabletServerServiceIf::Handle(std::shared_ptr<yb::rpc::InboundCall>) ${BUILD_ROOT}/src/yb/tserver/tserver_service.service.cc:831:3
    yugabyte#23 0x7fd9a5b7a892 in yb::rpc::ServicePoolImpl::Handle(std::shared_ptr<yb::rpc::InboundCall>) ${YB_SRC_ROOT}/src/yb/rpc/service_pool.cc:269:19
    yugabyte#24 0x7fd9a59fcea5 in yb::rpc::InboundCall::InboundCallTask::Run() ${YB_SRC_ROOT}/src/yb/rpc/inbound_call.cc:317:13
    yugabyte#25 0x7fd9a5bad15d in yb::rpc::(anonymous namespace)::Worker::Execute() ${YB_SRC_ROOT}/src/yb/rpc/thread_pool.cc:115:15
    yugabyte#26 0x7fd9a42ad037 in std::__function::__value_func<void ()>::operator()[abi:ue170006]() const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:517:16
    yugabyte#27 0x7fd9a42ad037 in std::function<void ()>::operator()() const ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:1168:12
    yugabyte#28 0x7fd9a42ad037 in yb::Thread::SuperviseThread(void*) ${YB_SRC_ROOT}/src/yb/util/thread.cc:895:3
    yugabyte#29 0x56176114abea in asan_thread_start(void*) ${YB_LLVM_TOOLCHAIN_DIR}/src/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:225:31
    yugabyte#30 0x7fd99f1071c9 in start_thread (/lib64/libpthread.so.0+0x81c9) (BuildId: 1962602ac5dc3011b6d697b38b05ddc244197114)
    yugabyte#31 0x7fd99eb488d2 in clone (/lib64/libc.so.6+0x398d2) (BuildId: 37e4ac6a7fb96950b0e6bf72d73d94f3296c77eb)

UndefinedBehaviorSanitizer: undefined-behavior ../../src/yb/docdb/conflict_resolution.cc:865:5 in
```

IsolationLevel is left uninitialized at `PgsqlLockOperation::GetDocPaths` so the ASAN builds can get the `not a valid value` faliure.
Jira: DB-14458

Test Plan: advisory_lock-test

Reviewers: bkolagani

Reviewed By: bkolagani

Subscribers: ybase, yql

Differential Revision: https://phorge.dev.yugabyte.com/D40638
premkumr pushed a commit that referenced this pull request Jan 10, 2025
…work

Summary:
Fix asan failure for tests running the isolation regress
framework: org.yb.pgsql.TestPgRegressWaitQueues,
org.yb.pgsql.TestPgRegressIsolationWithoutWaitQueues and
org.yb.pgsql.TestPgRegressIsolation.

The failure is:
```
+=================================================================
+==31476==ERROR: LeakSanitizer: detected memory leaks
+
+Direct leak of 864 byte(s) in 4 object(s) allocated from:
+    #0 0x55fc1116466e in malloc /opt/yb-build/llvm/yb-llvm-v17.0.6-yb-1-1720414757-9b881774-almalinux8-x86_64-build/src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
+    #1 0x7f94aa200490 in PQmakeEmptyPGresult /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-exec.c:164:24
+    #2 0x7f94aa21df9d in getRowDescriptions /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-protocol3.c
+    yugabyte#3 0x7f94aa21c0bc in pqParseInput3 /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-protocol3.c:324:11
+    yugabyte#4 0x7f94aa207028 in parseInput /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-exec.c:2014:2
+    yugabyte#5 0x7f94aa207028 in PQgetResult /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-exec.c:2100:3
+    yugabyte#6 0x7f94aa208437 in PQexecFinish /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-exec.c:2417:19
+    yugabyte#7 0x7f94aa208437 in PQexecParams /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/interfaces/libpq/../../../../../../src/postgres/src/interfaces/libpq/fe-exec.c:2279:9
+    yugabyte#8 0x55fc111a2881 in main /share/jenkins/workspace/github-yugabyte-db-alma8-master-clang17-asan/yugabyte-db/src/postgres/src/test/isolation/../../../../../../src/postgres/src/test/isolation/isolationtester.c:201:9
+    yugabyte#9 0x7f94a8caa7e4 in __libc_start_main (/lib64/libc.so.6+0x3a7e4) (BuildId: 37e4ac6a7fb96950b0e6bf72d73d94f3296c77eb)
+
+Objects leaked above:
+0x511000004640 (216 bytes)
+0x511000008ec0 (216 bytes)
+0x5110000133c0 (216 bytes)
+0x5110000179c0 (216 bytes)
```
Jira: DB-13172

Test Plan: Jenkins: test regex: .*RegressIsolation.*|.*RegressWaitQueues.*

Reviewers: patnaik.balivada

Reviewed By: patnaik.balivada

Subscribers: jason, yql

Differential Revision: https://phorge.dev.yugabyte.com/D40987
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants