Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YCQL][ASH] Test failures when ASH is enabled by default #23251

Closed
1 task done
abhinab-yb opened this issue Jul 22, 2024 · 0 comments
Closed
1 task done

[YCQL][ASH] Test failures when ASH is enabled by default #23251

abhinab-yb opened this issue Jul 22, 2024 · 0 comments
Assignees
Labels
area/ycql Yugabyte CQL (YCQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@abhinab-yb
Copy link
Contributor

abhinab-yb commented Jul 22, 2024

Jira Link: DB-12182

Description

There are a few test failures when ASH is enabled by default.

org.yb.cql.TestBatchRequest#testRecreateTable2, org.yb.cql.TestIndex#testPagingSelect, org.yb.cql.TestAudit#batchWithStaleMetadata, rg.yb.cql.TestIndex#testRecreateTable, org.yb.cql.TestOrderedColumns#testSingleTypeCreateOrders, org.yb.cql.TestInsertValues#testInsertIntoRecreatedTable, org.yb.cql.TestPrepareExecute#testDDLKeyspaceResolution, org.yb.cql.TestOrderedColumns#testSingleTypeScanOrders

Fails with stack trace

ts2|pid22980|:21491 F20240715 13:09:45 ../../src/yb/yql/cql/ql/exec/executor.cc:1294] No wait state here.
ts2|pid22980|:21491 ${YB_SRC_ROOT}/src/yb/util/logging.cc:464:     @     0x7efddca1a585  yb::LogFatalHandlerSink::send(int, char const*, char const*, int, tm const*, char const*, unsigned long)
ts2|pid22980|:21491     @     0x7efdd91e4e73 
ts2|pid22980|:21491     @     0x7efdd91cd160 
ts2|pid22980|:21491     @     0x7efdd91d07b5 
ts2|pid22980|:21491     @     0x7efdd91cfe02 
ts2|pid22980|:21491 ${YB_SRC_ROOT}/src/yb/yql/cql/ql/exec/executor.cc:1294:     @     0x7efde3170ca0  yb::ql::Executor::ExecPTNode(yb::ql::PTInsertStmt const*, yb::ql::TnodeContext*)
ts2|pid22980|:21491 ${YB_SRC_ROOT}/src/yb/yql/cql/ql/exec/executor.cc:344:     @     0x7efde315d810  yb::ql::Executor::ExecTreeNode(yb::ql::TreeNode const*)
ts2|pid22980|:21491 ${YB_SRC_ROOT}/src/yb/yql/cql/ql/exec/executor.cc:269:     @     0x7efde31573e9  yb::ql::Executor::Execute(yb::ql::ParseTree const&, yb::ql::StatementParameters const&)
ts2|pid22980|:21491 ${YB_SRC_ROOT}/src/yb/yql/cql/ql/exec/executor.cc:251:     @     0x7efde315c67a  yb::ql::Executor::ExecuteAsync(std::vector<std::pair<std::reference_wrapper<yb::ql::ParseTree const>, std::reference_wrapper<yb::ql::StatementParameters const>>, std::allocator<std::pair<std::reference_wrapper<yb::ql::ParseTree const>, std::reference_wrapper<yb::ql::StatementParameters const>>>> const&, yb::Callback<void (yb::Status const&, shared_ptr<yb::ql::ExecutedResult> const&)>)
ts2|pid22980|:21491 ${YB_SRC_ROOT}/src/yb/yql/cql/ql/ql_processor.cc:437:     @     0x7efde3226235  yb::ql::QLProcessor::ExecuteAsync(std::vector<std::pair<std::reference_wrapper<yb::ql::ParseTree const>, std::reference_wrapper<yb::ql::StatementParameters const>>, std::allocator<std::pair<std::reference_wrapper<yb::ql::ParseTree const>, std::reference_wrapper<yb::ql::StatementParameters const>>>> const&, yb::Callback<void (yb::Status const&, shared_ptr<yb::ql::ExecutedResult> const&)>)
ts2|pid22980|:21491 ${YB_SRC_ROOT}/src/yb/yql/cql/cqlserver/cql_processor.cc:558:     @     0x7efde32fb89a  yb::cqlserver::CQLProcessor::ProcessRequest(yb::ql::BatchRequest const&)
ts2|pid22980|:21491 ${YB_SRC_ROOT}/src/yb/yql/cql/cqlserver/cql_processor.cc:359:     @     0x7efde32f3aef  yb::cqlserver::CQLProcessor::ProcessRequest(yb::ql::CQLRequest const&)
ts2|pid22980|:21491 ../../src/yb/yql/cql/cqlserver/cql_processor.h:181:                                                     @     0x7efde3314102  yb::cqlserver::CQLProcessor::ProcessRequestTask::Run()
ts2|pid22980|:21491 ${YB_SRC_ROOT}/src/yb/rpc/thread_pool.cc:115:     @     0x7efdde4278ed  yb::rpc::(anonymous namespace)::Worker::Execute()
ts2|pid22980|:21491 ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:517:     @     0x7efddcd141a7  std::__function::__value_func<void ()>::operator()[abi:ue170006]() const
ts2|pid22980|:21491 ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:1168:     @     0x7efddcd141a7  std::function<void ()>::operator()() const
ts2|pid22980|:21491 ${YB_SRC_ROOT}/src/yb/util/thread.cc:866:     @     0x7efddcd141a7  yb::Thread::SuperviseThread(void*)
ts2|pid22980|:21491 ${YB_LLVM_TOOLCHAIN_DIR}-build/src/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:225:     @     0x55a62a9e793a  asan_thread_start
ts2|pid22980|:21491     @     0x7efdd7ba21c9 
ts2|pid22980|:21491     @     0x7efdd75e58d2 

and stack trace

ts1|pid38560|:21749 F20240715 13:07:53 ../../src/yb/ash/wait_state.cc:323]  In virtual void yb::cqlserver::CQLServiceImpl::Handle(yb::rpc::InboundCallPtr) wait-state 0x511000150458 was updated to kOnCpu_Active from kOnCpu_Active but it is currently kYCQL_Parse. Not expecting concurrent updates.
ts1|pid38560|:21749 Trace so far:n/a
ts1|pid38560|:21749 ${YB_SRC_ROOT}/src/yb/util/logging.cc:464:     @     0x7fda272f4585  yb::LogFatalHandlerSink::send(int, char const*, char const*, int, tm const*, char const*, unsigned long)
ts1|pid38560|:21749     @     0x7fda23abee73 
ts1|pid38560|:21749     @     0x7fda23aa7160 
ts1|pid38560|:21749     @     0x7fda23aaa7b5 
ts1|pid38560|:21749     @     0x7fda23aa9e02 
ts1|pid38560|:21749 ${YB_SRC_ROOT}/src/yb/ash/wait_state.cc:323:     @     0x7fda286c8a12  yb::ash::ScopedWaitStatus::~ScopedWaitStatus()
ts1|pid38560|:21749 ${YB_SRC_ROOT}/src/yb/yql/cql/cqlserver/cql_service.cc:232:     @     0x7fda2dc51a9d  yb::cqlserver::CQLServiceImpl::Handle(shared_ptr<yb::rpc::InboundCall>)
ts1|pid38560|:21749 ${YB_SRC_ROOT}/src/yb/rpc/service_pool.cc:269:     @     0x7fda28ccf042  yb::rpc::ServicePoolImpl::Handle(shared_ptr<yb::rpc::InboundCall>)
ts1|pid38560|:21749 ${YB_SRC_ROOT}/src/yb/rpc/inbound_call.cc:314:     @     0x7fda28b552a5  yb::rpc::InboundCall::InboundCallTask::Run()
ts1|pid38560|:21749 ${YB_SRC_ROOT}/src/yb/rpc/thread_pool.cc:115:     @     0x7fda28d018ed  yb::rpc::(anonymous namespace)::Worker::Execute()
ts1|pid38560|:21749 ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:517:     @     0x7fda275ee1a7  std::__function::__value_func<void ()>::operator()[abi:ue170006]() const
ts1|pid38560|:21749 ${YB_THIRDPARTY_DIR}/installed/asan/libcxx/include/c++/v1/__functional/function.h:1168:     @     0x7fda275ee1a7  std::function<void ()>::operator()() const
ts1|pid38560|:21749 ${YB_SRC_ROOT}/src/yb/util/thread.cc:866:     @     0x7fda275ee1a7  yb::Thread::SuperviseThread(void*)
ts1|pid38560|:21749 ${YB_LLVM_TOOLCHAIN_DIR}-build/src/llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:225:     @     0x556fc1c1c93a  asan_thread_start
ts1|pid38560|:21749     @     0x7fda2247c1c9 
ts1|pid38560|:21749     @     0x7fda21ebf8d2 

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@abhinab-yb abhinab-yb added area/ycql Yugabyte CQL (YCQL) status/awaiting-triage Issue awaiting triage labels Jul 22, 2024
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jul 22, 2024
@yugabyte-ci yugabyte-ci assigned abhinab-yb and unassigned abhinab-yb Jul 25, 2024
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Sep 2, 2024
abhinab-yb added a commit that referenced this issue Sep 2, 2024
Summary:
There seems to be some race conditions in CQL tests due
to D3169, it is not clear from the summary what that diff
was supposed to fix, so this diff reverts the code to
reading and writing into `request_` non-atomically.

Running a mini cluster with both YSQL and YCQL seems
to be slow in TSAN mode. CQL driver seems to timeout
most of the times while connecting to the cluster. This
diff fixes this by using either YSQL or YCQL while running
the tests.

This diff also enables concurrent updates in some cql
wait states, which fixes a few non-ASH tests when ASH
would be enabled by default
Jira: DB-12182

Test Plan: ./yb_build.sh --cxx-test wait_states-itest

Reviewers: amitanand, jason, hbhanawat

Reviewed By: amitanand

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D37312
jasonyb pushed a commit that referenced this issue Sep 4, 2024
Summary:
 Manually excluded: f77dd6a [#23251] YCQL, ASH: Fix CQL test failures with ASH
 1ebc289 [PLAT-13833]Upgrade ion java library
 1302a2b [PLAT-15087] Connection pooling fails when ysqlAuth is enabled
 1050ec4 [docs] Add a limitation for CDC in docs (#23586)
 Excluded: b57e3c6 [#23490] YSQL: Tighten notion of equality for update optimizations
 9b4c4b5 [PLAT-10264] collect audit logs, ha config and xcluster data in support bundle
 Excluded: 9e0c569 Revert "[PLAT-14786] Add support to node_agent install to use bind ip and node_external_fqdn"
 920989b [PLAT-14788]mask SAS token in backup config response and logs
 Excluded: 0ea4f54 [#23367] CDCSDK: Cleanup expired and not of interest tables from CDC stream
 Excluded: 0dc3a4a [#23737] YSQL: Change ysql conn mgr tests to fix them with warmup random mode
 eba9b49 [PLAT-13845] Upgrade aws sdk to 1.52+
 f44c92e [PLAT-12933] [k8s] Ability to roll N nodes at a time during upgrades for multi-AZ(region) universes
 b8f0308 [PLAT-14008] Avoid rolling YBA managed n2n certificates when not needed (e.g. during Gflag Upgrades)

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, tfoucher

Differential Revision: https://phorge.dev.yugabyte.com/D37738
abhinab-yb added a commit that referenced this issue Sep 5, 2024
…with ASH

Summary:
- wait_states-itest.cc
-- The tests - AshPg, AshCql, AshFlushAndCompactions, AshTestVerifyOccurrence.VerifyWaitStateEntered, AshTestWithPriorityQueue.VerifyWaitStateEntered
--- YB f77dd6a fixed and enabled TSAN tests in ASH
--- PG 55782d5 added pg_GSS_have_cred_cache in fe-gssapi-common.c which calls external function gss_acquire_cred from krb5 library
--- There seems to be data race in krb5 library when the tests try to create concurrent PG connections, don't enable the tsan tests in ASH until the underlying problem is fixed

Original commit: f77dd6a / D37312
There seems to be some race conditions in CQL tests due
to D3169, it is not clear from the summary what that diff
was supposed to fix, so this diff reverts the code to
reading and writing into `request_` non-atomically.

Running a mini cluster with both YSQL and YCQL seems
to be slow in TSAN mode. CQL driver seems to timeout
most of the times while connecting to the cluster. This
diff fixes this by using either YSQL or YCQL while running
the tests.

This diff also enables concurrent updates in some cql
wait states, which fixes a few non-ASH tests when ASH
would be enabled by default
Jira: DB-12182

Test Plan: ./yb_build.sh --cxx-test wait_states-itest

Reviewers: jason, tfoucher

Reviewed By: jason

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D37760
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ycql Yugabyte CQL (YCQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants