Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB] YQL system.partitions cache lock contention leads to master unresponsiveness #12950

Open
hulien22 opened this issue Jun 18, 2022 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@hulien22
Copy link
Contributor

hulien22 commented Jun 18, 2022

Jira Link: DB-2683

Description

During load balancing with a large number of tablets we see lock contention caused by the yql system partitions cache.

This contention appears to be caused by the partitions vtable refresh thread taking a read lock at the same time that the tablet reports attempt to get a write lock. like so:

Processing 11 tablet reports waiting on the write lock:
11 __clone;start_thread;yb::Thread::SuperviseThread();yb::rpc::InboundCall::InboundCallTask::Run();yb::rpc::ServicePoolImpl::Handle();yb::master::MasterHeartbeatIf::Handle();_ZNSt17_Function_handlerIFvSt10shared_ptrIN2yb3rpc11InboundCallEEEZNS1_6master17MasterHeartbeatIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEEUlS4_E_E9_M_invokeERKSt9_Any_dataOS4_;_ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_6master20TSHeartbeatRequestPBENS3_21TSHeartbeatResponsePBEEEZZNS3_17MasterHeartbeatIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENKUlSt10shared_ptrINS0_11InboundCallEEE_clESF_EUlPKS4_PS5_NS0_10RpcContextEE_EEDaSF_T0_;yb::master::CatalogManager::ProcessTabletReport();yb::master::CatalogManager::ProcessTabletReportBatch();yb::master::YQLPartitionsVTable::ProcessMutatedTablets();__pthread_rwlock_wrlock_slow;(unknown)

173 reads to the system.partitions table blocked by the write lock:
173 yb::rpc::InboundCall::InboundCallTask::Run();yb::rpc::ServicePoolImpl::Handle();yb::tserver::TabletServerServiceIf::Handle();_ZNSt17_Function_handlerIFvSt10shared_ptrIN2yb3rpc11InboundCallEEEZNS1_7tserver21TabletServerServiceIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEEUlS4_E0_E9_M_invokeERKSt9_Any_dataOS4_;_ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_7tserver13ReadRequestPBENS3_14ReadResponsePBEEEZZNS3_21TabletServerServiceIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENKUlSt10shared_ptrINS0_11InboundCallEEE0_clESF_EUlPKS4_PS5_NS0_10RpcContextEE_EEDaSF_T0_;yb::tserver::TabletServiceImpl::Read();yb::tserver::TabletServiceImpl::CompleteRead();yb::tserver::TabletServiceImpl::DoRead();yb::tserver::TabletServiceImpl::DoReadImpl();yb::master::SystemTablet::HandleQLReadRequest();yb::tablet::AbstractTablet::HandleQLReadRequest();yb::docdb::QLReadOperation::Execute();yb::master::YQLVirtualTable::GetIterator();yb::master::YQLPartitionsVTable::RetrieveData();__pthread_rwlock_rdlock_slow;(unknown)


Need to reevaluate the changes done in 7f65b9d (gh issue #8978), and reduce lock contention by limiting where we grab the lock / introducing a separate lock

Note: this is with generate_partitions_vtable_on_changes = true, and partitions_vtable_cache_refresh_secs = 0

@hulien22 hulien22 added area/docdb YugabyteDB core features status/awaiting-triage Issue awaiting triage labels Jun 18, 2022
@hulien22 hulien22 self-assigned this Jun 18, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 18, 2022
@hulien22 hulien22 removed the priority/medium Medium priority issue label Jun 18, 2022
@yugabyte-ci yugabyte-ci added priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels Jun 28, 2022
@hulien22 hulien22 changed the title [DocDB] YQL system.partitions cache lock contention lead to master unresponsiveness [DocDB] YQL system.partitions cache lock contention leads to master unresponsiveness Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants