Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DocDB][SQLsmith] ERROR: errstart was not called / Already marked as completed #11066

Closed
def- opened this issue Jan 12, 2022 · 6 comments
Closed
Assignees
Labels
area/docdb YugabyteDB core features

Comments

@def-
Copy link
Contributor

def- commented Jan 12, 2022

Description

Code state: 518246e (January 11)
Reproduced with:

bin/yugabyted start
bin/ysqlsh
CREATE DATABASE yb_demo;
\c yb_demo
\i 'sample/schema.sql';
select
  ref_0.tmplinline as c0
from
  pg_catalog.pg_pltemplate as ref_0
where ref_0.tmplname !~ ref_0.tmplhandler
limit 77
 for share;

This unexpectedly fails with ERROR: errstart was not called. A new FATAL master log file is written containing:

F20220112 16:58:04 ../../src/yb/rpc/inbound_call.cc:168] 0x0000000116e7d0d8 -> Call yb.tserver.TabletServerService.Read 127.0.0.1:50463 => 127.0.0.1:7100 (request call id 929): Already marked as completed
    @        0x106ab9894  google::LogDestination::LogToSinks()
    @        0x106ab89cc  google::LogMessage::SendToLog()
    @        0x106ab9388  google::LogMessage::Flush()
    @        0x106ab919c  google::LogMessage::~LogMessage()
    @        0x11276a620  yb::rpc::InboundCall::RecordHandlingCompleted()
    @        0x112829114  yb::rpc::RpcContext::RespondSuccess()
    @        0x105454438  yb::tserver::SetupErrorAndRespond()
    @        0x1054546a4  yb::tserver::SetupErrorAndRespond()
    @        0x10550dbd0  yb::tserver::ReadOperationCompletionCallback::operator()()
    @        0x10550db48  _ZNSt3__1L8__invokeIRN2yb7tserver31ReadOperationCompletionCallbackEJRKNS1_6StatusEEEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOS8_DpOS9_
    @        0x10550dae4  std::__1::__invoke_void_return_wrapper<>::__call<>()
    @        0x10550daa4  std::__1::__function::__alloc_func<>::operator()()
    @        0x10550c610  std::__1::__function::__func<>::operator()()
    @        0x10550e260  std::__1::__function::__value_func<>::operator()()
    @        0x1054b7b44  std::__1::function<>::operator()()
    @        0x10a850968  yb::tablet::WriteQuery::Complete()
    @        0x10a850244  yb::tablet::WriteQuery::Cancel()
    @        0x10a84ffa4  yb::tablet::WriteQuery::DoStartSynchronization()
    @        0x10a5c669c  yb::tablet::WriteQuery::StartSynchronization()
    @        0x10a850b10  yb::tablet::WriteQuery::SimpleExecuteDone()
    @        0x10a8509fc  yb::tablet::WriteQuery::ExecuteDone()
    @        0x10a853df8  yb::tablet::WriteQuery::Execute()
    @        0x10a5c6508  yb::tablet::Tablet::AcquireLocksAndPerformDocOperations()
    @        0x10a6941d0  yb::tablet::TabletPeer::WriteAsync()
    @        0x1054b4684  yb::tserver::TabletServiceImpl::Read()
    @        0x109715378  _ZZZN2yb7tserver21TabletServerServiceIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK3$_1clENSt3__110shared_ptrINS_3rpc11InboundCallEEEENKUlPKNS0_13ReadRequestPBEPNS0_14ReadResponsePBENSA_10RpcContextEE_clESF_SH_SI_
    @        0x109715128  _ZN2yb3rpc10HandleCallINS0_19RpcCallPBParamsImplINS_7tserver13ReadRequestPBENS3_14ReadResponsePBEEEZZNS3_21TabletServerServiceIf11InitMethodsERK13scoped_refptrINS_12MetricEntityEEENK3$_1clENSt3__110shared_ptrINS0_11InboundCallEEEEUlPKS4_PS5_NS0_10RpcContextEE_EEDaSH_T0_
    @        0x109714e40  yb::tserver::TabletServerServiceIf::InitMethods()::$_1::operator()()
    @        0x109714d84  _ZNSt3__1L8__invokeIRZN2yb7tserver21TabletServerServiceIf11InitMethodsERK13scoped_refptrINS1_12MetricEntityEEE3$_1JNS_10shared_ptrINS1_3rpc11InboundCallEEEEEEDTclclsr3std3__1E7forwardIT_Efp_Espclsr3std3__1E7forwardIT0_Efp0_EEEOSF_DpOSG_
    @        0x109714d10  std::__1::__invoke_void_return_wrapper<>::__call<>()
    @        0x109714cd0  std::__1::__function::__alloc_func<>::operator()()
    @        0x109713884  std::__1::__function::__func<>::operator()()

For context: The query comes from SQLsmith, I didn't come up on my own.

@def- def- added the area/docdb YugabyteDB core features label Jan 12, 2022
@tedyu
Copy link
Contributor

tedyu commented Jan 12, 2022

The FATAL can be reproduced without creating demo DB.
Saw the following in yb-master.FATAL.details.2022-01-12T17_52_28.pid2278.txt

F20220112 17:52:28 ../../src/yb/rpc/inbound_call.cc:168] 0x0000000004234260 -> Call yb.tserver.TabletServerService.Read 127.0.0.1:35245 => 127.0.0.1:7100 (request call id 75): Already marked as completed
    @     0x7fd2aaecd502  yb::LogFatalHandlerSink::send(int, char const*, char const*, int, tm const*, char const*, unsigned long) (src/yb/util/logging.cc:472)
    @     0x7fd2aaa036eb
    @     0x7fd2aa9fd3be
    @     0x7fd2aa9fdcc7
    @     0x7fd2aa9fdb01
    @     0x7fd2ab53e4ac  yb::rpc::InboundCall::RecordHandlingCompleted() (src/yb/rpc/inbound_call.cc:168)
    @     0x7fd2ab5d7903  yb::rpc::RpcContext::RespondSuccess() (src/yb/rpc/rpc_context.cc:159)
    @     0x7fd2b095f9dd  yb::tserver::SetupErrorAndRespond(yb::tserver::TabletServerErrorPB*, yb::Status const&, yb::tserver::TabletServerErrorPB_Code, yb::rpc::RpcContext*) (src/yb/tserver/service_util.cc:52)
    @     0x7fd2b095fa99  yb::tserver::SetupErrorAndRespond(yb::tserver::TabletServerErrorPB*, yb::Status const&, yb::rpc::RpcContext*) (src/yb/tserver/service_util.cc:59)
    @     0x7fd2b09f510e  yb::tserver::ReadOperationCompletionCallback::operator()(yb::Status const&) const (src/yb/tserver/tablet_service.cc:1993)
    @     0x7fd2b09f5081  decltype(std::forward<yb::tserver::ReadOperationCompletionCallback&>(fp)(std::forward<yb::Status const&>(fp0))) std::__invoke<yb::tserver::ReadOperationCompletionCallback&, yb::Status const&>(yb::tserver::ReadOperationCompletionCallback&, yb::Status const&) (/opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064215-dd4872fe56-centos7-x86_64-clang12/installed/uninstrumented/libcxx/include/c++/v1/type_traits:3694)
    @     0x7fd2b09f5021  void std::__invoke_void_return_wrapper<void, true>::__call<yb::tserver::ReadOperationCompletionCallback&, yb::Status const&>(yb::tserver::ReadOperationCompletionCallback&, yb::Status const&) (/opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064215-dd4872fe56-centos7-x86_64-clang12/installed/uninstrumented/libcxx/include/c++/v1/__functional_base:348)
    @     0x7fd2b09f4fe1  std::__function::__alloc_func<yb::tserver::ReadOperationCompletionCallback, std::allocator<yb::tserver::ReadOperationCompletionCallback>, void (yb::Status const&)>::operator()(yb::Status const&) (/opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064215-dd4872fe56-centos7-x86_64-clang12/installed/uninstrumented/libcxx/include/c++/v1/functional:1558)
    @     0x7fd2b09f41a0  std::__function::__func<yb::tserver::ReadOperationCompletionCallback, std::allocator<yb::tserver::ReadOperationCompletionCallback>, void (yb::Status const&)>::operator()(yb::Status const&) (/opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064215-dd4872fe56-centos7-x86_64-clang12/installed/uninstrumented/libcxx/include/c++/v1/functional:1732)
    @     0x7fd2b09f5619  std::__function::__value_func<void (yb::Status const&)>::operator()(yb::Status const&) const (/opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064215-dd4872fe56-centos7-x86_64-clang12/installed/uninstrumented/libcxx/include/c++/v1/functional:1885)
    @     0x7fd2b09bdb4c  std::function<void (yb::Status const&)>::operator()(yb::Status const&) const (/opt/yb-build/thirdparty/yugabyte-db-thirdparty-v20211222064215-dd4872fe56-centos7-x86_64-clang12/installed/uninstrumented/libcxx/include/c++/v1/functional:2560)
    @     0x7fd2b036e8bd  yb::tablet::WriteQuery::Complete(yb::Status const&) (src/yb/tablet/write_query.cc:188)
    @     0x7fd2b036e473  yb::tablet::WriteQuery::Cancel(yb::Status const&) (src/yb/tablet/write_query.cc:181)
    @     0x7fd2b036e28b  yb::tablet::WriteQuery::DoStartSynchronization(yb::Status const&) (src/yb/tablet/write_query.cc:138)
    @     0x7fd2b01c8b8c  yb::tablet::WriteQuery::StartSynchronization(unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery> >, yb::Status const&) (src/yb/tablet/write_query.h:90)
    @     0x7fd2b036ea4c  yb::tablet::WriteQuery::SimpleExecuteDone(yb::Status const&) (src/yb/tablet/write_query.cc:795)
    @     0x7fd2b036e92f  yb::tablet::WriteQuery::ExecuteDone(yb::Status const&) (src/yb/tablet/write_query.cc:196)
    @     0x7fd2b0371a00  yb::tablet::WriteQuery::Execute(unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery> >) (src/yb/tablet/write_query.cc:392)
    @     0x7fd2b01a043a  yb::tablet::Tablet::AcquireLocksAndPerformDocOperations(unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery> >) (src/yb/tablet/tablet.cc:1617)
    @     0x7fd2b0248e25  yb::tablet::TabletPeer::WriteAsync(unique_ptr<yb::tablet::WriteQuery, std::default_delete<yb::tablet::WriteQuery> >) (src/yb/tablet/tablet_peer.cc:647)
    @     0x7fd2b09a0092  yb::tserver::TabletServiceImpl::Read(yb::tserver::ReadRequestPB const*, yb::tserver::ReadResponsePB*, yb::rpc::RpcContext) (src/yb/tserver/tablet_service.cc:2208)
    @     0x7fd2ad9b769b  operator() (src/yb/tserver/tserver_service.service.cc:483)
    @     0x7fd2ad9b7558  auto yb::rpc::HandleCall<yb::rpc::RpcCallPBParamsImpl<yb::tserver::ReadRequestPB, yb::tserver::ReadResponsePB>, yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_1::operator()(shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::ReadRequestPB const*, yb::tserver::ReadResponsePB*, yb::rpc::RpcContext)>(shared_ptr<yb::rpc::InboundCall>, yb::tserver::TabletServerServiceIf::InitMethods(scoped_refptr<yb::MetricEntity> const&)::$_1::operator()(shared_ptr<yb::rpc::InboundCall>) const::'lambda'(yb::tserver::ReadRequestPB const*, yb::tserver::ReadResponsePB*, yb::rpc::RpcContext)) (src/yb/rpc/local_call.h:122)
    @     0x7fd2ad9b713d  operator() (src/yb/tserver/tserver_service.service.cc:481)

@bmatican
Copy link
Contributor

@def- @tedyu note, I downloaded 2.11.1 and was able to get the errstart issue, but not the FATAL.

Looking at the code, seems it's a DFATAL, so it only happens in debug! -- useful info for the future

LOG_IF_WITH_PREFIX(DFATAL, timing_.time_completed.Initialized()) << "Already marked as completed";

@bmatican
Copy link
Contributor

Sergei has a WIP diff for fixing this.

@def-
Copy link
Contributor Author

def- commented Jan 25, 2022

I believe this is the same issue but with release build:

delete from information_schema.sql_implementation_info
where
case when pg_catalog.pg_notification_queue_usage() >= case when information_schema.sql_implementation_info.character_value is NULL then (select stanullfrac from pg_catalog.pg_statistic limit 1 offset 6)
           else (select stanullfrac from pg_catalog.pg_statistic limit 1 offset 6)
           end
         then cast(null as bpchar) else cast(null as bpchar) end
     ~* case when (select moo from public.mvtest_vt1 limit 1 offset 2)
         is not NULL then pg_catalog.txid_status(
      cast((select pg_catalog.sum(indnatts) from pg_catalog.pg_index)
         as int8)) else pg_catalog.txid_status(
      cast((select pg_catalog.sum(indnatts) from pg_catalog.pg_index)
         as int8)) end

returning
  case when information_schema.sql_implementation_info.implementation_info_name is not NULL then information_schema.sql_implementation_info.implementation_info_name else information_schema.sql_implementation_info.implementation_info_name end
     as c0,
  information_schema.sql_implementation_info.integer_value as c1,
  case when cast(null as "char") > pg_catalog.char(
        cast((select pg_catalog.max(application_name) from pg_catalog.pg_stat_activity)
           as text)) then information_schema.sql_implementation_info.implementation_info_id else information_schema.sql_implementation_info.implementation_info_id end
     as c2,
  information_schema.sql_implementation_info.character_value as c3,
  pg_catalog.yb_mem_usage_sql_b() as c4,
  88 as c5
;

fails deterministically with:

ERROR:  transaction ID 208 is in the future

Found with SQLsmith while I was looking for something else, but might still be worth investigating if this is unexpected behavior and should check if Sergei's fix also fixes this.

@bmatican
Copy link
Contributor

@spolitov do you think this 2nd issue that @def- identified is really the same root cause?

spolitov added a commit that referenced this issue Jan 26, 2022
Summary:
The return is missing after SetupErrorAndRespond in case of read for share w/o transaction.
Refactored read handler to fix this issue and avoid similar mistakes in future.

Test Plan: ybd --gtest_filter PgMiniTest.RowLockWithoutTransaction

Reviewers: bogdan, timur

Reviewed By: timur

Subscribers: ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D14943
@def-
Copy link
Contributor Author

def- commented Jan 26, 2022

The fix fixed the original issue, from what I can tell. Haven't seen this crash happen anymore in Debug build. The "transaction ID 208 is in the future" seems unrelated and still persists. I opened a new bug: #11237

@def- def- closed this as completed Jan 26, 2022
@def- def- changed the title [DocDB] ERROR: errstart was not called / Already marked as completed [DocDB][SQLsmith] ERROR: errstart was not called / Already marked as completed Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features
Projects
None yet
Development

No branches or pull requests

4 participants