Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PgIndexBackfillTest.ReadTime fails due to 'Check failed: cpu < pools_.size() (2 vs. 0)' #7105

Closed
tedyu opened this issue Feb 4, 2021 · 7 comments

Comments

@tedyu
Copy link
Contributor

tedyu commented Feb 4, 2021

Running ybd '--cxx-test pgwrapper_pg_index_backfill-test --gtest_filter PgIndexBackfillTest.ReadTime' gives me:

I0205 02:44:46.887485 2079658 libpq_utils.cc:162] Connected to PG (dbname='yugabyte' host=127.185.175.0 port=29915 user='postgres'), time taken: 0.371s
[ts-1] F0205 02:44:47.112234 2079728 object_pool.h:245] Check failed: cpu < pools_.size() (0 vs. 0)
[ts-1] Fatal failure details written to /nfusr/dev-server/tedyu/yugabyte-db/build/debug-gcc-dynamic-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_ReadTime.fatal_failure_details.ts-1.2021-02-05T02_44_46.pid2079708.txt
[ts-1] F20210205 02:44:46 ../../src/yb/util/object_pool.h:245] Check failed: cpu < pools_.size() (2 vs. 0)
[ts-1]     @     0x7f94acdd86c7  yb::LogFatalHandlerSink::send(int, char const*, char const*, int, tm const*, char const*, unsigned long) (src/yb/util/logging.cc:474)
[ts-1]     @     0x7f94a89b5865
[ts-1]     @     0x7f94a89b2e39
[ts-1]     @     0x7f94a89b6528
[ts-1]     @     0x7f94ace77127  yb::ThreadSafeObjectPool<yb::internal::ArenaBase<yb::internal::ThreadSafeArenaTraits> >::GetCPU() const (src/yb/util/object_pool.h:245)
[ts-1]     @     0x7f94ace77127  yb::ThreadSafeObjectPool<yb::internal::ArenaBase<yb::internal::ThreadSafeArenaTraits> >::Release(yb::internal::ArenaBase<yb::internal::ThreadSafeArenaTraits>*) (src/yb/util/object_pool.h:232)
[ts-1]     @     0x7f94ace77127  yb::Trace::~Trace() (src/yb/util/trace.cc:242)
[ts-1]     @     0x7f94ace77372  yb::RefCountedThreadSafe<yb::Trace, yb::DefaultRefCountedThreadSafeTraits<yb::Trace> >::DeleteInternal(yb::Trace const*) (src/yb/gutil/ref_counted.h:222)
[ts-1]     @     0x7f94ace77372  yb::DefaultRefCountedThreadSafeTraits<yb::Trace>::Destruct(yb::Trace const*) (src/yb/gutil/ref_counted.h:184)
[ts-1]     @     0x7f94ace77372  yb::RefCountedThreadSafe<yb::Trace, yb::DefaultRefCountedThreadSafeTraits<yb::Trace> >::Release() const (src/yb/gutil/ref_counted.h:213)
[ts-1]     @     0x7f94ace77372  scoped_refptr<yb::Trace>::~scoped_refptr() (src/yb/gutil/ref_counted.h:336)

This is reproducible as of commit cdf6b7a

I will run the test with earlier commit(s) and see when the test started failing.

@jaki
Copy link
Contributor

jaki commented Feb 4, 2021

This looks like an initdb issue, not related to the test. Try reinitdb.

@tedyu
Copy link
Contributor Author

tedyu commented Feb 4, 2021

ybd --java-test 'org.yb.loadtest.TestSpark3Jsonb'
passes. So reinitdb should not be required.

@jaki
Copy link
Contributor

jaki commented Feb 4, 2021

I tested on latest master, and I don't get this error. Checked a recent jenkins test, and it also didn't have this error.

@jaki
Copy link
Contributor

jaki commented Feb 4, 2021

I've seen this kind of message before, and I believe reinitdb fixes it. Or clean build.

@tedyu
Copy link
Contributor Author

tedyu commented Feb 5, 2021

I got the following error after running reinitdb:

I0205 02:44:46.887485 2079658 libpq_utils.cc:162] Connected to PG (dbname='yugabyte' host=127.185.175.0 port=29915 user='postgres'), time taken: 0.371s
[ts-1] F0205 02:44:47.112234 2079728 object_pool.h:245] Check failed: cpu < pools_.size() (0 vs. 0)
[ts-1] Fatal failure details written to /nfusr/dev-server/tedyu/yugabyte-db/build/debug-gcc-dynamic-ninja/yb-test-logs/tests-pgwrapper__pg_index_backfill-test/PgIndexBackfillTest_ReadTime.fatal_failure_details.ts-1.2021-02-05T02_44_46.pid2079708.txt
[ts-1] F20210205 02:44:46 ../../src/yb/util/object_pool.h:245] Check failed: cpu < pools_.size() (2 vs. 0)
[ts-1]     @     0x7f94acdd86c7  yb::LogFatalHandlerSink::send(int, char const*, char const*, int, tm const*, char const*, unsigned long) (src/yb/util/logging.cc:474)
[ts-1]     @     0x7f94a89b5865
[ts-1]     @     0x7f94a89b2e39
[ts-1]     @     0x7f94a89b6528
[ts-1]     @     0x7f94ace77127  yb::ThreadSafeObjectPool<yb::internal::ArenaBase<yb::internal::ThreadSafeArenaTraits> >::GetCPU() const (src/yb/util/object_pool.h:245)
[ts-1]     @     0x7f94ace77127  yb::ThreadSafeObjectPool<yb::internal::ArenaBase<yb::internal::ThreadSafeArenaTraits> >::Release(yb::internal::ArenaBase<yb::internal::ThreadSafeArenaTraits>*) (src/yb/util/object_pool.h:232)
[ts-1]     @     0x7f94ace77127  yb::Trace::~Trace() (src/yb/util/trace.cc:242)
[ts-1]     @     0x7f94ace77372  yb::RefCountedThreadSafe<yb::Trace, yb::DefaultRefCountedThreadSafeTraits<yb::Trace> >::DeleteInternal(yb::Trace const*) (src/yb/gutil/ref_counted.h:222)
[ts-1]     @     0x7f94ace77372  yb::DefaultRefCountedThreadSafeTraits<yb::Trace>::Destruct(yb::Trace const*) (src/yb/gutil/ref_counted.h:184)
[ts-1]     @     0x7f94ace77372  yb::RefCountedThreadSafe<yb::Trace, yb::DefaultRefCountedThreadSafeTraits<yb::Trace> >::Release() const (src/yb/gutil/ref_counted.h:213)
[ts-1]     @     0x7f94ace77372  scoped_refptr<yb::Trace>::~scoped_refptr() (src/yb/gutil/ref_counted.h:336)
[ts-1]     @     0x7f94ace77372  void std::_Destroy<scoped_refptr<yb::Trace> >(scoped_refptr<yb::Trace>*) (gcc/5.5.0_4/include/c++/5.5.0/bits/stl_construct.h:93)
[ts-1]     @     0x7f94ace77372  void std::_Destroy_aux<false>::__destroy<scoped_refptr<yb::Trace>*>(scoped_refptr<yb::Trace>*, scoped_refptr<yb::Trace>*) (gcc/5.5.0_4/include/c++/5.5.0/bits/stl_construct.h:103)
[ts-1]     @     0x7f94ace77372  void std::_Destroy<scoped_refptr<yb::Trace>*>(scoped_refptr<yb::Trace>*, scoped_refptr<yb::Trace>*) (gcc/5.5.0_4/include/c++/5.5.0/bits/stl_construct.h:126)
[ts-1]     @     0x7f94ace77372  void std::_Destroy<scoped_refptr<yb::Trace>*, scoped_refptr<yb::Trace> >(scoped_refptr<yb::Trace>*, scoped_refptr<yb::Trace>*, std::allocator<scoped_refptr<yb::Trace> >&) (gcc/5.5.0_4/include/c++/5.5.0/bits/stl_construct.h:151)
[ts-1]     @     0x7f94ace77372  std::vector<scoped_refptr<yb::Trace>, std::allocator<scoped_refptr<yb::Trace> > >::~vector() (gcc/5.5.0_4/include/c++/5.5.0/bits/stl_vector.h:424)
[ts-1]     @     0x7f94ace77372  yb::Trace::~Trace() (src/yb/util/trace.cc:238)
[ts-1]     @     0x7f94a7c5313c  yb::RefCountedThreadSafe<yb::Trace, yb::DefaultRefCountedThreadSafeTraits<yb::Trace> >::DeleteInternal(yb::Trace const*) (src/yb/gutil/ref_counted.h:222)
[ts-1]     @     0x7f94a7c5313c  yb::DefaultRefCountedThreadSafeTraits<yb::Trace>::Destruct(yb::Trace const*) (src/yb/gutil/ref_counted.h:184)
[ts-1]     @     0x7f94a7c5313c  yb::RefCountedThreadSafe<yb::Trace, yb::DefaultRefCountedThreadSafeTraits<yb::Trace> >::Release() const (src/yb/gutil/ref_counted.h:213)
[ts-1]     @     0x7f94a7c5313c  scoped_refptr<yb::Trace>::~scoped_refptr() (src/yb/gutil/ref_counted.h:336)
[ts-1]     @     0x7f94a7c5313c  yb::client::YBOperation::~YBOperation() (src/yb/client/yb_op.cc:87)
[ts-1]     @     0x7f94a7c53908  yb::client::YBPgsqlOp::~YBPgsqlOp() (src/yb/client/yb_op.cc:502)
[ts-1]     @     0x7f94a7c58f34  yb::client::YBPgsqlReadOp::~YBPgsqlReadOp() (src/yb/client/yb_op.h:505)
[ts-1]     @     0x7f94a7c58f34  yb::client::YBPgsqlReadOp::~YBPgsqlReadOp() (src/yb/client/yb_op.h:505)
[ts-1]     @     0x7f94aca09524  std::default_delete<yb::client::YBPgsqlReadOp>::operator()(yb::client::YBPgsqlReadOp*) const (gcc/5.5.0_4/include/c++/5.5.0/bits/unique_ptr.h:76)
[ts-1]     @     0x7f94aca09524  std::_Sp_counted_deleter<yb::client::YBPgsqlReadOp*, std::default_delete<yb::client::YBPgsqlReadOp>, std::allocator<void>, (__gnu_cxx::_Lock_policy)2>::_M_dispose() (gcc/5.5.0_4/include/c++/5.5.0/bits/shared_ptr_base.h:466)
[ts-1]     @     0x7f94aca2e627  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() (gcc/5.5.0_4/include/c++/5.5.0/bits/shared_ptr_base.h:150)
[ts-1]     @     0x7f94aca2e627  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() (gcc/5.5.0_4/include/c++/5.5.0/bits/shared_ptr_base.h:659)
[ts-1]     @     0x7f94aca2e627  std::__shared_ptr<yb::client::YBPgsqlOp, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() (gcc/5.5.0_4/include/c++/5.5.0/bits/shared_ptr_base.h:925)
[ts-1]     @     0x7f94aca2e627  shared_ptr<yb::client::YBPgsqlOp>::~shared_ptr() (gcc/5.5.0_4/include/c++/5.5.0/bits/shared_ptr.h:93)
[ts-1]     @     0x7f94aca2e627  void std::_Destroy<shared_ptr<yb::client::YBPgsqlOp> >(shared_ptr<yb::client::YBPgsqlOp>*) (gcc/5.5.0_4/include/c++/5.5.0/bits/stl_construct.h:93)
[ts-1]     @     0x7f94aca2e627  void std::_Destroy_aux<false>::__destroy<shared_ptr<yb::client::YBPgsqlOp>*>(shared_ptr<yb::client::YBPgsqlOp>*, shared_ptr<yb::client::YBPgsqlOp>*) (gcc/5.5.0_4/include/c++/5.5.0/bits/stl_construct.h:103)
[ts-1]     @     0x7f94aca2e627  void std::_Destroy<shared_ptr<yb::client::YBPgsqlOp>*>(shared_ptr<yb::client::YBPgsqlOp>*, shared_ptr<yb::client::YBPgsqlOp>*) (gcc/5.5.0_4/include/c++/5.5.0/bits/stl_construct.h:126)
[ts-1]     @     0x7f94aca2e627  void std::_Destroy<shared_ptr<yb::client::YBPgsqlOp>*, shared_ptr<yb::client::YBPgsqlOp> >(shared_ptr<yb::client::YBPgsqlOp>*, shared_ptr<yb::client::YBPgsqlOp>*, std::allocator<shared_ptr<yb::client::YBPgsqlOp> >&) (gcc/5.5.0_4/include/c++/5.5.0/bits/stl_construct.h:151)
[ts-1]     @     0x7f94aca2e627  std::vector<shared_ptr<yb::client::YBPgsqlOp>, std::allocator<shared_ptr<yb::client::YBPgsqlOp> > >::~vector() (gcc/5.5.0_4/include/c++/5.5.0/bits/stl_vector.h:424)
[ts-1]     @     0x7f94aca2e627  yb::pggate::PgDocOp::~PgDocOp() (src/yb/yql/pggate/pg_doc_op.cc:125)

@tedyu tedyu changed the title PgIndexBackfillTest.ReadTime fails due to 'num_tablets should be greater than 0' PgIndexBackfillTest.ReadTime fails due to 'Check failed: cpu < pools_.size() (2 vs. 0)' Feb 5, 2021
amitanandaiyer added a commit that referenced this issue Feb 8, 2021
Summary:
1) Support tracing at a transaction level. Maintains a trace for each client/transaction that will include
            each RPC performed as part of the txn as a child transaction.
            This behavior can be controlled using the gflags txn_print_trace_every.
2) Carve out ybclient_print_all_traces to be separate from rpc_print_all_traces to be used on the ybclient side.
3) Support adding indentation to traces to make them more readable.
4) Add support for VTRACE and VTRACE_TO that work similar to VLOG for tracing.

Test Plan:
Check the logs with:
ybd debug --cxx-test ql-transaction-test --gtest_filter QLTransactionTest.Simple --test_args --vmodule=transaction=5 --test_args --txn_print_trace_every_n=1 --test_args --enable_tracing=true

```
10256 I0205 12:25:15.848953 11367 transaction.cc:162] { metadata: { transaction_id: 0c591819-cbb5-4553-81d9-2aa1619b77f5 isolation: SNAPSHOT_ISOLATION status_tablet: 16c9a801eb424f4f8eeeddf8c69909ed priority: 7774162232958115779 start_time: { days: 18663 time: 12:25:15.792396 } } state: kCommitted } took 56003us. Trace:
10257 0205 12:25:15.792348 (+     0us) transaction.cc:181] Init
10258 0205 12:25:15.796643 (+  4295us) transaction.cc:252] Perparing 1 ops
10259 0205 12:25:15.796704 (+    61us) transaction.cc:782] RequestStatusTablet
10260 0205 12:25:15.810836 (+ 14132us) transaction.cc:801] StatusTabletPicked
10261 0205 12:25:15.810871 (+    35us) transaction.cc:815] LookupStatusTablet
10262 0205 12:25:15.811902 (+  1031us) transaction.cc:826] LookupTabletDone
10263 0205 12:25:15.817018 (+  5116us) transaction.cc:252] Perparing 1 ops
10264 0205 12:25:15.823123 (+  6105us) transaction.cc:308] Flushed 1 ops. with Status OK
10265 0205 12:25:15.827623 (+  4500us) transaction.cc:252] Perparing 1 ops
10266 0205 12:25:15.831810 (+  4187us) transaction.cc:308] Flushed 1 ops. with Status OK
10267 0205 12:25:15.832019 (+   209us) transaction.cc:252] Perparing 1 ops
10268 0205 12:25:15.835559 (+  3540us) transaction.cc:308] Flushed 1 ops. with Status OK
10269 0205 12:25:15.838127 (+  2568us) transaction.cc:252] Perparing 1 ops
10270 0205 12:25:15.842376 (+  4249us) transaction.cc:308] Flushed 1 ops. with Status OK
10271 0205 12:25:15.842812 (+   436us) transaction.cc:252] Perparing 1 ops
10272 0205 12:25:15.845439 (+  2627us) transaction.cc:308] Flushed 1 ops. with Status OK
10273 0205 12:25:15.845662 (+   223us) transaction.cc:380] Commit
10274 0205 12:25:15.848420 (+  2758us) transaction.cc:743] CommitDone
10275 ..  Related trace:
10276 ..  0205 12:25:15.817133 (+     0us) async_rpc.cc:391] WriteRpc initiated to 5c0f03796883403abd632638d5215eac
10277 ..  0205 12:25:15.817215 (+    82us) async_rpc.cc:163] SendRpc() called.
10278 ..  0205 12:25:15.817216 (+     1us) tablet_rpc.cc:95] SelectTabletServer()
10279 ..  0205 12:25:15.817256 (+    40us) async_rpc.cc:478] SendRpcToTserver
10280 ..  0205 12:25:15.817433 (+   177us) async_rpc.cc:484] RpcDispatched Asynchronously
10281 ..  0205 12:25:15.823061 (+  5628us) tablet_rpc.cc:290] Done(OK)
10282 ..  0205 12:25:15.823075 (+    14us) async_rpc.cc:606] ProcessResponseFromTserver(OK)
10283 ..  ..  Related trace:
10284 ..  ..  0205 12:25:15.817269 (+     0us) outbound_call.cc:204] Outbound Call initiated.
10285 ..  ..  0205 12:25:15.817431 (+   162us) reactor.cc:721] Scheduled.
10286 ..  ..  0205 12:25:15.817604 (+   173us) outbound_call.cc:452] Queued.
10287 ..  ..  0205 12:25:15.817631 (+    27us) outbound_call.cc:462] Call Sent.
10288 ..  ..  0205 12:25:15.822987 (+  5356us) outbound_call.cc:408] Response received.
10289 ..  ..  0205 12:25:15.823041 (+    54us) outbound_call.cc:370] Callback called asynchronously.
10290 ..  Related trace:
10291 ..  0205 12:25:15.827658 (+     0us) async_rpc.cc:391] WriteRpc initiated to c48ee42634d94d57be4ef7a3314e5ccc
10292 ..  0205 12:25:15.827669 (+    11us) async_rpc.cc:163] SendRpc() called.
10293 ..  0205 12:25:15.827670 (+     1us) tablet_rpc.cc:95] SelectTabletServer()
10294 ..  0205 12:25:15.827672 (+     2us) async_rpc.cc:478] SendRpcToTserver
10295 ..  0205 12:25:15.827707 (+    35us) async_rpc.cc:484] RpcDispatched Asynchronously
10296 ..  0205 12:25:15.831795 (+  4088us) tablet_rpc.cc:290] Done(OK)
10297 ..  0205 12:25:15.831803 (+     8us) async_rpc.cc:606] ProcessResponseFromTserver(OK)
10298 ..  ..  Related trace:
10299 ..  ..  0205 12:25:15.827676 (+     0us) outbound_call.cc:204] Outbound Call initiated.
10300 ..  ..  0205 12:25:15.827706 (+    30us) reactor.cc:721] Scheduled.
10301 ..  ..  0205 12:25:15.827874 (+   168us) outbound_call.cc:452] Queued.
10302 ..  ..  0205 12:25:15.827933 (+    59us) outbound_call.cc:462] Call Sent.
10303 ..  ..  0205 12:25:15.831736 (+  3803us) outbound_call.cc:408] Response received.
10304 ..  ..  0205 12:25:15.831766 (+    30us) outbound_call.cc:370] Callback called asynchronously.
10305 ..  Related trace:
10306 ..  0205 12:25:15.832039 (+     0us) async_rpc.cc:391] WriteRpc initiated to 5c0f03796883403abd632638d5215eac
10307 ..  0205 12:25:15.832058 (+    19us) async_rpc.cc:163] SendRpc() called.
10308 ..  0205 12:25:15.832058 (+     0us) tablet_rpc.cc:95] SelectTabletServer()
10309 ..  0205 12:25:15.832060 (+     2us) async_rpc.cc:478] SendRpcToTserver
10310 ..  0205 12:25:15.832087 (+    27us) async_rpc.cc:484] RpcDispatched Asynchronously
10311 ..  0205 12:25:15.835545 (+  3458us) tablet_rpc.cc:290] Done(OK)
10312 ..  0205 12:25:15.835553 (+     8us) async_rpc.cc:606] ProcessResponseFromTserver(OK)
10313 ..  ..  Related trace:
10314 ..  ..  0205 12:25:15.832064 (+     0us) outbound_call.cc:204] Outbound Call initiated.
10315 ..  ..  0205 12:25:15.832087 (+    23us) reactor.cc:721] Scheduled.
10316 ..  ..  0205 12:25:15.832259 (+   172us) outbound_call.cc:452] Queued.
10317 ..  ..  0205 12:25:15.832294 (+    35us) outbound_call.cc:462] Call Sent.
10318 ..  ..  0205 12:25:15.835356 (+  3062us) outbound_call.cc:408] Response received.
10319 ..  ..  0205 12:25:15.835368 (+    12us) outbound_call.cc:370] Callback called asynchronously.
10320 ..  Related trace:
10321 ..  0205 12:25:15.838140 (+     0us) async_rpc.cc:391] WriteRpc initiated to 1c8125ff1b6846c69593eec56b114693
10322 ..  0205 12:25:15.838149 (+     9us) async_rpc.cc:163] SendRpc() called.
10323 ..  0205 12:25:15.838149 (+     0us) tablet_rpc.cc:95] SelectTabletServer()
10324 ..  0205 12:25:15.838167 (+    18us) async_rpc.cc:478] SendRpcToTserver
10325 ..  0205 12:25:15.838206 (+    39us) async_rpc.cc:484] RpcDispatched Asynchronously
10326 ..  0205 12:25:15.842364 (+  4158us) tablet_rpc.cc:290] Done(OK)
10327 ..  0205 12:25:15.842371 (+     7us) async_rpc.cc:606] ProcessResponseFromTserver(OK)
10328 ..  ..  Related trace:
10329 ..  ..  0205 12:25:15.838171 (+     0us) outbound_call.cc:204] Outbound Call initiated.
10330 ..  ..  0205 12:25:15.838205 (+    34us) reactor.cc:721] Scheduled.
10331 ..  ..  0205 12:25:15.838381 (+   176us) outbound_call.cc:452] Queued.
10332 ..  ..  0205 12:25:15.838427 (+    46us) outbound_call.cc:462] Call Sent.
10333 ..  ..  0205 12:25:15.842239 (+  3812us) outbound_call.cc:408] Response received.
10334 ..  ..  0205 12:25:15.842254 (+    15us) outbound_call.cc:370] Callback called asynchronously.
10335 ..  Related trace:
10336 ..  0205 12:25:15.842824 (+     0us) async_rpc.cc:391] WriteRpc initiated to 126f184381e94c2ea1787aa4a212129a
10337 ..  0205 12:25:15.842831 (+     7us) async_rpc.cc:163] SendRpc() called.
10338 ..  0205 12:25:15.842832 (+     1us) tablet_rpc.cc:95] SelectTabletServer()
10339 ..  0205 12:25:15.842834 (+     2us) async_rpc.cc:478] SendRpcToTserver
10340 ..  0205 12:25:15.842861 (+    27us) async_rpc.cc:484] RpcDispatched Asynchronously
10341 ..  0205 12:25:15.845426 (+  2565us) tablet_rpc.cc:290] Done(OK)
10342 ..  0205 12:25:15.845433 (+     7us) async_rpc.cc:606] ProcessResponseFromTserver(OK)
```

Reviewers: rsami, jason, sergei, zyu

Reviewed By: zyu

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D10557
@amitanandaiyer
Copy link
Contributor

Seems like yb_op wasn't too happy about having child traces added to it. 78f6e99 gets around this problem by not adding anything to yb_op; instead adding the trace directly from async_rpc to transaction.

I'm guessing yb_op (in the context of pgsql) isn't too happy with the changes because of the memory allocation mechanism used there? I'm not too sure of the exact root cause. But, it will be useful/good to understand why?

@m-iancu @spolitov @bmatican Do you guys have any idea on the thread-stack that @tedyu posted above?

@tedyu
Copy link
Contributor Author

tedyu commented Feb 12, 2021

This would be fixed by D10589

mbautin added a commit that referenced this issue Feb 12, 2021
Summary:
Clear the postgres_process_memctxs global PgMemctx object map in the YBCDestroyPgGate function, right after destroying the PgApiImpl object. If we don't do this, then we are leaving the destruction order of objects whose last refcount is held by the PgMemctx objects in the global map up to chance. Some of them contain Trace objects, which hold pointers to arenas allocated from a global thread-safe pool of arenas, but that thread-safe pool could have already been destroyed by the time the global PgMemctx map is being destructed. Global order of destructors in C++ is not well defined, and we should not rely on it.

Example crash stack trace: https://gist.githubusercontent.com/mbautin/37df1b73963b9e9cb92294be69560512/raw

Also see the stack traces in #7105

It would be good to move the global PgMemctx map into the PgApiImpl object in the future. That is tracked in #7216.

Test Plan: Jenkins: enable clang 11

Reviewers: neil, mihnea, dmitry

Reviewed By: dmitry

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D10589
@tedyu tedyu closed this as completed Feb 14, 2021
d-uspenskiy pushed a commit that referenced this issue Mar 1, 2021
…CDestroyPgGate

Summary:
Clear the postgres_process_memctxs global PgMemctx object map in the YBCDestroyPgGate function, right after destroying the PgApiImpl object. If we don't do this, then we are leaving the destruction order of objects whose last refcount is held by the PgMemctx objects in the global map up to chance. Some of them contain Trace objects, which hold pointers to arenas allocated from a global thread-safe pool of arenas, but that thread-safe pool could have already been destroyed by the time the global PgMemctx map is being destructed. Global order of destructors in C++ is not well defined, and we should not rely on it.

Example crash stack trace: https://gist.githubusercontent.com/mbautin/37df1b73963b9e9cb92294be69560512/raw

Also see the stack traces in #7105

It would be good to move the global PgMemctx map into the PgApiImpl object in the future. That is tracked in #7216.

Original commit: D10589 / f00bb9c

Test Plan: Jenkins: rebase: 2.4 enable clang 11

Reviewers: neil, mihnea

Reviewed By: mihnea

Subscribers: kannan, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D10740
polarweasel pushed a commit to lizayugabyte/yugabyte-db that referenced this issue Mar 9, 2021
…sactions

Summary:
1) Support tracing at a transaction level. Maintains a trace for each client/transaction that will include
            each RPC performed as part of the txn as a child transaction.
            This behavior can be controlled using the gflags txn_print_trace_every.
2) Carve out ybclient_print_all_traces to be separate from rpc_print_all_traces to be used on the ybclient side.
3) Support adding indentation to traces to make them more readable.
4) Add support for VTRACE and VTRACE_TO that work similar to VLOG for tracing.

Test Plan:
Check the logs with:
ybd debug --cxx-test ql-transaction-test --gtest_filter QLTransactionTest.Simple --test_args --vmodule=transaction=5 --test_args --txn_print_trace_every_n=1 --test_args --enable_tracing=true

```
10256 I0205 12:25:15.848953 11367 transaction.cc:162] { metadata: { transaction_id: 0c591819-cbb5-4553-81d9-2aa1619b77f5 isolation: SNAPSHOT_ISOLATION status_tablet: 16c9a801eb424f4f8eeeddf8c69909ed priority: 7774162232958115779 start_time: { days: 18663 time: 12:25:15.792396 } } state: kCommitted } took 56003us. Trace:
10257 0205 12:25:15.792348 (+     0us) transaction.cc:181] Init
10258 0205 12:25:15.796643 (+  4295us) transaction.cc:252] Perparing 1 ops
10259 0205 12:25:15.796704 (+    61us) transaction.cc:782] RequestStatusTablet
10260 0205 12:25:15.810836 (+ 14132us) transaction.cc:801] StatusTabletPicked
10261 0205 12:25:15.810871 (+    35us) transaction.cc:815] LookupStatusTablet
10262 0205 12:25:15.811902 (+  1031us) transaction.cc:826] LookupTabletDone
10263 0205 12:25:15.817018 (+  5116us) transaction.cc:252] Perparing 1 ops
10264 0205 12:25:15.823123 (+  6105us) transaction.cc:308] Flushed 1 ops. with Status OK
10265 0205 12:25:15.827623 (+  4500us) transaction.cc:252] Perparing 1 ops
10266 0205 12:25:15.831810 (+  4187us) transaction.cc:308] Flushed 1 ops. with Status OK
10267 0205 12:25:15.832019 (+   209us) transaction.cc:252] Perparing 1 ops
10268 0205 12:25:15.835559 (+  3540us) transaction.cc:308] Flushed 1 ops. with Status OK
10269 0205 12:25:15.838127 (+  2568us) transaction.cc:252] Perparing 1 ops
10270 0205 12:25:15.842376 (+  4249us) transaction.cc:308] Flushed 1 ops. with Status OK
10271 0205 12:25:15.842812 (+   436us) transaction.cc:252] Perparing 1 ops
10272 0205 12:25:15.845439 (+  2627us) transaction.cc:308] Flushed 1 ops. with Status OK
10273 0205 12:25:15.845662 (+   223us) transaction.cc:380] Commit
10274 0205 12:25:15.848420 (+  2758us) transaction.cc:743] CommitDone
10275 ..  Related trace:
10276 ..  0205 12:25:15.817133 (+     0us) async_rpc.cc:391] WriteRpc initiated to 5c0f03796883403abd632638d5215eac
10277 ..  0205 12:25:15.817215 (+    82us) async_rpc.cc:163] SendRpc() called.
10278 ..  0205 12:25:15.817216 (+     1us) tablet_rpc.cc:95] SelectTabletServer()
10279 ..  0205 12:25:15.817256 (+    40us) async_rpc.cc:478] SendRpcToTserver
10280 ..  0205 12:25:15.817433 (+   177us) async_rpc.cc:484] RpcDispatched Asynchronously
10281 ..  0205 12:25:15.823061 (+  5628us) tablet_rpc.cc:290] Done(OK)
10282 ..  0205 12:25:15.823075 (+    14us) async_rpc.cc:606] ProcessResponseFromTserver(OK)
10283 ..  ..  Related trace:
10284 ..  ..  0205 12:25:15.817269 (+     0us) outbound_call.cc:204] Outbound Call initiated.
10285 ..  ..  0205 12:25:15.817431 (+   162us) reactor.cc:721] Scheduled.
10286 ..  ..  0205 12:25:15.817604 (+   173us) outbound_call.cc:452] Queued.
10287 ..  ..  0205 12:25:15.817631 (+    27us) outbound_call.cc:462] Call Sent.
10288 ..  ..  0205 12:25:15.822987 (+  5356us) outbound_call.cc:408] Response received.
10289 ..  ..  0205 12:25:15.823041 (+    54us) outbound_call.cc:370] Callback called asynchronously.
10290 ..  Related trace:
10291 ..  0205 12:25:15.827658 (+     0us) async_rpc.cc:391] WriteRpc initiated to c48ee42634d94d57be4ef7a3314e5ccc
10292 ..  0205 12:25:15.827669 (+    11us) async_rpc.cc:163] SendRpc() called.
10293 ..  0205 12:25:15.827670 (+     1us) tablet_rpc.cc:95] SelectTabletServer()
10294 ..  0205 12:25:15.827672 (+     2us) async_rpc.cc:478] SendRpcToTserver
10295 ..  0205 12:25:15.827707 (+    35us) async_rpc.cc:484] RpcDispatched Asynchronously
10296 ..  0205 12:25:15.831795 (+  4088us) tablet_rpc.cc:290] Done(OK)
10297 ..  0205 12:25:15.831803 (+     8us) async_rpc.cc:606] ProcessResponseFromTserver(OK)
10298 ..  ..  Related trace:
10299 ..  ..  0205 12:25:15.827676 (+     0us) outbound_call.cc:204] Outbound Call initiated.
10300 ..  ..  0205 12:25:15.827706 (+    30us) reactor.cc:721] Scheduled.
10301 ..  ..  0205 12:25:15.827874 (+   168us) outbound_call.cc:452] Queued.
10302 ..  ..  0205 12:25:15.827933 (+    59us) outbound_call.cc:462] Call Sent.
10303 ..  ..  0205 12:25:15.831736 (+  3803us) outbound_call.cc:408] Response received.
10304 ..  ..  0205 12:25:15.831766 (+    30us) outbound_call.cc:370] Callback called asynchronously.
10305 ..  Related trace:
10306 ..  0205 12:25:15.832039 (+     0us) async_rpc.cc:391] WriteRpc initiated to 5c0f03796883403abd632638d5215eac
10307 ..  0205 12:25:15.832058 (+    19us) async_rpc.cc:163] SendRpc() called.
10308 ..  0205 12:25:15.832058 (+     0us) tablet_rpc.cc:95] SelectTabletServer()
10309 ..  0205 12:25:15.832060 (+     2us) async_rpc.cc:478] SendRpcToTserver
10310 ..  0205 12:25:15.832087 (+    27us) async_rpc.cc:484] RpcDispatched Asynchronously
10311 ..  0205 12:25:15.835545 (+  3458us) tablet_rpc.cc:290] Done(OK)
10312 ..  0205 12:25:15.835553 (+     8us) async_rpc.cc:606] ProcessResponseFromTserver(OK)
10313 ..  ..  Related trace:
10314 ..  ..  0205 12:25:15.832064 (+     0us) outbound_call.cc:204] Outbound Call initiated.
10315 ..  ..  0205 12:25:15.832087 (+    23us) reactor.cc:721] Scheduled.
10316 ..  ..  0205 12:25:15.832259 (+   172us) outbound_call.cc:452] Queued.
10317 ..  ..  0205 12:25:15.832294 (+    35us) outbound_call.cc:462] Call Sent.
10318 ..  ..  0205 12:25:15.835356 (+  3062us) outbound_call.cc:408] Response received.
10319 ..  ..  0205 12:25:15.835368 (+    12us) outbound_call.cc:370] Callback called asynchronously.
10320 ..  Related trace:
10321 ..  0205 12:25:15.838140 (+     0us) async_rpc.cc:391] WriteRpc initiated to 1c8125ff1b6846c69593eec56b114693
10322 ..  0205 12:25:15.838149 (+     9us) async_rpc.cc:163] SendRpc() called.
10323 ..  0205 12:25:15.838149 (+     0us) tablet_rpc.cc:95] SelectTabletServer()
10324 ..  0205 12:25:15.838167 (+    18us) async_rpc.cc:478] SendRpcToTserver
10325 ..  0205 12:25:15.838206 (+    39us) async_rpc.cc:484] RpcDispatched Asynchronously
10326 ..  0205 12:25:15.842364 (+  4158us) tablet_rpc.cc:290] Done(OK)
10327 ..  0205 12:25:15.842371 (+     7us) async_rpc.cc:606] ProcessResponseFromTserver(OK)
10328 ..  ..  Related trace:
10329 ..  ..  0205 12:25:15.838171 (+     0us) outbound_call.cc:204] Outbound Call initiated.
10330 ..  ..  0205 12:25:15.838205 (+    34us) reactor.cc:721] Scheduled.
10331 ..  ..  0205 12:25:15.838381 (+   176us) outbound_call.cc:452] Queued.
10332 ..  ..  0205 12:25:15.838427 (+    46us) outbound_call.cc:462] Call Sent.
10333 ..  ..  0205 12:25:15.842239 (+  3812us) outbound_call.cc:408] Response received.
10334 ..  ..  0205 12:25:15.842254 (+    15us) outbound_call.cc:370] Callback called asynchronously.
10335 ..  Related trace:
10336 ..  0205 12:25:15.842824 (+     0us) async_rpc.cc:391] WriteRpc initiated to 126f184381e94c2ea1787aa4a212129a
10337 ..  0205 12:25:15.842831 (+     7us) async_rpc.cc:163] SendRpc() called.
10338 ..  0205 12:25:15.842832 (+     1us) tablet_rpc.cc:95] SelectTabletServer()
10339 ..  0205 12:25:15.842834 (+     2us) async_rpc.cc:478] SendRpcToTserver
10340 ..  0205 12:25:15.842861 (+    27us) async_rpc.cc:484] RpcDispatched Asynchronously
10341 ..  0205 12:25:15.845426 (+  2565us) tablet_rpc.cc:290] Done(OK)
10342 ..  0205 12:25:15.845433 (+     7us) async_rpc.cc:606] ProcessResponseFromTserver(OK)
```

Reviewers: rsami, jason, sergei, zyu

Reviewed By: zyu

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D10557
polarweasel pushed a commit to lizayugabyte/yugabyte-db that referenced this issue Mar 9, 2021
…yPgGate

Summary:
Clear the postgres_process_memctxs global PgMemctx object map in the YBCDestroyPgGate function, right after destroying the PgApiImpl object. If we don't do this, then we are leaving the destruction order of objects whose last refcount is held by the PgMemctx objects in the global map up to chance. Some of them contain Trace objects, which hold pointers to arenas allocated from a global thread-safe pool of arenas, but that thread-safe pool could have already been destroyed by the time the global PgMemctx map is being destructed. Global order of destructors in C++ is not well defined, and we should not rely on it.

Example crash stack trace: https://gist.githubusercontent.com/mbautin/37df1b73963b9e9cb92294be69560512/raw

Also see the stack traces in yugabyte#7105

It would be good to move the global PgMemctx map into the PgApiImpl object in the future. That is tracked in yugabyte#7216.

Test Plan: Jenkins: enable clang 11

Reviewers: neil, mihnea, dmitry

Reviewed By: dmitry

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D10589
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants