Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] A large number of YSQL test failures in macOS debug mode #2509

Closed
mbautin opened this issue Oct 3, 2019 · 3 comments
Closed

[YSQL] A large number of YSQL test failures in macOS debug mode #2509

mbautin opened this issue Oct 3, 2019 · 3 comments
Assignees

Comments

@mbautin
Copy link
Contributor

mbautin commented Oct 3, 2019

image

@mbautin mbautin changed the title A large number of YSQL test failures in macOS debug mode [YSQL] A large number of YSQL test failures in macOS debug mode Oct 3, 2019
@ndeodhar ndeodhar assigned d-uspenskiy and unassigned ndeodhar Oct 4, 2019
@d-uspenskiy
Copy link
Contributor

d-uspenskiy commented Oct 8, 2019

It looks like postmaster process crashes with segmentation fault on one of t-server while starting

[ts-1] | 2019-10-03 17:38:34.944 PDT [70562] LOG: server process (PID 70669) was terminated by signal 11: Segmentation fault
-- | -- | --
[ts-1] | 2019-10-03 17:38:34.944 PDT [70562] LOG: terminating any other active server processes
[ts-1] | 2019-10-03 17:38:34.951 PDT [70562] LOG: all server processes terminated; reinitializing

Symptoms are the same on several failed builds/tests.
I've failed to reproduce it manually.

@d-uspenskiy
Copy link
Contributor

d-uspenskiy commented Nov 7, 2019

Core dump of postgres process is

   2019-10-31 15:15:18,307 (main) [WARN - org.yb.util.CoreFileUtil.processCoreFile(CoreFileUtil.java:107)] Analyzing core file using the command: [/Volumes/net/v1/yb-macmini-4.dev.yugabyte.com/jenkins/jenkins-github-yugabyte-db-phabricator-15025/build-support/analyze_core_file.sh, --c
ore, /cores/core.20754, --executable, /Volumes/net/v1/yb-macmini-4.dev.yugabyte.com/jenkins/jenkins-github-yugabyte-db-phabricator-15025/build/debug-clang-dynamic-ninja/postgres/bin/postgres]
    Found a core file at '/cores/core.20754', backtrace:
    + echo 'thread backtrace all'
    + lldb /Volumes/net/v1/yb-macmini-4.dev.yugabyte.com/jenkins/jenkins-github-yugabyte-db-phabricator-15025/build/debug-clang-dynamic-ninja/postgres/bin/postgres -c /cores/core.20754
    + egrep -v '^\[New LWP [0-9]+\]$'
    + /Volumes/net/v1/yb-macmini-4.dev.yugabyte.com/jenkins/jenkins-github-yugabyte-db-phabricator-15025/build-support/dedup_thread_stacks.py
    + tee -a /dev/null
    (lldb) target create "/Volumes/net/v1/yb-macmini-4.dev.yugabyte.com/jenkins/jenkins-github-yugabyte-db-phabricator-15025/build/debug-clang-dynamic-ninja/postgres/bin/postgres" --core "/cores/core.20754"
    Core file '/cores/core.20754' (x86_64) was loaded.
    (lldb) thread backtrace all
    error: connection.cc.o DWARF DIE at 0x00084714 (class Reactor) has a member variable 0x000848ab (process_outbound_queue_task_) whose type is a forward declaration, not a complete definition.
    Try compiling the source file with -fstandalone-debug
    * thread #1, stop reason = signal SIGSTOP
      * frame #0: 0x00007fff7bc7d86a libsystem_kernel.dylib`__psynch_cvwait + 10
        frame #1: 0x00007fff7bd3c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
        frame #2: 0x00007fff78d77a0a libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
        frame #3: 0x00007fff78d8096b libc++.1.dylib`std::__1::__assoc_sub_state::__sub_wait(std::__1::unique_lock<std::__1::mutex>&) + 45
        frame #4: 0x00000001101c742d libyb_client.dylib`std::__1::__assoc_state<yb::client::YBClient*>::copy(this=0x000000011914a460) at future:708:11
        frame #5: 0x00000001101c2e8d libyb_client.dylib`yb::client::AsyncClientInitialiser::client(this=<unavailable>) const at async_initializer.cc:69:25
        frame #6: 0x000000010f745133 libyb_pggate.dylib`yb::pggate::PgApiImpl::CreateSession(this=0x00000001192c5c00, pg_env=<unavailable>, database_name="yugabyte", pg_session=0x000000010f19b830) at pggate.cc:158:48
        frame #7: 0x000000010f73eda8 libyb_pggate.dylib`::YBCPgCreateSession(pg_env=0x0000000000000000, database_name="yugabyte", pg_session=0x000000010f19b830) at ybc_pggate.cc:85:29
        frame #8: 0x000000010ee47c64 postgres`YBInitPostgresBackend(program_name=<unavailable>, db_name="yugabyte", user_name="yugabyte") at pg_yb_utils.c:0
        frame #9: 0x000000010ee35ae5 postgres`InitPostgres(in_dbname="yugabyte", dboid=0, username="yugabyte", useroid=0, out_dbname=0x0000000000000000, override_allow_connections=<unavailable>) at postinit.c:0
        frame #10: 0x000000010ed4890f postgres`PostgresMain(argc=1, argv=0x000000011931a460, dbname=<unavailable>, username="yugabyte") at postgres.c:4048:2
        frame #11: 0x000000010ecd7bb6 postgres`BackendRun(port=0x000000010f098eb2) at postmaster.c:4398:2
        frame #12: 0x000000010ecd744a postgres`BackendStartup(port=<unavailable>) at postmaster.c:4064:3
        frame #13: 0x000000010ecd6c51 postgres`ServerLoop at postmaster.c:1724:7
        frame #14: 0x000000010ecd4caa postgres`PostmasterMain(argc=19, argv=0x000000011914a1e0) at postmaster.c:1387:11
        frame #15: 0x000000010ec43a9e postgres`PostgresServerProcessMain(argc=<unavailable>, argv=<unavailable>) at main.c:234:3
        frame #16: 0x000000010ec43db2 postgres`main + 34
        frame #17: 0x00007fff7bb453d5 libdyld.dylib`start + 1
      thread #2, stop reason = signal SIGSTOP
        frame #0: 0x00007fff7bc8078e libsystem_kernel.dylib`kevent + 10
        frame #1: 0x00000001121029e7 libyrpc.dylib`boost::asio::detail::kqueue_reactor::run(this=0x0000000119390000, usec=<unavailable>, ops=0x0000700001264d50) at kqueue_reactor.ipp:407:20
        frame #2: 0x00000001121026e6 libyrpc.dylib`boost::asio::detail::scheduler::do_run_one(this=0x00000001191a0120, lock=0x0000700001264d68, this_thread=0x0000700001264d40, ec=0x0000700001264db8) at scheduler.ipp:385:16
        frame #3: 0x0000000112102501 libyrpc.dylib`boost::asio::detail::scheduler::run(this=0x00000001191a0120, ec=0x0000700001264db8) at scheduler.ipp:154:10
        frame #4: 0x00000001120fe499 libyrpc.dylib`yb::rpc::IoThreadPool::Impl::Execute(this=<unavailable>) at io_thread_pool.cc:77:17
        frame #5: 0x00000001151ddf64 libyb_util.dylib`yb::Thread::SuperviseThread(arg=<unavailable>) at thread.cc:741:3
        frame #6: 0x00007fff7bd392eb libsystem_pthread.dylib`_pthread_body + 126
        frame #7: 0x00007fff7bd3c249 libsystem_pthread.dylib`_pthread_start + 66
        frame #8: 0x00007fff7bd3840d libsystem_pthread.dylib`thread_start + 13
      thread #3, stop reason = signal SIGSTOP
        frame #0: 0x00007fff7bc7d86a libsystem_kernel.dylib`__psynch_cvwait + 10
        frame #1: 0x00007fff7bd3c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
        frame #2: 0x00000001121033ab libyrpc.dylib`void boost::asio::detail::posix_event::wait<boost::asio::detail::conditionally_enabled_mutex::scoped_lock>(this=0x00000001191a01a8, lock=0x00007000012e7d68) at posix_event.hpp:106:7
        frame #3: 0x0000000112102656 libyrpc.dylib`boost::asio::detail::scheduler::do_run_one(this=0x00000001191a0120, lock=0x00007000012e7d68, this_thread=0x00007000012e7d40, ec=0x00007000012e7db8) at scheduler.ipp:409:21
        frame #4: 0x0000000112102501 libyrpc.dylib`boost::asio::detail::scheduler::run(this=0x00000001191a0120, ec=0x00007000012e7db8) at scheduler.ipp:154:10
        frame #5: 0x00000001120fe499 libyrpc.dylib`yb::rpc::IoThreadPool::Impl::Execute(this=<unavailable>) at io_thread_pool.cc:77:17
        frame #6: 0x00000001151ddf64 libyb_util.dylib`yb::Thread::SuperviseThread(arg=<unavailable>) at thread.cc:741:3
        frame #7: 0x00007fff7bd392eb libsystem_pthread.dylib`_pthread_body + 126
        frame #8: 0x00007fff7bd3c249 libsystem_pthread.dylib`_pthread_start + 66
        frame #9: 0x00007fff7bd3840d libsystem_pthread.dylib`thread_start + 13
      thread #4, stop reason = signal SIGSTOP
        frame #0: 0x00007fff7bc7d86a libsystem_kernel.dylib`__psynch_cvwait + 10
        frame #1: 0x00007fff7bd3c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
        frame #2: 0x00000001121033ab libyrpc.dylib`void boost::asio::detail::posix_event::wait<boost::asio::detail::conditionally_enabled_mutex::scoped_lock>(this=0x00000001191a01a8, lock=0x000070000136ad68) at posix_event.hpp:106:7
        frame #3: 0x0000000112102656 libyrpc.dylib`boost::asio::detail::scheduler::do_run_one(this=0x00000001191a0120, lock=0x000070000136ad68, this_thread=0x000070000136ad40, ec=0x000070000136adb8) at scheduler.ipp:409:21
        frame #4: 0x0000000112102501 libyrpc.dylib`boost::asio::detail::scheduler::run(this=0x00000001191a0120, ec=0x000070000136adb8) at scheduler.ipp:154:10
        frame #5: 0x00000001120fe499 libyrpc.dylib`yb::rpc::IoThreadPool::Impl::Execute(this=<unavailable>) at io_thread_pool.cc:77:17
        frame #6: 0x00000001151ddf64 libyb_util.dylib`yb::Thread::SuperviseThread(arg=<unavailable>) at thread.cc:741:3
        frame #7: 0x00007fff7bd392eb libsystem_pthread.dylib`_pthread_body + 126
        frame #8: 0x00007fff7bd3c249 libsystem_pthread.dylib`_pthread_start + 66
        frame #9: 0x00007fff7bd3840d libsystem_pthread.dylib`thread_start + 13
      thread #5, stop reason = signal SIGSTOP
        frame #0: 0x00007fff7bc7d86a libsystem_kernel.dylib`__psynch_cvwait + 10
        frame #1: 0x00007fff7bd3c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
        frame #2: 0x00000001121033ab libyrpc.dylib`void boost::asio::detail::posix_event::wait<boost::asio::detail::conditionally_enabled_mutex::scoped_lock>(this=0x00000001191a01a8, lock=0x00007000013edd68) at posix_event.hpp:106:7
        frame #3: 0x0000000112102656 libyrpc.dylib`boost::asio::detail::scheduler::do_run_one(this=0x00000001191a0120, lock=0x00007000013edd68, this_thread=0x00007000013edd40, ec=0x00007000013eddb8) at scheduler.ipp:409:21
        frame #4: 0x0000000112102501 libyrpc.dylib`boost::asio::detail::scheduler::run(this=0x00000001191a0120, ec=0x00007000013eddb8) at scheduler.ipp:154:10
        frame #5: 0x00000001120fe499 libyrpc.dylib`yb::rpc::IoThreadPool::Impl::Execute(this=<unavailable>) at io_thread_pool.cc:77:17
        frame #6: 0x00000001151ddf64 libyb_util.dylib`yb::Thread::SuperviseThread(arg=<unavailable>) at thread.cc:741:3
        frame #7: 0x00007fff7bd392eb libsystem_pthread.dylib`_pthread_body + 126
        frame #8: 0x00007fff7bd3c249 libsystem_pthread.dylib`_pthread_start + 66
        frame #9: 0x00007fff7bd3840d libsystem_pthread.dylib`thread_start + 13
      thread #6, stop reason = signal SIGSTOP
        frame #0: 0x00007fff7bc7a332 libsystem_kernel.dylib`swtch_pri + 10
        frame #1: 0x00007fff7bd3be90 libsystem_pthread.dylib`sched_yield + 11
        frame #2: 0x00000001151dda05 libyb_util.dylib`yb::Thread::StartThread(category="rpc_thread_pool", name="rpc_tp_pggate_ybclient_0", functor=yb::Thread::ThreadFunctor @ 0x0000700001470460, holder=0x000000011919eea8)>, scoped_refptr<yb::Thread>*) at thread.cc:694:7
        frame #3: 0x00000001121a8f6a libyrpc.dylib`yb::Status yb::Thread::Create<void (yb::rpc::(anonymous namespace)::Worker::*)(), yb::rpc::(anonymous namespace)::Worker*>(category=<unavailable>, name=<unavailable>, f=<unavailable>, a1=<unavailable>, holder=<unavailable>)::Worker::* const&)(), yb::rpc::(anonymous namespace)::Worker* const&, scoped_refptr<yb::Thread>*) at thread.h:156:12
        frame #4: 0x00000001121a8d7a libyrpc.dylib`yb::rpc::(anonymous namespace)::Worker::Worker(this=0x000000011919eea0, share=<unavailable>, index=0)::ThreadPoolShare*, unsigned long) at thread_pool.cc:58:5
        frame #5: 0x00000001121a2d46 libyrpc.dylib`yb::rpc::ThreadPool::Impl::Enqueue(this=0x0000000119176500, task=0x000000011934cc20) at thread_pool.cc:199:35
        frame #6: 0x000000011211df76 libyrpc.dylib`yb::rpc::OutboundCall::InvokeCallback(this=0x000000011934cb60) at outbound_call.cc:369:28
        frame #7: 0x000000011211e2a0 libyrpc.dylib`yb::rpc::OutboundCall::SetResponse(this=0x000000011934cb60, resp=<unavailable>) at outbound_call.cc:427:7
        frame #8: 0x00000001120e5c7b libyrpc.dylib`yb::rpc::Connection::HandleCallResponse(this=0x00000001191a0eb8, call_data=<unavailable>) at connection.cc:312:9
        frame #9: 0x00000001121ae223 libyrpc.dylib`yb::rpc::YBOutboundConnectionContext::HandleCall(this=<unavailable>, connection=<unavailable>, call_data=<unavailable>) at yb_rpc.cc:463:22
        frame #10: 0x00000001121ae23e libyrpc.dylib`non-virtual thunk to yb::rpc::YBOutboundConnectionContext::HandleCall(this=<unavailable>, connection=<unavailable>, call_data=<unavailable>) at yb_rpc.cc:0
        frame #11: 0x00000001120de48a libyrpc.dylib`yb::rpc::BinaryCallParser::Parse(this=0x00000001192d58e8, connection=std::__1::shared_ptr<yb::rpc::Connection>::element_type @ 0x00000001191a0eb8 strong=2 weak=3, data=0x0000700001470b70, read_buffer_full=(value_ = false)) at binary_call_parser.cc:85:7
        frame #12: 0x00000001121ae553 libyrpc.dylib`yb::rpc::YBOutboundConnectionContext::ProcessCalls(this=<unavailable>, connection=<unavailable>, data=<unavailable>, read_buffer_full=<unavailable>) at yb_rpc.cc:484:19
        frame #13: 0x00000001120e5730 libyrpc.dylib`yb::rpc::Connection::ProcessReceived(this=0x00000001191a0eb8, data=0x0000700001470b70, read_buffer_full=(value_ = false)) at connection.cc:275:27
        frame #14: 0x000000011219d2c6 libyrpc.dylib`yb::rpc::TcpStream::TryProcessReceived(this=0x000000011916f2c0) at tcp_stream.cc:354:17
        frame #15: 0x000000011219cb43 libyrpc.dylib`yb::rpc::TcpStream::ReadHandler(this=0x000000011916f2c0) at tcp_stream.cc:303:31
        frame #16: 0x000000011219c994 libyrpc.dylib`yb::rpc::TcpStream::Handler(this=0x000000011916f2c0, watcher=<unavailable>, revents=1) at tcp_stream.cc:251:14
        frame #17: 0x000000011636dc4b libev.4.dylib`ev_invoke_pending + 107
        frame #18: 0x000000011636e90a libev.4.dylib`ev_run + 3226
        frame #19: 0x0000000112148319 libyrpc.dylib`yb::rpc::Reactor::RunThread(this=0x000000011934c000) at reactor.cc:481:9
        frame #20: 0x00000001151ddf64 libyb_util.dylib`yb::Thread::SuperviseThread(arg=<unavailable>) at thread.cc:741:3
        frame #21: 0x00007fff7bd392eb libsystem_pthread.dylib`_pthread_body + 126
        frame #22: 0x00007fff7bd3c249 libsystem_pthread.dylib`_pthread_start + 66
        frame #23: 0x00007fff7bd3840d libsystem_pthread.dylib`thread_start + 13
      thread #7, stop reason = signal SIGSTOP
        frame #0: 0x00007fff7bc7a332 libsystem_kernel.dylib`swtch_pri + 10
        frame #1: 0x00007fff7bd3be90 libsystem_pthread.dylib`sched_yield + 11
        frame #2: 0x000000011667d083 libgutil.dylib`base::internal::SpinLockDelay(w=<unavailable>, value=<unavailable>, loop=1) at spinlock_posix-inl.h:63:5
        frame #3: 0x000000011667cf50 libgutil.dylib`base::SpinLock::SlowLock(this=0x00000001191a8298) at spinlock.cc:151:5
        frame #4: 0x0000000115104900 libyb_util.dylib`yb::MetricEntity::FindOrCreateHistogram(this=0x00000001191a8270, proto=0x00000001121eff30) at metrics.h:1322:36
        frame #5: 0x000000011510488a libyb_util.dylib`yb::HistogramPrototype::Instantiate(this=<unavailable>, entity=<unavailable>) at metrics.cc:746:18
        frame #6: 0x00000001120e382f libyrpc.dylib`yb::rpc::Connection::Connection(this=0x00000001191a0d98, reactor=0x000000011934c240, stream=<unavailable>, direction=<unavailable>, rpc_metrics=0x00000001192ceea0, context=unique_ptr<yb::rpc::ConnectionContext, std::__1::default_delete<yb::rpc::ConnectionContext> > @ 0x00000001191a0e80) at connection.cc:87:48
        frame #7: 0x000000011215bb64 libyrpc.dylib`std::__1::__compressed_pair_elem<yb::rpc::Connection, 1, false>::__compressed_pair_elem<yb::rpc::Reactor*&&, std::__1::unique_ptr<yb::rpc::Stream, std::__1::default_delete<yb::rpc::Stream> >&&, yb::rpc::ConnectionDirection&&, yb::rpc::RpcMetrics*&&, std::__1::unique_ptr<yb::rpc::ConnectionContext, std::__1::default_delete<yb::rpc::ConnectionContext> >&&, 0ul, 1ul, 2ul, 3ul, 4ul>(this=<unavailable>, (null)=<unavailable>, __args=size=5, (null)=<unavailable>) at memory:2156:9
        frame #8: 0x000000011215b6d1 libyrpc.dylib`std::__1::__shared_ptr_emplace<yb::rpc::Connection, std::__1::allocator<yb::rpc::Connection> >::__shared_ptr_emplace<yb::rpc::Reactor*, std::__1::unique_ptr<yb::rpc::Stream, std::__1::default_delete<yb::rpc::Stream> >, yb::rpc::ConnectionDirection, yb::rpc::RpcMetrics*, std::__1::unique_ptr<yb::rpc::ConnectionContext, std::__1::default_delete<yb::rpc::ConnectionContext> > >(this=0x00000001191a0d80, __a=allocator<yb::rpc::Connection> @ 0x00007000014f3980, __args=<unavailable>, __args=<unavailable>, __args=<unavailable>, __args=<unavailable>, __args=<unavailable>) at memory:3672:16
        frame #9: 0x000000011215b32d libyrpc.dylib`std::__1::shared_ptr<yb::rpc::Connection> std::__1::shared_ptr<yb::rpc::Connection>::make_shared<yb::rpc::Reactor*, std::__1::unique_ptr<yb::rpc::Stream, std::__1::default_delete<yb::rpc::Stream> >, yb::rpc::ConnectionDirection, yb::rpc::RpcMetrics*, std::__1::unique_ptr<yb::rpc::ConnectionContext, std::__1::default_delete<yb::rpc::ConnectionContext> > >(__args=<unavailable>, __args=<unavailable>, __args=<unavailable>, __args=<unavailable>, __args=<unavailable>) at memory:4331:26
        frame #10: 0x000000011214c99f libyrpc.dylib`std::__1::enable_if<!(is_array<yb::rpc::Connection>::value), std::__1::shared_ptr<yb::rpc::Connection> >::type std::__1::make_shared<yb::rpc::Connection, yb::rpc::Reactor*, std::__1::unique_ptr<yb::rpc::Stream, std::__1::default_delete<yb::rpc::Stream> >, yb::rpc::ConnectionDirection, yb::rpc::RpcMetrics*, std::__1::unique_ptr<yb::rpc::ConnectionContext, std::__1::default_delete<yb::rpc::ConnectionContext> > >(__args=<unavailable>, __args=<unavailable>, __args=<unavailable>, __args=<unavailable>, __args=<unavailable>) at memory:4710:12
        frame #11: 0x000000011214b4bc libyrpc.dylib`yb::rpc::Reactor::FindOrStartConnection(this=0x000000011934c240, conn_id=0x000000011934cdc0, hostname="127.0.1.227", deadline=<unavailable>, conn=0x00007000014f3bb8) at reactor.cc:574:21
        frame #12: 0x000000011214acd3 libyrpc.dylib`yb::rpc::Reactor::AssignOutboundCall(this=0x000000011934c240, call=std::__1::shared_ptr<yb::rpc::OutboundCall>::element_type @ 0x000000011934cda0 strong=2 weak=2) at reactor.cc:389:14
        frame #13: 0x0000000112147896 libyrpc.dylib`yb::rpc::Reactor::ProcessOutboundQueue(this=0x000000011934c240) at reactor.cc:679:17
        frame #14: 0x0000000112152279 libyrpc.dylib`yb::rpc::FunctorReactorTask<std::__1::__bind<void (yb::rpc::Reactor::*)(), yb::rpc::Reactor*> >::Run(this=<unavailable>, reactor=0x000000011934c240) at reactor.h:136:5
        frame #15: 0x000000011214a74b libyrpc.dylib`yb::rpc::Reactor::AsyncHandler(this=0x000000011934c240, watcher=<unavailable>, revents=<unavailable>) at reactor.cc:357:11
        frame #16: 0x000000011636dc4b libev.4.dylib`ev_invoke_pending + 107
        frame #17: 0x000000011636e90a libev.4.dylib`ev_run + 3226
        frame #18: 0x0000000112148319 libyrpc.dylib`yb::rpc::Reactor::RunThread(this=0x000000011934c240) at reactor.cc:481:9
        frame #19: 0x00000001151ddf64 libyb_util.dylib`yb::Thread::SuperviseThread(arg=<unavailable>) at thread.cc:741:3
        frame #20: 0x00007fff7bd392eb libsystem_pthread.dylib`_pthread_body + 126
        frame #21: 0x00007fff7bd3c249 libsystem_pthread.dylib`_pthread_start + 66
        frame #22: 0x00007fff7bd3840d libsystem_pthread.dylib`thread_start + 13
      thread #8, stop reason = signal SIGSTOP
        frame #0: 0x00007fff7bd53e34 libsystem_trace.dylib`_os_log_cmp_key + 4
        frame #1: 0x00007fff7bbfcb74 libsystem_c.dylib`rb_tree_find_node + 53
        frame #2: 0x00007fff7bd52021 libsystem_trace.dylib`os_log_create + 368
        frame #3: 0x00007fff7bc5b127 libsystem_info.dylib`gai_log_init + 23
        frame #4: 0x00007fff7bd37ce3 libsystem_pthread.dylib`__pthread_once_handler + 65
        frame #5: 0x00007fff7bd2daab libsystem_platform.dylib`_os_once_callout + 18
        frame #6: 0x00007fff7bd37c7f libsystem_pthread.dylib`pthread_once + 56
        frame #7: 0x00007fff7bc5a4ab libsystem_info.dylib`gai_log + 27
        frame #8: 0x00007fff7bc5b33f libsystem_info.dylib`_gai_load_libnetwork_once + 63
        frame #9: 0x00007fff7bd37ce3 libsystem_pthread.dylib`__pthread_once_handler + 65
        frame #10: 0x00007fff7bd2daab libsystem_platform.dylib`_os_once_callout + 18
        frame #11: 0x00007fff7bd37c7f libsystem_pthread.dylib`pthread_once + 56
        frame #12: 0x00007fff7bc5b29b libsystem_info.dylib`_gai_load_libnetwork + 27
        frame #13: 0x00007fff7bc5b64f libsystem_info.dylib`_gai_nat64_v4_address_requires_synthesis + 31
        frame #14: 0x00007fff7bc5aaa0 libsystem_info.dylib`_gai_nat64_second_pass + 512
        frame #15: 0x00007fff7bc39847 libsystem_info.dylib`si_addrinfo + 1959
        frame #16: 0x00007fff7bc38f77 libsystem_info.dylib`_getaddrinfo_internal + 231
        frame #17: 0x00007fff7bc38e7d libsystem_info.dylib`getaddrinfo + 61
        frame #18: 0x000000011512f8e5 libyb_util.dylib`yb::GetFQDN(hostname="yb-macmini-5.dev.yugabyte.com") at net_util.cc:371:20
        frame #19: 0x0000000110244657 libyb_client.dylib`yb::client::YBClient::Data::InitLocalHostNames(this=0x000000011934c6c0) at client-internal.cc:768:12
        frame #20: 0x00000001101e753f libyb_client.dylib`yb::client::YBClientBuilder::DoBuild(this=0x00000001192c5e90, messenger=<unavailable>, client=0x0000700001576e60) at client.cc:359:3
        frame #21: 0x00000001101e7c16 libyb_client.dylib`yb::client::YBClientBuilder::Build(this=<unavailable>, messenger=<unavailable>) at client.cc:377:3
        frame #22: 0x00000001101c2cdf libyb_client.dylib`yb::client::AsyncClientInitialiser::InitClient(this=0x00000001192c5e90) at async_initializer.cc:75:35
        frame #23: 0x00000001101c6daf libyb_client.dylib`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, std::__1::__bind<void (yb::client::AsyncClientInitialiser::*)(), yb::client::AsyncClientInitialiser*> > >(__vp=<unavailable>) at thread:352:5
        frame #24: 0x00007fff7bd392eb libsystem_pthread.dylib`_pthread_body + 126
        frame #25: 0x00007fff7bd3c249 libsystem_pthread.dylib`_pthread_start + 66
        frame #26: 0x00007fff7bd3840d libsystem_pthread.dylib`thread_start + 13
     thread #9, stop reason = signal SIGSTOP
        frame #0: 0x00007fff7bc7d86a libsystem_kernel.dylib`__psynch_cvwait + 10
        frame #1: 0x00007fff7bd3c56e libsystem_pthread.dylib`_pthread_cond_wait + 722
        frame #2: 0x00007fff78d77a0a libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 18
        frame #3: 0x00000001121a9fd6 libyrpc.dylib`yb::rpc::(anonymous namespace)::Worker::PopTask(this=0x000000011919eea0, task=0x00007000015f9db8) at thread_pool.cc:128:13
        frame #4: 0x00000001121a9003 libyrpc.dylib`yb::rpc::(anonymous namespace)::Worker::Execute(this=0x000000011919eea0) at thread_pool.cc:98:11
        frame #5: 0x00000001151ddf64 libyb_util.dylib`yb::Thread::SuperviseThread(arg=<unavailable>) at thread.cc:741:3
        frame #6: 0x00007fff7bd392eb libsystem_pthread.dylib`_pthread_body + 126
        frame #7: 0x00007fff7bd3c249 libsystem_pthread.dylib`_pthread_start + 66
        frame #8: 0x00007fff7bd3840d libsystem_pthread.dylib`thread_start + 13

d-uspenskiy added a commit that referenced this issue Nov 23, 2019
Summary:
Our test environment already analyzes core dumps of most processes such
as yb-master and yb-tserver, and prints their symbolized stack traces
into the log.  To analyze a core dump of any child process, the test
environment needs to know the child process id and the full path to its
executable file, and we already have this information for masters and
tablet servers.

But sometimes a PostgreSQL process can also crash with a core dump.  In
this diff, we are adding a new test flag, process_info_dir, and when it
is specified, each PostgreSQL process (either the postmaster or a
backend) will create a file named as its pid in this directory, and the
file would contain its executable path. This allows us to properly
symbolize core file stack traces for PostgreSQL processes.

Test Plan: Jenkins

Reviewers: mikhail

Reviewed By: mikhail

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D7495
@mbautin
Copy link
Contributor Author

mbautin commented Jan 2, 2020

Relevant links:

It looks like setting OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES in the tablet server process (the parent process of the postgres process) might help.

mbautin added a commit that referenced this issue Jan 3, 2020
…up on macOS

Summary:
Add a DNS lookup of the local hostname to postmaster startup to force
macOS network libraries to get initialized before any fork() calls
happen.  This fixes failures of ~20 tests in macOS debug mode. Without
this, PostgreSQL backends would frequently crash with SIGSEGV and dump
cores when trying to do the same DNS lookup.

Here is a SIGSEGV stack trace that we would previously get without this
fix:

```
frame #0: 0x00007fff7bd53e34 libsystem_trace.dylib`_os_log_cmp_key + 4
frame #1: 0x00007fff7bbfcb74 libsystem_c.dylib`rb_tree_find_node + 53
frame #2: 0x00007fff7bd52021 libsystem_trace.dylib`os_log_create + 368
frame #3: 0x00007fff7bc5b127 libsystem_info.dylib`gai_log_init + 23
frame #4: 0x00007fff7bd37ce3 libsystem_pthread.dylib`__pthread_once_handler + 65
frame #5: 0x00007fff7bd2daab libsystem_platform.dylib`_os_once_callout + 18
frame #6: 0x00007fff7bd37c7f libsystem_pthread.dylib`pthread_once + 56
frame #7: 0x00007fff7bc5a4ab libsystem_info.dylib`gai_log + 27
frame #8: 0x00007fff7bc5b33f libsystem_info.dylib`_gai_load_libnetwork_once + 63
frame #9: 0x00007fff7bd37ce3 libsystem_pthread.dylib`__pthread_once_handler + 65
frame #10: 0x00007fff7bd2daab libsystem_platform.dylib`_os_once_callout + 18
frame #11: 0x00007fff7bd37c7f libsystem_pthread.dylib`pthread_once + 56
frame #12: 0x00007fff7bc5b29b libsystem_info.dylib`_gai_load_libnetwork + 27
frame #13: 0x00007fff7bc5b64f libsystem_info.dylib`_gai_nat64_v4_address_requires_synthesis + 31
frame #14: 0x00007fff7bc5aaa0 libsystem_info.dylib`_gai_nat64_second_pass + 512
frame #15: 0x00007fff7bc39847 libsystem_info.dylib`si_addrinfo + 1959
frame #16: 0x00007fff7bc38f77 libsystem_info.dylib`_getaddrinfo_internal + 231
frame #17: 0x00007fff7bc38e7d libsystem_info.dylib`getaddrinfo + 61
frame #18: 0x000000011512f8e5 libyb_util.dylib`yb::GetFQDN(hostname="...") at net_util.cc:371:20
```

Test Plan: Jenkins

Reviewers: mihnea, dmitry

Reviewed By: dmitry

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D7757
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants