Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core dump in string allocation trace in LTO build #11033

Closed
mbautin opened this issue Jan 8, 2022 · 2 comments
Closed

Core dump in string allocation trace in LTO build #11033

mbautin opened this issue Jan 8, 2022 · 2 comments
Assignees

Comments

@mbautin
Copy link
Contributor

mbautin commented Jan 8, 2022

https://gist.githubusercontent.com/mbautin/f1918c1c44b002eab7820f4a486a537e/raw

(lldb) * thread #1, name = 'yb-tserver-lto', stop reason = signal SIGSEGV
  * frame #0: 0x0000000002bc949e yb-tserver-lto`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::replace(unsigned long, unsigned long, char const*, unsigned long) + 158
    frame #1: 0x00000000037ca9b7 yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<yb::debug::ConvertableToTraceFormat> const*, unsigned char) [inlined] sys_futex(u=0x00000000066fe190, o=<unavailable>, v=<unavailable>, t=0x0000000000000000, u2=0x0000000000000000, v2=0) at linux_syscall_support.h:3401:14
    frame #2: 0x00000000037ca9af yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<yb::debug::ConvertableToTraceFormat> const*, unsigned char) [inlined] base::internal::yb_SpinLockWake(w=0x00000000066fe190, all=false) at spinlock_linux-inl.h:108
    frame #3: 0x00000000037ca99f yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<yb::debug::ConvertableToTraceFormat> const*, unsigned char) [inlined] base::SpinLock::SlowUnlock(this=0x00000000066fe190, wait_cycles=<unavailable>) at spinlock.cc:172
    frame #4: 0x00000000037ca99f yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<yb::debug::ConvertableToTraceFormat> const*, unsigned char) [inlined] base::SpinLock::Unlock(this=0x00000000066fe190) at spinlock.h:125
    frame #5: 0x00000000037ca996 yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<yb::debug::ConvertableToTraceFormat> const*, unsigned char) [inlined] base::SpinLockHolder::~SpinLockHolder(this=<unavailable>) at spinlock.h:166
    frame #6: 0x00000000037ca996 yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<yb::debug::ConvertableToTraceFormat> const*, unsigned char) [inlined] yb::debug::TraceLog::EventToConsoleMessage(this=0x00000000066fe000, phase=<unavailable>, timestamp=<unavailable>) at trace_event_impl.cc:1971
    frame #7: 0x00000000037c9ee7 yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(this=<unavailable>, phase=<unavailable>, category_group_enabled="", name=0x0000000000000000, id=<unavailable>, thread_id=<unavailable>, timestamp=0x0000000000000000, num_args=4836632, arg_names=0x00007fff0049cce0, arg_types="", arg_values=0x0000000000000000, convertable_values=0x0000000000000000, flags='\0') at trace_event_impl.cc:1884
    frame #8: 0x00000000037ca329 yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<yb::debug::ConvertableToTraceFormat> const*, unsigned char) [inlined] std::__1::char_traits<char>::length(__s="") at __string:0:53
    frame #9: 0x00000000037ca31f yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<yb::debug::ConvertableToTraceFormat> const*, unsigned char) [inlined] std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::operator<<<std::__1::char_traits<char> >(__os=0x00007fff0049c9b0, __str="") at ostream:868
    frame #10: 0x00000000037ca31f yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<yb::debug::ConvertableToTraceFormat> const*, unsigned char) [inlined] yb::debug::TraceEvent::AppendPrettyPrinted(out=0x00007fff0049c9b0) const at trace_event_impl.cc:841
    frame #11: 0x00000000037ca2db yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(char, unsigned char const*, char const*, unsigned long, int, long const&, int, char const**, unsigned char const*, unsigned long const*, scoped_refptr<yb::debug::ConvertableToTraceFormat> const*, unsigned char) [inlined] yb::debug::TraceLog::EventToConsoleMessage(this=0x00007fff0049ce40, phase=<unavailable>, timestamp=<unavailable>) at trace_event_impl.cc:1961
    frame #12: 0x00000000037c9ee7 yb-tserver-lto`yb::debug::TraceLog::AddTraceEventWithThreadIdAndTimestamp(this=<unavailable>, phase=<unavailable>, category_group_enabled="", name="", id=<unavailable>, thread_id=<unavailable>, timestamp=0x0000000000000000, num_args=0, arg_names=0x0000000000000000, arg_types=0x0000000000000000, arg_values=0x0000000000000000, convertable_values=0x0000000000000000, flags='p') at trace_event_impl.cc:1884
    frame #13: 0x00000000037e2192 yb-tserver-lto`yb::(anonymous namespace)::GetFilesystemStats(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [inlined] std::__1::basic_ostream<char, std::__1::char_traits<char> >& std::__1::operator<<<std::__1::char_traits<char> >(__os=<unavailable>, __str=<unavailable>) at ostream:868:12
    frame #14: 0x00000000037e2189 yb-tserver-lto`yb::(anonymous namespace)::GetFilesystemStats(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [inlined] yb::StatusCheck(value=<unavailable>) at status.cc:624
    frame #15: 0x00000000037e2161 yb-tserver-lto`yb::(anonymous namespace)::GetFilesystemStats(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) [inlined] yb::Result<statvfs>::Result(this=0x00007fff0049cd18, status=0x00007fff0049cc30) at result.h:100
    frame #16: 0x00000000037e2144 yb-tserver-lto`yb::(anonymous namespace)::GetFilesystemStats(path=<unavailable>) at env_posix.cc:274
    frame #17: 0x00000000037ebe0a yb-tserver-lto`_GLOBAL__sub_I_flags.cc [inlined] __cxx_global_var_init.43 at flags.cc:0
    frame #18: 0x00000000037ebde5 yb-tserver-lto`_GLOBAL__sub_I_flags.cc at flags.cc:0
    frame #19: 0x00000000034e2467 yb-tserver-lto`_GLOBAL__sub_I_abstract_tablet.cc [inlined] std::__1::char_traits<char>::copy(__n=27) at __string:388:56
    frame #20: 0x00000000034e2456 yb-tserver-lto`_GLOBAL__sub_I_abstract_tablet.cc [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::__init(this=Summary Unavailable, __s=<unavailable>, __sz=27) at string:1876
    frame #21: 0x00000000034e2437 yb-tserver-lto`_GLOBAL__sub_I_abstract_tablet.cc [inlined] std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string<std::nullptr_t>(this=Summary Unavailable, __s=<unavailable>) at string:841
    frame #22: 0x00000000034e2437 yb-tserver-lto`_GLOBAL__sub_I_abstract_tablet.cc [inlined] std::__1::pair<rocksdb::Tickers, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >::pair<rocksdb::Tickers, char const (this=0x00007fff0049ce40, __u1=<unavailable>, __u2=<unavailable>) [28], false>(rocksdb::Tickers&&, char const (&) [28]) at utility:447
    frame #23: 0x00000000034e242d yb-tserver-lto`_GLOBAL__sub_I_abstract_tablet.cc [inlined] __cxx_global_var_init.11 at statistics.h:248
    frame #24: 0x00000000034e1e43 yb-tserver-lto`_GLOBAL__sub_I_abstract_tablet.cc at abstract_tablet.cc:0

@mbautin mbautin self-assigned this Jan 8, 2022
@mbautin
Copy link
Contributor Author

mbautin commented Jan 11, 2022

A workaround for this is to define INTERNAL_TRACE_EVENT_ADD_SCOPED to be a no-op.

However, trace-test starts failing when this is done.
image

@mbautin
Copy link
Contributor Author

mbautin commented Jan 11, 2022

Another similar stack trace:

(lldb) bt
* thread #1, name = 'yb-tserver', stop reason = signal SIGSEGV
  * frame #0: 0x0000000002b759de yb-tserver`std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::basic_string(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 14
    frame #1: 0x000000000371f357 yb-tserver`Singleton<yb::debug::TraceLog>::Init() at trace_event_impl.cc:1158:19
    frame #2: 0x000000000371f114 yb-tserver`Singleton<yb::debug::TraceLog>::Init() [inlined] Singleton<yb::debug::TraceLog>::CreateInstance() at singleton.h:138
    frame #3: 0x000000000371f0f4 yb-tserver`Singleton<yb::debug::TraceLog>::Init() at singleton.h:131
    frame #4: 0x00000000031cca83 yb-tserver`GoogleOnceInternalInit(control=0x0000000004298518, func=<unavailable>, func_with_arg=0x0000000000000000, arg=0x0000000000000000)(), void (*)(void*), void*) at once.cc:53:7
    frame #5: 0x0000000003739283 yb-tserver`yb::(anonymous namespace)::PosixEnv::IsDirectory(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool*) [inlined] GoogleOnceInit(state=<unavailable>, func=<unavailable>)()) at once.h:69:5
    frame #6: 0x0000000003739274 yb-tserver`yb::(anonymous namespace)::PosixEnv::IsDirectory(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool*) [inlined] Singleton<yb::debug::TraceLog>::get() at singleton.h:93
    frame #7: 0x0000000003739274 yb-tserver`yb::(anonymous namespace)::PosixEnv::IsDirectory(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool*) [inlined] yb::debug::TraceLog::GetInstance() at trace_event_impl.cc:1125
    frame #8: 0x0000000003739274 yb-tserver`yb::(anonymous namespace)::PosixEnv::IsDirectory(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool*) [inlined] yb::debug::TraceLog::GetCategoryGroupEnabled(category_group=<unavailable>) at trace_event_impl.cc:1172
    frame #9: 0x0000000003739274 yb-tserver`yb::(anonymous namespace)::PosixEnv::IsDirectory(this=<unavailable>, path="/home/mbautin/code/yugabyte-db20/build/release-clang12-linuxbrew-dynamic-ninja/bin/www", is_dir=0x00007fff3772e4f7) at env_posix.cc:1234
    frame #10: 0x000000000373d305 yb-tserver`yb::env_util::GetRootDir(search_for_dir=<unavailable>) at env_util.cc:91:37
    frame #11: 0x0000000002d751de yb-tserver`_GLOBAL__sub_I_webserver_options.cc [inlined] yb::GetDefaultDocumentRoot() at webserver_options.cc:95:27
    frame #12: 0x0000000002d751c6 yb-tserver`_GLOBAL__sub_I_webserver_options.cc [inlined] __cxx_global_var_init.9 at webserver_options.cc:63
    frame #13: 0x0000000002d751c6 yb-tserver`_GLOBAL__sub_I_webserver_options.cc at webserver_options.cc:0
    frame #14: 0x0000000002bdce1d yb-tserver`__libc_csu_init(argc=1, argv=0x00007fff3772e6b8, envp=0x00007fff3772e6c8) at elf-init.c:88
    frame #15: 0x00007f31a54477b5 libc.so.6`__libc_start_main(main=(yb-tserver`main at tablet_server_main.cc:332), argc=1, argv=0x00007fff3772e6b8, init=(yb-tserver`__libc_csu_init at elf-init.c:68), fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fff3772e6a8) at libc-start.c:245
    frame #16: 0x000000000260302e yb-tserver`_start + 46

mbautin added a commit that referenced this issue Jan 16, 2022
…lang 12

Summary:
Link-time optimization ( https://llvm.org/docs/LinkTimeOptimization.html ) allows for more aggressive optimizations, including inlining, compared to the shared library based model that we currently use.

This diff enables link-time optimization for the Clang 12 Linuxbrew-based release build for the yb-tserver executable only, producing a binary that statically links all object files needed by yb-tserver, including those that are included in the yb_pgbackend library. Third-party libraries are being linked statically but they are not LTO-enabled yet.

The linking of the final LTO-enabled binary is currently being done outside of the CMake build system, using the dependency_graph.py tool that can access the dependency graph of targets and object files, and therefore has all the information needed to construct the linker command line. This also gives us more flexibility customizing the linker command line compared to attempts to do this in the CMake build system. Moving this linking step to CMake may be a future project.

Refactored the dependency_graph.py script into multiple modules: dependency_graph.py, dep_graph_common.py, source_files.py, as well as lto.py (with the new LTO logic).

Also refactored master_main.cc and tablet_server_main.cc and extracted common initialization code to tserver/server_main_util.cc. It is in the tserver directory because the master code currently uses the tserver code.

For building LTO-enabled binaries, we need to use LLVM's lld linker. It has issues with our distributed compilation framework ( #11034 ). Fixing this by always running LLD-enabled linking commands locally and not on a remote build worker.

Various static initialization issues were identified as fixed as part of this work. If not fixed, these would result in the yb-tserver binary crashing immediately with a core dump.
- In consensus_queue.cc, the RpcThrottleThresholdBytesValidator function for validating the rpc_throttle_threshold_bytes flag was trying to access other flags before they were fully initialized. Moved this validation to the main program.
- The webserver_doc_root flag was calling yb::GetDefaultDocumentRoot() to determine its default value. Moved that default value determination to where the flag is being used.
- [ #11033 ] The INTERNAL_TRACE_EVENT_ADD_SCOPED macro, when invoked during static initialization, led to a crash in std::string construction. Added a new atomic trace_events_enabled for enabling trace events and only turned it on after main() started executing. The INTERNAL_TRACE_EVENT_ADD_SCOPED is a no-op before trace_events_enabled is set to true.
- [ #10964 ] The kGlobalTransactionTableName global constant of the YBTableName type relied on the statically initialized string constant, kGlobalTransactionsTableName, which turned out to be empty during initialization. As a result, the transaction status table could not be properly located. Changed kGlobalTransactionsTableName to be a `const char*`.

In addition, in the LTO-enable build, it became apparent that some symbols were duplicated between the gperftools library and the gutil part of YugabyteDB code ( #10956 ):
- AtomicOps_Internalx86CPUFeatures -- renamed to YbAtomicOps_Internalx86CPUFeatures
- RunningOnValgrind -- renamed to YbRunningOnValgrind
- ValgrindSlowdown -- renamed to YbValgrindSlowdown
- base::internal::SpinLockDelay, base::internal::SpinLockWake -- added a top-level yb namespace

To enable easily switching between regular and LTO binaries, we are updating yb-ctl to support YB_CTL_TSERVER_DAEMON_FILE_NAME and YB_CTL_MASTER_DAEMON_FILE_NAME environment variables. For example, by setting YB_CTL_TSERVER_DAEMON_FILE_NAME=yb-tserver-lto, you can tell yb-ctl to launch the tablet server using build/latest/bin/yb-tserver-lto. However, for the release package, the LTO-enabled yb-tserver executable will still be named yb-tserver, replacing the previous shared library based executable.

Another tooling change in this diff is how we handle the `--no-tests` flag passed to `yb_build.sh`. That flag results in setting the YB_DO_NOT_BUILD_TESTS environment variable to 1, and our CMake scripts skip all the test targets. However, it is easy to forget to keep specifying that flag. In this diff, we are storing the variable BUILD_TESTS in CMake's build cache, and reuse it during future CMake runs, without the developer having to specify `--no-tests`. It can be reset by setting YB_DO_NOT_BUILD_TESTS=0.

Test Plan:
Jenkins

```
# ./yb_build.sh --clang12 release
# build-support/tserver_lto.sh
# ldd build/latest/bin/yb-tserver-lto
	linux-vdso.so.1 (0x00007fff535bf000)
	libm.so.6 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libm.so.6 (0x00007f1b85b7d000)
	libgcc_s.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libgcc_s.so.1 (0x00007f1b85966000)
	libc.so.6 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libc.so.6 (0x00007f1b855ca000)
	/opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/ld.so => /lib64/ld-linux-x86-64.so.2 (0x00007f1b85e80000)
	libdl.so.2 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libdl.so.2 (0x00007f1b853c6000)
	libpthread.so.0 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libpthread.so.0 (0x00007f1b851a9000)
	librt.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/librt.so.1 (0x00007f1b84fa1000)
```
The yb-tserver-lto is ~326 MiB.

Microbenchmark
--------------
The test was done on a dual-socket Xeon E5-2670 machine (16 cores total, 32 hyper-threads) running AlmaLinux 8.5.
Details: https://gist.githubusercontent.com/mbautin/7f9784fb2ea4173539d2e2656cfe117f/raw
Results (CassandraKeyValue workload): 78K ops/sec with GCC 5.5, 85K ops/sec with Clang 12 without LTO, 104K ops/sec with Clang 12 with LTO.

Reviewers: sergei

Reviewed By: sergei

Subscribers: sergei, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D14616
@mbautin mbautin closed this as completed Jan 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant