-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core dump in string allocation trace in LTO build #11033
Comments
Another similar stack trace:
|
mbautin
added a commit
that referenced
this issue
Jan 16, 2022
…lang 12 Summary: Link-time optimization ( https://llvm.org/docs/LinkTimeOptimization.html ) allows for more aggressive optimizations, including inlining, compared to the shared library based model that we currently use. This diff enables link-time optimization for the Clang 12 Linuxbrew-based release build for the yb-tserver executable only, producing a binary that statically links all object files needed by yb-tserver, including those that are included in the yb_pgbackend library. Third-party libraries are being linked statically but they are not LTO-enabled yet. The linking of the final LTO-enabled binary is currently being done outside of the CMake build system, using the dependency_graph.py tool that can access the dependency graph of targets and object files, and therefore has all the information needed to construct the linker command line. This also gives us more flexibility customizing the linker command line compared to attempts to do this in the CMake build system. Moving this linking step to CMake may be a future project. Refactored the dependency_graph.py script into multiple modules: dependency_graph.py, dep_graph_common.py, source_files.py, as well as lto.py (with the new LTO logic). Also refactored master_main.cc and tablet_server_main.cc and extracted common initialization code to tserver/server_main_util.cc. It is in the tserver directory because the master code currently uses the tserver code. For building LTO-enabled binaries, we need to use LLVM's lld linker. It has issues with our distributed compilation framework ( #11034 ). Fixing this by always running LLD-enabled linking commands locally and not on a remote build worker. Various static initialization issues were identified as fixed as part of this work. If not fixed, these would result in the yb-tserver binary crashing immediately with a core dump. - In consensus_queue.cc, the RpcThrottleThresholdBytesValidator function for validating the rpc_throttle_threshold_bytes flag was trying to access other flags before they were fully initialized. Moved this validation to the main program. - The webserver_doc_root flag was calling yb::GetDefaultDocumentRoot() to determine its default value. Moved that default value determination to where the flag is being used. - [ #11033 ] The INTERNAL_TRACE_EVENT_ADD_SCOPED macro, when invoked during static initialization, led to a crash in std::string construction. Added a new atomic trace_events_enabled for enabling trace events and only turned it on after main() started executing. The INTERNAL_TRACE_EVENT_ADD_SCOPED is a no-op before trace_events_enabled is set to true. - [ #10964 ] The kGlobalTransactionTableName global constant of the YBTableName type relied on the statically initialized string constant, kGlobalTransactionsTableName, which turned out to be empty during initialization. As a result, the transaction status table could not be properly located. Changed kGlobalTransactionsTableName to be a `const char*`. In addition, in the LTO-enable build, it became apparent that some symbols were duplicated between the gperftools library and the gutil part of YugabyteDB code ( #10956 ): - AtomicOps_Internalx86CPUFeatures -- renamed to YbAtomicOps_Internalx86CPUFeatures - RunningOnValgrind -- renamed to YbRunningOnValgrind - ValgrindSlowdown -- renamed to YbValgrindSlowdown - base::internal::SpinLockDelay, base::internal::SpinLockWake -- added a top-level yb namespace To enable easily switching between regular and LTO binaries, we are updating yb-ctl to support YB_CTL_TSERVER_DAEMON_FILE_NAME and YB_CTL_MASTER_DAEMON_FILE_NAME environment variables. For example, by setting YB_CTL_TSERVER_DAEMON_FILE_NAME=yb-tserver-lto, you can tell yb-ctl to launch the tablet server using build/latest/bin/yb-tserver-lto. However, for the release package, the LTO-enabled yb-tserver executable will still be named yb-tserver, replacing the previous shared library based executable. Another tooling change in this diff is how we handle the `--no-tests` flag passed to `yb_build.sh`. That flag results in setting the YB_DO_NOT_BUILD_TESTS environment variable to 1, and our CMake scripts skip all the test targets. However, it is easy to forget to keep specifying that flag. In this diff, we are storing the variable BUILD_TESTS in CMake's build cache, and reuse it during future CMake runs, without the developer having to specify `--no-tests`. It can be reset by setting YB_DO_NOT_BUILD_TESTS=0. Test Plan: Jenkins ``` # ./yb_build.sh --clang12 release # build-support/tserver_lto.sh # ldd build/latest/bin/yb-tserver-lto linux-vdso.so.1 (0x00007fff535bf000) libm.so.6 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libm.so.6 (0x00007f1b85b7d000) libgcc_s.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libgcc_s.so.1 (0x00007f1b85966000) libc.so.6 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libc.so.6 (0x00007f1b855ca000) /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/ld.so => /lib64/ld-linux-x86-64.so.2 (0x00007f1b85e80000) libdl.so.2 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libdl.so.2 (0x00007f1b853c6000) libpthread.so.0 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/libpthread.so.0 (0x00007f1b851a9000) librt.so.1 => /opt/yb-build/brew/linuxbrew-20181203T161736v9/lib/librt.so.1 (0x00007f1b84fa1000) ``` The yb-tserver-lto is ~326 MiB. Microbenchmark -------------- The test was done on a dual-socket Xeon E5-2670 machine (16 cores total, 32 hyper-threads) running AlmaLinux 8.5. Details: https://gist.githubusercontent.com/mbautin/7f9784fb2ea4173539d2e2656cfe117f/raw Results (CassandraKeyValue workload): 78K ops/sec with GCC 5.5, 85K ops/sec with Clang 12 without LTO, 104K ops/sec with Clang 12 with LTO. Reviewers: sergei Reviewed By: sergei Subscribers: sergei, bogdan Differential Revision: https://phabricator.dev.yugabyte.com/D14616
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
https://gist.githubusercontent.com/mbautin/f1918c1c44b002eab7820f4a486a537e/raw
The text was updated successfully, but these errors were encountered: