Skip to content

Commit

Permalink
[#17875] Do not collect stack traces of threads doing memory allocation
Browse files Browse the repository at this point in the history
Summary:
When trying to capture a stack trace with a signal handler, if a memory allocation/deallocation is happening in the thread receiving the signal, the process could crash. Google TCMalloc issue: google/tcmalloc#189.

In this diff, we are using the IsCurThreadInAllocDealloc malloc extension API we added in yugabyte/tcmalloc@677ba2d to skip capturing the stack trace in case the signal interrupted a thread that is currently allocating or deallocating memory. In such cases, we produce an empty stack trace which is later omitted from the overall threads dump. #17889 is a follow-up issue for retrying obtaining stack traces in such cases.

Another change contained in the TCMalloc version that we are upgrading to is yugabyte/tcmalloc@d1b0e69 (adding an option to not seed lifetime profiler with live allocations). We are now setting seed_with_live_allocs to false when capturing an allocation profile.

Test Plan: Jenkins

Reviewers: asrivastava

Reviewed By: asrivastava

Subscribers: ybase, bogdan

Differential Revision: https://phorge.dev.yugabyte.com/D26349
  • Loading branch information
mbautin committed Jun 23, 2023
1 parent e8c04b7 commit 3b2d97c
Show file tree
Hide file tree
Showing 6 changed files with 165 additions and 47 deletions.
86 changes: 43 additions & 43 deletions build-support/thirdparty_archives.yml
Original file line number Diff line number Diff line change
@@ -1,149 +1,149 @@
sha_for_local_checkout: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha_for_local_checkout: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
archives:
- os_type: almalinux8
architecture: x86_64
compiler_type: clang15
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215502-04b5c61ec3-almalinux8-x86_64-clang15
tag: v20230621185546-6777477baa-almalinux8-x86_64-clang15
- os_type: almalinux8
architecture: x86_64
compiler_type: clang15
is_linuxbrew: true
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215522-04b5c61ec3-almalinux8-x86_64-clang15-linuxbrew
tag: v20230621185609-6777477baa-almalinux8-x86_64-clang15-linuxbrew
- os_type: almalinux8
architecture: x86_64
compiler_type: clang15
is_linuxbrew: true
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type: full
tag: v20230519215509-04b5c61ec3-almalinux8-x86_64-clang15-linuxbrew-full-lto
tag: v20230621185521-6777477baa-almalinux8-x86_64-clang15-linuxbrew-full-lto
- os_type: almalinux8
architecture: x86_64
compiler_type: clang16
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215509-04b5c61ec3-almalinux8-x86_64-clang16
tag: v20230621185529-6777477baa-almalinux8-x86_64-clang16
- os_type: almalinux8
architecture: x86_64
compiler_type: clang16
is_linuxbrew: true
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215503-04b5c61ec3-almalinux8-x86_64-clang16-linuxbrew
tag: v20230621185625-6777477baa-almalinux8-x86_64-clang16-linuxbrew
- os_type: almalinux8
architecture: x86_64
compiler_type: clang16
is_linuxbrew: true
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type: full
tag: v20230519215507-04b5c61ec3-almalinux8-x86_64-clang16-linuxbrew-full-lto
tag: v20230621185605-6777477baa-almalinux8-x86_64-clang16-linuxbrew-full-lto
- os_type: almalinux8
architecture: x86_64
compiler_type: gcc11
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215506-04b5c61ec3-almalinux8-x86_64-gcc11
tag: v20230621185524-6777477baa-almalinux8-x86_64-gcc11
- os_type: centos7
architecture: aarch64
compiler_type: clang15
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215639-04b5c61ec3-centos7-aarch64-clang15
tag: v20230621185659-6777477baa-centos7-aarch64-clang15
- os_type: centos7
architecture: aarch64
compiler_type: clang15
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type: full
tag: v20230519215621-04b5c61ec3-centos7-aarch64-clang15-full-lto
tag: v20230621185620-6777477baa-centos7-aarch64-clang15-full-lto
- os_type: centos7
architecture: aarch64
compiler_type: clang16
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215641-04b5c61ec3-centos7-aarch64-clang16
tag: v20230621185700-6777477baa-centos7-aarch64-clang16
- os_type: centos7
architecture: aarch64
compiler_type: clang16
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type: full
tag: v20230519215609-04b5c61ec3-centos7-aarch64-clang16-full-lto
tag: v20230621185616-6777477baa-centos7-aarch64-clang16-full-lto
- os_type: centos7
architecture: x86_64
compiler_type: clang15
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215552-04b5c61ec3-centos7-x86_64-clang15
tag: v20230621185651-6777477baa-centos7-x86_64-clang15
- os_type: centos7
architecture: x86_64
compiler_type: clang15
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type: full
tag: v20230519215632-04b5c61ec3-centos7-x86_64-clang15-full-lto
tag: v20230621185543-6777477baa-centos7-x86_64-clang15-full-lto
- os_type: centos7
architecture: x86_64
compiler_type: clang16
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215625-04b5c61ec3-centos7-x86_64-clang16
tag: v20230621185537-6777477baa-centos7-x86_64-clang16
- os_type: centos7
architecture: x86_64
compiler_type: clang16
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type: full
tag: v20230519215522-04b5c61ec3-centos7-x86_64-clang16-full-lto
tag: v20230622043321-6777477baa-centos7-x86_64-clang16-full-lto
- os_type: centos7
architecture: x86_64
compiler_type: gcc11
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215529-04b5c61ec3-centos7-x86_64-gcc11
tag: v20230621185536-6777477baa-centos7-x86_64-gcc11
- os_type: macos
architecture: arm64
compiler_type: clang
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230523091634-04b5c61ec3-macos-arm64
tag: v20230621193812-6777477baa-macos-arm64
- os_type: macos
architecture: x86_64
compiler_type: clang
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215557-04b5c61ec3-macos-x86_64
tag: v20230621185613-6777477baa-macos-x86_64
- os_type: ubuntu20.04
architecture: x86_64
compiler_type: clang15
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215505-04b5c61ec3-ubuntu2004-x86_64-clang15
tag: v20230621185516-6777477baa-ubuntu2004-x86_64-clang15
- os_type: ubuntu22.04
architecture: x86_64
compiler_type: clang15
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215507-04b5c61ec3-ubuntu2204-x86_64-clang15
tag: v20230621185524-6777477baa-ubuntu2204-x86_64-clang15
- os_type: ubuntu22.04
architecture: x86_64
compiler_type: gcc11
is_linuxbrew: false
sha: 04b5c61ec3a73ffabdd2faa1f44bebda25193963
sha: 6777477baaa5727cb3eb0d1b8256c1bb9ab4f33e
lto_type:
tag: v20230519215510-04b5c61ec3-ubuntu2204-x86_64-gcc11
tag: v20230621185622-6777477baa-ubuntu2204-x86_64-gcc11
2 changes: 1 addition & 1 deletion requirements_frozen.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ downloadutil==1.0.2
idna==3.4
iniconfig==2.0.0
jmespath==1.0.1
llvm-installer==1.3.2
llvm-installer==1.3.4
mypy-extensions==1.0.0
mypy==1.3.0
overrides==7.3.1
Expand Down
2 changes: 1 addition & 1 deletion src/yb/server/pprof-path-handler_util-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ TEST_F(SamplingProfilerTest, AllocationProfile) {
const int64_t alloc_size = 30_MB;

tcmalloc::MallocExtension::AllocationProfilingToken token;
token = tcmalloc::MallocExtension::StartLifetimeProfiling();
token = tcmalloc::MallocExtension::StartLifetimeProfiling(/* seed_with_live_allocs= */ false);

// We expect to find this allocation in the profile if and only if only_growth is false, since
// it is not deallocated before we stop profiling.
Expand Down
2 changes: 1 addition & 1 deletion src/yb/server/pprof-path-handlers_util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ tcmalloc::Profile GetAllocationProfile(int seconds, int64_t sample_freq_bytes) {
auto prev_sample_rate = tcmalloc::MallocExtension::GetProfileSamplingRate();
tcmalloc::MallocExtension::SetProfileSamplingRate(sample_freq_bytes);
tcmalloc::MallocExtension::AllocationProfilingToken token;
token = tcmalloc::MallocExtension::StartLifetimeProfiling();
token = tcmalloc::MallocExtension::StartLifetimeProfiling(/* seed_with_live_allocs= */ false);

LOG(INFO) << Format("Sleeping for $0 seconds while profile is collected.", seconds);
SleepFor(MonoDelta::FromSeconds(seconds));
Expand Down
105 changes: 105 additions & 0 deletions src/yb/util/debug-util-test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
#include <string>
#include <thread>
#include <vector>
#include <set>

#include <glog/logging.h>

Expand All @@ -50,6 +51,10 @@
#include "yb/util/test_util.h"
#include "yb/util/thread.h"
#include "yb/util/tsan_util.h"
#include "yb/util/test_thread_holder.h"
#include "yb/util/lockfree.h"
#include "yb/util/random_util.h"
#include "yb/util/tostring.h"

using std::string;
using std::vector;
Expand Down Expand Up @@ -329,6 +334,106 @@ TEST_F(DebugUtilTest, TestConcurrentStackTrace) {
}
}

TEST_F(DebugUtilTest, TestStackTraceSignalDuringAllocation) {
constexpr size_t kNumThreads = 10;
TestThreadHolder thread_holder;
// Each thread has a queue from which it consumes entries. Each thread will add entries to
// a random thread's queue.

struct Entry : public MPSCQueueEntry<Entry> {
char* bytes = nullptr;

explicit Entry(char* bytes_) : bytes(bytes_) {}
~Entry() {
if (bytes) {
free(bytes);
bytes = nullptr;
}
}
};

std::vector<std::unique_ptr<MPSCQueue<Entry>>> queues;
for (size_t i = 0; i < kNumThreads; ++i) {
queues.push_back(std::make_unique<MPSCQueue<Entry>>());
}

std::mutex thread_ids_mutex;
std::vector<ThreadIdForStack> thread_ids;

CountDownLatch start_latch(kNumThreads);

for (size_t i = 0; i < kNumThreads; ++i) {
thread_holder.AddThreadFunctor([
&thread_ids_mutex,
&start_latch,
&thread_ids,
&queues,
thread_index = i,
&stop = thread_holder.stop_flag()
]() {
{
std::lock_guard lock(thread_ids_mutex);
thread_ids.push_back(Thread::CurrentThreadIdForStack());
}
start_latch.CountDown();
while (!stop.load(std::memory_order_acquire)) {
if (RandomUniformBool()) {
// Allocate between 1 and 16 KB, with some random jitter.
size_t allocation_size =
(1L << RandomUniformInt(0, 10)) * RandomUniformInt(1, 16) + RandomUniformInt(1, 128);
char* bytes = pointer_cast<char*>(malloc(allocation_size));
size_t target_thread = RandomUniformInt<size_t>(0, kNumThreads - 1);
Entry* entry = new Entry(bytes);
queues[target_thread]->Push(entry);
} else {
Entry* entry = queues[thread_index]->Pop();
delete entry;
}
}
});
}
// Wait until all threads start running.
start_latch.Wait();
auto deadline = MonoTime::Now() + 10s;

// Keep dumping thread stacks.
while (MonoTime::Now() < deadline) {
for (size_t i = 0; i < 100; ++i) {
auto stacks = ThreadStacks(thread_ids);
int num_ok = 0;
int num_errors = 0;
int num_empty_stacks = 0;
std::set<std::string> error_statuses;
for (const auto& stack : stacks) {
if (stack.ok()) {
if (*stack) {
num_ok++;
} else {
num_empty_stacks++;
}
} else {
error_statuses.insert(stack.status().ToString());
num_errors++;
}
}
if (num_errors || num_empty_stacks) {
LOG(WARNING) << "OK stacks: " << num_ok << ", error stacks: " << num_errors
<< ", empty stacks: " << num_empty_stacks
<< ", errors statuses: " << ToString(error_statuses);
}
}
}
thread_holder.Stop();
thread_holder.JoinAll();

for (size_t i = 0; i < kNumThreads; ++i) {
auto& queue = queues[i];
while (auto* entry = queue->Pop()) {
delete entry;
}
}
}

TEST_F(DebugUtilTest, LongOperationTracker) {
class TestLogSink : public google::LogSink {
public:
Expand Down
15 changes: 14 additions & 1 deletion src/yb/util/stack_trace.cc
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@
#include "yb/util/result.h"
#include "yb/util/thread.h"

#if YB_GOOGLE_TCMALLOC
#include <tcmalloc/malloc_extension.h>
#endif

using namespace std::literals;

#if defined(__APPLE__)
Expand Down Expand Up @@ -159,8 +163,9 @@ struct ThreadStackHelper {

void RecordStackTrace(const StackTrace& stack_trace) {
auto* entry = allocated.Pop();
// If entry is nullptr, that means there are not enough allocated entries. In that case, don't
// write a log message since we are in a signal handler.
if (entry) {
// Not enough allocated entries, don't write log since we are in signal handler.
entry->tid = Thread::CurrentThreadIdForStack();
entry->stack = stack_trace;
collected.Push(entry);
Expand All @@ -180,6 +185,14 @@ ThreadStackHelper thread_stack_helper;
void HandleStackTraceSignal(int signum) {
int old_errno = errno;
StackTrace stack_trace;
#if YB_GOOGLE_TCMALLOC
// TODO(#17889): retry in this case. For now, just produce an empty stack trace.
if (tcmalloc::MallocExtension::IsCurThreadInAllocDealloc()) {
thread_stack_helper.RecordStackTrace(stack_trace);
errno = old_errno;
return;
}
#endif
stack_trace.Collect(2);

thread_stack_helper.RecordStackTrace(stack_trace);
Expand Down

0 comments on commit 3b2d97c

Please sign in to comment.