Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSAN false positives with chromium clang toolchain #953

Open
dionescu opened this issue May 8, 2018 · 18 comments
Open

TSAN false positives with chromium clang toolchain #953

dionescu opened this issue May 8, 2018 · 18 comments

Comments

@dionescu
Copy link

dionescu commented May 8, 2018

I've been trying to extend the Bazel toolchain at https://github.com/vsco/bazel-toolchains (based on Chromium clang toolchain) to add support for sanitizers. I had success with eliminating MSAN false positives after instrumenting libcxx but the same approach failed for TSAN. In particular, TSAN (with instrumented libcxx confirmed via nm) reports races when I try to run Abseil (https://github.com/abseil/abseil-cpp) synchronization tests. I added one report below (sorry not sure how to get line numbers to work).

Any ideas what may be happening here?

WARNING: ThreadSanitizer: data race (pid=15)
  Atomic write of size 4 at 0x7f0f827de150 by main thread:
    #0 __tsan_atomic32_fetch_add /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:616 (call_once_test+0x92540)
    #1 absl::synchronization_internal::Waiter::Post() ??:? (call_once_test+0xbd6d3)
    #2 AbslInternalPerThreadSemPost ??:? (call_once_test+0xbd391)
    #3 absl::synchronization_internal::PerThreadSem::Post(absl::base_internal::ThreadIdentity*) ??:? (call_once_test+0xbc739)
    #4 absl::Mutex::IncrementSynchSem(absl::Mutex*, absl::base_internal::PerThreadSynch*) ??:? (call_once_test+0xbbf24)
    #5 absl::Mutex::Wakeup(absl::base_internal::PerThreadSynch*) ??:? (call_once_test+0xb6de5)
    #6 absl::Mutex::UnlockSlow(absl::SynchWaitParams*) ??:? (call_once_test+0xb86f2)
    #7 absl::Mutex::Unlock() ??:? (call_once_test+0xb94ca)
    #8 absl::(anonymous namespace)::CallOnceTest_ExecutionCount_Test::TestBody() call_once_test.cc:? (call_once_test+0xb2c0a)
    #9 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ??:? (call_once_test+0x108f17)
    #10 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ??:? (call_once_test+0xfc355)
    #11 testing::Test::Run() ??:? (call_once_test+0xeaca7)
    #12 testing::TestInfo::Run() ??:? (call_once_test+0xeb4b2)
    #13 testing::TestCase::Run() ??:? (call_once_test+0xebbd1)
    #14 testing::internal::UnitTestImpl::RunAllTests() ??:? (call_once_test+0xf55b1)
    #15 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ??:? (call_once_test+0x10a7c7)
    #16 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ??:? (call_once_test+0xfe165)
    #17 testing::UnitTest::Run() ??:? (call_once_test+0xf52a0)
    #18 RUN_ALL_TESTS() ??:? (call_once_test+0xdaffb)
    #19 main ??:? (call_once_test+0xdafa8)

  Previous write of size 8 at 0x7f0f827de150 by thread T1:
    #0 memset /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:711 (call_once_test+0x565f9)
    #1 absl::synchronization_internal::NewThreadIdentity() create_thread_identity.cc:? (call_once_test+0xbd0b7)
    #2 absl::synchronization_internal::CreateThreadIdentity() ??:? (call_once_test+0xbcfe3)
    #3 absl::synchronization_internal::GetOrCreateCurrentThreadIdentity() ??:? (call_once_test+0xbc4bf)
    #4 absl::Synch_GetPerThread() mutex.cc:? (call_once_test+0xbaf33)
    #5 absl::Synch_GetPerThreadAnnotated(absl::Mutex*) mutex.cc:? (call_once_test+0xb7d76)
    #6 absl::Mutex::LockSlowWithDeadline(absl::MuHowS const*, absl::Condition const*, absl::synchronization_internal::KernelTimeout, int) ??:? (call_once_test+0xb7726)
    #7 absl::Mutex::LockSlow(absl::MuHowS const*, absl::Condition const*, int) ??:? (call_once_test+0xb72e4)
    #8 absl::Mutex::LockWhen(absl::Condition const&) ??:? (call_once_test+0xb74a1)
    #9 absl::(anonymous namespace)::WaitAndIncrement() call_once_test.cc:? (call_once_test+0xb3044)
    #10 _ZN4absl13base_internal8Callable6InvokeIRFvvEJEEEDTclclsr3stdE7declvalIT_EEspclsr3stdE7declvalIT0_EEEEOS5_DpOS6_ ??:? (call_once_test+0xb43c3)
    #11 _ZN4absl13base_internal6InvokeIRFvvEJEEEDTclsr7InvokerIT_DpT0_E4typeE6Invokeclsr3stdE7declvalIS4_EEspclsr3stdE7declvalIS5_EEEEOS4_DpOS5_ ??:? (call_once_test+0xb42f9)
    #12 void absl::base_internal::CallOnceImpl<void (&)()>(std::__1::atomic<unsigned int>*, absl::base_internal::SchedulingMode, void (&)()) ??:? (call_once_test+0xb4218)
    #13 void absl::call_once<void (&)()>(absl::once_flag&, void (&)()) ??:? (call_once_test+0xb4130)
    #14 absl::(anonymous namespace)::ThreadBody() call_once_test.cc:? (call_once_test+0xb2f34)
    #15 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()> >(void*) ??:? (call_once_test+0xb3938)

  Thread T1 (tid=18, running) created by main thread at:
    #0 pthread_create /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:921 (call_once_test+0x4ce43)
    #1 std::__1::thread::thread<void (&)(), , void>(void (&)()) crtstuff.c:? (call_once_test+0xb36cc)
    #2 void std::__1::vector<std::__1::thread, std::__1::allocator<std::__1::thread> >::__emplace_back_slow_path<void (&)()>(void (&)()) ??:? (call_once_test+0xb355f)
    #3 absl::(anonymous namespace)::CallOnceTest_ExecutionCount_Test::TestBody() call_once_test.cc:? (call_once_test+0xb29ee)
    #4 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ??:? (call_once_test+0x108f17)
    #5 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ??:? (call_once_test+0xfc355)
    #6 testing::Test::Run() ??:? (call_once_test+0xeaca7)
    #7 testing::TestInfo::Run() ??:? (call_once_test+0xeb4b2)
    #8 testing::TestCase::Run() ??:? (call_once_test+0xebbd1)
    #9 testing::internal::UnitTestImpl::RunAllTests() ??:? (call_once_test+0xf55b1)
    #10 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ??:? (call_once_test+0x10a7c7)
    #11 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ??:? (call_once_test+0xfe165)
    #12 testing::UnitTest::Run() ??:? (call_once_test+0xf52a0)
    #13 RUN_ALL_TESTS() ??:? (call_once_test+0xdaffb)
    #14 main ??:? (call_once_test+0xdafa8)

SUMMARY: ThreadSanitizer: data race ??:? in absl::synchronization_internal::Waiter::Post()
@dvyukov
Copy link
Contributor

dvyukov commented May 8, 2018

not sure how to get line numbers to work

You need to add llvm-symbolizer binary to path. Maybe there is also a runtime flag that can be used to specify binary path. And also of course compile with -g.

@dvyukov
Copy link
Contributor

dvyukov commented May 8, 2018

I see that absl/base/internal/tsan_mutex_interface.h contains #if defined(THREAD_SANITIZER). Have you defined it? Perhaps absl needs auto-detect tsan. THREAD_SANITIZER macro is Google-ism. Can you bring this up with absl people?

@dvyukov
Copy link
Contributor

dvyukov commented May 8, 2018

But! This can also be a real bug in absl mutex. Can you obtain a report with line numbers?

@dionescu
Copy link
Author

dionescu commented May 8, 2018

I tried everything you recommended to get line numbers but was not successful. The llvm-symbolizer is in my path, I compile with -g, and I even set TSAN_SYMBOLIZER_PATH and TSAN_OPTIONS=symbolize=1.
I also tried replacing llvm-symbolizer with a dummy script and it appears it does not get called at all. What is the mechanism that invokes the symbolizer out of process? Do I need to set some rpath?

For absl, it does indeed have some logic that depends on THREAD_SANITIZER being defined. I do pass -DTHREAD_SANITIZER when I compiled and I did check by adding a line to absl that it auto-detects TSAN support correctly.

@dionescu
Copy link
Author

dionescu commented May 8, 2018

What the threads are doing:

T0(main) is unlocking a mutex here:
  // Allow WaitAndIncrement to finish executing.  Once it does, the other
  // call_once waiters will be unblocked.
  done_blocking = true;
  counters_mu.Unlock();

T1 (created by main) is doing a LockWhen in WaitAndIncrement:
void WaitAndIncrement() {
  counters_mu.Lock();
  ++call_once_invoke_count;
  counters_mu.Unlock();

  counters_mu.LockWhen(Condition(&done_blocking));
  ++call_once_finished_count;
  counters_mu.Unlock();
}

Nothing wrong with the test as far as I can tell. The race seems to trigger in absl' s mutex internals where we do 2 writes of different sizes (4 and 8 bytes).

Maybe I'm doing something wrong with how I use / Bazel configure the chromium clang toolchain. Would there be anything to look out for?

@dionescu
Copy link
Author

dionescu commented May 8, 2018

Zooming into mutex internals:

T1 calls LockWhen which is where its associated thread identity gets created. The memset in the trace sets this identity object to 0.

T0(main) unlocks the mutex and by the time this happens some identities were created with associated waker objects. T0 notifies the waker objects of the Unlock which is a write to waker and thus to thread_identity.

I'm probably missing something but I don't see how the memset write is synchronized with the write to waker state which is part of the identity.

@dvyukov
Copy link
Contributor

dvyukov commented May 9, 2018

Does symbolization work for the simplest cases? It's not a good idea to throw in all possible unknowns into an equation from the beginning. E.g. does the following work for you?

$ cat /tmp/race.c

#include <pthread.h>

int Global;

void *Thread1(void *x) {
  Global = 42;
  return NULL;
}

void *Thread2(void *x) {
  Global = 43;
  return NULL;
}

int main() {
  pthread_t t[2];
  pthread_create(&t[0], NULL, Thread1, NULL);
  pthread_create(&t[1], NULL, Thread2, NULL);
  pthread_join(t[0], NULL);
  pthread_join(t[1], NULL);
  return 0;
}

$ clang /tmp/race.c -fsanitize=thread -g
$ ./a.out

==================
WARNING: ThreadSanitizer: data race (pid=33868)
  Write of size 4 at 0x000001119220 by thread T2:
    #0 Thread2 /tmp/race.c:11:10 (a.out+0x4bd271)

  Previous write of size 4 at 0x000001119220 by thread T1:
    #0 Thread1 /tmp/race.c:6:10 (a.out+0x4bd221)

  Location is global 'Global' of size 4 at 0x000001119220 (a.out+0x000001119220)

  Thread T2 (tid=33871, running) created by main thread at:
    #0 pthread_create tsan_interceptors.cc:965 (a.out+0x44f326)
    #1 main /tmp/race.c:18:3 (a.out+0x4bd2f3)

  Thread T1 (tid=33870, finished) created by main thread at:
    #0 pthread_create tsan_interceptors.cc:965 (a.out+0x44f326)
    #1 main /tmp/race.c:17:3 (a.out+0x4bd2cd)

SUMMARY: ThreadSanitizer: data race /tmp/race.c:11:10 in Thread2
==================

@dvyukov
Copy link
Contributor

dvyukov commented May 9, 2018

Does adding report_atomic_races=0 to TSAN_OPTIONS prevent these races?
This may be a false positives related to intricate dances trying to ignore mutex guts but at the same time not producing false positives. Internally we run with report_atomic_races=0 and that can explain why we didn't hit this before.

@dionescu
Copy link
Author

dionescu commented May 9, 2018

Thanks for the help with symbolization. Turns out I had a typo in my blazerc file (s/asan/tsan) which prevented the -g compile option to be added in tsan mode. I can get line numbers now:

WARNING: ThreadSanitizer: data race (pid=15)
  Atomic write of size 4 at 0x7f1042500350 by thread T10:
    #0 __tsan_atomic32_fetch_add /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc:616:3 (call_once_test+0x936b0)
    #1 std::__1::__atomic_base<int, true>::fetch_add(int, std::__1::memory_order) /usr/xclang/include/c++/v1/atomic:1017:17 (call_once_test+0xbec53)
    #2 absl::synchronization_internal::Waiter::Post() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/waiter.cc:155 (call_once_test+0xbec53)
    #3 AbslInternalPerThreadSemPost /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/per_thread_sem.cc:67:64 (call_once_test+0xbe8f1)
    #4 absl::synchronization_internal::PerThreadSem::Post(absl::base_internal::ThreadIdentity*) /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/per_thread_sem.h:100:3 (call_once_test+0xbdc29)
    #5 absl::Mutex::IncrementSynchSem(absl::Mutex*, absl::base_internal::PerThreadSynch*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:531:3 (call_once_test+0xbd3f4)
    #6 absl::Mutex::Wakeup(absl::base_internal::PerThreadSynch*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1118:3 (call_once_test+0xb825d)
    #7 absl::Mutex::UnlockSlow(absl::SynchWaitParams*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:2267:19 (call_once_test+0xb9b92)
    #8 absl::Mutex::Unlock() /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1674:11 (call_once_test+0xba96a)
    #9 absl::(anonymous namespace)::ThreadBody() /proc/self/cwd/external/com_google_absl/absl/base/call_once_test.cc:50:15 (call_once_test+0xb4091)
    #10 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()>&, std::__1::__tuple_indices<>) /usr/xclang/include/c++/v1/type_traits:4323:1 (call_once_test+0xb4af8)
    #11 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)()> >(void*) /usr/xclang/include/c++/v1/thread:352 (call_once_test+0xb4af8)

  Previous write of size 8 at 0x7f1042500350 by main thread:
    #0 memset /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/../sanitizer_common/sanitizer_common_interceptors.inc:711:3 (call_once_test+0x57769)
    #1 absl::synchronization_internal::NewThreadIdentity() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/create_thread_identity.cc:93:3 (call_once_test+0xbe5f7)
    #2 absl::synchronization_internal::CreateThreadIdentity() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/create_thread_identity.cc:102:45 (call_once_test+0xbe523)
    #3 absl::synchronization_internal::GetOrCreateCurrentThreadIdentity() /proc/self/cwd/external/com_google_absl/absl/synchronization/internal/create_thread_identity.h:46:12 (call_once_test+0xbd9af)
    #4 absl::Synch_GetPerThread() /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:503:30 (call_once_test+0xbc403)
    #5 absl::Synch_GetPerThreadAnnotated(absl::Mutex*) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:511:23 (call_once_test+0xb9216)
    #6 absl::Mutex::LockSlowWithDeadline(absl::MuHowS const*, absl::Condition const*, absl::synchronization_internal::KernelTimeout, int) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1818:42 (call_once_test+0xb8ba6)
    #7 absl::Mutex::LockSlow(absl::MuHowS const*, absl::Condition const*, int) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1729:3 (call_once_test+0xb8764)
    #8 absl::Mutex::LockWhen(absl::Condition const&) /proc/self/cwd/external/com_google_absl/absl/synchronization/mutex.cc:1468:9 (call_once_test+0xb8921)
    #9 absl::(anonymous namespace)::CallOnceTest_ExecutionCount_Test::TestBody() /proc/self/cwd/external/com_google_absl/absl/base/call_once_test.cc:76:15 (call_once_test+0xb3b8a)
    #10 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2417:10 (call_once_test+0x114a67)
    #11 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2453:14 (call_once_test+0x1072b5)
    #12 testing::Test::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2491:5 (call_once_test+0xf5a97)
    #13 testing::TestInfo::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2667:11 (call_once_test+0xf62a2)
    #14 testing::TestCase::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2785:28 (call_once_test+0xf69c1)
    #15 testing::internal::UnitTestImpl::RunAllTests() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5047:43 (call_once_test+0x1003b1)
    #16 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2417:10 (call_once_test+0x116537)
    #17 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2453:14 (call_once_test+0x109165)
    #18 testing::UnitTest::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:4663:10 (call_once_test+0x1000a0)
    #19 RUN_ALL_TESTS() /proc/self/cwd/external/com_google_googletest/googletest/include/gtest/gtest.h:2329:46 (call_once_test+0xe5cfb)
    #20 main /proc/self/cwd/external/com_google_googletest/googlemock/src/gmock_main.cc:53:10 (call_once_test+0xe5ca8)


@dionescu
Copy link
Author

dionescu commented May 9, 2018

Also, adding report_atomic_races=0 to TSAN_OPTIONS prevents the race reports.

@dvyukov
Copy link
Contributor

dvyukov commented May 9, 2018

Also, adding report_atomic_races=0 to TSAN_OPTIONS prevents the race reports.

Then I think hypothesis in #953 (comment) is correct. Unfortunately, I am not sure we will have time to fix this in near future. +@kcc
Is it possible to provide default TSAN_OPTIONS in bazel, or compile in:

extern "C" const char* __tsan_default_options() {
  return " report_atomic_races=0 ";
}

?

@dionescu
Copy link
Author

dionescu commented May 10, 2018

For others running into this issue, one way to silence atomic races reports is by adding the following to your .bazelrc (assuming you run TSAN with --config=tsan):

build:tsan --test_env=TSAN_OPTIONS=report_atomic_races=0

@jlisee
Copy link

jlisee commented Jul 2, 2019

It looks like on Mac there is a specific work around in tsan for std::call_once in libc++.dylib and a STDCXX_INTERCEPTOR. Is something like this needed for more platforms or is this just specific to how things are built on Mac?

@devjgm
Copy link

devjgm commented Feb 2, 2021

To be clear, adding report_atomic_races=0 to TSAN_OPTIONS should silence the report. But is it a legit data race that we're silencing? Or is it just a false positive?

@dvyukov
Copy link
Contributor

dvyukov commented Feb 2, 2021

I think this is a false positive related to the fact that we specifically ignore most synchronization in Mutex code:
https://source.chromium.org/chromium/chromium/src/+/master:third_party/abseil-cpp/absl/synchronization/mutex.cc;l=170;drc=b6496a2a60ea1712e46c32eb1be8d2e03d99af8b;bpv=1;bpt=1
for the sake of better race reports for the rest of the code.
Unfortunately this support was developed based on a code base that uses report_atomic_races=0, thus the false positive.

@mpoeter
Copy link

mpoeter commented Aug 8, 2022

Actually I think that this is an actual race, and TSan is right to report it! I still need to dig a bit deeper, but from what I saw so far there simply is no proper happens-before relation between the initialization of the Waiter (which uses some non-atomic writes to initialize the memory) and the later fetch-add. The fact that the Waiter ctor uses an atomic store to explicitly initialize the futex is irrelevant. This reminds me very much of this issue where we concluded that it actually is a race: #1009

@prigluchenie
Copy link

prigluchenie commented Dec 5, 2022

I have a similar problem with my grpc-project. I have a suspicion that the problem is related to the abseil-cpp fixed bug
abseil/abseil-cpp@9444b11

@MBkkt
Copy link

MBkkt commented Dec 5, 2022

abseil/abseil-cpp@bb7be49
After this commit in abseil was removed suppressions.
abseil/abseil-cpp@091842b

I do pretty same for our project and now it works, so I think this issue at least for library users is resolved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants