Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race aggregation #1

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Race aggregation #1

wants to merge 3 commits into from

Conversation

SumedhArani
Copy link
Collaborator

Race aggregation based on the memory operation.

Example for in the case of a data race happening at a specific location is shown but it may not show in the case of an array as different locations come in picture.

@simoatze
Copy link
Member

@SumedhArani Great job! I'll take a look this week and I'll try to give it a try!
Also, let's move the conversation and issue discussions in here so I'll close or the issure in the archer repository. Thanks!

@simoatze
Copy link
Member

also I think we should just keep the race-aggregation branch as a development branch, no need for PR now, once everything works than we can do a PR.

@simoatze simoatze closed this Feb 13, 2017
@dongahn
Copy link

dongahn commented Feb 13, 2017

@simoatze: typically having an experimental PR helps ppl to review code with diffing support, inline commenting etc. We won't certainly merge it right away. With this, could you reopen this?

@simoatze
Copy link
Member

and if you can keep it updated with the llvm-mirror/compiler-rt. Basically we update both master and race-aggregation branches so they are always allined to the llvm-mirror.

@simoatze simoatze reopened this Feb 13, 2017
@simoatze
Copy link
Member

@dongahn Got it, it makes more sense :)

@SumedhArani
Copy link
Collaborator Author

@simoatze Thank you! :)
I'll move the conversation here from next time onwards and when necessary.
Take your own time to review it cause anyways I'm busy with my college internal exams this week.

@dongahn
I'll post in an experimental PR whenever required.

Thank you,
Sumedh.

@dongahn
Copy link

dongahn commented Feb 14, 2017

Nice work @SumedhArani!

A couple of thoughts. Is there a way in which we can actually intercept PrintReport. E.g., using LD_PRELOAD trick of some sort. If we can, perhaps we can keep this logic into Archer itself without having to modify the compiler runtime. This way, we may be able to make this logic work on any Clang/LLVM we install on a system?

Two things that caught my eyes.

  1. There is a linear search in searching for previously seen races. This can become expensive if there is larger number of races. Hash container (e.g., map in C++ STL) is provided, this can be a better data structure to use which allow you to do this operation in O (1).

  2. Doing a reduction on an absolute address will work for reducing races within a single process but won't work across difference processes of an MPI application. Even if a same variable is used, it can be resolved to different addresses across different MPI processes. Further, this certainly won't work on addresses from multiple programs.

Is a more relative concept like module+offset or source file+line number also available through ReportSpec? That might be a bit better criteria to use?

@SumedhArani
Copy link
Collaborator Author

SumedhArani commented Feb 14, 2017

@dongahn, I know that better algorithms could be put to use but I came out with a prototype. My first thoughts were to use a map but tsan has very less documentation and they have written their own data structures (Vector instead of vector) so I found it a lot buggy to get started itself and it shows up with error when I tried to include the STL containers.

So for now I've done a linear search with the vector data structure that they have provided.

One of the plus points for map are that it if targeted properly, we can even output which races were reduced on which factor.

As for point two, I did understand the inhibitions of this type of reduction.

Yeah it's possible to do source file + line number.

I posted this prototype to verify the design of my code with your vision. If so, I'll have the remaining two scenarios as told in the issue comments up and running with necessary tweaks and also optimisations where possible.

And I don't know about LD_PRELOAD! I need to Google it!

Thanks,
Sumedh.

@SumedhArani
Copy link
Collaborator Author

SumedhArani commented Feb 14, 2017

And also @dongahn, thanks for the appreciation!!:)

It strikes me now that I can probably make use of ADT plugins provided by the llvm framework and make use of a dense map. Let's see how does it come out.

I'll keep you updated and will try that denseMap after my today's exam gets over.

@dongahn
Copy link

dongahn commented Feb 18, 2017

@SumedhArani: did you look at LD_PRELOAD to see if we can replace PrintReport even without having to make any modifications to the compiler runtime source file? I don't recall all the details of LD_PRELOAD but I think it's worth looking into it.

@SumedhArani
Copy link
Collaborator Author

@dongahn , No. Sorry I was a bit busy with my internal exams happening in my college. I'll look at it and get back to you after three days ? Had a few personal commitments over this weekend.

I'll get back to you on Monday with the same.

Thanks,
Sumedh.

@dongahn
Copy link

dongahn commented Feb 18, 2017

perfectly fine! Take as much time as you need.

@SumedhArani
Copy link
Collaborator Author

Thanks! I'll surely look into it once I'm back to college!! Thanks for being so amiable!!:)

Cheers,
Sumedh

@simoatze
Copy link
Member

@dongahn I was thinking that LD_PRELOAD wouldn't work because tsan is a static library. I think a solution would be to have PrintReport declared as weak symbol and then define a new PrintReport during the instrumentation pass inside the module where the main function is, like I did for the suppressions. I was talking to @SumedhArani and he'll looking into it.

@dongahn
Copy link

dongahn commented Feb 21, 2017

I see. Yeah it seems the weak symbol trick is the second best then.

To support existing clang/llvm installation, though, is there a way to change the attribute of and existing symbol to become weak? (binary patch...) Let me look.

If TSan accepts the small patch to change the attribute, this won't be neccessary, of course.

@simoatze
Copy link
Member

simoatze commented Feb 21, 2017

Yeah after @SumedhArani does the feasibility study, if everything works I can talk to Dmitry and see if they would accept the patch.

@dongahn
Copy link

dongahn commented Feb 21, 2017

sounds good. In the meantime, objcopy has --weaken-symbol option which may be able to rewrite the attrubute.

https://linux.die.net/man/1/objcopy

If you nm on this TSan runtime library, does this PrintReport function at least shows up as an externally visible symbol though?

@simoatze
Copy link
Member

Yes it shows up. For the test Sumedh can also just set the function as weak in the source.

@dongahn
Copy link

dongahn commented Feb 21, 2017

Yes I agree.

@SumedhArani
Copy link
Collaborator Author

After many long discussions with @simoatze as to how to go about to implement the above suggested technique, I feel that it is not so feasible to get the workaround using weak attributes.

Reasons as to why I feel so,

  1. __tsan_on_report_(the function marked as a weak symbol) does have a pointer to an object of class DescRep which internally has many members of different objects.
  2. Because of the above point, there will be a significant amount of dependent header files(in the tsan library) and their corresponding implementation to be imported (copied) into archer.
  3. This so because, the tsan implementation involves their own data structures and have not made use of data structures available in STL library.
  4. Even if decided to be implemented, we would have to update every now and then
  5. Also we would have to copy all of PrintReport as there is a call to it before the __tsan_on_report is called. (__tsan_on_report and PrintReport are called for every race detected) As found out by @simoatze we can set the log path to /dev/null of PrintReport call instead of the stderr to nullify the PrintReport call that happens before __tsan_on_report
  6. Hence, will it not become almost equivalent to shipping our own tsan library ? and defeat the purpose ?

Let me know as to what to proceed with next?

Thanks,
Sumedh.

@simoatze
Copy link
Member

simoatze commented Mar 3, 2017

@dongahn I was suggesting @SumedhArani to import the entire "compiler-rt" as a submodule of archer, in this way we can just redefine the __tsan_on_report function and for compiling purposes we will have all the header file we need. The function __tsan_on_report has a (void *) ptr to the RepDesc structure that has all the info for the report reduction, in this way we can just do everything in our shared library and don't need for any edit on the compiler-rt. Does it make sense?

@dongahn
Copy link

dongahn commented Mar 3, 2017

Nice suggestion! Are the API exposed from compiler-rt pretty stable? Just make sure the new definition of __tsan_on_report is written to only stable APIs which don't change much between different versions of compiler-rt. This way, you don't want to suffer from slight API/ABI mismatches between the imported compiler-rt and the compiler-rt used in a installed version. Also if there are ways to do API "version" checks, make sure you folks use these mechanisms.

jprotze pushed a commit that referenced this pull request May 11, 2018
atos is apparently not able to resolve symbol addresses properly on
i386-darwin reliably any more. This is causing bot flakiness:
http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/6841

There have not been any SDK changes on the bot as of late.

/Users/buildslave/jenkins/sharedspace/clang-stage1-cmake-RA_workspace/llvm/projects/compiler-rt/test/asan/TestCases/Darwin/atos-symbolizer.cc:20:12: error: expected string not found in input
 // CHECK: #1 0x{{.*}} in main {{.*}}atos-symbolizer.cc:[[@LINE-4]]
           ^
<stdin>:35:27: note: scanning from here
 #0 0x112f56 in wrap_free (/Users/buildslave/jenkins/sharedspace/clang-stage1-cmake-RA_workspace/clang-build/lib/clang/5.0.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib:i386+0x56f56)
                          ^
<stdin>:35:27: note: with expression "@LINE-4" equal to "16"
 #0 0x112f56 in wrap_free (/Users/buildslave/jenkins/sharedspace/clang-stage1-cmake-RA_workspace/clang-build/lib/clang/5.0.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib:i386+0x56f56)
                          ^
<stdin>:36:168: note: possible intended match here
 #1 0xb6f20 in main (/Users/buildslave/jenkins/sharedspace/clang-stage1-cmake-RA_workspace/clang-build/tools/clang/runtime/compiler-rt-bins/test/asan/I386DarwinConfig/TestCases/Darwin/Output/atos-symbolizer.cc.tmp:i386+0x1f20)

git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@304674 91177308-0d34-0410-b5e6-96231b3b80d8
jprotze pushed a commit that referenced this pull request May 11, 2018
Summary:
__builtin_clz used for Log calculation returns an undefined result
when argument is 0. I noticed that issue when was testing some fuzzers:

```
/src/libfuzzer/FuzzerTracePC.h:282:33: runtime error: shift exponent 450349 is too large for 32-bit type 'uint32_t' (aka 'unsigned int')
  #0 0x43d83f in operator() /src/libfuzzer/FuzzerTracePC.h:283:33
  #1 0x43d83f in void fuzzer::TracePC::CollectFeatures<fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*)::$_1>(fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*)::$_1) const /src/libfuzzer/FuzzerTracePC.h:290
  llvm-mirror#2 0x43cbd4 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/libfuzzer/FuzzerLoop.cpp:445:7
  llvm-mirror#3 0x43e5f1 in fuzzer::Fuzzer::ReadAndExecuteSeedCorpora(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:706:5
  llvm-mirror#4 0x43e9e1 in fuzzer::Fuzzer::Loop(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, fuzzer::fuzzer_allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&) /src/libfuzzer/FuzzerLoop.cpp:739:3
  llvm-mirror#5 0x432f8c in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:754:6
  llvm-mirror#6 0x42ee18 in main /src/libfuzzer/FuzzerMain.cpp:20:10
  llvm-mirror#7 0x7f17ffeb182f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
  llvm-mirror#8 0x407838 in _start (/out/rotate_fuzzer+0x407838)

Reviewers: kcc

Reviewed By: kcc

Subscribers: llvm-commits, #sanitizers

Differential Revision: https://reviews.llvm.org/D41457

git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@321211 91177308-0d34-0410-b5e6-96231b3b80d8
jprotze pushed a commit that referenced this pull request Sep 5, 2018
Summary:
Running sanitized 32-bit x86 programs on glibc 2.27 crashes at startup, with:

    ERROR: AddressSanitizer: SEGV on unknown address 0xf7a8a250 (pc 0xf7f807f4 bp 0xff969fc8 sp 0xff969f7c T16777215)
    The signal is caused by a WRITE memory access.
    #0 0xf7f807f3 in _dl_get_tls_static_info (/lib/ld-linux.so.2+0x127f3)
    #1 0xf7a92599  (/lib/libasan.so.5+0x112599)
    llvm-mirror#2 0xf7a80737  (/lib/libasan.so.5+0x100737)
    llvm-mirror#3 0xf7f7e14f in _dl_init (/lib/ld-linux.so.2+0x1014f)
    llvm-mirror#4 0xf7f6eb49  (/lib/ld-linux.so.2+0xb49)

The problem is that glibc changed the calling convention for the GLIBC_PRIVATE
symbol that sanitizer uses (even when it should not, GLIBC_PRIVATE is exactly
for symbols that can change at any time, be removed etc.), see
https://sourceware.org/ml/libc-alpha/2017-08/msg00497.html

Fixes google/sanitizers#954

Patch By: Jakub Jelinek

Reviewed By: vitalybuka, Lekensteyn

Differential Revison: https://reviews.llvm.org/D44623

git-svn-id: https://llvm.org/svn/llvm-project/compiler-rt/trunk@334363 91177308-0d34-0410-b5e6-96231b3b80d8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants