CRASH from race on detach in LOG debug-build code #4641

derekbruening · 2020-12-29T19:37:58Z

While testing a fix for #3535, running debug build I hit this crash maybe once in 100 runs:

#4  0x00007fe6e8f590a5 in getchar () at getchar.c:37
#5  0x00005640f8054db3 in handle_signal (signal=11, siginfo=0x7fe6e56cff30, ucxt=0x7fe6e56cfe00) at /home/bruening/dr/git/src/suite/tests/api/detach_signal.cpp:105
#6  <signal handler called>
#7  0x00007fe6e93d1c72 in master_signal_handler_C (sig=7, siginfo=0x7fe6e56d0770, ucxt=0x7fe6e56d0640, xsp=0x7fe6e56d0638 ";\016\071\351\346\177") at /home/bruening/dr/git/src/core/unix/signal.c:5107

   0x00007fe6e93d1c59 <+1004>:	lea    0x161388(%rip),%rax        # 0x7fe6e9532fe8 <d_r_stats>
   0x00007fe6e93d1c60 <+1011>:	mov    (%rax),%rax
   0x00007fe6e93d1c63 <+1014>:	test   %rax,%rax
   0x00007fe6e93d1c66 <+1017>:	je     0x7fe6e93d1ccc <master_signal_handler_C+1119>
   0x00007fe6e93d1c68 <+1019>:	lea    0x161379(%rip),%rax        # 0x7fe6e9532fe8 <d_r_stats>
   0x00007fe6e93d1c6f <+1026>:	mov    (%rax),%rax
==>
   0x00007fe6e93d1c72 <+1029>:	mov    0x218(%rax),%eax
   0x00007fe6e93d1c78 <+1035>:	test   %eax,%eax
   0x00007fe6e93d1c7a <+1037>:	je     0x7fe6e93d1ccc <master_signal_handler_C+1119>
   0x00007fe6e93d1c7c <+1039>:	lea    0x161365(%rip),%rax        # 0x7fe6e9532fe8 <d_r_stats>
   0x00007fe6e93d1c83 <+1046>:	mov    (%rax),%rax
   0x00007fe6e93d1c86 <+1049>:	mov    0x214(%rax),%eax
   0x00007fe6e93d1c8c <+1055>:	and    $0x10,%eax
   0x00007fe6e93d1c8f <+1058>:	test   %eax,%eax
   0x00007fe6e93d1c91 <+1060>:	je     0x7fe6e93d1ccc <master_signal_handler_C+1119>
   0x00007fe6e93d1c93 <+1062>:	callq  0x7fe6e93ae109 <get_sys_thread_id>

(gdb) p &((dr_statistics_t*)0)->loglevel
$2 = (uint *) 0x218
(gdb) p d_r_stats
$3 = (dr_statistics_t *) 0x0

Must be a race where d_r_stats is set to NULL in between the check for NULL and
the de-ref of loglevel.

I was not planning to fix it: b/c it would require eliminating all LOG calls on detach paths, which are useful for debugging, and b/c it is limited to debug build.

The text was updated successfully, but these errors were encountered:

Eliminates a race on detach where the detaching thread removes DR's SIGSEGV handler while a now-native detached thread is in the middle of having a signal delivered (natively) and invokes a TLS magic field safe read, whose SIGSEGV then goes to the application. The detaching thread is the one doing all the real cleanup, so we simply avoid any safe reads or TLS for detaching threads by recording the detacher's ID when we start the detach process. This var is not cleared until re-init, so we have no race with the end of detach. Tested on api.detach_signal with the forthcoming signal mask checks, which trigger when the handler is invoked for a DR signal instead of an app-generated signal. Without this fix, the test fails easily: about 1 in 5 runs in debug build. With this fix, it succeeds 200x in a loop. I still see one type of crash in debug build, a rare race where d_r_stats is set to NULL in between the check and use of a LOG(), but that is limited to debug and is beyond the scope of this fix and is much lower priority: I filed it as #4641. Fixes #3535

derekbruening · 2020-12-30T19:41:56Z

The best way to solve this may be to leave the useful LOG calls but move
loglevel to a static global struct that's not freed (i.e., make d_r_stats
like the options structs), or even its own special global var. IIRC it was
placed in d_r_stats back when we had a UI to tweak options via shared
memory.

derekbruening added Priority-Low Bug-DRCrash OpSys-UNIX labels Dec 29, 2020

derekbruening mentioned this issue Dec 29, 2020

i#3535: Solve detach handler race #4642

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRASH from race on detach in LOG debug-build code #4641

CRASH from race on detach in LOG debug-build code #4641

derekbruening commented Dec 29, 2020

derekbruening commented Dec 30, 2020

CRASH from race on detach in LOG debug-build code #4641

CRASH from race on detach in LOG debug-build code #4641

Comments

derekbruening commented Dec 29, 2020

derekbruening commented Dec 30, 2020