Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

browser_tests hangs with drcov tool on Linux #1390

Open
derekbruening opened this issue Nov 28, 2014 · 12 comments
Open

browser_tests hangs with drcov tool on Linux #1390

derekbruening opened this issue Nov 28, 2014 · 12 comments

Comments

@derekbruening
Copy link
Contributor

From [email protected] on March 13, 2014 17:05:33

/home/zhaoqin/Workspace/DynamoRIO/builds/build_x64_dbg.git/bin64/drrun -debug -checklevel 1 -disable_traces -c /home/zhaoqin/Workspace/DynamoRIO/builds/build_x64_dbg.git/clients/lib64/debug/libdrcov.so -- ./out/Release/browser_tests --no-sandbox --gtest_filter=AppListStartPageWebUITest.Basic --ui-test-action-max-timeout=80000000 --ui-test-action-timeout=40000000

ps -U zhaoqin | grep brow
7990 pts/11 00:20:53 browser_tests
8022 pts/11 00:21:02 browser_tests
9451 pts/11 00:00:01 browser_tests
9453 pts/11 00:00:18 browser_tests
9454 pts/11 00:00:00 browser_tests
9455 pts/11 00:00:01 browser_tests
9490 pts/11 00:05:06 browser_tests
9510 pts/11 00:05:14 browser_tests

Attach to 9451, 9453, stop at nop after syscall
Attach to 9490, 9510, loop at debug_infinite_loop

It runs fine without drcov client

Original issue: http://code.google.com/p/dynamorio/issues/detail?id=1390

@derekbruening
Copy link
Contributor Author

From [email protected] on March 17, 2014 09:19:08

Tried different options:
DR: pass
DR+empty: pass
DR+bbcount: pass, but two process left running:
[ PASSED ] 1 test.
<Stopping application /usr/local/google/home/zhaoqin/Workspace/Chrome/chromium.git/src/out/Release/browser_tests (20899)>
Instrumentation results:
26790553 basic block executions

!ps
ps -U zhaoqin | grep brow
20924 pts/11 00:00:14 browser_tests
20949 pts/11 00:00:30 browser_tests

@derekbruening
Copy link
Contributor Author

From [email protected] on March 17, 2014 12:16:30

ps -U zhaoqin | grep browser_tests
22365 pts/11 00:00:01 browser_tests
22367 pts/11 00:00:18 browser_tests
22368 pts/11 00:00:00 browser_tests
22369 pts/11 00:00:01 browser_tests
22393 pts/11 00:02:17 browser_tests
22413 pts/11 00:02:29 browser_tests

22365:

(gdb) info threads
Id Target Id Frame

  • 2 Thread 0x7fb54210b700 (LWP 22366) "test_launcherWo" 0x0000000046d7d19e in ?? ()
    1 Thread 0x7fb54d35b980 (LWP 22365) "browser_tests" 0x00000000468314f1 in ?? ()
    (gdb) thread 1
    [Switching to thread 1 (Thread 0x7fb54d35b980 (LWP 22365))]
    #0 0x00000000468314f1 in ?? ()

    0x468314e3: mov $0xe8,%eax
    0x468314e8: jmp 0x468314ef
    0x468314ea: jmpq 0x46d8e835
    0x468314ef: syscall
    => 0x468314f1: nop
    0x468314f2: jmpq 0x46d8e84c

(gdb) thread 2
[Switching to thread 2 (Thread 0x7fb54210b700 (LWP 22366))]
#0 0x0000000046d7d19e in ?? ()

0x46d7d190: mov $0x23,%eax
0x46d7d195: jmp 0x46d7d19c
0x46d7d197: jmpq 0x46d91213
0x46d7d19c: syscall
=> 0x46d7d19e: nop

22367:
(gdb) info threads
Id Target Id Frame
18 Thread 0x7f34d8802700 (LWP 22372) "dconf worker" 0x000000004704d0f2 in ?? ()
17 Thread 0x7f34d8001700 (LWP 22373) "gdbus" 0x000000004727d0b2 in ?? ()
16 Thread 0x7f34d619e700 (LWP 22375) "NetworkChangeNo" 0x00000000470fd4ed in ?? ()
15 Thread 0x7f34d599d700 (LWP 22376) "inotify_reader" 0x00000000471ad155 in ?? ()
14 Thread 0x7f34d519c700 (LWP 22377) "WorkerPool/2237" 0x0000000047bf523f in ?? ()
13 Thread 0x7f34d517b700 (LWP 22378) "WorkerPool/2237" 0x0000000047c452af in ?? ()
12 Thread 0x7f34d5159700 (LWP 22379) "AudioThread" 0x0000000047cc5123 in ?? ()
11 Thread 0x7f34d32e1700 (LWP 22380) "threaded-ml" 0x0000000047e1d2ea in ?? ()
10 Thread 0x7f34d4951700 (LWP 22381) "CrShutdownDetec" 0x0000000047edd0e2 in ?? ()
9 Thread 0x7f34ceadf700 (LWP 22382) "Chrome_DBThread" 0x00000000481290e3 in ?? ()
8 Thread 0x7f34ce2de700 (LWP 22383) "Chrome_FileThre" 0x000000004670fc02 in ?? ()
7 Thread 0x7f34cbad9700 (LWP 22388) "IndexedDB" 0x0000000048359123 in ?? ()
6 Thread 0x7f34c90ce700 (LWP 22394) "BrowserBlocking" 0x0000000048da515b in ?? ()
5 Thread 0x7f34c2957700 (LWP 22406) "BrowserBlocking" 0x000000004a5a52b3 in ?? ()
4 Thread 0x7f34c3158700 (LWP 22482) "BrowserBlocking" 0x0000000049b0d2af in ?? ()
3 Thread 0x7f34c210d700 (LWP 22483) "Shutdown watchd" 0x000000004b41917b in ?? ()
2 Thread 0x7f34c87cd700 (LWP 22484) "Chrome_ProcessL" 0x0000000048599134 in ?? ()

  • 1 Thread 0x7f34e6c12980 (LWP 22367) "browser_tests" 0x000000004670fc02 in ?? ()
    (gdb) thread 1
    [Switching to thread 1 (Thread 0x7f34e6c12980 (LWP 22367))]
    #0 0x000000004670fc02 in ?? ()

    0x4670fc00: syscall
    => 0x4670fc02: movabs %rax,%gs:0x0
    0x4670fc0d: movabs $0x7133a6e8,%rax
    0x4670fc17: jmpq 0x4670edc0

(gdb) where
#0 0x000000004670fc02 in ?? ()
#1 0x000000000000003c in ?? ()
#2 0x00007ffff34daf60 in ?? ()
#3 0x00007f34e613e070 in ?? () from /lib/x86_64-linux-gnu/libpthread.so.0
#4 0x00007f34ce2ded28 in ?? ()
#5 0x0000331bc7e3bab0 in ?? ()
#6 0x0000000000000000 in ?? ()
(gdb) x/10i 0x00007f34e613e070
0x7f34e613e070 : xor %edx,%edx
0x7f34e613e072 <cleanup+2>: mov %fs:0x10,%rax
0x7f34e613e07b <cleanup+11>: lock cmpxchg %rdx,(%rdi)
0x7f34e613e080 <cleanup+16>: retq

22368:
(gdb) info threads
Id Target Id Frame

  • 1 Thread 0x7f34e6c12980 (LWP 22368) "browser_tests" 0x00000000467d552d in ?? ()

    0x467d551f: mov $0x7,%eax
    0x467d5524: jmp 0x467d552b
    0x467d5526: jmpq 0x46cf2f72
    0x467d552b: syscall
    => 0x467d552d: nop
    0x467d552e: jmpq 0x46cf2f89

22393:
(gdb) info threads
Id Target Id Frame

  • 1 Thread 0x7f29b41d6980 (LWP 22393) "browser_tests" debug_infinite_loop () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/x86/x86.asm:861
    (gdb) x/10i global_do_syscall_syscall
    0x712ab06a <global_do_syscall_syscall>: mov %rcx,% r10 0x712ab06d <global_do_syscall_syscall+3>: syscall
    0x712ab06f <global_do_syscall_syscall+5>: jmpq 0x712ab079 <debug_infinite_loop>
    0x712ab074 <global_do_syscall_syscall+10>: jmpq 0x712ab152 <dynamorio_sys_exit_group>
    => 0x712ab079 <debug_infinite_loop>: jmpq 0x712ab079 <debug_infinite_loop>

22413
(gdb) where
#0 debug_infinite_loop () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/x86/x86.asm:861
#1 0x0000000000000000 in ?? ()
(gdb) info threads
Id Target Id Frame

  • 1 Thread 0x7f10e212b980 (LWP 22413) "browser_tests" debug_infinite_loop () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/x86/x86.asm:861
    (gdb) x/10i global_do_syscall_syscall
    0x712ab06a <global_do_syscall_syscall>: mov %rcx,% r10 0x712ab06d <global_do_syscall_syscall+3>: syscall
    0x712ab06f <global_do_syscall_syscall+5>: jmpq 0x712ab079 <debug_infinite_loop>
    0x712ab074 <global_do_syscall_syscall+10>: jmpq 0x712ab152 <dynamorio_sys_exit_group>
    => 0x712ab079 <debug_infinite_loop>: jmpq 0x712ab079 <debug_infinite_loop>

(gdb) x/60gx $rbp
0x4d0aeb58: 0x000000004d0af000 0x000000004d05ea80
0x4d0aeb68: 0x000000000000003e 0x000000000000578d
0x4d0aeb78: 0x000000000000000f 0x0000000000000001
0x4d0aeb88: 0x00000000712f9704 0xfffffffffffffd48

0x712f96fc <terminate_via_kill+102>: mov %rax,%rdi
0x712f96ff <terminate_via_kill+105>: callq 0x712aaf9d <cleanup_and_terminate>
0x712f9704 <terminate_via_kill+110>: mov 0x2ed585(%rip),%rax # 0x715e6c90

So it may came from the cleanup_and_terminate

@derekbruening
Copy link
Contributor Author

From [email protected] on March 17, 2014 12:43:53

more on 22413

(gdb) info threads
Id Target Id Frame

  • 1 Thread 0x7f10e212b980 (LWP 22413) "browser_tests" debug_infinite_loop () at /home/zhaoqin/Workspace/DynamoRIO/dynamorio.git/core/x86/x86.asm:861
    (gdb) info reg
    rax 0x0 0
    rbx 0x578d 22413
    rcx 0x712ab06f 1898623087
    rdx 0x0 0
    rsi 0xf 15
    rdi 0x578d 22413
    rbp 0x4d0aeb58 0x4d0aeb58
    rsp 0x4d06f000 0x4d06f000 r8 0x5137e 332670 r9 0x1 1 r10 0x0 0 r11 0x246 582 r12 0x67616a0 108402336 r13 0x0 0 r14 0x0 0 r15 0x0 0
    rip 0x712ab079 0x712ab079 <debug_infinite_loop>
    eflags 0x246 [ PF ZF IF ]
    cs 0x33 51
    ss 0x2b 43
    ds 0x0 0
    es 0x0 0
    fs 0x0 0
    gs 0x0 0

(gdb) x/30gx 0x4d0aeb58
0x4d0aeb58: 0x000000004d0af000 0x000000004d05ea80
0x4d0aeb68: 0x000000000000003e 0x000000000000578d
0x4d0aeb78: 0x000000000000000f 0x0000000000000001
0x4d0aeb88: 0x00000000712f9704 0xfffffffffffffd48
0x4d0aeb98: 0x000000004d05ea80 0x000000004d0aebb0
0x4d0aeba8: 0x0000000000000000 0x000000004d0aebd0
0x4d0aebb8: 0x00000000712f97fa 0x0000000f00004000
0x4d0aebc8: 0x000000004d05ea80 0x000000004d0aed10
0x4d0aebd8: 0x00000000712fa985 0x0000000000000000

(gdb) x/50i 0x712aaf9d
0x712aaf9d <cleanup_and_terminate>: lea -0x28(%rsp),%rsp
0x712aafa2 <cleanup_and_terminate+5>: lea -0x8(%rsp),%rbp
0x712aafa7 <cleanup_and_terminate+10>: mov % r8 ,0x28(%rbp)
0x712aafab <cleanup_and_terminate+14>: mov %rdi,0x8(%rbp)
0x712aafaf <cleanup_and_terminate+18>: mov %rsi,0x10(%rbp)
0x712aafb3 <cleanup_and_terminate+22>: mov %rdx,0x18(%rbp)
0x712aafb7 <cleanup_and_terminate+26>: mov %rcx,0x20(%rbp)

calling from terminate_via_kill
cleanup_and_terminate(dcontext, SYS_kill,
/* Pass -pid in case main thread has exited
* in which case will get -ESRCH
*/
IF_VMX86(os_in_vmkernel_userworld() ?
-(int)get_process_id() :)
get_process_id(),
dcontext->sys_param0, true);

dcontext->sys_param0: 0xf
pid: 0x578d,
SYS_kill: 0x3e

dcontext: 0x000000004d05ea80

@derekbruening
Copy link
Contributor Author

From [email protected] on March 17, 2014 13:04:40

(gdb) x/300gx 0x4d0aeb58
0x4d0aeb58: 0x000000004d0af000 0x000000004d05ea80
0x4d0aeb68: 0x000000000000003e 0x000000000000578d
0x4d0aeb78: 0x000000000000000f 0x0000000000000001
0x4d0aeb88: 0x00000000712f9704 0xfffffffffffffd48 // terminate_via_kill
0x4d0aeb98: 0x000000004d05ea80 0x000000004d0aebb0
0x4d0aeba8: 0x0000000000000000 0x000000004d0aebd0
0x4d0aebb8: 0x00000000712f97fa 0x0000000f00004000 // terminate_via_kill_from_anywhere
0x4d0aebc8: 0x000000004d05ea80 0x000000004d0aed10
0x4d0aebd8: 0x00000000712fa985 0x0000000000000000 // execute_default_action
0x4d0aebe8: 0x000000004d0e1038 0x0000000f4d05ea01
...
0x4d0aed18: 0x00000000712fb1ed 0x0000000000000000 // execute_default_from_dispatch
0x4d0aed28: 0x000000004d0e1038 0x0000000f4d0e1068
0x4d0aed38: 0x000000004d05ea80 0x000000004d0aedc0
0x4d0aed48: 0x00000000712f937c 0x0000000000000000 // execute_handler_from_dispatch
0x4d0aed58: 0x000000000000578d 0x0000000f4d0aed90
0x712f96ff <terminate_via_kill+105>: callq 0x712aaf9d <cleanup_and_terminate>
0x712f9704 <terminate_via_kill+110>: mov 0x2ed585(%rip),%rax # 0x715e6c90

0x712f97f5 <terminate_via_kill_from_anywhere+98>: callq 0x712f9696 <terminate_via_kill>
0x712f97fa <terminate_via_kill_from_anywhere+103>: mov 0x2ed48f(%rip),%rax # 0x715e6c90

0x712fa980 <execute_default_action+3166>: callq 0x712f9793 <terminate_via_kill_from_anywhere>
0x712fa985 <execute_default_action+3171>: mov 0x2ec304(%rip),%rax # 0x715e6c90

0x711b59c0 <heap_free+90>: callq 0x711b4a0e <common_heap_free>
0x711b59c5 <heap_free+95>: mov %al,-0x9(%rbp)

0x712fb1e8 <execute_default_from_dispatch+44>: callq 0x712f9d22 <execute_default_action>
0x712fb1ed <execute_default_from_dispatch+49>: leaveq

0x712f9377 <execute_handler_from_dispatch+2303>: callq 0x712fb1bc <execute_default_from_dispatch>
0x712f937c <execute_handler_from_dispatch+2308>: mov $0x1,%eax

0x712fb64f <receive_pending_signal+1120>: callq 0x712f8a78 <execute_handler_from_dispatch>
0x712fb654 <receive_pending_signal+1125>: mov %al,-0x11(%rbp)

@derekbruening
Copy link
Contributor Author

From [email protected] on March 27, 2014 12:05:15

If remove the SIGTERM and only keep SIGKILL, the test won't hang, but still fail.
Before all process exit, we can see something like:

ps -U zhaoqin | grep browser_tests
13808 pts/0 00:00:05 browser_tests
13828 pts/0 00:01:19 browser_tests
13845 pts/0 00:00:00 browser_tests
13846 pts/0 00:00:05 browser_tests

process 13845 and 13846 are children of 13828, which is a child process of 13808.
13846 becomes a zombie, does it mean 13846 exit earlier than its parent 13828 expect?

@derekbruening
Copy link
Contributor Author

From [email protected] on March 27, 2014 15:50:29

-no_nudge_kills has the same problem, so at least there are problems without soft_kills.
It looks like caused by the code event_pre_syscall.

static bool
event_pre_syscall(void drcontext, int sysnum)
{
#ifdef UNIX
/
We assume execve always succeeds */
if (sysnum == sysnum_execve) {
event_thread_exit(drcontext);
event_exit();
}
#endif
return true;
}

@derekbruening
Copy link
Contributor Author

From [email protected] on April 03, 2014 14:36:34

By replacing the event_exit with dump data only, there is no hang.
However, it seems there are still a browser_tests process running on debug_infinit_loop, which might caused by a syscall failure.

@derekbruening
Copy link
Contributor Author

From [email protected] on April 03, 2014 14:37:34

In drcov, on pre-execve-syscall, we should iterate over all thread and dump data in the case of drcov_per_thread.

@derekbruening
Copy link
Contributor Author

From [email protected] on April 03, 2014 14:39:17

On pre-execve-syscall, we dumped the coverage data. However, the execve test may fail, and we may dump the coverage data to the same file again later.
So the solution should be modifying drcov2lcov to handle multiple dumps in the same file.

@derekbruening
Copy link
Contributor Author

From [email protected] on April 03, 2014 15:06:09

Re: execve failing: it may be better to handle this in the client. execve failing is not uncommon (some apps just try execve with each component on path instead of a separate check for existence), and if you have a large app run a lot of code and then do 30 failing execves you're going to have an enormous logfile.

@derekbruening
Copy link
Contributor Author

From [email protected] on April 04, 2014 10:02:50

This issue was closed by revision r2632 .

Status: Fixed

@derekbruening
Copy link
Contributor Author

From [email protected] on April 04, 2014 11:09:08

Keep it open as there are still a few issues to be fixed.

Status: Accepted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant