-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux PID attach feature causes application to segfault #5054
Comments
This is the attach that PR #5019 just barely added, that doesn't even have a regression test yet? @M3m3M4n may be able to help -- but this is still an experimental feature with issues to iron out so if you could look further into it @natarajanragavendra that would be good. Maybe look through the attach code for debug-only things; timing is another thing debug vs release often hits but that is unlikely here. |
This should not crash DR or drrun. @natarajanragavendra can you put it in a while(1) and try again? |
I think I know why, running -attach without something behind to parse as pid result in drrun segfaulting. In @natarajanragavendra case app ended too fast and pidof returns nothing -> drrun segfault. -debug or not has no effect in this case. |
Thanks for the quick response @M3m3M4n. I did try with a while(1) and I can see the same behaviour. Attach process works with -debug but segfaults otherwise. I see the segfault occurring in the application and not in drrun. |
Best for drrun to print an error msg if there's no such process though.
The PC and callstack of the segfault could help. It's after attach (use |
I can confirm this bug, DR does crash in release build. However, DR runs fine with debug version built with -DDEBUG=ON. Need to investigate. |
@derekbruening this one is out of my area, need your help. I did some digging, DR segfaulted after injection has completed, inside core/heap.c -> common_heap_alloc I had to modify dr_inject_process_run to issue ptrace_cont in loop to live dump memory. Here is register state in core dump the cause is null R12 as PC is currently executing this instr and R12 is tu->cur_unit in the code. it is 0. trying to get tu->cur_unit->cur_pc failed backtrace in look unreliable so the stack might have more clue:
call chain backward is: I don't know why running with debug version of libdynamorio.so is perfectly fine |
The error doesn't occur with -debug -checklevel 0 |
Not sure at first glance. os_thread_take_over_secondary is normally for additional threads: but this is a single-threaded app? I would look into why the (only) primary thread thinks it's not initialized at the point of that call. |
As part of fixing this, please remove the docs disabling directives I put into PR #5227 to prevent people from trying it before it works. |
Another thing to be improved is that if ptrace attach capabilities are not enabled, the target process is killed with SIGKILL and there is no useful message saying that's what happened. E.g.:
|
A further thing (this is mentioned up above as well) is that drrun crashes if no pid is passed:
|
Release build seems to work in at least some cases: with ptrace capabilities, on my machine attaching to |
I can reproduce this in an Ubuntu20 VM. It looks like the problem is that Another issue is that attach doesn't clear |
I believe this is not really release vs debug: it's the file size and the |
Fixes a number of issues with Linux attach: + Set xdi to zero for x86 _start relocation of libdynamorio. + Implement remote memset for .bss zeroing in elf_loader_map_phdrs(), fixing a crash in some builds such as Ubuntu20 release build. + Don't kill target if attach fails. + Fix crash if no pid passed. + Adds a useful error message on failure to look at ptrace permissions. + Adds a warning to use -skip_syscall if attach hangs. + Adds a test by porting the Windows client.attach test to Linux. Disables the mprotect syscall due to weird failures which need to be examined. Further tests of blocking syscalls and -skip_syscall are needed. Re-enables the attach help message for drrun and the deployment docs. Tested release build on Ubuntu20 where the .bss crash reproduced every run and is now gone. Tested "ctest --repeat-until-fail 100 -V -R client.attach" on Ubuntu20 and on a Debian-ish system: no failures. Issue: #38, #5054 Fixes #5054
Fixes a number of issues with Linux attach: + Set xdi to zero for x86 _start relocation of libdynamorio. + Implement remote memset for .bss zeroing in elf_loader_map_phdrs(), fixing a crash in some builds such as Ubuntu20 release build. + Don't kill target if attach fails. + Fix crash if no pid passed. + Adds a useful error message on failure to look at ptrace permissions. + Adds a warning to use -skip_syscall if attach hangs. + Adds a test by porting the Windows client.attach test to Linux. Disables the mprotect syscall due to weird failures which need to be examined. Further tests of blocking syscalls and -skip_syscall are needed. Re-enables the attach help message for drrun and the deployment docs. Tested release build on Ubuntu20 where the .bss crash reproduced every run and is now gone. Tested "ctest --repeat-until-fail 100 -V -R client.attach" on Ubuntu20 and on a Debian-ish system: no failures. Issue: #38, #5054 Fixes #5054
Describe the bug
The PID attach feature on Linux causes an application segfault. However, the attach feature on the debug build of DynamoRio works correctly
To Reproduce
Steps to reproduce the behavior:
Precise command line for running the application.
./a.out
Exact output or incorrect behavior.
$ ./a.out & /mnt/benchmarks/raga/dimprint/exports/bin64/drrun -attach $(pidof a.out)
[1] 65099
[1]+ Segmentation fault (core dumped) ./a.out
Please also answer these questions:
The application and PID attach work as expected
$ ./a.out & /mnt/benchmarks/raga/dimprint/exports/bin64/drrun -debug -attach $(pidof a.out)
[1] 65106
<Starting application /mnt/benchmarks/raga/a.out (65106)>
<Initial options = -no_dynamic_options -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Stopping application /mnt/benchmarks/raga/a.out (65106)>
[1]+ Done ./a.out
Expected behavior
DynamoRio should attach to the specified PID
Screenshots or Pasted Text
If applicable, add screenshots to help explain your problem. For text, please cut and paste the text here, delimited by lines consisting of three backtics to render it verbatim, like this:
Versions
What version of DynamoRIO are you using?
drrun version 8.0.18855 -- build 0
Does the latest build from https://github.com/DynamoRIO/dynamorio/releases solve the problem?
No
What operating system version are you running on? ("Windows 10" is not sufficient: give the release number.)
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
64-bit
$ file ./a.out
./a.out: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=3040b314526c220386b916098a9a46fbce7ebe23, not stripped
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: