Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux PID attach feature causes application to segfault #5054

Closed
Tracked by #5145
natarajanragavendra opened this issue Aug 17, 2021 · 15 comments · Fixed by #5230
Closed
Tracked by #5145

Linux PID attach feature causes application to segfault #5054

natarajanragavendra opened this issue Aug 17, 2021 · 15 comments · Fixed by #5230
Assignees

Comments

@natarajanragavendra
Copy link

natarajanragavendra commented Aug 17, 2021

Describe the bug
The PID attach feature on Linux causes an application segfault. However, the attach feature on the debug build of DynamoRio works correctly

To Reproduce
Steps to reproduce the behavior:

  1. Pointer to a minimized application (ideally the source code for it and instructions on which toolchain it was built with)
int main()
{
    int array [16384];

    for (int j = 0; j < 16384; j++) {
        for (int i = 0; i < 16384; i++) {
            array [i] = i;
        }
    }
    return 0;
}
  1. Precise command line for running the application.
    ./a.out

  2. Exact output or incorrect behavior.

$ ./a.out & /mnt/benchmarks/raga/dimprint/exports/bin64/drrun -attach $(pidof a.out)
[1] 65099
[1]+ Segmentation fault (core dumped) ./a.out

Please also answer these questions:

  • What happens when you run without any client? The application segfaults
  • What happens when you run with debug build ("-debug" flag to drrun/drconfig/drinject)?

The application and PID attach work as expected

$ ./a.out & /mnt/benchmarks/raga/dimprint/exports/bin64/drrun -debug -attach $(pidof a.out)
[1] 65106
<Starting application /mnt/benchmarks/raga/a.out (65106)>
<Initial options = -no_dynamic_options -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Stopping application /mnt/benchmarks/raga/a.out (65106)>
[1]+ Done ./a.out

Expected behavior
DynamoRio should attach to the specified PID

Screenshots or Pasted Text
If applicable, add screenshots to help explain your problem. For text, please cut and paste the text here, delimited by lines consisting of three backtics to render it verbatim, like this:

```
paste output here
```

Versions

  • What version of DynamoRIO are you using?
    drrun version 8.0.18855 -- build 0

  • Does the latest build from https://github.com/DynamoRIO/dynamorio/releases solve the problem?
    No

  • What operating system version are you running on? ("Windows 10" is not sufficient: give the release number.)

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

  • Is your application 32-bit or 64-bit?
    64-bit

$ file ./a.out
./a.out: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=3040b314526c220386b916098a9a46fbce7ebe23, not stripped

Additional context
Add any other context about the problem here.

@natarajanragavendra natarajanragavendra changed the title APP CRASH APP CRASH: Linux PID attach feature causes application to segfault Aug 17, 2021
@natarajanragavendra natarajanragavendra changed the title APP CRASH: Linux PID attach feature causes application to segfault Linux PID attach feature causes application to segfault Aug 17, 2021
@derekbruening
Copy link
Contributor

This is the attach that PR #5019 just barely added, that doesn't even have a regression test yet?

@M3m3M4n may be able to help -- but this is still an experimental feature with issues to iron out so if you could look further into it @natarajanragavendra that would be good. Maybe look through the attach code for debug-only things; timing is another thing debug vs release often hits but that is unlikely here.

@M3m3M4n
Copy link
Contributor

M3m3M4n commented Aug 17, 2021

This should not crash DR or drrun. @natarajanragavendra can you put it in a while(1) and try again?

@M3m3M4n
Copy link
Contributor

M3m3M4n commented Aug 17, 2021

I think I know why, running -attach without something behind to parse as pid result in drrun segfaulting. In @natarajanragavendra case app ended too fast and pidof returns nothing -> drrun segfault. -debug or not has no effect in this case.

@natarajanragavendra
Copy link
Author

natarajanragavendra commented Aug 17, 2021

This should not crash DR or drrun. @natarajanragavendra can you put it in a while(1) and try again?

Thanks for the quick response @M3m3M4n. I did try with a while(1) and I can see the same behaviour. Attach process works with -debug but segfaults otherwise. I see the segfault occurring in the application and not in drrun.

@derekbruening
Copy link
Contributor

I think I know why, running -attach without something behind to parse as pid result in drrun segfaulting.

Best for drrun to print an error msg if there's no such process though.

I did try with a while(1) and I can see the same behaviour. Attach process works with -debug but segfaults otherwise. I see the segfault occurring in the application and not in drrun.

The PC and callstack of the segfault could help. It's after attach (use -stderr_mask 15 to help see)? Does it happen with -debug -checklevel 0?

@M3m3M4n
Copy link
Contributor

M3m3M4n commented Aug 17, 2021

I can confirm this bug, DR does crash in release build. However, DR runs fine with debug version built with -DDEBUG=ON. Need to investigate.

@M3m3M4n
Copy link
Contributor

M3m3M4n commented Aug 17, 2021

@derekbruening this one is out of my area, need your help.

I did some digging, DR segfaulted after injection has completed, inside core/heap.c -> common_heap_alloc

I had to modify dr_inject_process_run to issue ptrace_cont in loop to live dump memory.

Here is register state in core dump
regs

the cause is null R12 as PC is currently executing this instr
disass

and R12 is tu->cur_unit in the code. it is 0. trying to get tu->cur_unit->cur_pc failed

backtrace in look unreliable so the stack might have more clue:

0x7fffc937cb10: 0x00000000711dd3e0 0x0000000000000000
0x7fffc937cb20: 0x00000000711e9ff0 0x00007fffc937cba4
0x7fffc937cb30: 0x0000000000000000 0x0000000000000000
0x7fffc937cb40: 0x0000000000000000 0x000000007108422d
0x7fffc937cb50: 0x00007fffc937cba8 0x00007fffc937cba4
0x7fffc937cb60: 0x0000000000000000 0x000000007104d7b3
0x7fffc937cb70: 0x0000000000000000 0x0000000000000000
0x7fffc937cb80: 0x00007fffc937db68 0x00000000711dd320
0x7fffc937cb90: 0x00007fffc937db68 0x000000007111b611
0x7fffc937cba0: 0x0000000000000000 0x000000007104e821
0x7fffc937cbb0: 0x00007fffc937db68 0x00007fffc937db68
0x7fffc937cbc0: 0x0000000000000000 0x00007fffc937db68
0x7fffc937cbd0: 0x00007fffc937db60 0x000000007110cc56
0x7fffc937cbe0: 0x00007fffc937db68 0x0000000071200200
0x7fffc937cbf0: 0x0000561fd5ca4040 0x000000007112aaae
0x7fffc937cc00: 0x0000000000000000 0x0000000000000000

call chain backward is:
common_heap_alloc (crash)
global_heap_alloc (return 0x000000007108422d)
get_list_of_threads_common (0x000000007104d7b3)
os_thread_take_over_secondary (0x000000007111b611)
dynamo_start

I don't know why running with debug version of libdynamorio.so is perfectly fine

@natarajanragavendra
Copy link
Author

The PC and callstack of the segfault could help. It's after attach (use -stderr_mask 15 to help see)? Does it happen with -debug -checklevel 0?

The error doesn't occur with -debug -checklevel 0

@derekbruening
Copy link
Contributor

Not sure at first glance. os_thread_take_over_secondary is normally for additional threads: but this is a single-threaded app? I would look into why the (only) primary thread thinks it's not initialized at the point of that call.

@M3m3M4n M3m3M4n mentioned this issue Oct 5, 2021
21 tasks
derekbruening added a commit that referenced this issue Nov 23, 2021
Hides the -attach flag from the drrun -help output.
Hides the -attach example in the documentation.

Once #5054 is fixed we can re-enable these, but for now it is just too
broken and advertising it causes more confusion than good.

Issue: #5145, #5054
derekbruening added a commit that referenced this issue Nov 24, 2021
Hides the -attach flag from the drrun -help output.
Hides the -attach example in the documentation.

Once #5054 is fixed we can re-enable these, but for now it is just too
broken and advertising it causes more confusion than good.

Issue: #5145, #5054
@derekbruening
Copy link
Contributor

derekbruening commented Nov 24, 2021

As part of fixing this, please remove the docs disabling directives I put into PR #5227 to prevent people from trying it before it works.

@derekbruening
Copy link
Contributor

Another thing to be improved is that if ptrace attach capabilities are not enabled, the target process is killed with SIGKILL and there is no useful message saying that's what happened. E.g.:

$ xcalc &
[1] 400121
$ bin64/drrun -debug -attach $(pgrep xcalc)
ERROR: unable to inject: exec of |(null)| failed
[1]+  Killed                  xcalc

@derekbruening
Copy link
Contributor

derekbruening commented Nov 24, 2021

A further thing (this is mentioned up above as well) is that drrun crashes if no pid is passed:

$ bin64/drrun -attach
Segmentation fault

@derekbruening
Copy link
Contributor

Release build seems to work in at least some cases: with ptrace capabilities, on my machine attaching to xcalc works fine in both debug and release (though the default attach hangs until the mouse is over xcalc; -skip_syscall solves that w/ a warning printed by xcalc on its select failing).

@derekbruening
Copy link
Contributor

I can reproduce this in an Ubuntu20 VM.

It looks like the problem is that dynamo_initialized has a bogus value (18 in my case). This is coming from the .bss not being completely zeroed: which looks like a known problem with a FIXME in elf_loader_map_phdrs() as it has no way to memset another process. This causes initialization to be skipped, leading to a lack of dcontext and the subsequent heap crash.

Another issue is that attach doesn't clear rdi which causes relocation to not happen (b/c of the skip in _start on x86 which assumes the caller (the kernel) has zeroed nearly all the registers). This will cause a problem if libdynamorio's preferred address is occupied in the target process.

@derekbruening derekbruening self-assigned this Nov 27, 2021
@derekbruening
Copy link
Contributor

I believe this is not really release vs debug: it's the file size and the .bss bounds so it depends on the toolchain as well as release vs debug and could happen in debug if the file lined up unluckily.

derekbruening added a commit that referenced this issue Nov 28, 2021
Fixes a number of issues with Linux attach:

+ Set xdi to zero for x86 _start relocation of libdynamorio.

+ Implement remote memset for .bss zeroing in elf_loader_map_phdrs(),
  fixing a crash in some builds such as Ubuntu20 release build.

+ Don't kill target if attach fails.

+ Fix crash if no pid passed.

+ Adds a useful error message on failure to look at ptrace permissions.

+ Adds a warning to use -skip_syscall if attach hangs.

+ Adds a test by porting the Windows client.attach test to Linux.
  Disables the mprotect syscall due to weird failures which need to be
  examined.
  Further tests of blocking syscalls and -skip_syscall are needed.

Re-enables the attach help message for drrun and the deployment docs.

Tested release build on Ubuntu20 where the .bss crash reproduced every
run and is now gone.

Tested "ctest --repeat-until-fail 100 -V -R client.attach" on Ubuntu20
and on a Debian-ish system: no failures.

Issue: #38, #5054
Fixes #5054
derekbruening added a commit that referenced this issue Nov 29, 2021
Fixes a number of issues with Linux attach:

+ Set xdi to zero for x86 _start relocation of libdynamorio.

+ Implement remote memset for .bss zeroing in elf_loader_map_phdrs(),
  fixing a crash in some builds such as Ubuntu20 release build.

+ Don't kill target if attach fails.

+ Fix crash if no pid passed.

+ Adds a useful error message on failure to look at ptrace permissions.

+ Adds a warning to use -skip_syscall if attach hangs.

+ Adds a test by porting the Windows client.attach test to Linux.
  Disables the mprotect syscall due to weird failures which need to be
  examined.
  Further tests of blocking syscalls and -skip_syscall are needed.

Re-enables the attach help message for drrun and the deployment docs.

Tested release build on Ubuntu20 where the .bss crash reproduced every
run and is now gone.

Tested "ctest --repeat-until-fail 100 -V -R client.attach" on Ubuntu20
and on a Debian-ish system: no failures.

Issue: #38, #5054
Fixes #5054
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants