Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libipt decode fails due to missing mapped addresses #6486

Open
abhinav92003 opened this issue Dec 1, 2023 · 13 comments
Open

libipt decode fails due to missing mapped addresses #6486

abhinav92003 opened this issue Dec 1, 2023 · 13 comments
Assignees

Comments

@abhinav92003
Copy link
Contributor

I'm seeing errors like the following happen sporadically during decoding of Intel-PT kernel traces:

408:   *** postcmd failed (1): drpt2ir: [28430, IP:ffffffffc11dd000] get next
408:   instruction error: no memory mapped at this address

When it happens, it's always the same address that's unmapped. When I hard-coded that address (and another that was revealed when this error was fixed) to be copied in kcore_copy, the error went away. I couldn't find details of the address in /proc/modules or /proc/kallsyms but it is a part of the kcore code section (we copy only the memory that corresponds to the live kernel modules from /proc/modules). It's presumably some kernel code that's executing during system calls.

The errors don't happen every time but they seem to have started happening frequently enough. Not sure if it was some change in my machine's kernel that caused the unmapped instrs to be executed.

abhinav92003 added a commit that referenced this issue Dec 1, 2023
Adds encodings for kernel system call instructions to the trace in
raw2trace. Kernel system call traces are decoded using libipt which also
provides the instruction encodings. We add support to drir_t to write
these encodings to a new buffer which is re-used for all dynamic
instances of that instr even across multiple system call traces.

Fixes taken/not-taken detection for conditional branches in the syscall
trace.

Adds support in the syscall_mix tool to report the counts of each system
call's traces also. Adds sysnum to system call trace start and end
markers to achieve this.

Ran all Intel-PT tests locally:

```
$ ctest -VV -R 'SUDO'
...
The following tests passed:
	code_api|client.drpttracer_SUDO-test
	code_api|tool.drcachesim.phys_SUDO # not really PT. Just included because of ctest -R.
	code_api|tool.drcachesim.phys-threads_SUDO # not really PT. Just included because of ctest -R.
	code_api|tool.drcacheoff.phys_SUDO # not really PT. Just included because of ctest -R.
	code_api|tool.drcacheoff.kernel.simple_SUDO
	code_api|tool.drcacheoff.kernel.opcode-mix_SUDO
	code_api|tool.drcacheoff.kernel.syscall-mix_SUDO

100% tests passed, 0 tests failed out of 7

```

Found some flakiness due to #6486 in local runs of the kernel sudo
tests, which will be addressed separately.

Issue: #5505
@dolanzhao dolanzhao self-assigned this Dec 8, 2023
@dolanzhao
Copy link
Contributor

I have successfully tested the main branch on my local system and can confirm that it is functioning as expected.

@derekbruening
Copy link
Contributor

I hit this missing address error in every run in these 3 tests on my machine:

        408 - code_api|tool.drcacheoff.kernel.simple_SUDO (Failed)
        409 - code_api|tool.drcacheoff.kernel.opcode-mix_SUDO (Failed)
        410 - code_api|tool.drcacheoff.kernel.syscall-mix_SUDO (Failed)

It is a Debian-ish 6.5.6 kernel. Should be same as @abhinav92003 's machine.

@abhinav92003
Copy link
Contributor Author

FTR the following is the local workaround I'm using. It adds two addresses that showed up in error messages. They may be different on different machines of course.

$ git diff
diff --git a/clients/drcachesim/tracer/kcore_copy.cpp b/clients/drcachesim/tracer/kcore_copy.cpp
index 962ab73f7..1e2424636 100644
--- a/clients/drcachesim/tracer/kcore_copy.cpp
+++ b/clients/drcachesim/tracer/kcore_copy.cpp
@@ -370,6 +370,21 @@ kcore_copy_t::read_modules()
             last_module = module;
         }
     }
+    proc_module_t *module = (proc_module_t *)dr_global_alloc(sizeof(proc_module_t));
+    module->start = 0xffffffffc039b000;
+    module->end = 0xffffffffc039b000 + 0x10000;
+    module->next = nullptr;
+    kcore_code_segments_num_++;
+    last_module->next = module;
+    last_module = module;
+
+    module = (proc_module_t *)dr_global_alloc(sizeof(proc_module_t));
+    module->start = 0xffffffffc03777c8;
+    module->end = 0xffffffffc03777c8 + 0x10000;
+    module->next = nullptr;
+    kcore_code_segments_num_++;
+    last_module->next = module;
+    last_module = module;
     f.close();
     return true;
 }

@derekbruening
Copy link
Contributor

There are now 4 tests and all fail every time on my machine:

The following tests FAILED:
        413 - code_api|tool.drcacheoff.kernel.simple_SUDO (Failed)
        414 - code_api|tool.drcacheoff.kernel.opcode-mix_SUDO (Failed)
        415 - code_api|tool.drcacheoff.kernel.syscall-mix_SUDO (Failed)
        416 - code_api|tool.drcacheoff.kernel.invariant-checker_SUDO (Failed)

@abhinav92003
Copy link
Contributor Author

They're all because of the same underlying reason: PT raw traces containing some instr not in the range of any /proc/modules.

@abhinav92003
Copy link
Contributor Author

On the same system where the test failures reproduce, I tried using perf to trace the same test app (suite/tests/bin/simple_app), and it didn't fail (see https://perf.wiki.kernel.org/index.php/Perf_tools_support_for_Intel%C2%AE_Processor_Trace#Kernel-only_tracing for the detailed steps).

$ sudo perf record -a --kcore -e intel_pt/cyc,noretcomp/k  -- dynamorio/suite/tests/bin/simple_app
$ sudo chown -R <user> perf.data
$ perf script --insn-trace --xed &> perf_op.txt

I was able to find the address that failed in our tests in the perf output:

403:   *** postcmd failed (1): drpt2ir: [cd8, IP:ffffffffc0063064] get next
403:   instruction error: no memory mapped at this address
$ grep "ffffffffc0063064" perf_op.txt 
            perf  229533 [005] 104007.009349630:  ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms]) 		callq  0xffffffffc0063064
            perf  229533 [005] 104007.009349630:  ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring) 		nop %edi, %edx
       perf-exec  229534 [001] 104007.009526523:  ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms]) 		callq  0xffffffffc0063064
       perf-exec  229534 [001] 104007.009526523:  ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring) 		nop %edi, %edx
       perf-exec  229534 [001] 104007.009559435:  ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms]) 		callq  0xffffffffc0063064
       perf-exec  229534 [001] 104007.009559435:  ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring) 		nop %edi, %edx
      simple_app  229534 [001] 104007.010255712:  ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms]) 		callq  0xffffffffc0063064
      simple_app  229534 [001] 104007.010255712:  ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring) 		nop %edi, %edx
      simple_app  229534 [001] 104007.010305774:  ffffffffc4fe1057 nft_chain_nat_init+0x47 ([kernel.kallsyms]) 		callq  0xffffffffc0063064
      simple_app  229534 [001] 104007.010305774:  ffffffffc0063064 bpf_prog_f7765d2581983488_file_monitoring+0x0 (bpf_prog_f7765d2581983488_file_monitoring) 		nop %edi, %edx

So perf is doing it right. We need to see what we're missing.

@abhinav92003
Copy link
Contributor Author

Here's the documented logic used by perf for copying kcore: https://github.com/torvalds/linux/blob/6d0dc8559c847e2dcd66c5dd93dbab3d3d887ff5/tools/perf/util/symbol-elf.c#L2473.

Particularly,

 The kernel map starts at _stext or the lowest function symbol, and ends at _etext or the highest function symbol.

Our kcore_copy looks only at _stext and _etext though:

if (strcmp(name, "_stext") == 0) {

Based on the contents of my /proc/kallsyms, I see that if we also considered the address of the lowest and highest t,w,T,W symbols, the unmapped address reported by libipt would be covered.

@abhinav92003
Copy link
Contributor Author

Following up on our offline discussion: @dolanzhao Can you provide details of the kernel version you were able to reproduce this issue with? Do you have a fix ready that we can review and commit?

@dolanzhao
Copy link
Contributor

Can you provide details of the kernel version you were able to reproduce this issue with?

Yes. The kernel version is 6.2.0-39.

# uname -a
Linux dolan-ubuntu 6.2.0-39-generic #40~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Nov 16 10:53:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Do you have a fix ready that we can review and commit?

I have a draft patch, but it's not yet complete. I attempted to test this solution on Linux kernel version 6.5.
However, I encountered an issue where the std::ifstream (glibc 2.35) causes a crash, same as #5763. Due to this problem, I take some time to test the solution with a different version of the glibc(glibc 2.33 and glibc 2.36).

@abhinav92003
Copy link
Contributor Author

abhinav92003 commented Jan 22, 2024

Based on the contents of my /proc/kallsyms, I see that if we also considered the address of the lowest and highest t,w,T,W symbols, the unmapped address reported by libipt would be covered.

Looking at the perf logic more closely, perf's implementation for copying the kcore includes the missing addr because: when determining the start addr of the module map, it considers not only the addresses in /proc/modules but also the lowest module-related symbol in /proc/kallsyms1 2.

E.g., in this case, the missing IP is 0xffffffffc002bb08.

The kernel map _stext, _etext is 0xffffffffaf800000, 0xffffffffb0600000, which does not cover the addr.
lowest_module_addr, highest_module_addr from /proc/modules is 0xffffffffc0225000, 0xffffffffc5242000 which also doesn't cover the addr.

However, the lowest and highest module-related symbol in /proc/kallsyms (symbols that have the module name in brackets in their /proc/kallsyms entry) is 0xffffffffaf800000 0xffffffffc0225000, 0xffffffffc522b338; this range indeed covers the missing addr.

Our kcore_copy differs from perf also in that: we copy the specific ranges for each module4, instead of everything between the lowest and highest module symbols range3. IIRC this was intended as an optimization to reduce the size of the dumped kcore. Would probably need to fix this.

@dolanzhao Let me know if you have any comments on this. It may be easier for me to patch this issue because I can actually reproduce this error on my workstation. I gather from our last discussion that your previous comment actually refers to a different issue that affects newer hardware (fixed by #6552 by updating the libipt version we use), and that your system does not reproduce this particular issue.

Edit: striked out some incorrect observations. The missing symbol is actually not covered between the lowest and highest module-related function symbols from /proc/kallsyms. That was an incorrect observation on my part. The missing symbol is covered between the lowest and highest function symbols (as also noted in a comment above). But there seem to be more complexities as even perf does not copy this range.

@abhinav92003
Copy link
Contributor Author

As a side note: perf also has this logic to copy the "entry trampolines" that we don't do4. I didn't observe any failures in our tracing because of missing this, so will just note it in a code comment for now.

@abhinav92003
Copy link
Contributor Author

I tried using DR's dumped kcore with perf script by copying it to the perf.data dir of a perf-collected trace, and perf was still able to decode its trace. I also tried using perf's dumped kcore with our libipt decode on a DR trace (by copying perf's kcore to kernel.raw in the DR trace dir), and DR's libipt decode still failed at the same missing IP.

Also, just looking at perf's kcore copy logic, I couldn't find how it dumps the missing IP additionally. As noted in the comment above, the missing IP does lie between the lowest and highest function symbols in /proc/kallsyms; but perf does not dump that region if it finds stext and etext (which are indeed present in my /proc/kallsyms).

Also, perf's kcore does not show the missing IP (0xffffffffc01c5754) in it when I use readelf:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000001000 0xffffffffb4a00000 0x0000000000000000
                 0x0000000000e00000 0x0000000000e00000  RWE    0x1000
  LOAD           0x0000000000e01000 0xffffffffc0386000 0x0000000000000000
                 0x0000000004e8a000 0x0000000004e8a000  RWE    0x1000

Now I'm thinking: even though dumping that missing IP additionally in our kcore_copy helps workaround this issue, the real difference between perf and us may not be in the kcore copy logic.

When I grep for "bpf_prog_fb33d7816e42d685_file_monitoring_open" (the perf script output says this is the symbol at the missing IP), I could find it in the perf.data/data binary, but not in perf.data/kcore_dir. Maybe perf dumps more information in the trace, which helps it decode later.

Speculation: I'm reading that eBPF code may be JIT compiled (also the symbol above has that hex number in it which seems weird), and JIT code probably doesn't have symbols in /proc/kallsyms1.

I don't know how we can reliably identify and dump such JIT regions though.

@abhinav92003
Copy link
Contributor Author

I was able to find the symbol for the missing IP in /proc/kallsyms after relaxing /proc/sys/net/core/bpf_jit_harden. One possible workaround is that we special-case bpf, and copy the /proc/kcore regions that correspond to all "[bpf]"-related symbols in /proc/kallsyms. Still curious what perf does that allows it to get the BPF JIT code symbols/encodings in its trace.

I verified that /proc/modules hasn't changed even after relaxing bpf_jit_harden.

abhinav92003 added a commit that referenced this issue Feb 1, 2024
Fixes missing instruction encodings for some kernel code execution captured
using Intel-PT.

The root-cause seemed to be that JIT code executed by the kernel, eBPF code in
this case, does not have entries in /proc/kallsyms, so our kcore dump logic
did not include them. This fix looks for BPF related symbols in /proc/kallsyms
and includes them in the copied regions from /proc/kcore.

Note that BPF JIT symbols are not included in /proc/kallsyms by default. One
needs to set /proc/sys/net/core/bpf_jit_harden and
/proc/sys/net/core/bpf_jit_kallsyms appropriately (see
https://docs.kernel.org/admin-guide/sysctl/net.html#proc-sys-net-core-network-core-options
for more details). Added this suggestion to documentation.

Tested PT tracing related tests locally on a machine that supports Intel-PT:

$ ctest -R 'drpttracer|drcacheoff.kernel'
...
    Start 213: code_api|client.drpttracer_SUDO-test
[sudo] password for sharmaabhinav:
1/5 Test #213: code_api|client.drpttracer_SUDO-test .....................   Passed    4.29 sec
    Start 412: code_api|tool.drcacheoff.kernel.simple_SUDO
2/5 Test #412: code_api|tool.drcacheoff.kernel.simple_SUDO ..............   Passed    4.66 sec
    Start 413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO
3/5 Test #413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO ..........   Passed    4.71 sec
    Start 414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO
4/5 Test #414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO .........   Passed    4.59 sec
    Start 415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO
5/5 Test #415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO ...   Passed    5.75 sec

100% tests passed, 0 tests failed out of 5

Issue: #6486
@abhinav92003 abhinav92003 changed the title libipt decode sporadically fails due to missing mapped addresses libipt decode fails due to missing mapped addresses Feb 2, 2024
abhinav92003 added a commit that referenced this issue Feb 5, 2024
Fixes drmemtrace kernel trace libipt post-processing failures caused by
missing instruction encodings for some kernel code execution captured
using Intel-PT.

The root-cause seems to be that JIT code executed by the kernel, BPF
code in this case, does not have entries in `/proc/modules`. So, our
kcore dump logic did not include them. This fix looks for BPF related
symbols in `/proc/kallsyms` and includes them in the copied regions from
`/proc/kcore`.

Note that BPF JIT symbols are not included in `/proc/kallsyms` by
default. One needs to set `/proc/sys/net/core/bpf_jit_harden` and
`/proc/sys/net/core/bpf_jit_kallsyms` appropriately (see
https://docs.kernel.org/admin-guide/sysctl/net.html#proc-sys-net-core-network-core-options
for more details). Added this suggestion to documentation. It may be
better to not automatically make this possibly-too-intrusive change to
the user's machine in cmake. This is probably fine because the issue
is not widespread (not reproduced on public Linux distributions).

Tested PT tracing related tests locally on a machine that supports
Intel-PT:

```
$ ctest -R 'drpttracer|drcacheoff.kernel'
...
    Start 213: code_api|client.drpttracer_SUDO-test
[sudo] password for sharmaabhinav: 
1/5 Test #213: code_api|client.drpttracer_SUDO-test .....................   Passed    4.29 sec
    Start 412: code_api|tool.drcacheoff.kernel.simple_SUDO
2/5 Test #412: code_api|tool.drcacheoff.kernel.simple_SUDO ..............   Passed    4.66 sec
    Start 413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO
3/5 Test #413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO ..........   Passed    4.71 sec
    Start 414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO
4/5 Test #414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO .........   Passed    4.59 sec
    Start 415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO
5/5 Test #415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO ...   Passed    5.75 sec

100% tests passed, 0 tests failed out of 5
```

Unfortunately the decode errors do not go away completely even after
this fix, but they have become very less frequent now (tool.kernel.simple
in release build failed after 40 successful runs with this fix, which failed
every run before).

Issue: #6486
xdje42 pushed a commit that referenced this issue Feb 6, 2024
Fixes drmemtrace kernel trace libipt post-processing failures caused by
missing instruction encodings for some kernel code execution captured
using Intel-PT.

The root-cause seems to be that JIT code executed by the kernel, BPF
code in this case, does not have entries in `/proc/modules`. So, our
kcore dump logic did not include them. This fix looks for BPF related
symbols in `/proc/kallsyms` and includes them in the copied regions from
`/proc/kcore`.

Note that BPF JIT symbols are not included in `/proc/kallsyms` by
default. One needs to set `/proc/sys/net/core/bpf_jit_harden` and
`/proc/sys/net/core/bpf_jit_kallsyms` appropriately (see
https://docs.kernel.org/admin-guide/sysctl/net.html#proc-sys-net-core-network-core-options
for more details). Added this suggestion to documentation. It may be
better to not automatically make this possibly-too-intrusive change to
the user's machine in cmake. This is probably fine because the issue
is not widespread (not reproduced on public Linux distributions).

Tested PT tracing related tests locally on a machine that supports
Intel-PT:

```
$ ctest -R 'drpttracer|drcacheoff.kernel'
...
    Start 213: code_api|client.drpttracer_SUDO-test
[sudo] password for sharmaabhinav: 
1/5 Test #213: code_api|client.drpttracer_SUDO-test .....................   Passed    4.29 sec
    Start 412: code_api|tool.drcacheoff.kernel.simple_SUDO
2/5 Test #412: code_api|tool.drcacheoff.kernel.simple_SUDO ..............   Passed    4.66 sec
    Start 413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO
3/5 Test #413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO ..........   Passed    4.71 sec
    Start 414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO
4/5 Test #414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO .........   Passed    4.59 sec
    Start 415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO
5/5 Test #415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO ...   Passed    5.75 sec

100% tests passed, 0 tests failed out of 5
```

Unfortunately the decode errors do not go away completely even after
this fix, but they have become very less frequent now (tool.kernel.simple
in release build failed after 40 successful runs with this fix, which failed
every run before).

Issue: #6486
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants