Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#6486 kernel tracing: Include BPF JIT code in kcore dump #6619

Merged
merged 6 commits into from
Feb 5, 2024

Conversation

abhinav92003
Copy link
Contributor

@abhinav92003 abhinav92003 commented Feb 1, 2024

Fixes drmemtrace kernel trace libipt post-processing failures caused by missing
instruction encodings for some kernel code execution captured using Intel-PT.

The root-cause seems to be that JIT code executed by the kernel, BPF code in
this case, does not have entries in /proc/modules. So, our kcore dump logic
did not include them. This fix looks for BPF related symbols in /proc/kallsyms
and includes them in the copied regions from /proc/kcore.

Note that BPF JIT symbols are not included in /proc/kallsyms by default. One
needs to set /proc/sys/net/core/bpf_jit_harden and
/proc/sys/net/core/bpf_jit_kallsyms appropriately (see
https://docs.kernel.org/admin-guide/sysctl/net.html#proc-sys-net-core-network-core-options
for more details). Added this suggestion to documentation. It may be better to
not automatically make this possibly-too-intrusive change to the user's
machine automatically in cmake. This is probably fine because the issue is not
widespread (not reproduced on public Linux distributions).

Tested PT tracing related tests locally on a machine that supports Intel-PT:

$ ctest -R 'drpttracer|drcacheoff.kernel'
...
    Start 213: code_api|client.drpttracer_SUDO-test
[sudo] password for sharmaabhinav: 
1/5 Test #213: code_api|client.drpttracer_SUDO-test .....................   Passed    4.29 sec
    Start 412: code_api|tool.drcacheoff.kernel.simple_SUDO
2/5 Test #412: code_api|tool.drcacheoff.kernel.simple_SUDO ..............   Passed    4.66 sec
    Start 413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO
3/5 Test #413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO ..........   Passed    4.71 sec
    Start 414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO
4/5 Test #414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO .........   Passed    4.59 sec
    Start 415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO
5/5 Test #415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO ...   Passed    5.75 sec

100% tests passed, 0 tests failed out of 5

Unfortunately the decode errors do not go away completely even after this fix,
but they have become very less frequent now (tool.kernel.simple in release build
failed after 40 successful runs after this fix, which failed every run before).

Issue: #6486

Fixes missing instruction encodings for some kernel code execution captured
using Intel-PT.

The root-cause seemed to be that JIT code executed by the kernel, eBPF code in
this case, does not have entries in /proc/kallsyms, so our kcore dump logic
did not include them. This fix looks for BPF related symbols in /proc/kallsyms
and includes them in the copied regions from /proc/kcore.

Note that BPF JIT symbols are not included in /proc/kallsyms by default. One
needs to set /proc/sys/net/core/bpf_jit_harden and
/proc/sys/net/core/bpf_jit_kallsyms appropriately (see
https://docs.kernel.org/admin-guide/sysctl/net.html#proc-sys-net-core-network-core-options
for more details). Added this suggestion to documentation.

Tested PT tracing related tests locally on a machine that supports Intel-PT:

$ ctest -R 'drpttracer|drcacheoff.kernel'
...
    Start 213: code_api|client.drpttracer_SUDO-test
[sudo] password for sharmaabhinav:
1/5 Test #213: code_api|client.drpttracer_SUDO-test .....................   Passed    4.29 sec
    Start 412: code_api|tool.drcacheoff.kernel.simple_SUDO
2/5 Test #412: code_api|tool.drcacheoff.kernel.simple_SUDO ..............   Passed    4.66 sec
    Start 413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO
3/5 Test #413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO ..........   Passed    4.71 sec
    Start 414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO
4/5 Test #414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO .........   Passed    4.59 sec
    Start 415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO
5/5 Test #415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO ...   Passed    5.75 sec

100% tests passed, 0 tests failed out of 5

Issue: #6486
Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the address set to module region code.

clients/drcachesim/tracer/kcore_copy.cpp Outdated Show resolved Hide resolved
clients/drcachesim/tracer/kcore_copy.cpp Outdated Show resolved Hide resolved
clients/drcachesim/tracer/kcore_copy.cpp Outdated Show resolved Hide resolved
clients/drcachesim/tracer/kcore_copy.cpp Show resolved Hide resolved
clients/drcachesim/tracer/kcore_copy.cpp Outdated Show resolved Hide resolved
@abhinav92003
Copy link
Contributor Author

On doing more stress testing: unfortunately the decode errors do not go away completely even after this fix, but they have
become very less frequent now (tool.kernel.simple in release build failed after 40 successful runs). The address indicated in the error message, I couldn't find it in /proc/kallsyms like before.

@abhinav92003 abhinav92003 merged commit 5cb0c61 into master Feb 5, 2024
15 checks passed
@abhinav92003 abhinav92003 deleted the i6486-dump-bpf-kcore-regions branch February 5, 2024 15:17
xdje42 pushed a commit that referenced this pull request Feb 6, 2024
Fixes drmemtrace kernel trace libipt post-processing failures caused by
missing instruction encodings for some kernel code execution captured
using Intel-PT.

The root-cause seems to be that JIT code executed by the kernel, BPF
code in this case, does not have entries in `/proc/modules`. So, our
kcore dump logic did not include them. This fix looks for BPF related
symbols in `/proc/kallsyms` and includes them in the copied regions from
`/proc/kcore`.

Note that BPF JIT symbols are not included in `/proc/kallsyms` by
default. One needs to set `/proc/sys/net/core/bpf_jit_harden` and
`/proc/sys/net/core/bpf_jit_kallsyms` appropriately (see
https://docs.kernel.org/admin-guide/sysctl/net.html#proc-sys-net-core-network-core-options
for more details). Added this suggestion to documentation. It may be
better to not automatically make this possibly-too-intrusive change to
the user's machine in cmake. This is probably fine because the issue
is not widespread (not reproduced on public Linux distributions).

Tested PT tracing related tests locally on a machine that supports
Intel-PT:

```
$ ctest -R 'drpttracer|drcacheoff.kernel'
...
    Start 213: code_api|client.drpttracer_SUDO-test
[sudo] password for sharmaabhinav: 
1/5 Test #213: code_api|client.drpttracer_SUDO-test .....................   Passed    4.29 sec
    Start 412: code_api|tool.drcacheoff.kernel.simple_SUDO
2/5 Test #412: code_api|tool.drcacheoff.kernel.simple_SUDO ..............   Passed    4.66 sec
    Start 413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO
3/5 Test #413: code_api|tool.drcacheoff.kernel.opcode-mix_SUDO ..........   Passed    4.71 sec
    Start 414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO
4/5 Test #414: code_api|tool.drcacheoff.kernel.syscall-mix_SUDO .........   Passed    4.59 sec
    Start 415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO
5/5 Test #415: code_api|tool.drcacheoff.kernel.invariant-checker_SUDO ...   Passed    5.75 sec

100% tests passed, 0 tests failed out of 5
```

Unfortunately the decode errors do not go away completely even after
this fix, but they have become very less frequent now (tool.kernel.simple
in release build failed after 40 successful runs with this fix, which failed
every run before).

Issue: #6486
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants