Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use kprobe for unreliable recvmsg return probe #1095

Merged
merged 2 commits into from
Aug 16, 2024

Conversation

grcevski
Copy link
Contributor

There are number of reasons why a kretprobe may not fire and we've recently had issues on older kernels where we don't see some return probes. This PR adds code to run the same logic we have on the kretprobe for tcp_recvmsg on a kprobe for tcp_cleanup_rbuf. The retprobe is still there in case we encounter custom kernel builds (like we've seen before with WSL2) where the tcp_cleanup_rbuf might be inlined.

Tested with 5.10 and 6.5.

@codecov-commenter
Copy link

codecov-commenter commented Aug 16, 2024

Codecov Report

Attention: Patch coverage is 42.85714% with 4 lines in your changes missing coverage. Please review.

Project coverage is 70.65%. Comparing base (c8f170d) to head (6e8358b).

Files Patch % Lines
pkg/internal/ebpf/httpfltr/bpf_debug_x86_bpfel.go 0.00% 1 Missing ⚠️
...g/internal/ebpf/httpfltr/bpf_tp_debug_x86_bpfel.go 0.00% 1 Missing ⚠️
pkg/internal/ebpf/httpfltr/bpf_tp_x86_bpfel.go 0.00% 1 Missing ⚠️
pkg/internal/ebpf/httpfltr/bpf_x86_bpfel.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1095      +/-   ##
==========================================
- Coverage   73.26%   70.65%   -2.61%     
==========================================
  Files         172      172              
  Lines       13066    13073       +7     
==========================================
- Hits         9573     9237     -336     
- Misses       2944     3272     +328     
- Partials      549      564      +15     
Flag Coverage Δ
integration-test ?
k8s-integration-test 52.84% <42.85%> (-0.01%) ⬇️
oats-test 33.50% <42.85%> (+0.05%) ⬆️
unittests 45.62% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
This looks really nice actually! Fingers crossed that this will solve those weird issues we've been seeing (and even if not, this approach appears to be way more robust, so definitely a win!)

@@ -502,6 +495,32 @@ int BPF_KRETPROBE(kretprobe_tcp_recvmsg, int copied_len) {
return 0;
}

SEC("kprobe/tcp_cleanup_rbuf")
int BPF_KPROBE(kprobe_tcp_cleanup_rbuf, struct sock *sk, int copied) {
u64 id = bpf_get_current_pid_tgid();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nitpick:

Suggested change
u64 id = bpf_get_current_pid_tgid();
const u64 id = bpf_get_current_pid_tgid();

If I could make a recommendation, I'd suggest we start using const by default on our codebase moving forward, and dropping it only when required. There's a rationale about const-correctness here - it's mostly C++ oriented but it also applies to C (to an extent, in particular to preventing logic errors, improving clarity and optimisations, although I don't know how the latter goes for eBPF). Another interesting example is Rust. It makes everything const by default (and provides the mut keyword to explicitly tag variables that are not const)

@grcevski grcevski merged commit fe36f90 into grafana:main Aug 16, 2024
6 checks passed
@grcevski grcevski deleted the backup_recvmsg branch August 16, 2024 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants