-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add fallback to no kretprobes #582
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #582 +/- ##
==========================================
+ Coverage 79.26% 79.64% +0.37%
==========================================
Files 70 70
Lines 5907 5909 +2
==========================================
+ Hits 4682 4706 +24
+ Misses 1001 978 -23
- Partials 224 225 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
pkg/internal/ebpf/instrumenter.go
Outdated
// We should make RetprobeMaxActive configurable when we make the num concurrent requests configurable | ||
// By default it sets itself to at least 10, but at most 2 * num cpus, which is low for something like tcp_recvmsg. Max value is 4096. | ||
// https://elixir.bootlin.com/linux/v5.19/source/kernel/kprobes.c#L2202 | ||
kp, err := link.Kretprobe(funcName, programs.End, &link.KprobeOptions{RetprobeMaxActive: 1024}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where other eBPF projects set this is either 1024 or the max. Let's start with 1024 and see how this works.
Thanks Mario! |
We used to track kprobe events by using a socket filter, which was difficult to work with for the cases we cared about. It was not easy to find matching PIDs or read more than 250 bytes. We replaced the socket filter by kprobes/kretprobes on tcp_sendmsg and tcp_recvmsg.
However, as per the bug report discussion in #573, essentially kprobes can be discarded if the events are taking too long, which makes the kprobe on tcp_recvmsg a prime candidate in high network workload scenarios.
To fix this issue, with this PR I'm bringing back the socket filter, targeted only for creating a fallback information for the HTTP requests. It doesn't do the full work as before, it only captures some essential information to be used if the kretprobe on tcp_recvmsg was eliminated.
I should also mention that we have other kretprobes, but they are on sock_alloc and accept4, those will typically be very fast and unlikely to be cancelled, unlike receiving of network buffers.
To test this scenario, I manually removed the code from the retprobe on tcp_recvmsg and tested with Apache2 to see if we still see the routes.