-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add arguments for sendmmsg and recvmmsg #2027
base: master
Are you sure you want to change the base?
Conversation
/cc @Andreagit97 @FedeDP |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Molter73 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Please double check driver/SCHEMA_VERSION file. See versioning. /hold |
Thanks Mauro! |
Hei, thank you! Before looking at this PR I was thinking of a tail call approach, so you start with a first bpf program that sends the first message and then you tail call until you have no more messages to send (the limit is 32 if I recall well). This approach should work well in the modern ebpf (have you already considered it?) but I see the issue that it could cause with the old drivers... we have no easy ways to create multiple headers and send multiple events inside the same filler, all the architecture is built on the assumption of sending just one event per filler... I understand that creating all these helpers just for two syscalls could be an overkill. Send just the first message/tuple could be a hypothesis for the old drivers, even if I don't like it so much. At least in the kernel module, I would like to send all the messages/tuples in some way. I will try to think if we can do something different at least for the kernel module, but I'm not sure about the outcome... |
No worries, I have a bunch of CI failures to go through before this can go in anyways and I'm also not super happy about the implementation so we can discuss as long as it takes.
That was my first thought too, I discarded it because:
Yeah, I'm not super happy about this implementation either, but as you mention, it would take a pretty big effort to be able to send more than one message. I'm willing to do the effort, just need some pointers on where to start. Also, I tried adding kernel tests that grab more than one event and it looked like the call to |
c9cd8b1
to
0616dfb
Compare
Looks like the scap tests are failing because a recvmmsg event doesn't have arguments in the file and it's not able to be parsed correctly, not sure how to fix that, do I need to recreate the scap file altogether? |
Yep probably there is no reason why we should choose the tail call approach in the end, the loop seems to do its job 🤔
Probably it would be enough to check the offset inside the
I'm not sure it is worth reworking all the architecture for just these 2 new events... if we find a cool way to handle it with add only code ok, but otherwise, I have some doubts...
Uhm the framework should be able to retrieve more than one event, see here an example
To be honest, the best thing to do in the framework would be to scrape all the events in the buffers, save them in a cpp vector, and then search the events we need in the vector. This would provide us with great debug capabilities since we can print all the events we have seen from a specific thread for example and we could easily understand why our event is not there, maybe just a wrong event type |
Usually, there are no issues when we only add parameters to an event in the event table. this is a particular case because the previous number of parameters was 0... we should try to understand in which method we are facing this exception
|
Alright, I refrain from doing anything else in this regard until we have some time to think about it.
That's weird, I'll give it a few more tries when I get a minute.
I'll try to get a stack trace, I can't remember the exact point it failed at off the top of my head. |
Maybe I add an idea for the kernel module. In //----------------------------------------------------------------------------- New code
if(event_pair->exit_event_type == PPME_SOCKET_SENDMMSG_X ||
event_pair->exit_event_type == PPME_SOCKET_RECVMMSG_X)
{
for(...)
{
// we need to add a custom logic inside `record_event_all_consumers` for these syscalls to understand which tuple
// and message we need to send.
record_event_all_consumers(event_pair->exit_event_type, event_pair->flags, &event_data, KMOD_PROG_SYS_EXIT);
}
return;
}
//-----------------------------------------------------------------------------
if (event_pair->flags & UF_USED)
record_event_all_consumers(event_pair->exit_event_type, event_pair->flags, &event_data, KMOD_PROG_SYS_EXIT);
else
record_event_all_consumers(PPME_GENERIC_X, UF_ALWAYS_DROP, &event_data, KMOD_PROG_SYS_EXIT); we could call the For what concern the legacy probe I had no great ideas, probably it's ok to send just the first message and not the others |
Neat idea andre! |
Alright then, I'll try to get the implementation for kmod done when I get a minute. I also still need to go through the last errors in CI. |
So here is the stacktrace for the scap file unit test that is failing:
The interesting part is in these 2 lines though:
Looks like we are trying to get the file descriptor here and, because it is not set in the scap file, an out of range error is thrown here. Best I can think of is catching the exception for recvmmsg and sendmmsg in parsers.cpp, and re-throw for other syscalls, or check before accessing for those syscalls, which kinda defeats the purpose of |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2027 +/- ##
=======================================
Coverage 73.69% 73.70%
=======================================
Files 253 253
Lines 31915 31929 +14
Branches 5638 5632 -6
=======================================
+ Hits 23521 23533 +12
+ Misses 8394 8376 -18
- Partials 0 20 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
2f69cd7
to
f96a41b
Compare
Hey @FedeDP! I haven't had time to do the implementation proposed to support multiple events in the kernel module, I might come back to it at a later point, that PR has become way bigger than I intended it to be and would appreciate a review (even if it's a quick one) before I keep adding to it, hope that's ok. I also expect CI to not be super happy, so I'll look into any issues I might stumble with and would appreciate help understanding if these changes require a major or minor version bump to the schema and/api driver versions. |
f96a41b
to
80aedd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR LGTM; considering that we are missing the kmod multi-event implementation of course :)
But yes, i think this is in a good shape.
driver/event_table.c
Outdated
@@ -326,8 +334,20 @@ const struct ppm_event_info g_event_info[] = { | |||
{"data", PT_BYTEBUF, PF_NA}, | |||
{"tuple", PT_SOCKTUPLE, PF_NA}, | |||
{"msgcontrol", PT_BYTEBUF, PF_NA}}}, | |||
[PPME_SOCKET_RECVMMSG_E] = {"recvmmsg", EC_IO_READ | EC_SYSCALL, EF_NONE, 0}, | |||
[PPME_SOCKET_RECVMMSG_X] = {"recvmmsg", EC_IO_READ | EC_SYSCALL, EF_NONE, 0}, | |||
[PPME_SOCKET_RECVMMSG_E] = {"recvmmsg", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why we send fd
in the recvmmsg_e event? Can't we do like sendmmsg_e
and sending 0 parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just that I implemented both using separate enter and exit events, but only had an issue on sendmmsg because it was sending the socket tuple on the enter side, so I decided to move everything on that one and didn't bother with recvmmsg. I'll move it to the exit side so it's consistent.
int BPF_PROG(recvmmsg_x, struct pt_regs *regs, long ret) { | ||
struct ringbuf_struct ringbuf; | ||
if(!ringbuf__reserve_space(&ringbuf, ctx, RECVMMSG_X_SIZE, PPME_SOCKET_RECVMMSG_X)) { | ||
typedef struct recvmmsg_data_s { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typedef struct recvmmsg_data_s { | |
typedef struct { |
Less confusing IMHO :)
userspace/libsinsp/parsers.cpp
Outdated
@@ -666,8 +669,11 @@ bool sinsp_parser::reset(sinsp_evt *evt) { | |||
// | |||
int fd_location = get_fd_location(etype); | |||
ASSERT(evt->get_param_info(fd_location)->type == PT_FD); | |||
evt->get_tinfo()->m_lastevent_fd = evt->get_param(fd_location)->as<int64_t>(); | |||
evt->set_fd_info(evt->get_tinfo()->get_fd(evt->get_tinfo()->m_lastevent_fd)); | |||
if((etype != PPME_SOCKET_SENDMMSG_E && etype != PPME_SOCKET_RECVMMSG_E) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we remove fd
param from PPME_SOCKET_RECVMMSG_E
(and remove the EF_USES_FD
flag too of course) we can skip this check.
Oh and indeed the CI is super happy :) BTW i just started kernel tests against this PR: https://github.com/falcosecurity/libs/actions/runs/11380128079 |
ARM64:
AMD64:
New failure spotted for modern_ebpf for amazonlinux 5.10 and fedora 5.8 :/
Err:
|
Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
Due to limitations with the verifier, it won't be possible to iterate over all messages, so the implementation is best effort and only the first message is actually processed. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
The current implementation is not complete, only the first message is processed. In order to allow for multiple messages to be processed the kmod needs to allow for multiple headers to be added to the ringbuffer from the filler. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
The added fields were added in newer kernels and can be used to check for access of some newer helpers. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>
56a540f
to
60f9fd6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i really like the implementation, thank you!
* @param auxmap pointer to the auxmap in which we have already written the entire event. | ||
* @param ctx BPF prog context | ||
*/ | ||
static __always_inline void auxmap__submit_event(struct auxiliary_map *auxmap, void *ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch should no longer be necessary thanks to this #2150. We don't call anymore the drop
filler here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rebase and remove all this once that PR is merged.
auxmap__store_s64_param(auxmap, ret); | ||
|
||
/* Parameter 2: fd (type: PT_FD) */ | ||
auxmap__store_empty_param(auxmap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we can probably send the first parameter of the syscall sockfd
int sendmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen,
int flags);
So something like
int32_t socket_fd = (int32_t)args[0];
auxmap__store_s64_param(auxmap, (int64_t)socket_fd);
|
||
SEC("tp_btf/sys_exit") | ||
int BPF_PROG(sendmmsg_x, struct pt_regs *regs, long ret) { | ||
if(ret < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if ret==0
is possible but i would add it
if(ret < 0) { | |
if(ret <= 0) { |
uint32_t nr_loops = ret < 1024 ? ret : 1024; | ||
bpf_loop(nr_loops, handle_exit, &data, 0); | ||
return 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use an if/else here, if the verifier likes it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer if conditions that return early to not have else branches, I think it's more readable, but W/E, I'll change it.
|
||
static long handle_exit(uint32_t index, void *ctx) { | ||
sendmmsg_exit_t *data = (sendmmsg_exit_t *)ctx; | ||
struct mmsghdr mmh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct mmsghdr mmh; | |
struct mmsghdr mmh = {}; |
@@ -3901,7 +3929,9 @@ void sinsp_parser::parse_rw_exit(sinsp_evt *evt) { | |||
// If the fd still doesn't contain tuple info (because the socket is a datagram one | |||
// or because some event was lost), add it here. | |||
// | |||
if(!retrieve_enter_event(enter_evt, evt)) { | |||
if(etype == PPME_SOCKET_SENDMMSG_X) { | |||
enter_evt = evt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would add a comment saying that the tuple is in the exit so we use this workaround
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be interesting to test also the case of multiple messages received. BTW thank you for all tests i know that is quite noisy to write tests for networking syscalls... I recently introduced new functions like client_to_server_ipv4_tcp
client_to_server_ipv4_udp
maybe they could be useful
CHECK_RES(res); | ||
|
||
/* Parameter 2: fd (type: PT_FD) */ | ||
res = bpf_push_empty_param(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like in the other drivers we probably want to push the fd
of the syscall here
val = bpf_syscall_get_argument(data, 1); | ||
if(bpf_probe_read_user(&mmh, sizeof(mmh), (void *)val)) | ||
return PPM_FAILURE_INVALID_USER_MEMORY; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the usual comments like /* Parameter 1: res (type: PT_ERRNO) */
just to understand what we are pushing
vlen = bpf_syscall_get_argument(data, 2); | ||
|
||
/* Retrieve the message header */ | ||
if(bpf_probe_read_user(&mmh, sizeof(mmh), (void *)(mmh_ptr))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so if mmh_ptr
we rely on the fact that mmh
is zeroed and we won't enter all the following branches, is it true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this question the first time I went through the review, sorry.
mmh_ptr
is read from the arguments, as such, I assume it will always be pointing to a valid position in memory because the syscall was successful here and we haven't returned control to userspace, so the memory must still be there. If, for whatever reason, the pointer was invalid or it was no longer readable due to a cache miss or such a situation, I would expect bpf_probe_read_user
to fail and the event to be dropped altogether, (the if condition will be true).
Am I missing something here?
For what concerns the verifier failures, using a vanilla kernel 5.8 I can see the following error on this branch 🤔
I used a default Ubuntu debug kernel not the Fedora one in our CI but probably the error is the same if you use virtme-ng |
I've seen this error on some older kernels before, but I couldn't figure out how to fix it, I think it's related to the fact that the callbacks used in |
Uhm if it is related to the legacy loop maybe we could replace it with a chain of tail calls + a per-cpu map to save the state. The maximum number of tail calls is 32 but in the end, this is the same limit we have with the legacy loop so we should be fine |
It's not the legacy loop per se, it's the fact that I could maybe abuse macros, but all the code from |
uhm got it, what about something like this? static always_inline long handle_exit__inline(uint32_t index, void *ctx) {}
static long handle_exit(uint32_t index, void *ctx) {
handle_exit__inline(index,ctx);
} from the legacy loop we can call directly the inlined one |
I'll try it |
Tried this and now I'm getting this error on a 6.11 kernel:
|
uhm so we moved the issue from 5.8 to 6.11... I need to check but this if(bpf_core_enum_value_exists(enum bpf_func_id, BPF_FUNC_loop)) {
uint32_t nr_loops = ret < 1024 ? ret : 1024;
bpf_loop(nr_loops, handle_exit, &data, 0);
return 0;
} else {
// Tail calls
} In this way if |
This is possible, the only catch is the allowed stack size is reduced to 256 bytes I think, but it will still not work on 5.8 because of the ld_imm64 instruction is not implemented, preventing the call from happening. I can try to test it, but implementing this same logic with tail calls sounds like a nightmare to me. |
What type of PR is this?
/kind feature
Any specific area of the project related to this PR?
/area driver-kmod
/area driver-bpf
/area driver-modern-bpf
/area tests
Does this PR require a change in the driver versions?
What this PR does / why we need it:
Add argument processing for the sendmmsg and recvmmsg syscalls.
These are quite tricky, they behave the same way sendmsg and recvmsg do, but allowing for multiple messages to be sent/received in a single syscall. This breaks some invariants on how Falco processes events, for instance, a sendmmsg call could send messages to multiple destinations in connectionless UDP, which we would need multiple socket tuples to represent in userspace. To work around this I proposed issuing multiple events from the kernel. This has lead me to do 2 implementations:
bpf_loop
if available, or does a best effort with a regular loop otherwise.The implementation is far from perfect and I'm not super happy with it, but it's the best I've managed to come up with so far. Any suggestions for improvement are welcome.
Which issue(s) this PR fixes:
Fixes #1636
Special notes for your reviewer:
Does this PR introduce a user-facing change?: