Add arguments for sendmmsg and recvmmsg #2027

Molter73 · 2024-08-27T16:22:57Z

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area API-version

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libscap

/area libpman

/area libsinsp

/area tests

Does this PR require a change in the driver versions?

/version driver-API-version-major

/version driver-API-version-minor

/version driver-API-version-patch

/version driver-SCHEMA-version-major

/version driver-SCHEMA-version-minor

/version driver-SCHEMA-version-patch

What this PR does / why we need it:

Add argument processing for the sendmmsg and recvmmsg syscalls.

These are quite tricky, they behave the same way sendmsg and recvmsg do, but allowing for multiple messages to be sent/received in a single syscall. This breaks some invariants on how Falco processes events, for instance, a sendmmsg call could send messages to multiple destinations in connectionless UDP, which we would need multiple socket tuples to represent in userspace. To work around this I proposed issuing multiple events from the kernel. This has lead me to do 2 implementations:

The kernel module and legacy eBPF probe only process the first message in the syscall. This is due to both verifier issues on the eBPF probe and the impossibility to issue multiple event headers in a single syscall.
The modern BPF probe processes all events by using bpf_loop if available, or does a best effort with a regular loop otherwise.

The implementation is far from perfect and I'm not super happy with it, but it's the best I've managed to come up with so far. Any suggestions for improvement are welcome.

Which issue(s) this PR fixes:

Fixes #1636

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Add arguments for sendmmsg and recvmmsg syscalls.

Molter73 · 2024-08-27T16:23:14Z

/cc @Andreagit97 @FedeDP

poiana · 2024-08-27T16:23:15Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Molter73

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Molter73]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

github-actions · 2024-08-27T16:23:47Z

Please double check driver/SCHEMA_VERSION file. See versioning.

/hold

FedeDP · 2024-08-28T06:56:31Z

Thanks Mauro!
Putting this in the TBD milestone because we won't have enough time to review/improve it in the next ~10days, and then we will have to tag libs and drivers for the next Falco release. But we will make sure to come at it as soon as possible.
/milestone TBD

Andreagit97 · 2024-08-28T08:22:07Z

Hei, thank you! Before looking at this PR I was thinking of a tail call approach, so you start with a first bpf program that sends the first message and then you tail call until you have no more messages to send (the limit is 32 if I recall well). This approach should work well in the modern ebpf (have you already considered it?) but I see the issue that it could cause with the old drivers... we have no easy ways to create multiple headers and send multiple events inside the same filler, all the architecture is built on the assumption of sending just one event per filler... I understand that creating all these helpers just for two syscalls could be an overkill. Send just the first message/tuple could be a hypothesis for the old drivers, even if I don't like it so much. At least in the kernel module, I would like to send all the messages/tuples in some way. I will try to think if we can do something different at least for the kernel module, but I'm not sure about the outcome...

Molter73 · 2024-08-28T08:44:18Z

Thanks Mauro! Putting this in the TBD milestone because we won't have enough time to review/improve it in the next ~10days, and then we will have to tag libs and drivers for the next Falco release. But we will make sure to come at it as soon as possible. /milestone TBD

No worries, I have a bunch of CI failures to go through before this can go in anyways and I'm also not super happy about the implementation so we can discuss as long as it takes.

I was thinking of a tail call approach, so you start with a first bpf program that sends the first message and then you tail call until you have no more messages to send (the limit is 32 if I recall well). This approach should work well in the modern ebpf (have you already considered it?)

That was my first thought too, I discarded it because:

The cap is 33 tail calls, so it would be around 32 after the initial tail call into the filler. This would not allow us to capture all events anyways in the modern bpf.
I couldn't find an easy way to keep track of what number of event we are actually on, that's probably a skill issue on my side, probably using a map or some part of the context would be enough.

but I see the issue that it could cause with the old drivers... we have no easy ways to create multiple headers and send multiple events inside the same filler, all the architecture is built on the assumption of sending just one event per filler... I understand that creating all these helpers just for two syscalls could be an overkill. Send just the first message/tuple could be a hypothesis for the old drivers, even if I don't like it so much. At least in the kernel module, I would like to send all the messages/tuples in some way. I will try to think if we can do something different at least for the kernel module, but I'm not sure about the outcome...

Yeah, I'm not super happy about this implementation either, but as you mention, it would take a pretty big effort to be able to send more than one message. I'm willing to do the effort, just need some pointers on where to start.

Also, I tried adding kernel tests that grab more than one event and it looked like the call to evt_test->assert_event_presence(); was not moving the event forward and instead was validating the previous one, or rather, it could tell there was another event in the buffer, but the internal values were not updated, any ideas why this could happen? Or is the framework meant to work with just one event? Anyways, I'll keep looking into it because I think it's a valuable test to have for these syscalls.

Molter73 · 2024-08-28T14:07:58Z

Looks like the scap tests are failing because a recvmmsg event doesn't have arguments in the file and it's not able to be parsed correctly, not sure how to fix that, do I need to recreate the scap file altogether?

Andreagit97 · 2024-08-30T09:49:54Z

That was my first thought too, I discarded it because:

The cap is 33 tail calls, so it would be around 32 after the initial tail call into the filler. This would not allow us to capture all events anyways in the modern bpf.

Yep probably there is no reason why we should choose the tail call approach in the end, the loop seems to do its job 🤔

I couldn't find an easy way to keep track of what number of event we are actually on, that's probably a skill issue on my side, probably using a map or some part of the context would be enough.

Probably it would be enough to check the offset inside the auxmap struct, more in detail the payload_pos to check how much data we have already pushed, or if we need other information, probably yes we need to use or some scratch space in the same map or add an additional map. But again probably is not necessary to go this way

Yeah, I'm not super happy about this implementation either, but as you mention, it would take a pretty big effort to be able to send more than one message. I'm willing to do the effort, just need some pointers on where to start.

I'm not sure it is worth reworking all the architecture for just these 2 new events... if we find a cool way to handle it with add only code ok, but otherwise, I have some doubts...

Also, I tried adding kernel tests that grab more than one event and it looked like the call to evt_test->assert_event_presence(); was not moving the event forward and instead was validating the previous one, or rather, it could tell there was another event in the buffer, but the internal values were not updated, any ideas why this could happen? Or is the framework meant to work with just one event? Anyways, I'll keep looking into it because I think it's a valuable test to have for these syscalls.

Uhm the framework should be able to retrieve more than one event, see here an example

libs/test/drivers/test_suites/actions_suite/ring_buffer.cpp

Line 33 in 5ed00b2

evt_test->assert_event_presence(CURRENT_PID, PPME_SYSCALL_CLOSE_E);

. If the events are in order we should be able to retrieve them...

To be honest, the best thing to do in the framework would be to scrape all the events in the buffers, save them in a cpp vector, and then search the events we need in the vector. This would provide us with great debug capabilities since we can print all the events we have seen from a specific thread for example and we could easily understand why our event is not there, maybe just a wrong event type

Andreagit97 · 2024-08-30T09:56:07Z

Usually, there are no issues when we only add parameters to an event in the event table. this is a particular case because the previous number of parameters was 0... we should try to understand in which method we are facing this exception

[ RUN      ] scap_file_kexec_arm64.tail_lineage
Trying to open the right engine!
unknown file: Failure
C++ exception with description "vector::_M_range_check: __n (which is 0) >= this->size() (which is 0)" thrown in the test body.
Trying to open the right engine!
[  FAILED  ] scap_file_kexec_arm64.tail_lineage (28 ms)

Molter73 · 2024-08-30T10:07:25Z

I'm not sure it is worth reworking all the architecture for just these 2 new events... if we find a cool way to handle it with add only code ok, but otherwise, I have some doubts...

Alright, I refrain from doing anything else in this regard until we have some time to think about it.

Uhm the framework should be able to retrieve more than one event, see here an example

That's weird, I'll give it a few more tries when I get a minute.

Usually, there are no issues when we only add parameters to an event in the event table. this is a particular case because the previous number of parameters was 0... we should try to understand in which method we are facing this exception

I'll try to get a stack trace, I can't remember the exact point it failed at off the top of my head.

Andreagit97 · 2024-09-02T08:58:31Z

Maybe I add an idea for the kernel module. In TRACEPOINT_PROBE(syscall_exit_probe, struct pt_regs *regs, long ret) we can add the following code:

        //----------------------------------------------------------------------------- New code
	if(event_pair->exit_event_type == PPME_SOCKET_SENDMMSG_X || 
		event_pair->exit_event_type == PPME_SOCKET_RECVMMSG_X)
	{
		for(...)
		{
			// we need to add a custom logic inside `record_event_all_consumers` for these syscalls to understand which tuple
			// and message we need to send.
			record_event_all_consumers(event_pair->exit_event_type, event_pair->flags, &event_data, KMOD_PROG_SYS_EXIT);
		}
                return;
	}
        //-----------------------------------------------------------------------------
	if (event_pair->flags & UF_USED)
		record_event_all_consumers(event_pair->exit_event_type, event_pair->flags, &event_data, KMOD_PROG_SYS_EXIT);
	else
		record_event_all_consumers(PPME_GENERIC_X, UF_ALWAYS_DROP, &event_data, KMOD_PROG_SYS_EXIT);

we could call the record_event_all_consumers in a for loop and each time we could send a different SENDMMSG event. We probably need to enrich the event_data struct with some information on how many messages we have already sent and what is the next message to send.

For what concern the legacy probe I had no great ideas, probably it's ok to send just the first message and not the others

FedeDP · 2024-09-02T09:00:54Z

Neat idea andre!
Re old bpf probe, i agree, don't spend too much time on it for now; perhaps we can fix it in the future if needed. Also, having a working solution that let's say kills (through the verifier) like 30% of our support matrix is not gonna work IMHO; if we find a solution, we should fine on that does not break too many verifiers; i think that is the hardest part.

Molter73 · 2024-09-02T11:18:55Z

Alright then, I'll try to get the implementation for kmod done when I get a minute. I also still need to go through the last errors in CI.

Molter73 · 2024-09-03T11:04:39Z

So here is the stacktrace for the scap file unit test that is failing:

#0  0x00007ffff5ea8664 in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff5e4fc4e in raise () from /lib64/libc.so.6
#2  0x00007ffff5e37902 in abort () from /lib64/libc.so.6
#3  0x00007ffff60a5da9 in __gnu_cxx::__verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6
#4  0x00007ffff60b7c4c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
#5  0x00007ffff60a5951 in std::terminate() () from /lib64/libstdc++.so.6
#6  0x00007ffff60b7ed8 in __cxa_throw () from /lib64/libstdc++.so.6
#7  0x00007ffff60a88df in std::__throw_out_of_range_fmt(char const*, ...) () from /lib64/libstdc++.so.6
#8  0x0000000000e753c9 in std::vector<sinsp_evt_param, std::allocator<sinsp_evt_param> >::_M_range_check (
    this=this@entry=0x7ffff4317a00, __n=__n@entry=0) at /usr/include/c++/14/bits/stl_vector.h:1160
#9  0x0000000000e6bde2 in std::vector<sinsp_evt_param, std::allocator<sinsp_evt_param> >::at (this=0x7ffff4317a00, __n=0)
    at /usr/include/c++/14/bits/stl_vector.h:1180
#10 sinsp_evt::get_param (this=this@entry=0x7ffff43179b8, id=id@entry=0)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/event.cpp:198
#11 0x0000000001174707 in sinsp_parser::reset (this=this@entry=0x515000009180, evt=evt@entry=0x7ffff43179b8)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/parsers.cpp:718
#12 0x000000000117ee53 in sinsp_parser::process_event (this=0x515000009180, evt=0x7ffff43179b8)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/parsers.cpp:90
#13 0x0000000001072167 in sinsp::next (this=0x7ffff4317840, puevt=0x7ffff3c77720)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/sinsp.cpp:1322
#14 0x0000000000c56032 in scap_file_test_helpers::capture_search_evt_by_type_and_tid (
    inspector=inspector@entry=0x7ffff4317840, type=type@entry=293, tid=tid@entry=107370)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/test/helpers/scap_file_helpers.cpp:45
#15 0x0000000000c5357d in scap_file_kexec_x86_tail_lineage_Test::TestBody (this=<optimized out>)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/test/scap_files/kexec_x86/kexec_x86.cpp:39
#16 0x00000000016c0064 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (
    object=object@entry=0x5020000082f0, method=&virtual testing::Test::TestBody(),
    location=location@entry=0x16cdb91 "the test body")
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:2657
#17 0x00000000016b409a in testing::Test::Run (this=this@entry=0x5020000082f0)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:2674
#18 0x00000000016b4260 in testing::TestInfo::Run (this=0x5120000304c0)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:2853
#19 0x00000000016b43d2 in testing::TestSuite::Run (this=0x512000030640)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:3012
#20 0x00000000016b7f5e in testing::internal::UnitTestImpl::RunAllTests (this=0x516000000680)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:5870
#21 0x00000000016c042f in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
    object=0x516000000680,
    method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x16b7c0e <testing::internal::UnitTestImpl::RunAllTests()>, location=location@entry=0x17c52d0 "auxiliary test code (environments or event listeners)")
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:2657
#22 0x00000000016b40e4 in testing::UnitTest::Run (this=0x2162540 <testing::UnitTest::GetInstance()::instance>)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest.cc:5444
#23 0x000000000118d5ec in RUN_ALL_TESTS ()
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/include/gtest/gtest.h:2293
#24 0x000000000118d5d5 in main (argc=<optimized out>, argv=0x7fffffffe2e8)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/build/googletest-src/googletest/src/gtest_main.cc:51

The interesting part is in these 2 lines though:

#10 sinsp_evt::get_param (this=this@entry=0x7ffff43179b8, id=id@entry=0)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/event.cpp:198
#11 0x0000000001174707 in sinsp_parser::reset (this=this@entry=0x515000009180, evt=evt@entry=0x7ffff43179b8)
    at /home/mmoltras/go/src/github.com/falcosecurity/libs/userspace/libsinsp/parsers.cpp:718

Looks like we are trying to get the file descriptor here and, because it is not set in the scap file, an out of range error is thrown here.

Best I can think of is catching the exception for recvmmsg and sendmmsg in parsers.cpp, and re-throw for other syscalls, or check before accessing for those syscalls, which kinda defeats the purpose of at, WDYT?

codecov · 2024-09-03T11:30:41Z

Codecov Report

Attention: Patch coverage is 78.94737% with 4 lines in your changes missing coverage. Please review.

Project coverage is 73.70%. Comparing base (d8d345a) to head (60f9fd6).
Report is 59 commits behind head on master.

Files with missing lines	Patch %	Lines
userspace/libsinsp/parsers.cpp	78.94%	4 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #2027   +/-   ##
=======================================
  Coverage   73.69%   73.70%           
=======================================
  Files         253      253           
  Lines       31915    31929   +14     
  Branches     5638     5632    -6     
=======================================
+ Hits        23521    23533   +12     
+ Misses       8394     8376   -18     
- Partials        0       20   +20

Flag	Coverage Δ
libsinsp	`73.70% <78.94%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Molter73 · 2024-10-03T11:25:22Z

Hey @FedeDP!

I haven't had time to do the implementation proposed to support multiple events in the kernel module, I might come back to it at a later point, that PR has become way bigger than I intended it to be and would appreciate a review (even if it's a quick one) before I keep adding to it, hope that's ok.

I also expect CI to not be super happy, so I'll look into any issues I might stumble with and would appreciate help understanding if these changes require a major or minor version bump to the schema and/api driver versions.

FedeDP

The PR LGTM; considering that we are missing the kmod multi-event implementation of course :)
But yes, i think this is in a good shape.

FedeDP · 2024-10-17T06:58:52Z

driver/event_table.c

@@ -326,8 +334,20 @@ const struct ppm_event_info g_event_info[] = {
                                    {"data", PT_BYTEBUF, PF_NA},
                                    {"tuple", PT_SOCKTUPLE, PF_NA},
                                    {"msgcontrol", PT_BYTEBUF, PF_NA}}},
-        [PPME_SOCKET_RECVMMSG_E] = {"recvmmsg", EC_IO_READ | EC_SYSCALL, EF_NONE, 0},
-        [PPME_SOCKET_RECVMMSG_X] = {"recvmmsg", EC_IO_READ | EC_SYSCALL, EF_NONE, 0},
+        [PPME_SOCKET_RECVMMSG_E] = {"recvmmsg",


Is there a reason why we send fd in the recvmmsg_e event? Can't we do like sendmmsg_e and sending 0 parameters?

Just that I implemented both using separate enter and exit events, but only had an issue on sendmmsg because it was sending the socket tuple on the enter side, so I decided to move everything on that one and didn't bother with recvmmsg. I'll move it to the exit side so it's consistent.

FedeDP · 2024-10-17T07:01:14Z

driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/recvmmsg.bpf.c

-int BPF_PROG(recvmmsg_x, struct pt_regs *regs, long ret) {
-	struct ringbuf_struct ringbuf;
-	if(!ringbuf__reserve_space(&ringbuf, ctx, RECVMMSG_X_SIZE, PPME_SOCKET_RECVMMSG_X)) {
+typedef struct recvmmsg_data_s {


Suggested change

typedef struct recvmmsg_data_s {

typedef struct {

Less confusing IMHO :)

FedeDP · 2024-10-17T07:05:46Z

userspace/libsinsp/parsers.cpp

@@ -666,8 +669,11 @@ bool sinsp_parser::reset(sinsp_evt *evt) {
 			//
 			int fd_location = get_fd_location(etype);
 			ASSERT(evt->get_param_info(fd_location)->type == PT_FD);
-			evt->get_tinfo()->m_lastevent_fd = evt->get_param(fd_location)->as<int64_t>();
-			evt->set_fd_info(evt->get_tinfo()->get_fd(evt->get_tinfo()->m_lastevent_fd));
+			if((etype != PPME_SOCKET_SENDMMSG_E && etype != PPME_SOCKET_RECVMMSG_E) ||


I think that if we remove fd param from PPME_SOCKET_RECVMMSG_E (and remove the EF_USES_FD flag too of course) we can skip this check.

FedeDP · 2024-10-17T07:13:52Z

Oh and indeed the CI is super happy :)

BTW i just started kernel tests against this PR: https://github.com/falcosecurity/libs/actions/runs/11380128079
Will post the results once it finishes.

FedeDP · 2024-10-17T07:30:02Z

ARM64:

KERNEL	CMAKE-CONFIGURE	KMOD BUILD	KMOD SCAP-OPEN	BPF-PROBE BUILD	BPF-PROBE SCAP-OPEN	MODERN-BPF SCAP-OPEN
amazonlinux2-5.4	🟢	🟢	🟢	🟢	🟢	🟡
amazonlinux2022-5.15	🟢	🟢	🟢	🟢	🟢	🟢
fedora-6.2	🟢	🟢	🟢	🟢	🟢	🟢
oraclelinux-4.14	🟢	🟢	🟢	🟡	🟡	🟡
oraclelinux-5.15	🟢	🟢	🟢	🟢	🟢	🟢
ubuntu-6.5	🟢	🟢	🟢	🟢	🟢	🟢

AMD64:

KERNEL	CMAKE-CONFIGURE	KMOD BUILD	KMOD SCAP-OPEN	BPF-PROBE BUILD	BPF-PROBE SCAP-OPEN	MODERN-BPF SCAP-OPEN
amazonlinux2-4.19	🟢	🟢	🟢	🟢	🟢	🟡
amazonlinux2-5.10	🟢	🟢	🟢	🟢	🟢	❌
amazonlinux2-5.15	🟢	🟢	🟢	🟢	🟢	🟢
amazonlinux2-5.4	🟢	🟢	🟢	🟢	🟢	🟡
amazonlinux2022-5.15	🟢	🟢	🟢	🟢	🟢	🟢
amazonlinux2023-6.1	🟢	🟢	🟢	🟢	🟢	🟢
archlinux-6.0	🟢	🟢	🟢	🟢	🟢	🟢
archlinux-6.7	🟢	🟢	🟢	🟢	🟢	🟢
centos-3.10	🟢	🟢	🟢	🟡	🟡	🟡
centos-4.18	🟢	🟢	🟢	🟢	🟢	🟢
centos-5.14	🟢	🟢	🟢	🟢	🟢	🟢
fedora-5.17	🟢	🟢	🟢	🟢	🟢	🟢
fedora-5.8	🟢	🟢	🟢	🟢	🟢	❌
fedora-6.2	🟢	🟢	🟢	🟢	🟢	🟢
oraclelinux-3.10	🟢	🟢	🟢	🟡	🟡	🟡
oraclelinux-4.14	🟢	🟢	🟢	🟢	🟢	🟡
oraclelinux-5.15	🟢	🟢	🟢	🟢	🟢	🟢
oraclelinux-5.4	🟢	🟢	🟢	🟢	🟢	🟡
ubuntu-4.15	🟢	🟢	🟢	🟢	🟢	🟡
ubuntu-5.8	🟢	🟢	🟢	🟢	🟢	🟡
ubuntu-6.5	🟢	🟢	🟢	🟢	🟢	🟢

New failure spotted for modern_ebpf for amazonlinux 5.10 and fedora 5.8 :/
Not many info though:
Msg:

non-zero return code

Err:

 (22)

Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>

Due to limitations with the verifier, it won't be possible to iterate over all messages, so the implementation is best effort and only the first message is actually processed. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>

The current implementation is not complete, only the first message is processed. In order to allow for multiple messages to be processed the kmod needs to allow for multiple headers to be added to the ringbuffer from the filler. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>

The added fields were added in newer kernels and can be used to check for access of some newer helpers. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>

Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>

Andreagit97

i really like the implementation, thank you!

Andreagit97 · 2024-11-07T13:53:42Z

driver/modern_bpf/helpers/store/auxmap_store_params.h

+ * @param auxmap pointer to the auxmap in which we have already written the entire event.
+ * @param ctx BPF prog context
+ */
+static __always_inline void auxmap__submit_event(struct auxiliary_map *auxmap, void *ctx) {


This patch should no longer be necessary thanks to this #2150. We don't call anymore the drop filler here

I'll rebase and remove all this once that PR is merged.

Andreagit97 · 2024-11-07T14:01:16Z

driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/sendmmsg.bpf.c

+		auxmap__store_s64_param(auxmap, ret);
+
+		/* Parameter 2: fd (type: PT_FD) */
+		auxmap__store_empty_param(auxmap);


here we can probably send the first parameter of the syscall sockfd

int sendmmsg(int sockfd, struct mmsghdr *msgvec, unsigned int vlen, int flags);

So something like

int32_t socket_fd = (int32_t)args[0]; auxmap__store_s64_param(auxmap, (int64_t)socket_fd);

Andreagit97 · 2024-11-07T14:03:46Z

driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/sendmmsg.bpf.c

+
+SEC("tp_btf/sys_exit")
+int BPF_PROG(sendmmsg_x, struct pt_regs *regs, long ret) {
+	if(ret < 0) {


not sure if ret==0 is possible but i would add it

Suggested change

if(ret < 0) {

if(ret <= 0) {

Andreagit97 · 2024-11-07T14:05:48Z

driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/sendmmsg.bpf.c

+		uint32_t nr_loops = ret < 1024 ? ret : 1024;
+		bpf_loop(nr_loops, handle_exit, &data, 0);
+		return 0;
+	}


I would use an if/else here, if the verifier likes it

I prefer if conditions that return early to not have else branches, I think it's more readable, but W/E, I'll change it.

Andreagit97 · 2024-11-07T14:20:38Z

driver/modern_bpf/programs/tail_called/events/syscall_dispatched_events/sendmmsg.bpf.c

+
+static long handle_exit(uint32_t index, void *ctx) {
+	sendmmsg_exit_t *data = (sendmmsg_exit_t *)ctx;
+	struct mmsghdr mmh;


Suggested change

struct mmsghdr mmh;

struct mmsghdr mmh = {};

Andreagit97 · 2024-11-07T14:57:23Z

userspace/libsinsp/parsers.cpp

@@ -3901,7 +3929,9 @@ void sinsp_parser::parse_rw_exit(sinsp_evt *evt) {
 				// If the fd still doesn't contain tuple info (because the socket is a datagram one
 				// or because some event was lost), add it here.
 				//
-				if(!retrieve_enter_event(enter_evt, evt)) {
+				if(etype == PPME_SOCKET_SENDMMSG_X) {
+					enter_evt = evt;


i would add a comment saying that the tuple is in the exit so we use this workaround

Andreagit97 · 2024-11-07T15:14:13Z

test/drivers/test_suites/syscall_exit_suite/recvmmsg_x.cpp

it would be interesting to test also the case of multiple messages received. BTW thank you for all tests i know that is quite noisy to write tests for networking syscalls... I recently introduced new functions like client_to_server_ipv4_tcp client_to_server_ipv4_udp maybe they could be useful

Andreagit97 · 2024-11-07T15:15:33Z

driver/bpf/fillers.h

+		CHECK_RES(res);
+
+		/* Parameter 2: fd (type: PT_FD) */
+		res = bpf_push_empty_param(data);


like in the other drivers we probably want to push the fd of the syscall here

Andreagit97 · 2024-11-07T15:19:36Z

driver/bpf/fillers.h

+	val = bpf_syscall_get_argument(data, 1);
+	if(bpf_probe_read_user(&mmh, sizeof(mmh), (void *)val))
+		return PPM_FAILURE_INVALID_USER_MEMORY;
+


can we use the usual comments like /* Parameter 1: res (type: PT_ERRNO) */ just to understand what we are pushing

Andreagit97 · 2024-11-07T15:26:02Z

driver/bpf/fillers.h

+	vlen = bpf_syscall_get_argument(data, 2);
+
+	/* Retrieve the message header */
+	if(bpf_probe_read_user(&mmh, sizeof(mmh), (void *)(mmh_ptr))) {


so if mmh_ptr we rely on the fact that mmh is zeroed and we won't enter all the following branches, is it true?

I missed this question the first time I went through the review, sorry.

mmh_ptr is read from the arguments, as such, I assume it will always be pointing to a valid position in memory because the syscall was successful here and we haven't returned control to userspace, so the memory must still be there. If, for whatever reason, the pointer was invalid or it was no longer readable due to a cache miss or such a situation, I would expect bpf_probe_read_user to fail and the event to be dropped altogether, (the if condition will be true).

Am I missing something here?

Andreagit97 · 2024-11-07T15:42:03Z

For what concerns the verifier failures, using a vanilla kernel 5.8 I can see the following error on this branch 🤔

libbpf: prog 'recvmmsg_x': BPF program load failed: Invalid argument
libbpf: prog 'recvmmsg_x': -- BEGIN PROG LOAD LOG --
unrecognized bpf_ld_imm64 insn
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'recvmmsg_x': failed to load: -22
libbpf: failed to load object 'bpf_probe'
libbpf: failed to load BPF skeleton 'bpf_probe': -22
 (22)
libpman: failed to load BPF object (errno: 22 | message: Invalid argument)

I used a default Ubuntu debug kernel not the Fedora one in our CI but probably the error is the same if you use virtme-ng vng -r v5.8 is enough to obtain a repro

Molter73 · 2024-11-07T16:04:28Z

For what concerns the verifier failures, using a vanilla kernel 5.8 I can see the following error on this branch 🤔
libbpf: prog 'recvmmsg_x': BPF program load failed: Invalid argument
libbpf: prog 'recvmmsg_x': -- BEGIN PROG LOAD LOG --
unrecognized bpf_ld_imm64 insn
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'recvmmsg_x': failed to load: -22
libbpf: failed to load object 'bpf_probe'
libbpf: failed to load BPF skeleton 'bpf_probe': -22
 (22)
libpman: failed to load BPF object (errno: 22 | message: Invalid argument)
I used a default Ubuntu debug kernel not the Fedora one in our CI but probably the error is the same if you use virtme-ng vng -r v5.8 is enough to obtain a repro

I've seen this error on some older kernels before, but I couldn't figure out how to fix it, I think it's related to the fact that the callbacks used in bpf_loop are not inlined and the regular for loop attempts to make the call, except that wasn't possible in 5.8 due to that instruction not being implemented. I'm not really sure how we could work around this.

Andreagit97 · 2024-11-07T19:25:16Z

Uhm if it is related to the legacy loop maybe we could replace it with a chain of tail calls + a per-cpu map to save the state. The maximum number of tail calls is 32 but in the end, this is the same limit we have with the legacy loop so we should be fine

Molter73 · 2024-11-08T09:58:05Z

Uhm if it is related to the legacy loop maybe we could replace it with a chain of tail calls + a per-cpu map to save the state. The maximum number of tail calls is 32 but in the end, this is the same limit we have with the legacy loop so we should be fine

It's not the legacy loop per se, it's the fact that bpf_loop requires a callback, so the handle_exit function cannot be inlined. If we turn it into a tail call chain, but we still want to re-use the handle_exit method, the verifier will still complain. The alternative would be to completely duplicate the code, but that's just painful.

I could maybe abuse macros, but all the code from handle_exit into one and then use the macro on the legacy loop and handle_exit... But that would physically hurt me.

Andreagit97 · 2024-11-08T11:36:30Z

uhm got it, what about something like this?

static always_inline long handle_exit__inline(uint32_t index, void *ctx)  {}

static long handle_exit(uint32_t index, void *ctx) {
     handle_exit__inline(index,ctx);
}

from the legacy loop we can call directly the inlined one

Molter73 · 2024-11-08T16:52:32Z

uhm got it, what about something like this?
static always_inline long handle_exit__inline(uint32_t index, void *ctx)  {}

static long handle_exit(uint32_t index, void *ctx) {
     handle_exit__inline(index,ctx);
}
from the legacy loop we can call directly the inlined one

I'll try it

Molter73 · 2024-11-13T16:39:17Z

uhm got it, what about something like this?
static always_inline long handle_exit__inline(uint32_t index, void *ctx)  {}

static long handle_exit(uint32_t index, void *ctx) {
     handle_exit__inline(index,ctx);
}
from the legacy loop we can call directly the inlined one
I'll try it

Tried this and now I'm getting this error on a 6.11 kernel:

* Configure modern BPF probe tests!
* Using buffer dim: 8388608
libbpf: prog 'recvmmsg_x': BPF program load failed: Permission denied
libbpf: prog 'recvmmsg_x': -- BEGIN PROG LOAD LOG --
combined stack size of 2 calls is 656. Too large
processed 5830 insns (limit 1000000) max_states_per_insn 4 total_states 250 peak_states 250 mark_read 57
-- END PROG LOAD LOG --
libbpf: prog 'recvmmsg_x': failed to load: -13
libbpf: failed to load object 'bpf_probe'
libbpf: failed to load BPF skeleton 'bpf_probe': -13
libpman: failed to load BPF object (errno: 13 | message: Permission denied)Unable to open the engine:

Andreagit97 · 2024-11-15T14:47:24Z

uhm so we moved the issue from 5.8 to 6.11... I need to check but this combined stack size of 2 calls is 656. Too large sounds like the verifier thinks it's possible to have tail calls + a call to a static function not inline, have you tried with an if/else.

if(bpf_core_enum_value_exists(enum bpf_func_id, BPF_FUNC_loop)) {
		uint32_t nr_loops = ret < 1024 ? ret : 1024;
		bpf_loop(nr_loops, handle_exit, &data, 0);
		return 0;
} else {
   // Tail calls
}

In this way if BPF_FUNC_loop is defined the verifier should prune the else considering it dead code

Molter73 · 2024-11-18T11:40:42Z

it's possible to have tail calls + a call to a static function not inline

This is possible, the only catch is the allowed stack size is reduced to 256 bytes I think, but it will still not work on 5.8 because of the ld_imm64 instruction is not implemented, preventing the call from happening. I can try to test it, but implementing this same logic with tail calls sounds like a nightmare to me.

poiana added kind/feature New feature or request release-note dco-signoff: yes area/driver-kmod area/driver-bpf area/driver-modern-bpf area/tests labels Aug 27, 2024

poiana added the size/XXL label Aug 27, 2024

poiana added the approved label Aug 27, 2024

poiana requested review from Andreagit97, FedeDP and hbrueckner August 27, 2024 16:23

poiana added the do-not-merge/hold label Aug 27, 2024

poiana added this to the TBD milestone Aug 28, 2024

Molter73 force-pushed the add-mmsg-args branch from c9cd8b1 to 0616dfb Compare August 28, 2024 14:06

Molter73 force-pushed the add-mmsg-args branch from 2f69cd7 to f96a41b Compare October 3, 2024 11:21

Molter73 marked this pull request as ready for review October 3, 2024 11:22

poiana removed the do-not-merge/work-in-progress label Oct 3, 2024

poiana requested a review from mstemm October 3, 2024 11:22

Molter73 force-pushed the add-mmsg-args branch from f96a41b to 80aedd7 Compare October 3, 2024 13:19

FedeDP reviewed Oct 17, 2024

View reviewed changes

poiana added dco-signoff: no and removed dco-signoff: yes labels Oct 17, 2024

Molter73 added 6 commits October 17, 2024 13:21

feat(modern_bpf): add arguments to recvmmsg and sendmmsg

a5532b3

Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>

feat(tests): add tests for recvmmsg and sendmmsg

950d9eb

Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>

feat(bpf): add arguments to recvmmsg and sendmmsg

68c1fd2

Due to limitations with the verifier, it won't be possible to iterate over all messages, so the implementation is best effort and only the first message is actually processed. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>

chore(modern_bpf): add fields to enum bpf_func_id

ba208b1

The added fields were added in newer kernels and can be used to check for access of some newer helpers. Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>

cleanup(drivers): move all recvmmsg arguments to the exit event

60f9fd6

Signed-off-by: Mauro Ezequiel Moltrasio <[email protected]>

Molter73 force-pushed the add-mmsg-args branch from 56a540f to 60f9fd6 Compare October 17, 2024 11:22

poiana added dco-signoff: yes and removed dco-signoff: no labels Oct 17, 2024

Andreagit97 reviewed Nov 7, 2024

View reviewed changes

Add arguments for sendmmsg and recvmmsg #2027

Are you sure you want to change the base?

Add arguments for sendmmsg and recvmmsg #2027

Conversation

Molter73 commented Aug 27, 2024

Molter73 commented Aug 27, 2024

poiana commented Aug 27, 2024

github-actions bot commented Aug 27, 2024

FedeDP commented Aug 28, 2024

Andreagit97 commented Aug 28, 2024

Molter73 commented Aug 28, 2024

Molter73 commented Aug 28, 2024 • edited Loading

Andreagit97 commented Aug 30, 2024

Andreagit97 commented Aug 30, 2024

Molter73 commented Aug 30, 2024

Andreagit97 commented Sep 2, 2024

FedeDP commented Sep 2, 2024

Molter73 commented Sep 2, 2024

Molter73 commented Sep 3, 2024 • edited Loading

codecov bot commented Sep 3, 2024 • edited Loading

Codecov Report

Molter73 commented Oct 3, 2024

FedeDP left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FedeDP commented Oct 17, 2024

FedeDP commented Oct 17, 2024 • edited Loading

Andreagit97 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Andreagit97 commented Nov 7, 2024 • edited Loading

Molter73 commented Nov 7, 2024

Andreagit97 commented Nov 7, 2024

Molter73 commented Nov 8, 2024

Andreagit97 commented Nov 8, 2024

Molter73 commented Nov 8, 2024

Molter73 commented Nov 13, 2024

Andreagit97 commented Nov 15, 2024

Molter73 commented Nov 18, 2024

Molter73 commented Aug 28, 2024 •

edited

Loading

Molter73 commented Sep 3, 2024 •

edited

Loading

codecov bot commented Sep 3, 2024 •

edited

Loading

FedeDP commented Oct 17, 2024 •

edited

Loading

Andreagit97 commented Nov 7, 2024 •

edited

Loading