systemd seccomp filters cause systemd-timesyncd core dumps on Kernel 6.11 when an app uses eBPF modules #1205

XZVB12 · 2024-10-04T20:32:14Z

Include the following information:

OpenSnitch version: 1.6.6
OS: Manjaro linux
Version: 24.1
Window Manager: KDE / Kwin
Kernel version: 6.11.0 - 6.11.1

To Reproduce
Describe in detail as much as you can what happened.

Steps to reproduce the behavior:

build eBPF modules,for example http://aur.archlinux.org/packages/opensnitch-ebpf-module
reboot
systemctl status systemd-timesyncd
See error

This issue appears on 6.11 kernel and does not affect older version.
A more detailed discussion: systemd/systemd#34615

Removing eBPF modules solves the problem.

gustavo-iniguez-goya · 2024-10-07T23:14:17Z

Hi @XZVB12 ,

I haven't analyzed nor try to reproduce this issue, but per similar cases we've had in the past, this is usually a bug in the other program (systemd-timesyncd in this case). We used to crash Transmission for example, or even the kernel, because there were some conditions not handled correctly (delayed connections, connections unexpectedly blocked/killed, ...)

I'll try to reproduce it, but before that some questions:

is this a "regular" kernel or a tuned kernel (liquorix, etc)?
do you have a rule to allow systemd-timesyncd?
could you change the LogLevel to DEBUG, and make systemd-timesyncd crash again? then copy /var/log/opensnitchd.log to /tmp , get the full-time of the coredump (ls -l --full-time /var/lib/coredump/...) and try to correlate the date of the core with the opensnitch logs, to see if we can obtain some hints of what was doing the daemon.

On the other hand, @Flat commented on systemd/systemd#34615 (comment):

I also notice it was giving me an issue with python running with systemd's SyscallFilters enabled and that is how I figured out what was causing it due to strace showing me uretprobe() = -1 ENOSYS (Function not implemented)

Could you post the output of ~ $ sudo opensnitchd -check-requirements? in order to see if the uprobes have been removed from the kernel 6.11 configuration.

Thank you!

XZVB12 · 2024-10-08T09:09:29Z

is this a "regular" kernel or a tuned kernel (liquorix, etc)?

Hi.Yes, this is "regular" Manjaro Linux kernel

do you have a rule to allow systemd-timesyncd?

Yes.

could you change the LogLevel to DEBUG, and make systemd-timesyncd crash again? then copy /var/log/opensnitchd.log to /tmp , get the full-time of the coredump (ls -l --full-time /var/lib/coredump/...) and try to correlate the date of the core with the opensnitch logs, to see if we can obtain some hints of what was doing the daemon.

I’ll try it later

Could you post the output of ~ $ sudo opensnitchd -check-requirements? in order to see if the uprobes have been removed from the kernel 6.11 configuration.

Checking system requirements for kernel version 6.11.2-2-MANJARO
------------------------------------------------------------------------------

Checking => CONFIG_KPROBES=y
Checking => CONFIG_KPROBES_ON_FTRACE=y
Checking => CONFIG_HAVE_KPROBES=y
Checking => CONFIG_HAVE_KPROBES_ON_FTRACE=y
Checking => CONFIG_KPROBE_EVENTS=y

* kprobes        ✔

Checking => CONFIG_UPROBES=y
Checking => CONFIG_UPROBE_EVENTS=y

* uprobes        ✔

Checking => CONFIG_FTRACE=y

* ftrace         ✔

Checking => CONFIG_HAVE_SYSCALL_TRACEPOINTS=y
Checking => CONFIG_FTRACE_SYSCALLS=y

* syscalls       ✔

Checking => CONFIG_NETFILTER_NETLINK_QUEUE=[my]
Checking => CONFIG_NFT_QUEUE=[my]
Checking => CONFIG_NETFILTER_XT_TARGET_NFQUEUE=[my]

* nfqueue        ✔

Checking => CONFIG_NETFILTER_NETLINK=[my]
Checking => CONFIG_NETFILTER_NETLINK_QUEUE=[my]
Checking => CONFIG_NETFILTER_NETLINK_ACCT=[my]

* netlink        ✔

Checking => CONFIG_INET_DIAG=[my]
Checking => CONFIG_INET_TCP_DIAG=[my]
Checking => CONFIG_INET_UDP_DIAG=[my]
Checking => CONFIG_INET_DIAG_DESTROY=[my]

* net diagnostics        ✔

XZVB12 · 2024-10-08T10:34:13Z

could you change the LogLevel to DEBUG, and make systemd-timesyncd crash again?

According to systemd-coredump event occurred at 13:25:27

sudo coredumpctl info 3603
PID: 3603 (systemd-timesyn)
UID: 977 (systemd-timesync)
GID: 977 (systemd-timesync)
Signal: 11 (SEGV)
Timestamp: Tue 2024-10-08 13:25:27

For this time in opensnitch log there are the following entries:


�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF exit event] -> 3587
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF exit event inCache] -> 3587
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF exit event] -> 3590
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF exit event inCache] -> 3590
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF exec event] ppid: 1, pid: 3603, /usr/bin/opensnitchd -> [/usr/lib/systemd/systemd-executor --deserialize 50 --log-level info --log-target journal-or-kmsg]
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF event inCache] -> 3603
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m new connection udp => 41641:192.168.0.2 -> 1.1.1.1 (0.arch.pool.ntp.org):53 uid: 977, mark: 0
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [ebpf conn] not in cache, but in execEvents: udp41641192.168.0.21.1.1.153, 3603 -> /usr/lib/systemd/systemd-timesyncd
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [ebpf conn] adding item to cache: udp41641192.168.0.21.1.1.153
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m �[1m�[32mâ���[0m�[0m �[1m/usr/lib/systemd/systemd-timesyncd�[0m -> 41641:�[1m192.168.0.2�[0m => �[1m0.arch.pool.ntp.org (1.1.1.1)�[0m:53, mark: 0 (�[32mallow-always-simple-usr-lib-systemd-systemd-timesyncd�[0m)
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m new connection udp => 41641:192.168.0.2 -> 1.1.1.1 (0.arch.pool.ntp.org):53 uid: 977, mark: 0
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m error deleting ebpf entry: unable to delete element: no such file or directory
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [ebpf conn] in cache: udp41641192.168.0.21.1.1.153, 3603 -> /usr/lib/systemd/systemd-timesyncd
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m �[1m�[32mâ���[0m�[0m �[1m/usr/lib/systemd/systemd-timesyncd�[0m -> 41641:�[1m192.168.0.2�[0m => �[1m0.arch.pool.ntp.org (1.1.1.1)�[0m:53, mark: 0 (�[32mallow-always-simple-usr-lib-systemd-systemd-timesyncd�[0m)
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m New DNS record: 89.221.207.113 -> 0.arch.pool.ntp.org
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m New DNS record: 82.142.168.18 -> 0.arch.pool.ntp.org
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m New DNS record: 91.206.16.3 -> 0.arch.pool.ntp.org
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m New DNS record: 162.159.200.123 -> 0.arch.pool.ntp.org
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF exec event] ppid: 1, pid: 3606, /usr/bin/opensnitchd -> [/usr/lib/systemd/systemd-executor --deserialize 113 --log-level info --log-target journal-or-kmsg]
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF event inCache] -> 3606
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF exit event] -> 3603
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF exit event inCache] -> 3603
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF exit event] -> 3606
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m [eBPF exit event inCache] -> 3606
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m client.disconnect()
�[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m UI auth: simple

The error I see here: �[2m[2024-10-08 13:25:27]�[0m �[2m�[30m�[100m DBG �[0m error deleting ebpf entry: unable to delete element: no such file or directory

I attached a log file with information where systemd-timesync starts, so you should easily find . Maybe it will help more
opensnitchd cut.log

gustavo-iniguez-goya · 2024-10-08T12:19:23Z

thank you @XZVB12 !

I've managed to reproduce it on manjaro with systemd-timesyncd 256.6 and kernel 6.11.0.6-MANJARO.

The problematic ebpf module is opensnitch-dns.o , if you rename it, systemd-timesyncd will stop crashing:
~ $ sudo mv /usr/lib/opensnitchd/ebpf/opensnitch-dns.o /usr/lib/opensnitchd/ebpf/opensnitch-dns.o.bak

I'll try to understand why systemd-timesyncd is crashing and if we can do anything about it.

gustavo-iniguez-goya · 2024-10-08T14:00:40Z

we hook 2 libc functions: gethostbyname and getaddrinfo.

systemd-timesyncd crashes when a ebpf hook is attached to uretprobe getaddrinfo (just hooking it, even without reading any variable or register). The hooks for gethostbyname or uprobe getaddrinfo work just fine.

Here's a bpftrace script that causes systemd-timesyncd to crash, without opensnitch running:

uprobe-getaddrinfo.bt.txt

(rename it to .bt, and execute it with bpftrace uprobe-getaddrinfo.bt)

so in my opinion, even if we're indirectly causing it, the bug is not in our code. This segfault will occur with any software that hooks libc's getaddrinfo uretprobe.

Flat · 2024-10-08T14:03:09Z

I'd just like to add that I've also had the same issue happen with python using the archlinux reflector script, https://wiki.archlinux.org/title/Reflector so it does not seem to be solely systemd-timesyncd

XZVB12 · 2024-10-08T14:27:13Z

so in my opinion, even if we're indirectly causing it, the bug is not in our code. This segfault will occur with any software that hooks libc's getaddrinfo uretprobe.

Any idea what might be the reason? At this point, the kernel is 6.11.2 , and the problem still persists and apparently not noticed. Although I initially noticed this bug in the -RC versions of the kernel, I just didn’t report it before the final release

gustavo-iniguez-goya · 2024-10-08T15:09:44Z

no idea @XZVB12 , a change in the 6.11.x kernels maybe.

I've just tested it on kernel 6.10.11-2-MANJARO and systemd-timesyncd doesn't crash, so maybe some of the changes of uretprobes might be the reason of the crashes:

91bd008d4e2b4962ecb9a10e40c2fb666b0aeb92
Merge tag 'probes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

https://lore.kernel.org/lkml/[email protected]/T/

https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.11

Flat · 2024-10-08T15:29:29Z

Sounds like it could be related to this

Add uretprobe syscall which speeds up the uretprobe 10-30% faster. This
syscall is automatically used from user-space trampolines which are
generated by the uretprobe. If this syscall is used by normal
user program, it will cause SIGILL. Note that this is currently only
implemented on x86_64.

XZVB12 · 2024-10-08T20:27:29Z

Sounds like it could be related to this

So it’s not a bug now, but a new way of "speeds up" ?

gustavo-iniguez-goya · 2024-10-08T21:37:33Z

it could be a bug in the 6.11 kernel (or systemd maybe).

let's summarize what's happening:

a user-space app hooks some uretprobes (with this bpftrace script for example https://github.com/user-attachments/files/17294279/uprobe-getaddrinfo.bt.txt).
the app and the uretprobes work as expected.
restarting systemd-timesyncd (256.6, not tested with other versions) crashes with references in the stacktrace to epoll_wait(), sd_event_wait()/_run() and sd_event_loop(). (our syscallfilters by default deny "uretprobe()" syscall, but some bpf tools want it systemd/systemd#34615 (comment))
the problem can be reproduced with kernel 6.11, but no with 6.10.11 (the only one I've tested).
launching systemd-timesyncd manually from a root shell doesn't crash.
Using systemctl restart systemd-timesyncd crashes in loop.

Workaround 1: rename opensnitchd-dns.o and restart opensnitch

~ $ sudo mv /usr/lib/opensnitchd/ebpf/opensnitch-dns.o /usr/lib/opensnitchd/ebpf/opensnitch-dns.o.bak
~ $ sudo systemctl restart opensnitchd

Workaround 2: #1205 (comment)

I am pretty sure it is due to the SystemCallFilter option set in the .service file used to launch via systemd. If you remove that line it will not crash.

gustavo-iniguez-goya · 2024-10-09T18:45:41Z

@XZVB12 , even if this is a kernel bug, the process which is crashing is systemd-timesyncd, so there seems to be a condition that they're not controlling, when systemd-timesyncd is launched by systemd.

So maybe they should take a look at this problem. They can use the steps above to reproduce it.

Flat · 2024-10-09T19:14:16Z

@XZVB12 , even if this is a kernel bug, the process which is crashing is systemd-timesyncd, so there seems to be a condition that they're not controlling, when systemd-timesyncd is launched by systemd.

So maybe they should take a look at this problem. They can use the steps above to reproduce it.

I am pretty sure it is due to the SystemCallFilter option set in the .service file used to launch via systemd. If you remove that line it will not crash. The odd part is it should not be a SIGSEV it should be EPERM or EACCES. Adding uretprobe syscall to the filters does not help either.

zquestz · 2024-10-09T21:20:09Z

Just a heads up, this is crashing stubby on Arch as well.

zquestz · 2024-10-09T21:28:18Z

Just did a test, and removing the SystemCallFilter fixes both the issues on my machine. Oddly enough, you can also get them running again by reinstalling the glibc package... For some reason the SystemCallFilter doesn't trip if you reinstall glibc then run the systemd services...

XZVB12 · 2024-10-10T09:15:21Z

Just a heads up, this is crashing stubby on Arch as well.

It seems that any distribution with the 6.11 kernel will be affected by this problem

So maybe they should take a look at this problem. They can use the steps above to reproduce it.

I reopened the discussion and gave links to new information . Perhaps you could also join the discussion, maybe it would be easier to communicate directly.

gustavo-iniguez-goya · 2024-10-11T10:55:13Z

Thank you @XZVB12 and @Flat for handling and analyzing this issue.

Hopefully it'll be resolved in the next systemd release.

UnconnectedBedna · 2024-11-15T21:27:11Z

This is back again, but this time the boot freezes for 60s when trying to sync, and then systemd-timesyncd fails with core dump.

Same thing happens on 6.6 LTS, (freezes for one minute) BUT the service does NOT fail, only gets stuck for 60s.

After boot:

sudo /usr/lib/systemd/systemd-timesyncd

Contacted time server 192.121.108.99:123 (0.arch.pool.ntp.org).
Initial clock synchronization to Fri 2024-11-15 23:06:08.458151 CET.

Works just fine, so this is something at boot time.
I do have opensnitch-ebpf-module installed.

If you want me to make a new post I can do that, but since this seems to be some kind of regression, I started by posting here.

Flat · 2024-11-15T22:23:58Z

The original issue still hasn't been resolved. libseccomp will need a new release with the syscalls added. seccomp/libseccomp#439 once this has been merged and libseccomp updated it should resolve this issue.

UnconnectedBedna · 2024-11-15T22:30:21Z

Ah, that is what is happening.
Strange that it started working on my system, and then regressed again.

I DID try to check seccomp first before posting here with the links referenced (seccomp/libseccomp#438 > seccomp/libseccomp#435) and they looked merged, but clearly there is more judging by the link you provided. Ie, my "research" was sloppy.

Thank you for the clarification!
I'll just hold on for the devs to "catch up", I can wait one minute for the boot to complete.. xD

XZVB12 mentioned this issue Oct 4, 2024

our syscallfilters by default deny "uretprobe()" syscall, but some bpf tools want it systemd/systemd#34615

Closed

gustavo-iniguez-goya changed the title ~~opensnitch eBPF modules causes systemd-timesyncd core dump on Kernel 6.11~~ systemd seccomp filters cause systemd-timesyncd core dumps on Kernel 6.11 when an app uses eBPF modules Oct 10, 2024

XZVB12 mentioned this issue Oct 11, 2024

RFE: add the uretprobe() syscall to the list for kernel 6.11.* seccomp/libseccomp#438

Closed

gustavo-iniguez-goya closed this as completed Oct 11, 2024

JohnyPeaN mentioned this issue Nov 17, 2024

SEGFAULT with HTTPS requests on 6.11 linux kernel nginx/nginx#226

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

systemd seccomp filters cause systemd-timesyncd core dumps on Kernel 6.11 when an app uses eBPF modules #1205

systemd seccomp filters cause systemd-timesyncd core dumps on Kernel 6.11 when an app uses eBPF modules #1205

XZVB12 commented Oct 4, 2024 •

edited

Loading

gustavo-iniguez-goya commented Oct 7, 2024

XZVB12 commented Oct 8, 2024 •

edited

Loading

XZVB12 commented Oct 8, 2024

gustavo-iniguez-goya commented Oct 8, 2024

gustavo-iniguez-goya commented Oct 8, 2024

Flat commented Oct 8, 2024

XZVB12 commented Oct 8, 2024 •

edited

Loading

gustavo-iniguez-goya commented Oct 8, 2024

Flat commented Oct 8, 2024

XZVB12 commented Oct 8, 2024

gustavo-iniguez-goya commented Oct 8, 2024 •

edited

Loading

gustavo-iniguez-goya commented Oct 9, 2024

Flat commented Oct 9, 2024

zquestz commented Oct 9, 2024

zquestz commented Oct 9, 2024

XZVB12 commented Oct 10, 2024

gustavo-iniguez-goya commented Oct 11, 2024

UnconnectedBedna commented Nov 15, 2024 •

edited

Loading

Flat commented Nov 15, 2024

UnconnectedBedna commented Nov 15, 2024 •

edited

Loading

systemd seccomp filters cause systemd-timesyncd core dumps on Kernel 6.11 when an app uses eBPF modules #1205

systemd seccomp filters cause systemd-timesyncd core dumps on Kernel 6.11 when an app uses eBPF modules #1205

Comments

XZVB12 commented Oct 4, 2024 • edited Loading

gustavo-iniguez-goya commented Oct 7, 2024

XZVB12 commented Oct 8, 2024 • edited Loading

XZVB12 commented Oct 8, 2024

gustavo-iniguez-goya commented Oct 8, 2024

gustavo-iniguez-goya commented Oct 8, 2024

Flat commented Oct 8, 2024

XZVB12 commented Oct 8, 2024 • edited Loading

gustavo-iniguez-goya commented Oct 8, 2024

Flat commented Oct 8, 2024

XZVB12 commented Oct 8, 2024

gustavo-iniguez-goya commented Oct 8, 2024 • edited Loading

gustavo-iniguez-goya commented Oct 9, 2024

Flat commented Oct 9, 2024

zquestz commented Oct 9, 2024

zquestz commented Oct 9, 2024

XZVB12 commented Oct 10, 2024

gustavo-iniguez-goya commented Oct 11, 2024

UnconnectedBedna commented Nov 15, 2024 • edited Loading

Flat commented Nov 15, 2024

UnconnectedBedna commented Nov 15, 2024 • edited Loading

XZVB12 commented Oct 4, 2024 •

edited

Loading

XZVB12 commented Oct 8, 2024 •

edited

Loading

XZVB12 commented Oct 8, 2024 •

edited

Loading

gustavo-iniguez-goya commented Oct 8, 2024 •

edited

Loading

UnconnectedBedna commented Nov 15, 2024 •

edited

Loading

UnconnectedBedna commented Nov 15, 2024 •

edited

Loading