Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: run dbus-broker under ASan and UBsan #359

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mrc0mmand
Copy link
Contributor

Let's introduce a test that runs dbus-broker under Address Sanitizer and Undefined Behavior Sanitizer, while running other tests against it.

The setup to achieve this is slightly convoluted, since we need to run (and restart) sanitized dbus-broker without nuking the host machine. For that we setup an nspawn-container that re-uses host's rootfs (to some degree) and overlays our additions on top of that. This way we can test (not-only) the full user-space boot with sanitized dbus-broker without risking "damage" to the host machine.

@mrc0mmand mrc0mmand force-pushed the test-run-dbus-broker-under-sanitizers branch 3 times, most recently from 69331b9 to 4dd4726 Compare April 17, 2024 18:08
@mrc0mmand
Copy link
Contributor Author

mrc0mmand commented Apr 17, 2024

@evverx as promised here's a PoC of a test that runs dbus-broker under ASan and UBSan while we hammer it with other tests on the side. It got slightly more involved to protect the host machine, and LSan needed some extra care as well (see the comments in the test code).

It's different from the original idea of running existing tests on a sanitized build, since I currently have no clue how to incorporate this into the whole Packit/TestingFarm infra (and I'm also not quite sure how well it would handle dying dbus and collecting artifacts from such machines).

If I run the test in one of my Arch VMs against the latest upstream, it reports a leak after I run dfuzzer just on the D-Bus control interface, so it looks like it's doing something :)

# TMT_TEST_DATA=~/logs DBUS_BROKER_TREE=$PWD test/integration/fuzz/sanitizers/test.sh
...
[688762.734598] dbus-broker[66]: A security policy denied :1.16 to send method call /org/freedesktop/DBus:org.freedesktop.DBus.Peer.GetMachineId to org.freedesktop.DBus.
[688762.735575] dbus-broker[66]: A security policy denied :1.16 to send method call /org/freedesktop/DBus:org.freedesktop.DBus.Peer.Ping to org.freedesktop.DBus.
[688763.062724] dbus-broker[66]: Dispatched 1954 messages @ 198(±297)μs / message.
[688763.063133] systemd[1]: Stopping D-Bus System Message Bus...
[688763.105139] dbus-broker-launch[65]: =================================================================
[688763.105437] dbus-broker-launch[65]: ==65==ERROR: LeakSanitizer: detected memory leaks
[688763.105437] dbus-broker-launch[65]: Direct leak of 880 byte(s) in 10 object(s) allocated from:
[688763.105564] dbus-broker-launch[65]:     #0 0x75fe014e2cc1 in __interceptor_calloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
[688763.105564] dbus-broker-launch[65]:     #1 0x5b9d8d5fe8f7 in policy_record_new_xmit ../src/launch/policy.c:56
[688763.105564] dbus-broker-launch[65]:     #2 0x5b9d8d609364 in policy_import_send ../src/launch/policy.c:460
[688763.105564] dbus-broker-launch[65]:     #3 0x5b9d8d60f44f in policy_import ../src/launch/policy.c:696
[688763.105564] dbus-broker-launch[65]:     #4 0x5b9d8d5c018b in launcher_load_policy ../src/launch/launcher.c:1096
[688763.105564] dbus-broker-launch[65]:     #5 0x5b9d8d5c1943 in launcher_reload_config ../src/launch/launcher.c:1195
[688763.105564] dbus-broker-launch[65]:     #6 0x5b9d8d5c240c in bus_method_reload_config ../src/launch/launcher.c:1296
[688763.105564] dbus-broker-launch[65]:     #7 0x75fe0131b45c  (/usr/lib/libsystemd.so.0+0x2f45c) (BuildId: 540b0b05a1f8668a797c3ff583cd0b19ff6962f4)
[688763.105564] dbus-broker-launch[65]:     #8 0x75fe01334b1a  (/usr/lib/libsystemd.so.0+0x48b1a) (BuildId: 540b0b05a1f8668a797c3ff583cd0b19ff6962f4)
[688763.105734] dbus-broker-launch[65]: Direct leak of 88 byte(s) in 1 object(s) allocated from:
[688763.105877] dbus-broker-launch[65]:     #0 0x75fe014e2cc1 in __interceptor_calloc /usr/src/debug/gcc/gcc/libsanitizer/asan/asan_malloc_linux.cpp:77
[688763.105877] dbus-broker-launch[65]:     #1 0x5b9d8d5fe8f7 in policy_record_new_xmit ../src/launch/policy.c:56
[688763.105877] dbus-broker-launch[65]:     #2 0x5b9d8d609364 in policy_import_send ../src/launch/policy.c:460
[688763.105877] dbus-broker-launch[65]:     #3 0x5b9d8d60f44f in policy_import ../src/launch/policy.c:696
[688763.105877] dbus-broker-launch[65]:     #4 0x5b9d8d5c018b in launcher_load_policy ../src/launch/launcher.c:1096
[688763.105877] dbus-broker-launch[65]:     #5 0x5b9d8d5c313f in launcher_run ../src/launch/launcher.c:1347
[688763.105877] dbus-broker-launch[65]:     #6 0x5b9d8d5b270c in run ../src/launch/main.c:151
[688763.105877] dbus-broker-launch[65]:     #7 0x5b9d8d5b2949 in main ../src/launch/main.c:174
[688763.105877] dbus-broker-launch[65]:     #8 0x75fe00b1cccf  (/usr/lib/libc.so.6+0x25ccf) (BuildId: c0caa0b7709d3369ee575fcd7d7d0b0fc48733af)
[688763.106017] dbus-broker-launch[65]: SUMMARY: AddressSanitizer: 968 byte(s) leaked in 11 allocation(s).
[688763.115803] systemd[1]: dbus-broker.service: Main process exited, code=exited, status=1/FAILURE
[688763.115956] systemd[1]: dbus-broker.service: Failed with result 'exit-code'.
[688763.116408] systemd[1]: Stopped D-Bus System Message Bus.
[688763.116595] systemd[1]: dbus-broker.service: Consumed 2.112s CPU time, 142.2M memory peak, 0B memory swap peak

But it doesn't appear to ... appear in CI, interesting.

@evverx
Copy link
Contributor

evverx commented Apr 18, 2024

If I run the test in one of my Arch VMs against the latest upstream, it reports a leak after I run dfuzzer just on the D-Bus control interface, so it looks like it's doing something. But it doesn't appear to ... appear in CI, interesting

I'm guessing the policy triggering it isn't included in the Fedora base image used by the CI. It should probably be possible to track that policy down by removing the policies on the Arch VM one by one.

Environment=ASAN_OPTIONS=$ASAN_OPTIONS
Environment=UBSAN_OPTIONS=$UBSAN_OPTIONS
# Useful for debugging LSan errors, but it's very verbose, hence disabled by default
#Environment=LSAN_OPTIONS=verbosity=1:log_threads=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand the errors were caused by dropping privileges when it was run as dbus. With <user>root</user> it should probably be fine to remove it.

# To make the test a bit more robust without too much effort, let's use systemd-nspawn to run an ephemeral
# container on top of the current rootfs. To get the "sanitized" dbus-broker into that container, we need to
# prepare a special rootfs with just the sanitized dbus-broker (and a couple of other things) which we then
# simply overlay on top of the ephemeral rootfs in the container.
Copy link
Contributor

@evverx evverx Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I fully understand what's going on here :-) but assuming the next step would be to run it all under Valgrind too I wonder if it's possible to somehow split the script and move the test suite itself to a separate bash script that could be run in two different places?

That being said I think it would make sense to avoid complicating things by splitting the script and make Asan/UBSan work first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be relatively easy to factor the container shenanigans out into a separate helper, I'll check what's the actual reality :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one more question. Is it run on VMs/bare-metal machines where coredumps can be collected? I'm not sure what should happen when dbus-broker crashes/segfaults or whatever.

Copy link
Contributor Author

@mrc0mmand mrc0mmand Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Until recently systemd-coredump would leave coredumps from containers on the host, but since systemd/systemd@a108c43 it forwads the coredumps to the container to handle, so this might need some extra attention as well. We could either disable the forwarding (CoredumpReceive=no in the respective [email protected] instance) and just use coredumpctl on the host, configure systemd-coredump to store the coredumps in the journal (since the container journal is already exported on the host), or bind mount a directory from the host to /var/lib/systemd/coredump so the coredumps don't vanish together with the container.

I like the first option the most (and since we use host's /usr we shouldn't have any issues with mismatches symbols), but I'll play with this a bit to see how well it works in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Apologies for dropping the ball on this, got very busy with some internal stuff. Hopefully I'll get back to this soon.)

@mrc0mmand mrc0mmand force-pushed the test-run-dbus-broker-under-sanitizers branch 3 times, most recently from d03a43d to 1c7a0f8 Compare May 13, 2024 15:00
@mrc0mmand mrc0mmand force-pushed the test-run-dbus-broker-under-sanitizers branch 8 times, most recently from 6bb14bb to 8f32458 Compare May 14, 2024 10:11
We need all the listed packages so upgrade the dependency appropriately.
@mrc0mmand mrc0mmand force-pushed the test-run-dbus-broker-under-sanitizers branch 3 times, most recently from b86c1d0 to 718bb38 Compare May 14, 2024 19:36
@mrc0mmand
Copy link
Contributor Author

@evverx I factored out the common parts into a separate utility script, prepped another test that runs dbus-broker under Valgrind, and it seems to work (or at least Valgrind seem to complain a lot). However, it will need a bit more polish I'll move the last two commits into a separate branch with the next push (which might take a bit, as I'll like to gather some coverage reports for the sanitized dbus-broker to see how we could improve the ASan+UBSan test), so it doesn't block the sanitizer test.

@evverx
Copy link
Contributor

evverx commented May 15, 2024

@mrc0mmand I agree that Valgrind shouldn't block this PR. As far as I can remember the launcher should be tweaked too to run dbus-broker under Valgrind and some syscalls should be instrumented. The backtraces came with PID fds as far as I can remember.

- dbus-daemon
- dfuzzer
- expat-devel
- gcc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to build it with clang because it comes with more checks. (Ideally it would be great to build it with both but if it's either gcc or clang I'd pick clang :-))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, I'm just playing around with sancov (which works best with clang), so I'm inclined to using clang by default as well :)

# issues:
#
# 1) We need to restart dbus-broker (and hence the machine we're currently running on)
# 2) If dbus-broker crashes due to ASan/UBSan error, the whole machine is hosed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand it should be applicable to "plain" builds too in the sense that if dbus-broker crashes/gets stuck there the testbed is hosed as well. I wonder if this container trick should be used in the integration tests generally? (Just to clarify I'm not saying it should be implemented here and I don't fully understand what's going on under the hood. I'm guessing it isn't possible to prevent the package from being installed and started)

Copy link
Contributor Author

@mrc0mmand mrc0mmand May 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that as well, but I'm not sure if it's worth the trouble. I opted into the container shenanigans just because both the ASan and the Valgrind tests change how dbus-broker is invoked (and under which user), which has slightly higher probability of going south (and some services don't like restarting dbus in general).

However, even if dbus dies the machine still remains (somewhat) usable (and this scenario should be very unlikely in "plain" tests). I plan on adding an "at exit" test/task that'll run after all tests (or maybe after each test) and collects possible coredumps and other useful logs, so even if we manage to crash dbus-broker in a "plain" test, we should still have necessary dumps and logs available to debug it.

mkdir -p "/run/systemd/system/systemd-nspawn@$CONTAINER_NAME.service.d"
cat >"/run/systemd/system/systemd-nspawn@$CONTAINER_NAME.service.d/override.conf" <<EOF
[Service]
# We'll handle the coredumps on the host instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really cool that it's possible to collect coredumps on the host there. GH Actions are run in containers so coredumps go to the underlying host actions have no access to. I guess it's convincing enough to switch to this testing infrastructure :-)

@mrc0mmand mrc0mmand force-pushed the test-run-dbus-broker-under-sanitizers branch from 718bb38 to 596eecb Compare May 15, 2024 16:03
@mrc0mmand mrc0mmand force-pushed the test-run-dbus-broker-under-sanitizers branch 3 times, most recently from c7495e3 to 50c0c52 Compare May 20, 2024 13:11
mrc0mmand added 3 commits May 20, 2024 17:34
_exit() calls skip at-exit hooks, which also skips a call to
__gcov_dump() when collecting coverage with gcov, resulting in
inaccurate coverage reports. To mitigate this, define a custom _exit()
function which injects __gcov_dump() just before _exit(), and use a
macro to override the already existing _exit() function.

To make this work without a bunch of includes scattered across the
codebase, inject the coverage-specific include into the compiler command
line when -Db_coverage=true is used.
Let's introduce a test that runs dbus-broker under Address Sanitizer and
Undefined Behavior Sanitizer, while running other tests against it.

The setup to achieve this is slightly convoluted, since we need to run
(and restart) sanitized dbus-broker without nuking the host machine. For
that we setup an nspawn-container that re-uses host's rootfs (to some
degree) and overlays our additions on top of that. This way we can test
(not-only) the full user-space boot with sanitized dbus-broker without
risking "damage" to the host machine.
@mrc0mmand mrc0mmand force-pushed the test-run-dbus-broker-under-sanitizers branch from 50c0c52 to ddedfdd Compare May 20, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants