-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: run dbus-broker under ASan and UBsan #359
base: main
Are you sure you want to change the base?
test: run dbus-broker under ASan and UBsan #359
Conversation
69331b9
to
4dd4726
Compare
@evverx as promised here's a PoC of a test that runs dbus-broker under ASan and UBSan while we hammer it with other tests on the side. It got slightly more involved to protect the host machine, and LSan needed some extra care as well (see the comments in the test code). It's different from the original idea of running existing tests on a sanitized build, since I currently have no clue how to incorporate this into the whole Packit/TestingFarm infra (and I'm also not quite sure how well it would handle dying dbus and collecting artifacts from such machines). If I run the test in one of my Arch VMs against the latest upstream, it reports a leak after I run dfuzzer just on the D-Bus control interface, so it looks like it's doing something :)
But it doesn't appear to ... appear in CI, interesting. |
I'm guessing the policy triggering it isn't included in the Fedora base image used by the CI. It should probably be possible to track that policy down by removing the policies on the Arch VM one by one. |
Environment=ASAN_OPTIONS=$ASAN_OPTIONS | ||
Environment=UBSAN_OPTIONS=$UBSAN_OPTIONS | ||
# Useful for debugging LSan errors, but it's very verbose, hence disabled by default | ||
#Environment=LSAN_OPTIONS=verbosity=1:log_threads=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand the errors were caused by dropping privileges when it was run as dbus
. With <user>root</user>
it should probably be fine to remove it.
# To make the test a bit more robust without too much effort, let's use systemd-nspawn to run an ephemeral | ||
# container on top of the current rootfs. To get the "sanitized" dbus-broker into that container, we need to | ||
# prepare a special rootfs with just the sanitized dbus-broker (and a couple of other things) which we then | ||
# simply overlay on top of the ephemeral rootfs in the container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I fully understand what's going on here :-) but assuming the next step would be to run it all under Valgrind too I wonder if it's possible to somehow split the script and move the test suite itself to a separate bash script that could be run in two different places?
That being said I think it would make sense to avoid complicating things by splitting the script and make Asan/UBSan work first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be relatively easy to factor the container shenanigans out into a separate helper, I'll check what's the actual reality :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one more question. Is it run on VMs/bare-metal machines where coredumps can be collected? I'm not sure what should happen when dbus-broker crashes/segfaults or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. Until recently systemd-coredump would leave coredumps from containers on the host, but since systemd/systemd@a108c43 it forwads the coredumps to the container to handle, so this might need some extra attention as well. We could either disable the forwarding (CoredumpReceive=no
in the respective [email protected]
instance) and just use coredumpctl
on the host, configure systemd-coredump
to store the coredumps in the journal (since the container journal is already exported on the host), or bind mount a directory from the host to /var/lib/systemd/coredump
so the coredumps don't vanish together with the container.
I like the first option the most (and since we use host's /usr we shouldn't have any issues with mismatches symbols), but I'll play with this a bit to see how well it works in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Apologies for dropping the ball on this, got very busy with some internal stuff. Hopefully I'll get back to this soon.)
d03a43d
to
1c7a0f8
Compare
6bb14bb
to
8f32458
Compare
We need all the listed packages so upgrade the dependency appropriately.
b86c1d0
to
718bb38
Compare
@evverx I factored out the common parts into a separate utility script, prepped another test that runs dbus-broker under Valgrind, and it seems to work (or at least Valgrind seem to complain a lot). However, it will need a bit more polish I'll move the last two commits into a separate branch with the next push (which might take a bit, as I'll like to gather some coverage reports for the sanitized dbus-broker to see how we could improve the ASan+UBSan test), so it doesn't block the sanitizer test. |
@mrc0mmand I agree that Valgrind shouldn't block this PR. As far as I can remember the launcher should be tweaked too to run dbus-broker under Valgrind and some syscalls should be instrumented. The backtraces came with PID fds as far as I can remember. |
- dbus-daemon | ||
- dfuzzer | ||
- expat-devel | ||
- gcc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to build it with clang because it comes with more checks. (Ideally it would be great to build it with both but if it's either gcc or clang I'd pick clang :-))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hehe, I'm just playing around with sancov (which works best with clang), so I'm inclined to using clang by default as well :)
# issues: | ||
# | ||
# 1) We need to restart dbus-broker (and hence the machine we're currently running on) | ||
# 2) If dbus-broker crashes due to ASan/UBSan error, the whole machine is hosed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand it should be applicable to "plain" builds too in the sense that if dbus-broker crashes/gets stuck there the testbed is hosed as well. I wonder if this container trick should be used in the integration tests generally? (Just to clarify I'm not saying it should be implemented here and I don't fully understand what's going on under the hood. I'm guessing it isn't possible to prevent the package from being installed and started)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that as well, but I'm not sure if it's worth the trouble. I opted into the container shenanigans just because both the ASan and the Valgrind tests change how dbus-broker is invoked (and under which user), which has slightly higher probability of going south (and some services don't like restarting dbus in general).
However, even if dbus dies the machine still remains (somewhat) usable (and this scenario should be very unlikely in "plain" tests). I plan on adding an "at exit" test/task that'll run after all tests (or maybe after each test) and collects possible coredumps and other useful logs, so even if we manage to crash dbus-broker in a "plain" test, we should still have necessary dumps and logs available to debug it.
mkdir -p "/run/systemd/system/systemd-nspawn@$CONTAINER_NAME.service.d" | ||
cat >"/run/systemd/system/systemd-nspawn@$CONTAINER_NAME.service.d/override.conf" <<EOF | ||
[Service] | ||
# We'll handle the coredumps on the host instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really cool that it's possible to collect coredumps on the host there. GH Actions are run in containers so coredumps go to the underlying host actions have no access to. I guess it's convincing enough to switch to this testing infrastructure :-)
718bb38
to
596eecb
Compare
c7495e3
to
50c0c52
Compare
_exit() calls skip at-exit hooks, which also skips a call to __gcov_dump() when collecting coverage with gcov, resulting in inaccurate coverage reports. To mitigate this, define a custom _exit() function which injects __gcov_dump() just before _exit(), and use a macro to override the already existing _exit() function. To make this work without a bunch of includes scattered across the codebase, inject the coverage-specific include into the compiler command line when -Db_coverage=true is used.
Let's introduce a test that runs dbus-broker under Address Sanitizer and Undefined Behavior Sanitizer, while running other tests against it. The setup to achieve this is slightly convoluted, since we need to run (and restart) sanitized dbus-broker without nuking the host machine. For that we setup an nspawn-container that re-uses host's rootfs (to some degree) and overlays our additions on top of that. This way we can test (not-only) the full user-space boot with sanitized dbus-broker without risking "damage" to the host machine.
50c0c52
to
ddedfdd
Compare
Let's introduce a test that runs dbus-broker under Address Sanitizer and Undefined Behavior Sanitizer, while running other tests against it.
The setup to achieve this is slightly convoluted, since we need to run (and restart) sanitized dbus-broker without nuking the host machine. For that we setup an nspawn-container that re-uses host's rootfs (to some degree) and overlays our additions on top of that. This way we can test (not-only) the full user-space boot with sanitized dbus-broker without risking "damage" to the host machine.