Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add static analysis to CI to determine redundant seccomp rules #4924

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Nov 22, 2024

Use static analysis to determine the set of syscalls (+ arguments) that are actually present in the Firecracker binary, and cross-reference this with the seccomp filters, to determine redundant seccomp rules. Additionally, remove redundant seccomp rules identified by this process. Please see the individual commit messages for details.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Nov 22, 2024
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.07%. Comparing base (4352366) to head (6a6f52c).
Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4924   +/-   ##
=======================================
  Coverage   84.07%   84.07%           
=======================================
  Files         251      251           
  Lines       28067    28067           
=======================================
  Hits        23597    23597           
  Misses       4470     4470           
Flag Coverage Δ
5.10-c5n.metal 84.64% <ø> (+<0.01%) ⬆️
5.10-m5n.metal 84.62% <ø> (ø)
5.10-m6a.metal 83.93% <ø> (ø)
5.10-m6g.metal 80.75% <ø> (ø)
5.10-m6i.metal 84.62% <ø> (ø)
5.10-m7g.metal 80.75% <ø> (ø)
6.1-c5n.metal 84.64% <ø> (ø)
6.1-m5n.metal 84.62% <ø> (ø)
6.1-m6a.metal 83.94% <ø> (+<0.01%) ⬆️
6.1-m6g.metal 80.75% <ø> (ø)
6.1-m6i.metal 84.62% <ø> (ø)
6.1-m7g.metal 80.75% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


🚨 Try these New Features:

These have been determined by static analysis of a Firecracker binary
(see also follow up commits): The removed seccomp rules here trigger
syscalls that are either not present at all in the entire Firecracker
binary, or are not reachable from the entry point of a specific
Firecracker thread (this analysis has only been done for the vcpu thread
for now, due to being fairly tricky).

Some explanations for why some of these entries are no longer needed can
be found below

- The `uname` syscall was used back when we supported 4.14, to query the
  host kernel version and disable specific Firecracker features that
  were not supported pre-5.10 (io_uring and hugepages for memfd). With
  4.14 support dropped, there are no such checks anymore.
- Various KVM_SET_* ioctls do not need to be allowed on the vcpu thread,
  because they are all called _before_ the vcpu seccomp filters are
  installed (as they are only used during snapshot restore / when
  preparing KVM state for boot).
- on aarch64, we can additionally remove KVM_{SET,GET}VCPU_EVENTS, as we
  never call this ioctl on arm (only on x86)

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat force-pushed the seccomp-cleanup branch 2 times, most recently from 9dd9379 to 25ba8a0 Compare November 25, 2024 14:00
Since Firecracker uses the musl target, it is statically compiled
against all of its dependencies. This means that all syscalls that
Firecracker can possibly do, are represented by `syscall`/`svc #0x0`
instructions inside its binary. By doing some primitive static analysis,
it is possible to determine, for each of these instructions, what the
actual syscall number is (the idea being: it will probably be a constant
loaded into a specific register, determined by Linux's syscall
convention, very near the instruction itself). On a best-effort basis,
we can also try to determine the values of all registers that might
holds syscall arguments (for things like ioctls, this works very well,
because the ioctl number is almost always a constant loaded very closely
to the syscall instruction).

The static_analysis.py file implements the logic for this analysis. We
essentially start at each syscall instruction, and then work through the
preceding instructions in reverse order, trying to symbolically execute
the program in reverse until we figure out the value of the register
that holds the syscall number. Thankfully, because of the above
mentioned locally, this means we only need to really understand a
handful of instructions (based on the assumption that we can ignore
instructions that do not explicitly refer to registers that we care
about).

For example, on x86_64, the syscall number is stored in the eax
register, so if we encounter the following instruction sequence

xor %ebx,%ebx
mov %ebx,%eax
syscall

we first symbolically execute the "mov", and determine that to figure
out the syscall number, we actually need to determine the value of the
ebx register. So we go back one more instruction, and find out that %ebx
is explicitly set to 0. Thus, forward propagating this 0 to the syscall
instruction again, we find that we're doing a `read` syscall.

Things get a bit more complicated on ARM, where we don't have to only
care about register-to-register transfers, but also things like `add`
instructions. For these, we must also keep a "log" of manipulations to
apply during forward propagation. For example, on ARM the syscall number
is passed in the w8 register, so in the following sequence

mov w19, #0x6
add w8, w19, #0x1
svc #0x0

we determine that to compute the syscall number, we must first determine
the value of the w19 register, but also we must remember that when we
forward propagate the value 0x6 from the w19 register, we must add 1 to
it when propagating into the w8 register.

Sometimes, we additionally have to deal with wrapper functions around
syscalls (e.g. `libc::syscall`). For these, we need to translate the
list of syscall number and syscall arguments that the wrapper received
according to  to the architectures C calling convention (cdecl on
x86_64, w0-w7 for arguments on ARM) into what the wrapper will
eventually pass to the syscall instruction. On x86_64 this comes with
the complication that cdecl only allows passing 6 arguments in
registers, but syscall nr + all arguments are up to 7 values. Thus, when
a syscall is invoked via libc::syscall, we can only determine up to 5
syscall arguments (the final one would be on the stack, and stack
analysis would significantly complicate this script).

Lastly, we sometimes also have wrappers for specific syscalls (e.g.
ioctl). Here, we do something very similar to the above, however since
now the syscall number is "implicit" (e.g. not an argument to the
wrapper function, but rather hard-coded into the wrapper), we no not
actually run into problems with argument counts on x86_64. We implement
this logic because it allows us to determine more arguments (for
example, if we did not explicitly look at callsites of the ioctl
wrapper, we would not be able to determine which ioctl numbers are
actually invoked, greatly reducing the usefulness, as our seccomp
filters are large ioctls).

After determining all the syscalls, we then compare it to our seccomp
filters. We look at all our allowed rules, and compute the set of rules
that does not match _any_ of the actual syscalls invoked by the binary.
If any such are found, we fail inside a new integration test.

There are fairly big limitations to this approach. Because we're doing a
global analysis of the firecracker binary, we cannot differentiate
between syscalls on different threads, and also not between syscalls
done before and after actual application of seccomp filters. For this,
we would need some sort of control flow graph analysis, which is made
significantly harder due to the presence of dynamic dispatch.
Nevertheless, this simple approach already showed 4 redundant syscall
rules (and playing around with applying the script to reduced
compilation units showed a lot more unneeded rules for the vcpu thread).

For all of this to work, we need to compile Firecracker as a
non-position independent executable (to get the compiler to general
direct calls with absolute addresses, instead of instruction pointer
relative ones). Since all dependencies also need to be compiled this
way, we have to use cargo's unstable `-Zbuild-std` feature, which thus
means we need a nightly toolchain.

Signed-off-by: Patrick Roy <[email protected]>
Add the rust-src component and the $(uname -m)-unknown-linux-musl
targets for the nightly toolchain, and install python3-seccomp and
rustfilt. Since the python bindings for libseccomp are not published to
pip, we have to install it into the global python installation via
apt-get, and then copy into our venv.

Signed-off-by: Patrick Roy <[email protected]>
this commit should be dropped once we have rebuilt the devctr.

Signed-off-by: Patrick Roy <[email protected]>
@roypat
Copy link
Contributor Author

roypat commented Nov 25, 2024

Mhh, it works on ARM with the newest nightly, maybe related to rust-lang issue 125619? maybe need to part this one for a while then

@roypat roypat added Status: Parked Indicates that an issues or pull request will be revisited later and removed Status: Awaiting review Indicates that a pull request is ready to be reviewed labels Nov 25, 2024
@roypat roypat marked this pull request as ready for review November 25, 2024 15:06
Comment on lines +448 to +450
disassembly = subprocess.check_output(["rustfilt"], stdin=p_objdump.stdout).decode(
"utf-8"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

objdump -C rust should do the trick as well.

Comment on lines +1 to +2
# Copyright 2024 Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be a rust binary instead of python script? Smth that takes binary and a seccomp json as inputs and tells if rules are correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Parked Indicates that an issues or pull request will be revisited later
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants