Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add static analysis to CI to determine redundant seccomp rules #4924

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Commits on Nov 25, 2024

  1. Remove some redundant seccomp entries from x86_64 allow list

    These have been determined by static analysis of a Firecracker binary
    (see also follow up commits): The removed seccomp rules here trigger
    syscalls that are either not present at all in the entire Firecracker
    binary, or are not reachable from the entry point of a specific
    Firecracker thread (this analysis has only been done for the vcpu thread
    for now, due to being fairly tricky).
    
    Some explanations for why some of these entries are no longer needed can
    be found below
    
    - The `uname` syscall was used back when we supported 4.14, to query the
      host kernel version and disable specific Firecracker features that
      were not supported pre-5.10 (io_uring and hugepages for memfd). With
      4.14 support dropped, there are no such checks anymore.
    - Various KVM_SET_* ioctls do not need to be allowed on the vcpu thread,
      because they are all called _before_ the vcpu seccomp filters are
      installed (as they are only used during snapshot restore / when
      preparing KVM state for boot).
    - on aarch64, we can additionally remove KVM_{SET,GET}VCPU_EVENTS, as we
      never call this ioctl on arm (only on x86)
    
    Signed-off-by: Patrick Roy <[email protected]>
    roypat committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    7e2eb06 View commit details
    Browse the repository at this point in the history
  2. Add a test that fails if there are obviously unneeded seccomp rules

    Since Firecracker uses the musl target, it is statically compiled
    against all of its dependencies. This means that all syscalls that
    Firecracker can possibly do, are represented by `syscall`/`svc #0x0`
    instructions inside its binary. By doing some primitive static analysis,
    it is possible to determine, for each of these instructions, what the
    actual syscall number is (the idea being: it will probably be a constant
    loaded into a specific register, determined by Linux's syscall
    convention, very near the instruction itself). On a best-effort basis,
    we can also try to determine the values of all registers that might
    holds syscall arguments (for things like ioctls, this works very well,
    because the ioctl number is almost always a constant loaded very closely
    to the syscall instruction).
    
    The static_analysis.py file implements the logic for this analysis. We
    essentially start at each syscall instruction, and then work through the
    preceding instructions in reverse order, trying to symbolically execute
    the program in reverse until we figure out the value of the register
    that holds the syscall number. Thankfully, because of the above
    mentioned locally, this means we only need to really understand a
    handful of instructions (based on the assumption that we can ignore
    instructions that do not explicitly refer to registers that we care
    about).
    
    For example, on x86_64, the syscall number is stored in the eax
    register, so if we encounter the following instruction sequence
    
    xor %ebx,%ebx
    mov %ebx,%eax
    syscall
    
    we first symbolically execute the "mov", and determine that to figure
    out the syscall number, we actually need to determine the value of the
    ebx register. So we go back one more instruction, and find out that %ebx
    is explicitly set to 0. Thus, forward propagating this 0 to the syscall
    instruction again, we find that we're doing a `read` syscall.
    
    Things get a bit more complicated on ARM, where we don't have to only
    care about register-to-register transfers, but also things like `add`
    instructions. For these, we must also keep a "log" of manipulations to
    apply during forward propagation. For example, on ARM the syscall number
    is passed in the w8 register, so in the following sequence
    
    mov w19, #0x6
    add w8, w19, #0x1
    svc #0x0
    
    we determine that to compute the syscall number, we must first determine
    the value of the w19 register, but also we must remember that when we
    forward propagate the value 0x6 from the w19 register, we must add 1 to
    it when propagating into the w8 register.
    
    Sometimes, we additionally have to deal with wrapper functions around
    syscalls (e.g. `libc::syscall`). For these, we need to translate the
    list of syscall number and syscall arguments that the wrapper received
    according to  to the architectures C calling convention (cdecl on
    x86_64, w0-w7 for arguments on ARM) into what the wrapper will
    eventually pass to the syscall instruction. On x86_64 this comes with
    the complication that cdecl only allows passing 6 arguments in
    registers, but syscall nr + all arguments are up to 7 values. Thus, when
    a syscall is invoked via libc::syscall, we can only determine up to 5
    syscall arguments (the final one would be on the stack, and stack
    analysis would significantly complicate this script).
    
    Lastly, we sometimes also have wrappers for specific syscalls (e.g.
    ioctl). Here, we do something very similar to the above, however since
    now the syscall number is "implicit" (e.g. not an argument to the
    wrapper function, but rather hard-coded into the wrapper), we no not
    actually run into problems with argument counts on x86_64. We implement
    this logic because it allows us to determine more arguments (for
    example, if we did not explicitly look at callsites of the ioctl
    wrapper, we would not be able to determine which ioctl numbers are
    actually invoked, greatly reducing the usefulness, as our seccomp
    filters are large ioctls).
    
    After determining all the syscalls, we then compare it to our seccomp
    filters. We look at all our allowed rules, and compute the set of rules
    that does not match _any_ of the actual syscalls invoked by the binary.
    If any such are found, we fail inside a new integration test.
    
    There are fairly big limitations to this approach. Because we're doing a
    global analysis of the firecracker binary, we cannot differentiate
    between syscalls on different threads, and also not between syscalls
    done before and after actual application of seccomp filters. For this,
    we would need some sort of control flow graph analysis, which is made
    significantly harder due to the presence of dynamic dispatch.
    Nevertheless, this simple approach already showed 4 redundant syscall
    rules (and playing around with applying the script to reduced
    compilation units showed a lot more unneeded rules for the vcpu thread).
    
    For all of this to work, we need to compile Firecracker as a
    non-position independent executable (to get the compiler to general
    direct calls with absolute addresses, instead of instruction pointer
    relative ones). Since all dependencies also need to be compiled this
    way, we have to use cargo's unstable `-Zbuild-std` feature, which thus
    means we need a nightly toolchain.
    
    Signed-off-by: Patrick Roy <[email protected]>
    roypat committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    f21c569 View commit details
    Browse the repository at this point in the history
  3. docker: add what's needed for static seccomp analysis

    Add the rust-src component and the $(uname -m)-unknown-linux-musl
    targets for the nightly toolchain, and install python3-seccomp and
    rustfilt. Since the python bindings for libseccomp are not published to
    pip, we have to install it into the global python installation via
    apt-get, and then copy into our venv.
    
    Signed-off-by: Patrick Roy <[email protected]>
    roypat committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    dbc886a View commit details
    Browse the repository at this point in the history
  4. tmp: make seccomp test work without new docker container

    this commit should be dropped once we have rebuilt the devctr.
    
    Signed-off-by: Patrick Roy <[email protected]>
    roypat committed Nov 25, 2024
    Configuration menu
    Copy the full SHA
    13f4cf4 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6a6f52c View commit details
    Browse the repository at this point in the history