-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seccomp: force -ENOSYS as new default action #2737
Conversation
I'm not sure if this works in all scenarios because I haven't tested this thoroughly yet, but I'm pushing this so folks can comment. /cc @richfelker @jethrob @fweimer Does this seem reasonable-ish to you? This is more complicated than I'd like but it's a result of several legacy aspects of runc's seccomp code (one of which is #2735 -- and is probably a bug). |
@cyphar: I still don't entirely understand the need for inverse rules rather than a well-defined fallback order, but if the latter is hard, I guess they work. At least the patch does not look too large/invasive and the high-level explanation of what it's doing seems to be reasonable and in alignment with what I expected. |
@richfelker As far as I understand, libseccomp doesn't have per-syscall fallback order (or at least it's not guaranteed to do so) -- if you add an unconditional Once seccomp/libseccomp#286 is resolved we can switch away from this workaround. |
It appears that libseccomp doesn't support a single rule referencing the same argument more than once (this is enforced by Just to make sure, I've tried adding a basic fallback rule (even with the I'm starting to get a creeping feeling that this might've been simpler to implement in BPF without any other workarounds. /cc @pcmoore Am I misunderstanding libseccomp, or are these real restrictions on filters created with libseccomp? |
Okay this latest version handles things pretty well. The main thing which we cannot handle is OR rules for arguments (because of the above libseccomp restriction) but this doesn't impact the default Docker rules and thus is not a major issue -- besides I would argue the current implementation is actually a bug (see #2735). All-in-all, I've tested this with a variety of cases and it does appear to work correctly. I'm marking this as ready for review -- PTAL @opencontainers/runc-maintainers. I'd also appreciate a review from @justincormack et al. |
Having -EPERM is the default was a fairly significant mistake from a future-proofing standpoint in that it makes any new syscall return a non-ignorable error (from glibc's point of view). We need to correct this now because faccessat2(2) is something glibc *critically* needs to have support for, but they're blocked on container runtimes because we return -EPERM unconditionally (leading to confusion in glibc). Unfortunately there are several issues which stop us from having a clean solution to this problem: 1. libseccomp has several limitations which require us to emulate behaviour we want: a. We cannot do logic based on syscall number, meaning we cannot specify a "largest known syscall number"; b. libseccomp doesn't know in which kernel version a syscall was added, and has no API for "minimum kernel version" so we cannot simply ask libseccomp to generate sane -ENOSYS rules for us. c. Additional seccomp rules for the same syscall are not treated as distinct rules -- if rules overlap, seccomp will merge them. This means we cannot add per-syscall -EPERM fallbacks; d. There is no inverse operation for SCMP_CMP_MASKED_EQ; e. libseccomp does not allow you to specify multiple rules for a single argument, making it impossible to invert OR rules for arguments. 2. The runtime-spec does not have any way of specifying: a. The errno for the default action; b. The minimum kernel version or "newest syscall at time of profile creation"; nor c. Which syscalls were intentionally excluded from the allow list (weird syscalls that are no longer used were excluded entirely, but Docker et al expect those syscalls to get EPERM not ENOSYS). 3. Certain syscalls should not return -ENOSYS (especially only for certain argument combinations) because this could also trigger glibc confusion. This means we have to return -EPERM for certain syscalls but not as a global default. 4. There is not an obvious (and reasonable) upper limit to syscall numbers, so we cannot create a set of rules for each syscall above the largest syscall number in libseccomp. This means we must handle inverse rules as described below. 5. Any syscall can be specified multiple times, which can make generation of hotfix rules much harder. As a result, we have to work around all of these things by coming up with a heuristic to stop the bleeding. In the future we could hopefully improve the situation in the runtime-spec and libseccomp. The solution is to switch the default action errno to -ENOSYS, and simply pick a Linux kernel version which is treated as the universal minimum version (Linux 3.0 was picked since basically no widely-used distribution ships an older kernel, and it contains enough common syscalls which would cause confusion if -ENOSYS was returned). We then generate -EPERM rules for each syscall which: 1. Existed in Linux 3.0 (read: has a syscall number smaller than "setns" on that platform) and does not have any explicit rules in the configuration. These syscalls get an unconditional -EPERM, because they are assumed to have been excluded intentionally. 2. Have an explicit *conditional* rule. These syscalls get a set of -EPERM rules which invert the logic of the conditional, to ensure that syscalls which are only partially permitted do not return -ENOSYS. This is trivial for all operations *except* SCMP_CMP_MASKED_EQ, which has no inverse operatin. For MASKED_EQ we generate a series of MASKED_EQ rules except for the value which was checked in the original rule. There are additional complications, which are all treated as simply "too hard" and will thus fall back to -ENOSYS: a. Any syscall specified in more than one seccomp rule is treated as "too hard to handle" and will be ignored. This is because they are logical OR rules but things become far too complicated if the action is non-trivial. b. Syscalls which have OR rules (the same argument specified multiple times) because it would require AND support in libseccomp, which doesn't exist. c. Syscalls which do not have allow rules (ACT_LOG or ACT_ALLOW), because other actions have different logical meanings to allow rules and it'd be too complicated to handle those as well for a hotfix. None of the above happens if the default action is not SCMP_RET_ERRNO. Signed-off-by: Aleksa Sarai <[email protected]>
I have a test program I've been using, but I'm not sure what's the nicest way to convert it to an integration test: #define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <assert.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/syscall.h>
#include <linux/openat2.h>
int openat2(int dfd, const char *path, struct open_how *how, size_t usize)
{
int ret = syscall(SYS_openat2, dfd, path, how, usize);
if (ret < 0)
ret = -errno;
return ret;
}
int pidfd_open(pid_t pid, unsigned int flags)
{
int ret = syscall(SYS_pidfd_open, pid, flags);
if (ret < 0)
ret = -errno;
return ret;
}
int _dup3(int oldfd, int newfd, int flags)
{
int ret = dup3(oldfd, newfd, flags);
if (ret < 0)
ret = -errno;
return ret;
}
int fake_syscall(void)
{
int ret = syscall(1000);
if (ret < 0)
ret = -errno;
return ret;
}
#define debug_assert(lval, rval) \
do { \
int L = (lval), R = (rval); \
if (L != R) { \
printf("debug_assert(%s == %s) failed: %d != %d\n", #lval, #rval, L, R); \
} \
} while (0)
int main(void)
{
// Basic rule.
// Multiple arguments with OR rules.
debug_assert(openat2(100, NULL, NULL, 0), -EINVAL);
debug_assert(openat2(9001, NULL, NULL, 0), -EINVAL);
debug_assert(openat2(0, NULL, NULL, 0), -ENOSYS); // :(
debug_assert(openat2(0, NULL, NULL, 0), -ENOSYS); // :(
// Multiple arguments with AND rules.
debug_assert(pidfd_open(1, 1337), -EINVAL);
debug_assert(pidfd_open(1, 0), -EPERM);
debug_assert(pidfd_open(500, 1337), -EPERM);
debug_assert(pidfd_open(500, 500), -EPERM);
// Multiple rules for the same syscall.
debug_assert(_dup3(0, -100, 0xFFFF), -ENOSYS);
debug_assert(_dup3(1, -100, 0xFFFF), -EINVAL);
debug_assert(_dup3(2, -100, 0xFFFF), -ENOSYS);
debug_assert(_dup3(3, -100, 0xFFFF), -EINVAL);
// Out-of-bounds syscall.
debug_assert(fake_syscall(), -ENOSYS);
} With the seccomp section in config.json "seccomp": {
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": [
"SCMP_ARCH_X86",
"SCMP_ARCH_X32",
"SCMP_ARCH_X86_64"
],
"syscalls": [
{
"action": "SCMP_ACT_ALLOW",
"names": [
"accept",
"accept4",
"access",
"adjtimex",
"alarm",
"arch_prctl",
"bind",
"brk",
"capget",
"capset",
"chdir",
"chmod",
"chown",
"chown32",
"clock_adjtime",
"clock_adjtime64",
"clock_getres",
"clock_getres_time64",
"clock_gettime",
"clock_gettime64",
"clock_nanosleep",
"clock_nanosleep_time64",
"close",
"connect",
"copy_file_range",
"creat",
"dup",
"dup2",
"epoll_create",
"epoll_create1",
"epoll_ctl",
"epoll_ctl_old",
"epoll_pwait",
"epoll_wait",
"epoll_wait_old",
"eventfd",
"eventfd2",
"execve",
"execveat",
"exit",
"exit_group",
"faccessat",
"faccessat2",
"fadvise64",
"fadvise64_64",
"fallocate",
"fanotify_mark",
"fchdir",
"fchmod",
"fchmodat",
"fchown",
"fchown32",
"fchownat",
"fcntl",
"fcntl64",
"fdatasync",
"fgetxattr",
"flistxattr",
"flock",
"fork",
"fremovexattr",
"fsetxattr",
"fstat",
"fstat64",
"fstatat64",
"fstatfs",
"fstatfs64",
"fsync",
"ftruncate",
"ftruncate64",
"futex",
"futex_time64",
"futimesat",
"getcpu",
"getcwd",
"getdents",
"getdents64",
"getegid",
"getegid32",
"geteuid",
"geteuid32",
"getgid",
"getgid32",
"getgroups",
"getgroups32",
"getitimer",
"getpeername",
"getpgid",
"getpgrp",
"getpid",
"getppid",
"getpriority",
"getrandom",
"getresgid",
"getresgid32",
"getresuid",
"getresuid32",
"getrlimit",
"get_robust_list",
"getrusage",
"getsid",
"getsockname",
"getsockopt",
"get_thread_area",
"gettid",
"gettimeofday",
"getuid",
"getuid32",
"getxattr",
"inotify_add_watch",
"inotify_init",
"inotify_init1",
"inotify_rm_watch",
"io_cancel",
"ioctl",
"io_destroy",
"io_getevents",
"io_pgetevents",
"io_pgetevents_time64",
"ioprio_get",
"ioprio_set",
"io_setup",
"io_submit",
"io_uring_enter",
"io_uring_register",
"io_uring_setup",
"ipc",
"kill",
"lchown",
"lchown32",
"lgetxattr",
"link",
"linkat",
"listen",
"listxattr",
"llistxattr",
"_llseek",
"lremovexattr",
"lseek",
"lsetxattr",
"lstat",
"lstat64",
"madvise",
"membarrier",
"memfd_create",
"mincore",
"mkdir",
"mkdirat",
"mknod",
"mknodat",
"mlock",
"mlock2",
"mlockall",
"mmap",
"mmap2",
"modify_ldt",
"mprotect",
"mq_getsetattr",
"mq_notify",
"mq_open",
"mq_timedreceive",
"mq_timedreceive_time64",
"mq_timedsend",
"mq_timedsend_time64",
"mq_unlink",
"mremap",
"msgctl",
"msgget",
"msgrcv",
"msgsnd",
"msync",
"munlock",
"munlockall",
"munmap",
"nanosleep",
"newfstatat",
"_newselect",
"open",
"openat",
"pause",
"pidfd_send_signal",
"pipe",
"pipe2",
"poll",
"ppoll",
"ppoll_time64",
"prctl",
"pread64",
"preadv",
"preadv2",
"prlimit64",
"pselect6",
"pselect6_time64",
"pwrite64",
"pwritev",
"pwritev2",
"read",
"readahead",
"readlink",
"readlinkat",
"readv",
"recv",
"recvfrom",
"recvmmsg",
"recvmmsg_time64",
"recvmsg",
"remap_file_pages",
"removexattr",
"rename",
"renameat",
"renameat2",
"restart_syscall",
"rmdir",
"rseq",
"rt_sigaction",
"rt_sigpending",
"rt_sigprocmask",
"rt_sigqueueinfo",
"rt_sigreturn",
"rt_sigsuspend",
"rt_sigtimedwait",
"rt_sigtimedwait_time64",
"rt_tgsigqueueinfo",
"sched_getaffinity",
"sched_getattr",
"sched_getparam",
"sched_get_priority_max",
"sched_get_priority_min",
"sched_getscheduler",
"sched_rr_get_interval",
"sched_rr_get_interval_time64",
"sched_setaffinity",
"sched_setattr",
"sched_setparam",
"sched_setscheduler",
"sched_yield",
"seccomp",
"select",
"semctl",
"semget",
"semop",
"semtimedop",
"semtimedop_time64",
"send",
"sendfile",
"sendfile64",
"sendmmsg",
"sendmsg",
"sendto",
"setfsgid",
"setfsgid32",
"setfsuid",
"setfsuid32",
"setgid",
"setgid32",
"setgroups",
"setgroups32",
"setitimer",
"setpgid",
"setpriority",
"setregid",
"setregid32",
"setresgid",
"setresgid32",
"setresuid",
"setresuid32",
"setreuid",
"setreuid32",
"setrlimit",
"set_robust_list",
"setsid",
"setsockopt",
"set_thread_area",
"set_tid_address",
"setuid",
"setuid32",
"setxattr",
"shmat",
"shmctl",
"shmdt",
"shmget",
"shutdown",
"sigaltstack",
"signalfd",
"signalfd4",
"sigprocmask",
"sigreturn",
"socket",
"socketcall",
"socketpair",
"splice",
"stat",
"stat64",
"statfs",
"statfs64",
"statx",
"symlink",
"symlinkat",
"sync",
"sync_file_range",
"syncfs",
"sysinfo",
"tee",
"tgkill",
"time",
"timer_create",
"timer_delete",
"timer_getoverrun",
"timer_gettime",
"timer_gettime64",
"timer_settime",
"timer_settime64",
"timerfd_create",
"timerfd_gettime",
"timerfd_gettime64",
"timerfd_settime",
"timerfd_settime64",
"times",
"tkill",
"truncate",
"truncate64",
"ugetrlimit",
"umask",
"uname",
"unlink",
"unlinkat",
"utime",
"utimensat",
"utimensat_time64",
"utimes",
"vfork",
"vmsplice",
"wait4",
"waitid",
"waitpid",
"write",
"writev"
]
},
{
"action": "SCMP_ACT_ALLOW",
"names": [
"dup3"
],
"args": [
{
"index": 0,
"value": 1,
"op": "SCMP_CMP_EQ"
}
]
},
{
"action": "SCMP_ACT_ALLOW",
"names": [
"dup3"
],
"args": [
{
"index": 0,
"value": 2,
"op": "SCMP_CMP_GT"
}
]
},
{
"action": "SCMP_ACT_ALLOW",
"names": [
"pidfd_open"
],
"args": [
{
"index": 0,
"value": 1,
"op": "SCMP_CMP_EQ"
},
{
"index": 1,
"value": 1337,
"op": "SCMP_CMP_EQ"
}
]
},
{
"action": "SCMP_ACT_ALLOW",
"names": [
"openat2"
],
"args": [
{
"index": 0,
"value": 100,
"op": "SCMP_CMP_EQ"
},
{
"index": 0,
"value": 9001,
"op": "SCMP_CMP_EQ"
}
]
},
{
"action": "SCMP_ACT_ALLOW",
"names": [
"clone"
],
"args": [
{
"index": 0,
"value": 2114060288,
"op": "SCMP_CMP_MASKED_EQ"
}
]
}
]
} |
it seems to work well with the default Podman seccomp profile, that has some weird handling for |
I think we can just compile and run the C code from bats |
@giuseppe That example is actually ignored entirely by the workaround code -- you will get |
I hit my first |
I see. I guess it works because Differently, both |
|
I don't see how this could be a problem. Could you please elaborate? You can make a direct system call using |
@fweimer-rh The (potential) issue is that programs which used to get I don't think there is a nice solution for this outside of making |
@cyphar The
My mental model is that seccomp is essentially deconfiguring otherwise supported kernel interfaces, so returning (Note that glibc doesn't do anything like that with |
I appreciate this is a bit of a philosophical discussion, but my mental model is that we are acting like an ad-hoc MAC or LSM-like system which is explicitly restricting features. This isn't true for unknown syscalls, but it is definitely true for cases like |
My hope was that the
|
@pcmoore It seems that text is only available in |
Seems like the |
Hmm, now the test is failing on CentOS 7 with no warnings or any output other than |
Sometimes it's easy to miss things in the docs too; I've done it more times than I care to remember. The |
This seems totally the wrong place to do this, runc is (sigh) just doing what it is told and if you have to second guess its behaviour too thats even worse. There are only a few seccomp policies in the world, just fix them. |
I don't think this is a documentation problem. The underlying problem is that the DSL/libseccomp API for this is just utterly misdesigned in a way that's anti-intuitive, harder to use than, and less powerful than, raw BPF. Is there no way it can be fixed to be somewhat less bad while maintaining compatibility with important caller expectations? If not, can the necessary fixups just be done on the resulting BPF rather than via the deficient libseccomp API? |
That libseccomp library is just awful isn't it? As I understand it the libseccomp maintainers are open to accepting patches to improve the library; which is good, because they are clearly a bunch of idiots who don't know what they are doing. Alternatively you could simply write your own seccomp filters in raw BPF, that's clearly what a real programmer would do in this case. The libseccomp maintainers openly say that libseccomp is not for everyone or every use case; they say the goal is to lower the bar for the common seccomp use cases in hopes that more applications/developers will make use of seccomp. However, what they should really be saying is that libseccomp is always worse than writing raw BPF and only idiots should use it. |
openSUSE. The
So switch the default errno to The other (potentially simpler) solution is to temporarily implement this in BPF ourselves until libseccomp has the minimum kernel feature we need (there are some other features that would be nice like supporting multiple AND rules for the same argument, but they aren't mandatory). You also can't specify the return value in the default action, so we will need to change runc somehow. |
Leaving aside the way you phrased your point, I just tried my hand at this and it's at least somewhat non-trivial to get right. Because libseccomp doesn't have per-syscall fallback paths everything goes through a final All of this is far more complicated than just making the default So for now it may just be simplest to do the most basic After that I can work on cooking up our own BPF generation that handles these cases in the way we would like them to be handled. It is a bit sad we'll have to reinvent most of libseccomp's architecture handling, but we will still use the libseccomp syscall name lookup and architecture information to avoid having to handle all of that ourselves too... And we will need kernel version information anyway (which I will work on implementing in libseccomp). I do think long-term we could switch back to libseccomp but there are a few other limitations we would ideally like to have fixed in libseccomp (the multiple-rule-for-a-single-argument restriction would be great to remove -- separately from this whole issue). The slightly positive thing is that we can use the |
#2746 is a hotfix variant of this PR that only does |
Signed-off-by: Aleksa Sarai <[email protected]>
do you think it makes sense to update the runtime specs now to allow overriding the default action? |
@giuseppe I assume you meant the default errno for the default action? That's probably not a bad idea to do anyway, but that's not actually enough to solve this problem -- ideally the So we will also need to add either a "minimal kernel version" field to the spec or otherwise define the There are also several aspects of runc's seccomp implementation that aren't really in line with the runtime-spec -- such as #2735 (though the runtime-spec's definition of seccomp rules is woefully vague so you could argue that it's compliant purely because it's so ill-defined). We need to fix those too. |
yes, I was referring to the errno for the default action, but perhaps it is not necessary with the approach in #2746? Would exposing |
@giuseppe A better solution is something more like seccomp/libseccomp#11 -- you specify the minimum kernel version of the profile -- any syscalls newer than that kernel version get But I would prefer we don't touch the runtime-spec until libseccomp has the features we need to make a decision about what the right API should be. The current seccomp API was pretty much wholesale copied from the internal runc implementation, which is quite worrying because the runc implementation has weird footguns like #2735 and #1847. I don't want a(nother) runc hotfix to be codified in the spec. |
What is the impact of the Linux 3.0 cut-off point? I'm asking because modern glibc assumes at least kernel 3.2 (main differences are A complete different concern is who maintains the name-to-syscall mapping. @cyphar You said you wanted to get this from libseccomp. However, downstream, at least we don't treat libseccomp as a package that we need to keep updating continuously for each system call that is added to the upstream (!) kernel. Maybe this is something that we need to change downstream, and I will have to start a discussion about that. |
@fweimer-rh Well this is intended as a hotfix, if you think 3.2 is a better cutoff point (the fact it's the minimum version required for glibc is probably the best reason to pick it) we can use that instead. Especially if you think a Linux 3.0 cut-off will break glibc installations. But we don't plan to have this workaround stick around for a very long time -- which is why I've hacked up #2746 as an even simpler implementation of this PR so that we can have something merged. The longer-term goal is to get rid of the hard-coded hotfix and instead either:
I think the way we may implement this eventually is to have (1) be the preferred method but if no kernel version is specified we fall back to (2). The reason we need to do this on a kernel version level rather than syscall number level is that the syscall table can have holes in it, and thus any simplistic "just check if the syscall number is larger than X" checks can be wrong -- this happened recently with However this long-term solution requires features in libseccomp (or some other source) which don't currently exist -- see below.
We already use libseccomp for this purpose (for two reasons -- there is no real kernel API for it, and we only care about syscall names in the context of seccomp). If we change sources of this information that'd be fine so long as the mapping didn't change for old syscalls. I agree that this is less than ideal because it requires libseccomp to be updated fairly regularly, but right now these are the tools we have at our disposal (and in practice it hasn't been a huge problem -- the libseccomp maintainers are pretty reactive to new syscall additions). However it should also be noted that for libseccomp's usecase we also need to have the following information which probably won't exist in a hypothetical kernel API:
|
I've thought of a nicer interim solution to this problem -- inspired by @richfelker, except we prepend our But we still need a better long-term solution. |
#2750 is a cleaner (though still not ideal) solution to this problem. Closing in favour of that implementation. |
Having -EPERM is the default was a fairly significant mistake from a
future-proofing standpoint in that it makes any new syscall return a
non-ignorable error (from glibc's point of view). We need to correct
this now because faccessat2(2) is something glibc critically needs to
have support for, but they're blocked on container runtimes because we
return -EPERM unconditionally (leading to confusion in glibc).
Unfortunately there are several issues which stop us from having a clean
solution to this problem:
libseccomp has several limitations which require us to emulate
behaviour we want:
a. We cannot do logic based on syscall number, meaning we cannot
specify a "largest known syscall number";
b. libseccomp doesn't know in which kernel version a syscall was
added, and has no API for "minimum kernel version" so we cannot
simply ask libseccomp to generate sane -ENOSYS rules for us.
c. Additional seccomp rules for the same syscall are not treated as
distinct rules -- if rules overlap, seccomp will merge them. This
means we cannot add per-syscall -EPERM fallbacks;
d. There is no inverse operation for SCMP_CMP_MASKED_EQ;
e. libseccomp does not allow you to specify multiple rules for a
single argument, making it impossible to invert OR rules for
arguments.
The runtime-spec does not have any way of specifying:
a. The errno for the default action;
b. The minimum kernel version or "newest syscall at time of profile
creation"; nor
c. Which syscalls were intentionally excluded from the allow list
(weird syscalls that are no longer used were excluded entirely,
but Docker et al expect those syscalls to get EPERM not ENOSYS).
Certain syscalls should not return -ENOSYS (especially only for
certain argument combinations) because this could also trigger glibc
confusion. This means we have to return -EPERM for certain syscalls
but not as a global default.
There is not an obvious (and reasonable) upper limit to syscall
numbers, so we cannot create a set of rules for each syscall above
the largest syscall number in libseccomp. This means we must handle
inverse rules as described below.
Any syscall can be specified multiple times, which can make
generation of hotfix rules much harder.
As a result, we have to work around all of these things by coming up
with a heuristic to stop the bleeding. In the future we could hopefully
improve the situation in the runtime-spec and libseccomp.
The solution is to switch the default action errno to -ENOSYS, and
simply pick a Linux kernel version which is treated as the universal
minimum version (Linux 3.0 was picked since basically no widely-used
distribution ships an older kernel, and it contains enough common
syscalls which would cause confusion if -ENOSYS was returned).
We then generate -EPERM rules for each syscall which:
Existed in Linux 3.0 (read: has a syscall number smaller than
"setns" on that platform) and does not have any explicit rules in
the configuration. These syscalls get an unconditional -EPERM,
because they are assumed to have been excluded intentionally.
Have an explicit conditional rule. These syscalls get a set of
-EPERM rules which invert the logic of the conditional, to ensure
that syscalls which are only partially permitted do not return
-ENOSYS.
This is trivial for all operations except SCMP_CMP_MASKED_EQ,
which has no inverse operatin. For MASKED_EQ we generate a series of
MASKED_EQ rules except for the value which was checked in the
original rule.
There are additional complications, which are all treated as simply
"too hard" and will thus fall back to -ENOSYS:
a. Any syscall specified in more than one seccomp rule is treated as
"too hard to handle" and will be ignored. This is because they
are logical OR rules but things become far too complicated if the
action is non-trivial.
b. Syscalls which have OR rules (the same argument specified
multiple times) because it would require AND support in
libseccomp, which doesn't exist.
c. Syscalls which do not have allow rules (ACT_LOG or ACT_ALLOW),
because other actions have different logical meanings to allow
rules and it'd be too complicated to handle those as well for a
hotfix.
None of the above happens if the default action is not SCMP_RET_ERRNO.
Fixes #2151
Signed-off-by: Aleksa Sarai [email protected]