diff --git a/content/en/docs/concepts/security/linux-kernel-security-constraints.md b/content/en/docs/concepts/security/linux-kernel-security-constraints.md index 9b494a5748994..884a832aee32c 100644 --- a/content/en/docs/concepts/security/linux-kernel-security-constraints.md +++ b/content/en/docs/concepts/security/linux-kernel-security-constraints.md @@ -90,7 +90,8 @@ profile to a more permissive profile. {{}} To learn how to implement seccomp in Kubernetes, refer to -[Restrict a Container's Syscalls with seccomp](/docs/tutorials/security/seccomp/). +[Restrict a Container's Syscalls with seccomp](/docs/tutorials/security/seccomp/) +or the [Seccomp node reference](/docs/reference/node/seccomp/) To learn more about seccomp, see [Seccomp BPF](https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html) @@ -288,3 +289,4 @@ of support that you need. For instructions, refer to * [Learn how to use AppArmor](/docs/tutorials/security/apparmor/) * [Learn how to use seccomp](/docs/tutorials/security/seccomp/) * [Learn how to use SELinux](/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container) +* [Seccomp Node Reference](/docs/reference/node/seccomp/) diff --git a/content/en/docs/reference/node/_index.md b/content/en/docs/reference/node/_index.md index 64a41edb61176..144ae78bf9ad0 100644 --- a/content/en/docs/reference/node/_index.md +++ b/content/en/docs/reference/node/_index.md @@ -15,6 +15,8 @@ This section contains the following reference topics about nodes: * [Node `.status` information](/docs/reference/node/node-status/) +* [Seccomp information](/docs/reference/node/seccomp/) + You can also read node reference details from elsewhere in the Kubernetes documentation, including: diff --git a/content/en/docs/reference/node/seccomp.md b/content/en/docs/reference/node/seccomp.md new file mode 100644 index 0000000000000..f663c4b29e4d3 --- /dev/null +++ b/content/en/docs/reference/node/seccomp.md @@ -0,0 +1,132 @@ +--- +content_type: reference +title: Seccomp and Kubernetes +weight: 80 +--- + + + +Seccomp stands for secure computing mode and has been a feature of the Linux +kernel since version 2.6.12. It can be used to sandbox the privileges of a +process, restricting the calls it is able to make from userspace into the +kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a +{{< glossary_tooltip text="node" term_id="node" >}} to your Pods and containers. + +## Seccomp fields + +{{< feature-state for_k8s_version="v1.19" state="stable" >}} + +There are four ways to specify a seccomp profile for a +{{< glossary_tooltip text="pod" term_id="pod" >}}: + +- for the whole Pod using [`spec.securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) +- for a single container using [`spec.containers[*].securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1) +- for an (restartable / sidecar) init container using [`spec.initContainers[*].securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1) +- for an [ephermal container](/docs/concepts/workloads/pods/ephemeral-containers) using [`spec.ephemeralContainers[*].securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-2) + +{{% code_sample file="pods/security/seccomp/fields.yaml" %}} + +The Pod in the example above runs as `Unconfined`, while the +`ephemeral-container` and `init-container` specifically defines +`RuntimeDefault`. If the ephemeral or init container would not have set the +`securityContext.seccompProfile` field explicitly, then the value would be +inherited by the Pod. The same applies to the container, which runs a +`Localhost` profile `my-profile.json`. + +Generally speaking, fields from (ephemeral) containers have a higher priority +than the Pod level value, while containers which do not set the seccomp field +are being inherited by the Pod. + +{{< note >}} +It is not possible to apply a seccomp profile to a Pod or container running with +`privileged: true` set in the container's `securityContext`. Privileged +containers always run as `Unconfined`. +{{< /note >}} + +The following values are possible for the `seccompProfile.type`: + +`Unconfined` +: The workload runs without any seccomp restrictions. + +`RuntimeDefault` +: A default seccomp profile defined by the +{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} +is applied. The default profiles aim to provide a strong set of security +defaults while preserving the functionality of the workload. It is possible that +the default profiles differ between container runtimes and their release +versions, for example when comparing those from +{{< glossary_tooltip text="CRI-O" term_id="cri-o" >}} and +{{< glossary_tooltip text="containerd" term_id="containerd" >}}. + +`Localhost` +: The `localhostProfile` will be applied, which has to be available on the node +disk (on Linux it's `/var/lib/kubelet/seccomp`). The availability of the seccomp +profile is verified by the +{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} +on container creation. If the profile does not exist, then the container +creation will fail with a `CreateContainerError`. + +### `Localhost` profiles + +Seccomp profiles are JSON files following the scheme defined by the +[OCI runtime specification](https://github.com/opencontainers/runtime-spec/blob/f329913/config-linux.md#seccomp). +A profile basically defines actions based on matched syscalls, but also allows +to pass specific values as arguments to syscalls. For example: + +{{% code_sample file="pods/security/seccomp/profile.json" %}} + +The `defaultAction` in the profile above is defined as `SCMP_ACT_ERRNO` and +should return on every not matching syscall of `syscalls.names`. The error is +defined as code `38` via the `defaultErrnoRet` field. + +The following actions are generally possible: + +`SCMP_ACT_ERRNO` +: Return the specified error code. + +`SCMP_ACT_ALLOW` +: Allow the syscall to be executed. + +`SCMP_ACT_KILL_PROCESS` +: Kill the process. + +`SCMP_ACT_KILL_THREAD` and `SCMP_ACT_KILL` +: Kill only the thread. + +`SCMP_ACT_TRAP` +: Throw a `SIGSYS` signal. + +`SCMP_ACT_NOTIFY` and `SECCOMP_RET_USER_NOTIF`. +: Notify the user space. + +`SCMP_ACT_TRACE` +: Notify a tracing process with the specified value. + +`SCMP_ACT_LOG` +: Allow the syscall to be executed after the action has been logged to syslog or +auditd. + +Some actions like `SCMP_ACT_NOTIFY` or `SECCOMP_RET_USER_NOTIF` may be not +supported depending on the container runtime, OCI runtime or Linux kernel +version being used. There may be also further limitations, for example that +`SCMP_ACT_NOTIFY` cannot be used as `defaultAction` or for certain syscalls like +`write`. All those limitations are defined by either the OCI runtime +([runc](https://github.com/opencontainers/runc), +[crun](https://github.com/containers/crun)) or +[libseccomp](https://github.com/seccomp/libseccomp). + +The `syscalls` JSON array contains a list of objects referencing syscalls by +their respective `names`. In the above example the list of syscalls to be +allowed is using the action `SCMP_ACT_ALLOW`. It would also be possible to +define another list using the action `SCMP_ACT_ERRNO` but a different return +(`errnoRet`) value. + +It is also possible to specify the arguments (`args`) passed to certain +syscalls. More information about those advanced use cases can be found in the +[OCI runtime spec](https://github.com/opencontainers/runtime-spec/blob/f329913/config-linux.md#seccomp) +and the [Seccomp Linux kernel documentation](https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt). + +## Further reading + +- [Restrict a Container's Syscalls with seccomp](/docs/tutorials/security/seccomp/) +- [Pod Security Standards](/docs/concepts/security/pod-security-standards/) diff --git a/content/en/docs/tasks/administer-cluster/securing-a-cluster.md b/content/en/docs/tasks/administer-cluster/securing-a-cluster.md index 2d319611a2a5e..7e286c3ebf30e 100644 --- a/content/en/docs/tasks/administer-cluster/securing-a-cluster.md +++ b/content/en/docs/tasks/administer-cluster/securing-a-cluster.md @@ -275,3 +275,4 @@ page for more on how to report vulnerabilities. ## What's next - [Security Checklist](/docs/concepts/security/security-checklist/) for additional information on Kubernetes security guidance. +- [Seccomp Node Reference](/docs/reference/node/seccomp/) diff --git a/content/en/examples/pods/security/seccomp/fields.yaml b/content/en/examples/pods/security/seccomp/fields.yaml new file mode 100644 index 0000000000000..6fb6b3fcd1a69 --- /dev/null +++ b/content/en/examples/pods/security/seccomp/fields.yaml @@ -0,0 +1,27 @@ +apiVersion: v1 +kind: Pod +metadata: + name: pod +spec: + securityContext: + seccompProfile: + type: Unconfined + ephemeralContainers: + - name: ephemeral-container + image: debian + securityContext: + seccompProfile: + type: RuntimeDefault + initContainers: + - name: init-container + image: debian + securityContext: + seccompProfile: + type: RuntimeDefault + containers: + - name: container + image: docker.io/library/debian:stable + securityContext: + seccompProfile: + type: Localhost + localhostProfile: my-profile.json diff --git a/content/en/examples/pods/security/seccomp/profile.json b/content/en/examples/pods/security/seccomp/profile.json new file mode 100644 index 0000000000000..8f2b56c4950d8 --- /dev/null +++ b/content/en/examples/pods/security/seccomp/profile.json @@ -0,0 +1,18 @@ +{ + "defaultAction": "SCMP_ACT_ERRNO", + "defaultErrnoRet": 38, + "syscalls": [ + { + "names": [ + "adjtimex", + "alarm", + "bind", + "waitid", + "waitpid", + "write", + "writev" + ], + "action": "SCMP_ACT_ALLOW" + } + ] +}