Skip to content

Commit

Permalink
Add dedicated seccomp node reference
Browse files Browse the repository at this point in the history
Signed-off-by: Sascha Grunert <[email protected]>
  • Loading branch information
saschagrunert committed Sep 19, 2024
1 parent 56e2fb1 commit f5e1026
Show file tree
Hide file tree
Showing 6 changed files with 170 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,8 @@ profile to a more permissive profile.
{{</note>}}

To learn how to implement seccomp in Kubernetes, refer to
[Restrict a Container's Syscalls with seccomp](/docs/tutorials/security/seccomp/).
[Restrict a Container's Syscalls with seccomp](/docs/tutorials/security/seccomp/)
or the [Seccomp node reference](/docs/reference/node/seccomp/)

To learn more about seccomp, see
[Seccomp BPF](https://www.kernel.org/doc/html/latest/userspace-api/seccomp_filter.html)
Expand Down Expand Up @@ -288,3 +289,4 @@ of support that you need. For instructions, refer to
* [Learn how to use AppArmor](/docs/tutorials/security/apparmor/)
* [Learn how to use seccomp](/docs/tutorials/security/seccomp/)
* [Learn how to use SELinux](/docs/tasks/configure-pod-container/security-context/#assign-selinux-labels-to-a-container)
* [Seccomp Node Reference](/docs/reference/node/seccomp/)
2 changes: 2 additions & 0 deletions content/en/docs/reference/node/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ This section contains the following reference topics about nodes:

* [Node `.status` information](/docs/reference/node/node-status/)

* [Seccomp information](/docs/reference/node/seccomp/)

You can also read node reference details from elsewhere in the
Kubernetes documentation, including:

Expand Down
119 changes: 119 additions & 0 deletions content/en/docs/reference/node/seccomp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
content_type: reference
title: Seccomp and Kubernetes
weight: 80
---

<!-- overview -->

Seccomp stands for secure computing mode and has been a feature of the Linux
kernel since version 2.6.12. It can be used to sandbox the privileges of a
process, restricting the calls it is able to make from userspace into the
kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a
{{< glossary_tooltip text="node" term_id="node" >}} to your Pods and containers.

## Seccomp fields

{{< feature-state for_k8s_version="v1.19" state="stable" >}}

There are four ways to specify a seccomp profile for a
{{< glossary_tooltip text="pod" term_id="pod" >}}:

- for the whole Pod using [`spec.securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context)
- for a single container using [`spec.containers[*].securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1)
- for an (restartable / sidecar) init container using [`spec.initContainers[*].securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1)
- for an [ephermal container](/docs/concepts/workloads/pods/ephemeral-containers) using [`spec.ephemeralContainers[*].securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-2)

{{% code_sample file="pods/security/seccomp/fields.yaml" %}}

The Pod in the example above runs as `Unconfined`, while the
`ephemeral-container` and `init-container` specifically defines
`RuntimeDefault`. If the ephemeral or init container would not have set the
`securityContext.seccompProfile` field explicitly, then the value would be
inherited by the Pod. The same applies to the container, which runs a
`Localhost` profile `my-profile.json`.

Generally speaking, fields from (ephemeral) containers have a higher priority
than the Pod level value, while containers which do not set the seccomp field
are being inherited by the Pod.

{{< note >}}
It is not possible to apply a seccomp profile to a Pod or container running with
`privileged: true` set in the container's `securityContext`. Privileged
containers always run as `Unconfined`.
{{< /note >}}

The following values are possible for the `seccompProfile.type`:

`Unconfined`
: The workload runs without any seccomp restrictions.

`RuntimeDefault`
: A default seccomp profile defined by the {{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} is applied.

`Localhost`
: The `localhostProfile` will be applied, which has to be available on the node disk (on Linux it's `/var/lib/kubelet/seccomp`).

### `Localhost` profiles

Seccomp profiles are JSON files following the scheme defined by the
[OCI runtime specification](https://github.com/opencontainers/runtime-spec/blob/f329913/config-linux.md#seccomp).
A profile basically defines actions based on matched syscalls, but also allows
to pass specific values as arguments to syscalls. For example:

{{% code_sample file="pods/security/seccomp/profile.json" %}}

The `defaultAction` in the profile above is defined as `SCMP_ACT_ERRNO` and
should return on every not matching syscall of `syscalls.names`. The error is
defined as code `38` via the `defaultErrnoRet` field.

The following actions are generally possible:

`SCMP_ACT_ERRNO`
: Return the specified error code.

`SCMP_ACT_ALLOW`
: Allow the syscall to be executed.

`SCMP_ACT_KILL_PROCESS`
: Kill the process.

`SCMP_ACT_KILL_THREAD` and `SCMP_ACT_KILL`
: Kill only the thread.

`SCMP_ACT_TRAP`
: Throw a `SIGSYS` signal.

`SCMP_ACT_NOTIFY` and `SECCOMP_RET_USER_NOTIF`.
: Notify the user space.

`SCMP_ACT_TRACE`
: Notify a tracing process with the specified value.

`SCMP_ACT_LOG`
: Allow the syscall to be executed after the action has been logged to syslog or auditd.

Some actions like `SCMP_ACT_NOTIFY` or `SECCOMP_RET_USER_NOTIF` may be not
supported depending on the container runtime, OCI runtime or Linux kernel
version being used. There may be also further limitations, for example that
`SCMP_ACT_NOTIFY` cannot be used as `defaultAction` or for certain syscalls like
`write`. All those limitations are defined by either the OCI runtime
([runc](https://github.com/opencontainers/runc),
[crun](https://github.com/containers/crun)) or
[libseccomp](https://github.com/seccomp/libseccomp).

The `syscalls` JSON array contains a list of objects referencing syscalls by
their respective `names`. In the above example the list of syscalls to be
allowed is using the action `SCMP_ACT_ALLOW`. It would also be possible to
define another list using the action `SCMP_ACT_ERRNO` but a different return
(`errnoRet`) value.

It is also possible to specify the arguments (`args`) passed to certain
syscalls. More information about those advanced use cases can be found in the
[OCI runtime spec](https://github.com/opencontainers/runtime-spec/blob/f329913/config-linux.md#seccomp)
and the [Seccomp Linux kernel documentation](https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt).

## Further reading

- [Restrict a Container's Syscalls with seccomp](/docs/tutorials/security/seccomp/)
- [Pod Security Standards](/docs/concepts/security/pod-security-standards/)
Original file line number Diff line number Diff line change
Expand Up @@ -275,3 +275,4 @@ page for more on how to report vulnerabilities.
## What's next

- [Security Checklist](/docs/concepts/security/security-checklist/) for additional information on Kubernetes security guidance.
- [Seccomp Node Reference](/docs/reference/node/seccomp/)
27 changes: 27 additions & 0 deletions content/en/examples/pods/security/seccomp/fields.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
apiVersion: v1
kind: Pod
metadata:
name: pod
spec:
securityContext:
seccompProfile:
type: Unconfined
ephemeralContainers:
- name: ephemeral-container
image: debian
securityContext:
seccompProfile:
type: RuntimeDefault
initContainers:
- name: init-container
image: debian
securityContext:
seccompProfile:
type: RuntimeDefault
containers:
- name: container
image: docker.io/library/debian:stable
securityContext:
seccompProfile:
type: Localhost
localhostProfile: my-profile.json
18 changes: 18 additions & 0 deletions content/en/examples/pods/security/seccomp/profile.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
{
"defaultAction": "SCMP_ACT_ERRNO",
"defaultErrnoRet": 38,
"syscalls": [
{
"names": [
"adjtimex",
"alarm",
"bind",
"waitid",
"waitpid",
"write",
"writev"
],
"action": "SCMP_ACT_ALLOW"
}
]
}

0 comments on commit f5e1026

Please sign in to comment.