-
Notifications
You must be signed in to change notification settings - Fork 14.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add dedicated seccomp node reference
Signed-off-by: Sascha Grunert <[email protected]>
- Loading branch information
1 parent
56e2fb1
commit c8009a6
Showing
6 changed files
with
183 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
--- | ||
content_type: reference | ||
title: Seccomp and Kubernetes | ||
weight: 80 | ||
--- | ||
|
||
<!-- overview --> | ||
|
||
Seccomp stands for secure computing mode and has been a feature of the Linux | ||
kernel since version 2.6.12. It can be used to sandbox the privileges of a | ||
process, restricting the calls it is able to make from userspace into the | ||
kernel. Kubernetes lets you automatically apply seccomp profiles loaded onto a | ||
{{< glossary_tooltip text="node" term_id="node" >}} to your Pods and containers. | ||
|
||
## Seccomp fields | ||
|
||
{{< feature-state for_k8s_version="v1.19" state="stable" >}} | ||
|
||
There are four ways to specify a seccomp profile for a | ||
{{< glossary_tooltip text="pod" term_id="pod" >}}: | ||
|
||
- for the whole Pod using [`spec.securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) | ||
- for a single container using [`spec.containers[*].securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1) | ||
- for an (restartable / sidecar) init container using [`spec.initContainers[*].securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1) | ||
- for an [ephermal container](/docs/concepts/workloads/pods/ephemeral-containers) using [`spec.ephemeralContainers[*].securityContext.seccompProfile`](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-2) | ||
|
||
{{% code_sample file="pods/security/seccomp/fields.yaml" %}} | ||
|
||
The Pod in the example above runs as `Unconfined`, while the | ||
`ephemeral-container` and `init-container` specifically defines | ||
`RuntimeDefault`. If the ephemeral or init container would not have set the | ||
`securityContext.seccompProfile` field explicitly, then the value would be | ||
inherited by the Pod. The same applies to the container, which runs a | ||
`Localhost` profile `my-profile.json`. | ||
|
||
Generally speaking, fields from (ephemeral) containers have a higher priority | ||
than the Pod level value, while containers which do not set the seccomp field | ||
are being inherited by the Pod. | ||
|
||
{{< note >}} | ||
It is not possible to apply a seccomp profile to a Pod or container running with | ||
`privileged: true` set in the container's `securityContext`. Privileged | ||
containers always run as `Unconfined`. | ||
{{< /note >}} | ||
|
||
The following values are possible for the `seccompProfile.type`: | ||
|
||
`Unconfined` | ||
: The workload runs without any seccomp restrictions. | ||
|
||
`RuntimeDefault` | ||
: A default seccomp profile defined by the | ||
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} | ||
is applied. The default profiles aim to provide a strong set of security | ||
defaults while preserving the functionality of the workload. It is possible that | ||
the default profiles differ between container runtimes and their release | ||
versions, for example when comparing those from | ||
{{< glossary_tooltip text="CRI-O" term_id="cri-o" >}} and | ||
{{< glossary_tooltip text="containerd" term_id="containerd" >}}. | ||
|
||
`Localhost` | ||
: The `localhostProfile` will be applied, which has to be available on the node | ||
disk (on Linux it's `/var/lib/kubelet/seccomp`). The availability of the seccomp | ||
profile is verified by the | ||
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}} | ||
on container creation. If the profile does not exist, then the container | ||
creation will fail with a `CreateContainerError`. | ||
|
||
### `Localhost` profiles | ||
|
||
Seccomp profiles are JSON files following the scheme defined by the | ||
[OCI runtime specification](https://github.com/opencontainers/runtime-spec/blob/f329913/config-linux.md#seccomp). | ||
A profile basically defines actions based on matched syscalls, but also allows | ||
to pass specific values as arguments to syscalls. For example: | ||
|
||
{{% code_sample file="pods/security/seccomp/profile.json" %}} | ||
|
||
The `defaultAction` in the profile above is defined as `SCMP_ACT_ERRNO` and | ||
should return on every not matching syscall of `syscalls.names`. The error is | ||
defined as code `38` via the `defaultErrnoRet` field. | ||
|
||
The following actions are generally possible: | ||
|
||
`SCMP_ACT_ERRNO` | ||
: Return the specified error code. | ||
|
||
`SCMP_ACT_ALLOW` | ||
: Allow the syscall to be executed. | ||
|
||
`SCMP_ACT_KILL_PROCESS` | ||
: Kill the process. | ||
|
||
`SCMP_ACT_KILL_THREAD` and `SCMP_ACT_KILL` | ||
: Kill only the thread. | ||
|
||
`SCMP_ACT_TRAP` | ||
: Throw a `SIGSYS` signal. | ||
|
||
`SCMP_ACT_NOTIFY` and `SECCOMP_RET_USER_NOTIF`. | ||
: Notify the user space. | ||
|
||
`SCMP_ACT_TRACE` | ||
: Notify a tracing process with the specified value. | ||
|
||
`SCMP_ACT_LOG` | ||
: Allow the syscall to be executed after the action has been logged to syslog or | ||
auditd. | ||
|
||
Some actions like `SCMP_ACT_NOTIFY` or `SECCOMP_RET_USER_NOTIF` may be not | ||
supported depending on the container runtime, OCI runtime or Linux kernel | ||
version being used. There may be also further limitations, for example that | ||
`SCMP_ACT_NOTIFY` cannot be used as `defaultAction` or for certain syscalls like | ||
`write`. All those limitations are defined by either the OCI runtime | ||
([runc](https://github.com/opencontainers/runc), | ||
[crun](https://github.com/containers/crun)) or | ||
[libseccomp](https://github.com/seccomp/libseccomp). | ||
|
||
The `syscalls` JSON array contains a list of objects referencing syscalls by | ||
their respective `names`. In the above example the list of syscalls to be | ||
allowed is using the action `SCMP_ACT_ALLOW`. It would also be possible to | ||
define another list using the action `SCMP_ACT_ERRNO` but a different return | ||
(`errnoRet`) value. | ||
|
||
It is also possible to specify the arguments (`args`) passed to certain | ||
syscalls. More information about those advanced use cases can be found in the | ||
[OCI runtime spec](https://github.com/opencontainers/runtime-spec/blob/f329913/config-linux.md#seccomp) | ||
and the [Seccomp Linux kernel documentation](https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt). | ||
|
||
## Further reading | ||
|
||
- [Restrict a Container's Syscalls with seccomp](/docs/tutorials/security/seccomp/) | ||
- [Pod Security Standards](/docs/concepts/security/pod-security-standards/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: pod | ||
spec: | ||
securityContext: | ||
seccompProfile: | ||
type: Unconfined | ||
ephemeralContainers: | ||
- name: ephemeral-container | ||
image: debian | ||
securityContext: | ||
seccompProfile: | ||
type: RuntimeDefault | ||
initContainers: | ||
- name: init-container | ||
image: debian | ||
securityContext: | ||
seccompProfile: | ||
type: RuntimeDefault | ||
containers: | ||
- name: container | ||
image: docker.io/library/debian:stable | ||
securityContext: | ||
seccompProfile: | ||
type: Localhost | ||
localhostProfile: my-profile.json |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
{ | ||
"defaultAction": "SCMP_ACT_ERRNO", | ||
"defaultErrnoRet": 38, | ||
"syscalls": [ | ||
{ | ||
"names": [ | ||
"adjtimex", | ||
"alarm", | ||
"bind", | ||
"waitid", | ||
"waitpid", | ||
"write", | ||
"writev" | ||
], | ||
"action": "SCMP_ACT_ALLOW" | ||
} | ||
] | ||
} |