forked from containers/bootc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: Add a new "build guidance" section
I originally was thinking these docs needed to live in downstream places but...it will be really helpful to us to have generic recommended guidance here. Signed-off-by: Colin Walters <[email protected]>
- Loading branch information
Showing
3 changed files
with
289 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# Generic guidance for building images | ||
|
||
The bootc project intends to be operating system and distribution independent as possible, | ||
similar to its related projects [podman](http://podman.io/) and [systemd](https://systemd.io/), | ||
etc. | ||
|
||
The recommendations for creating bootc-compatible images will in general need to | ||
be owned by the OS/distribution - in particular the ones who create the default | ||
bootc base image(s). However, some guidance is very generic to most Linux | ||
systems (and bootc only supports Linux). | ||
|
||
Let's however restate a base goal of this project: | ||
|
||
> The original Docker container model of using "layers" to model | ||
> applications has been extremely successful. This project | ||
> aims to apply the same technique for bootable host systems - using | ||
> standard OCI/Docker containers as a transport and delivery format | ||
> for base operating system updates. | ||
Every tool and technique for creating application base images | ||
should apply to the host Linux OS as much as possible. | ||
|
||
## Installing software | ||
|
||
For package management tools like `apt`, `dnf`, `zypper` etc. | ||
(generically, `$pkgsystem`) it is very much expected that | ||
the pattern of | ||
|
||
`RUN $pkgsystem install somepackage && $pkgsystem clean all` | ||
|
||
type flow Just Works here - the same way as it does | ||
"application" container images. This pattern is really how | ||
Docker got started. | ||
|
||
There's not much special to this that doesn't also apply | ||
to application containers; but see below. | ||
|
||
## systemd units | ||
|
||
The model that is most popular with the Docker/OCI world | ||
is "microservice" style containers with the application as | ||
pid 1, isolating the applications from each other and | ||
from the host system - as opposed to "system containers" | ||
which run an init system like systemd, typically also | ||
SSH and often multiple logical "application" components | ||
as part of the same container. | ||
|
||
The bootc project generally expects systemd as pid 1, | ||
and if you embed software in your derived image, the | ||
default would then be that that software is initially | ||
launched via a systemd unit. | ||
|
||
``` | ||
RUN dnf -y install postgresql | ||
``` | ||
|
||
Would typically also carry a systemd unit, and that | ||
service will be launched the same way as it would | ||
on a package-based system. | ||
|
||
## Users and groups | ||
|
||
Note that the above `postgresql` today will allocate a user; | ||
this leads to the topic of [users, groups and SSH keys](users-and-groups.md). | ||
|
||
## Configuration | ||
|
||
A key aspect of choosing a bootc-based operating system model | ||
is that *code* and *configuration* can be strictly "lifecycle bound" | ||
together in exactly the same way. | ||
|
||
(Today, that's by including the configuration into the base | ||
container image; however a future enhancement for bootc | ||
will also support dynamically-injected ConfigMaps, similar | ||
to kubelet) | ||
|
||
You can add configuration files to the same places they're | ||
expected by typical package systems on Debian/Fedora/Arch | ||
etc. and others - in `/usr` (preferred where possible) | ||
or `/etc`. systemd has long advocated and supported | ||
a model where `/usr` (e.g. `/usr/lib/systemd/system`) | ||
contains content owned by the operating system image. | ||
|
||
`/etc` is machine-local state. However, per [filesystem.md](../filesystem.md) | ||
it's important to note that the underlying OSTree | ||
system performs a 3-way merge of `/etc`, so changes you | ||
make in the container image to e.g. `/etc/postgresql.conf` | ||
will be applied on update, assuming it is not modified | ||
locally. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,194 @@ | ||
|
||
# Users and groups | ||
|
||
This is one of the more complex topics. Generally speaking, bootc has nothing to | ||
do directly with configuring users or groups; it is a generic OS | ||
update/configuration mechanism. (There is currently just one small exception in | ||
that `bootc install` has a special case `--root-ssh-authorized-keys` argument, | ||
but it's very much optional). | ||
|
||
## Generic base images | ||
|
||
Commonly OS/distribution base images will be generic, i.e. | ||
without any configuration. It is *very strongly recommended* | ||
to avoid hardcoded passwords and ssh keys with publicly-available | ||
private keys (as Vagrant does) in generic images. | ||
|
||
### Injecting SSH keys via systemd credentials | ||
|
||
The systemd project has documentation for [credentials](https://systemd.io/CREDENTIALS/) | ||
which can be used in some environments to inject a root | ||
password or SSH authorized_keys. For many cases, this | ||
is a best practice. | ||
|
||
At the time of this writing this relies on SMBIOS which | ||
is mainly configurable in local virtualization environments. | ||
(qemu). | ||
|
||
### Injecting users and SSH keys via cloud-init, etc. | ||
|
||
Many IaaS and virtualization systems are oriented towards a "metadata server" | ||
(see e.g. [AWS instance metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html)) | ||
that are commonly processed by software such as [cloud-init](https://cloud-init.io/) | ||
or [Ignition](https://github.com/coreos/ignition) or equivalent. | ||
|
||
The base image you're using may include such software, or you | ||
can install it in your own derived images. | ||
|
||
In this model, SSH configuration is managed outside of the bootable | ||
image. See e.g. [GCP oslogin](https://cloud.google.com/compute/docs/oslogin/) | ||
for an example of this where operating system identities are linked | ||
to the underlying Google accounts. | ||
|
||
### Adding users and credentials via custom logic (container or unit) | ||
|
||
Of course, systems like `cloud-init` are not privileged; you | ||
can inject any logic you want to manage credentials via | ||
e.g. a systemd unit (which may launch a container image) | ||
that manages things however you prefer. Commonly, | ||
this would be a custom network-hosted source. For example, | ||
[FreeIPA](https://www.freeipa.org/page/Main_Page). | ||
|
||
Another example in a Kubernetes-oriented infrastructure would | ||
be a container image that fetches desired authentication | ||
credentials from a [CRD](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/) | ||
hosted in the API server. (To do things like this | ||
it's suggested to reuse the kubelet credentials) | ||
|
||
### Adding users and credentials statically in the container build | ||
|
||
Relative to package-oriented systems, a new ability is to inject | ||
users and credentials as part of a derived build: | ||
|
||
```dockerfile | ||
RUN useradd someuser | ||
``` | ||
|
||
However, it is important to understand some issues with the default | ||
`shadow-utils` implementation of `useradd`: | ||
|
||
First, typically user/group IDs are allocated dynamically, and this can result in "drift" (see below). | ||
|
||
#### User and group home directories and `/var` | ||
|
||
For systems configured with persistent `/home` → `/var/home`, any changes to `/var` made | ||
in the container image after initial installation *will not be applied on subsequent updates*. If for example you inject `/var/home/someuser/.ssh/authorized_keys` | ||
into a container build, existing systems will *not* get the updated authorized keys file. | ||
|
||
#### Using DynamicUser=yes for systemd units | ||
|
||
For "system" users it's strongly recommended to use systemd [DynamicUser=yes](https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#DynamicUser=) where | ||
possible. | ||
|
||
This is significantly better than the pattern of allocating users/groups | ||
at "package install time" (e.g. [Fedora package user/group guidelines](https://docs.fedoraproject.org/en-US/packaging-guidelines/UsersAndGroups/)) because | ||
it avoids potential UID/GID drift (see below). | ||
|
||
#### Using systemd-sysusers | ||
|
||
See [systemd-sysusers](https://www.freedesktop.org/software/systemd/man/latest/systemd-sysusers.html). For example in your derived build: | ||
|
||
``` | ||
COPY mycustom-user.conf /usr/lib/sysusers.d | ||
``` | ||
|
||
A key aspect of how this works is that `sysusers` will make changes | ||
to the traditional `/etc/passwd` file as necessary on boot. If | ||
`/etc` is persistent, this can avoid uid/gid drift (but | ||
in the general case it does mean that uid/gid allocation can | ||
depend on how a specific machine was upgraded over time). | ||
|
||
#### Using systemd JSON user records | ||
|
||
See [JSON user records](https://systemd.io/USER_RECORD/). Unlike `sysusers`, | ||
the canonical state for these live in `/usr` - if a subsequent | ||
image drops a user record, then it will also vanish | ||
from the system - unlike `sysusers.d`. | ||
|
||
#### nss-altfiles | ||
|
||
The [nss-altfiles](https://github.com/aperezdc/nss-altfiles) project | ||
(long) predates systemd JSON user records. It aims to help split | ||
"system" users into `/usr/lib/passwd` and `/usr/lib/group`. It's | ||
very important to understand that this aligns with the way | ||
the OSTree project handles the "3 way merge" for `/etc` as it | ||
relates to `/etc/passwd`. Currently, if the `/etc/passwd` file is | ||
modified in any way on the local system, then subsequent changes | ||
to `/etc/passwd` in the container image *will not be applied*. | ||
|
||
Some base images may have `nss-altfiles` enabled by default; | ||
this is currently the case for base images built by | ||
[rpm-ostree](https://github.com/coreos/rpm-ostree). | ||
|
||
Commonly, base images will have some "system" users pre-allocated | ||
and managed via this file again to avoid uid/gid drift. | ||
|
||
In a derived container build, you can also append users | ||
to `/usr/lib/passwd` for example. (At the time of this | ||
writing there is no command line to do so though). | ||
|
||
Typically it is more preferable to use `sysusers.d` | ||
or `DynamicUser=yes`. | ||
|
||
### Machine-local state for users | ||
|
||
At this point, it is important to understand the [filesystem](filesystem.md) | ||
layout - the default is up to the base image. | ||
|
||
The default Linux concept of a user has data stored in both `/etc` (`/etc/passwd`, `/etc/shadow` and groups) | ||
and `/home`. The choice for how these work is up to the base image, but | ||
a common default for generic base images is to have both be machine-local persistent state. | ||
In this model `/home` would be a symlink to `/var/home/someuser`. | ||
|
||
But it is also valid to default to having e.g. `/home` be a `tmpfs` | ||
to ensure user data is cleaned up across reboots (and this pairs particularly | ||
well with a transient `/etc` as well). | ||
|
||
#### Injecting users and SSH keys via at system provisioning time | ||
|
||
For base images where `/etc` and `/var` are configured to persist by default, it | ||
will then be generally supported to inject users via "installers" such | ||
as [Anaconda](https://github.com/rhinstaller/anaconda/) (interactively or | ||
via kickstart) or any others. | ||
|
||
Typically generic installers such as this are designed for "one time bootstrap" | ||
and again then the configuration becomes mutable machine-local state | ||
that can be changed "day 2" via some other mechanism. | ||
|
||
The simple case is a user with a password - typically the installer helps | ||
set the initial password, but to change it there is a different in-system | ||
tool (such as `passwd` or a GUI as part of [Cockpit](https://cockpit-project.org/), GNOME/KDE/etc). | ||
|
||
It is intended that these flows work equivalently in a bootc-compatible | ||
system, to support users directly installing "generic" base images, without | ||
requiring changes to the tools above. | ||
|
||
### UID/GID drift | ||
|
||
Ultimately the `/etc/passwd` and similar files are a mapping | ||
between names and numeric identifiers. A problem then becomes | ||
when this mapping is dynamic and mixed with "stateless" | ||
container image builds. | ||
|
||
For example today the CentOS Stream 9 `postgresql` package | ||
allocates a [static uid of `26`](https://gitlab.com/redhat/centos-stream/rpms/postgresql/-/blob/a03cf81d4b9a77d9150a78949269ae52a0027b54/postgresql.spec#L847). | ||
|
||
This means that | ||
``` | ||
RUN dnf -y install postgresql | ||
``` | ||
|
||
will always result in a change to `/etc/passwd` that allocates uid 26 | ||
and data in `/var/lib/postgres` will always be owned by that UID. | ||
|
||
However in contrast, the cockpit project allocates | ||
[a floating cockpit-ws user](https://gitlab.com/redhat/centos-stream/rpms/cockpit/-/blob/1909236ad28c7d93238b8b3b806ecf9c4feb7e46/cockpit.spec#L506). | ||
|
||
This means that each container image build (without additional work) | ||
may (due to RPM installation ordering or other reasons) result | ||
in the uid changing. | ||
|
||
This can be a problem if that user maintains persistent state. | ||
Such cases are best handled by being converted to use `sysusers.d` | ||
(see [Fedora change](https://fedoraproject.org/wiki/Changes/Adopting_sysusers.d_format)) - or again even better, using `DynamicUser=yes` (see above). | ||
|