Skip to content

Commit

Permalink
docs: Add a new "build guidance" section
Browse files Browse the repository at this point in the history
I originally was thinking these docs needed to live in
downstream places but...it will be really helpful
to us to have generic recommended guidance here.

Signed-off-by: Colin Walters <[email protected]>
  • Loading branch information
cgwalters committed Mar 24, 2024
1 parent 60552ee commit 1789c6a
Show file tree
Hide file tree
Showing 3 changed files with 289 additions and 0 deletions.
5 changes: 5 additions & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@

- [Installation](installation.md)

# Building images

- [Building images](building/guidance.md)
- [Users, groups, SSH keys](building/users-and-groups.md)

# Using bootc

- [Upgrade and rollback](upgrades.md)
Expand Down
90 changes: 90 additions & 0 deletions docs/src/building/guidance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Generic guidance for building images

The bootc project intends to be operating system and distribution independent as possible,
similar to its related projects [podman](http://podman.io/) and [systemd](https://systemd.io/),
etc.

The recommendations for creating bootc-compatible images will in general need to
be owned by the OS/distribution - in particular the ones who create the default
bootc base image(s). However, some guidance is very generic to most Linux
systems (and bootc only supports Linux).

Let's however restate a base goal of this project:

> The original Docker container model of using "layers" to model
> applications has been extremely successful. This project
> aims to apply the same technique for bootable host systems - using
> standard OCI/Docker containers as a transport and delivery format
> for base operating system updates.
Every tool and technique for creating application base images
should apply to the host Linux OS as much as possible.

## Installing software

For package management tools like `apt`, `dnf`, `zypper` etc.
(generically, `$pkgsystem`) it is very much expected that
the pattern of

`RUN $pkgsystem install somepackage && $pkgsystem clean all`

type flow Just Works here - the same way as it does
"application" container images. This pattern is really how
Docker got started.

There's not much special to this that doesn't also apply
to application containers; but see below.

## systemd units

The model that is most popular with the Docker/OCI world
is "microservice" style containers with the application as
pid 1, isolating the applications from each other and
from the host system - as opposed to "system containers"
which run an init system like systemd, typically also
SSH and often multiple logical "application" components
as part of the same container.

The bootc project generally expects systemd as pid 1,
and if you embed software in your derived image, the
default would then be that that software is initially
launched via a systemd unit.

```
RUN dnf -y install postgresql
```

Would typically also carry a systemd unit, and that
service will be launched the same way as it would
on a package-based system.

## Users and groups

Note that the above `postgresql` today will allocate a user;
this leads to the topic of [users, groups and SSH keys](users-and-groups.md).

## Configuration

A key aspect of choosing a bootc-based operating system model
is that *code* and *configuration* can be strictly "lifecycle bound"
together in exactly the same way.

(Today, that's by including the configuration into the base
container image; however a future enhancement for bootc
will also support dynamically-injected ConfigMaps, similar
to kubelet)

You can add configuration files to the same places they're
expected by typical package systems on Debian/Fedora/Arch
etc. and others - in `/usr` (preferred where possible)
or `/etc`. systemd has long advocated and supported
a model where `/usr` (e.g. `/usr/lib/systemd/system`)
contains content owned by the operating system image.

`/etc` is machine-local state. However, per [filesystem.md](../filesystem.md)
it's important to note that the underlying OSTree
system performs a 3-way merge of `/etc`, so changes you
make in the container image to e.g. `/etc/postgresql.conf`
will be applied on update, assuming it is not modified
locally.

194 changes: 194 additions & 0 deletions docs/src/building/users-and-groups.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@

# Users and groups

This is one of the more complex topics. Generally speaking, bootc has nothing to
do directly with configuring users or groups; it is a generic OS
update/configuration mechanism. (There is currently just one small exception in
that `bootc install` has a special case `--root-ssh-authorized-keys` argument,
but it's very much optional).

## Generic base images

Commonly OS/distribution base images will be generic, i.e.
without any configuration. It is *very strongly recommended*
to avoid hardcoded passwords and ssh keys with publicly-available
private keys (as Vagrant does) in generic images.

### Injecting SSH keys via systemd credentials

The systemd project has documentation for [credentials](https://systemd.io/CREDENTIALS/)
which can be used in some environments to inject a root
password or SSH authorized_keys. For many cases, this
is a best practice.

At the time of this writing this relies on SMBIOS which
is mainly configurable in local virtualization environments.
(qemu).

### Injecting users and SSH keys via cloud-init, etc.

Many IaaS and virtualization systems are oriented towards a "metadata server"
(see e.g. [AWS instance metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html))
that are commonly processed by software such as [cloud-init](https://cloud-init.io/)
or [Ignition](https://github.com/coreos/ignition) or equivalent.

The base image you're using may include such software, or you
can install it in your own derived images.

In this model, SSH configuration is managed outside of the bootable
image. See e.g. [GCP oslogin](https://cloud.google.com/compute/docs/oslogin/)
for an example of this where operating system identities are linked
to the underlying Google accounts.

### Adding users and credentials via custom logic (container or unit)

Of course, systems like `cloud-init` are not privileged; you
can inject any logic you want to manage credentials via
e.g. a systemd unit (which may launch a container image)
that manages things however you prefer. Commonly,
this would be a custom network-hosted source. For example,
[FreeIPA](https://www.freeipa.org/page/Main_Page).

Another example in a Kubernetes-oriented infrastructure would
be a container image that fetches desired authentication
credentials from a [CRD](https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/)
hosted in the API server. (To do things like this
it's suggested to reuse the kubelet credentials)

### Adding users and credentials statically in the container build

Relative to package-oriented systems, a new ability is to inject
users and credentials as part of a derived build:

```dockerfile
RUN useradd someuser
```

However, it is important to understand some issues with the default
`shadow-utils` implementation of `useradd`:

First, typically user/group IDs are allocated dynamically, and this can result in "drift" (see below).

#### User and group home directories and `/var`

For systems configured with persistent `/home``/var/home`, any changes to `/var` made
in the container image after initial installation *will not be applied on subsequent updates*. If for example you inject `/var/home/someuser/.ssh/authorized_keys`
into a container build, existing systems will *not* get the updated authorized keys file.

#### Using DynamicUser=yes for systemd units

For "system" users it's strongly recommended to use systemd [DynamicUser=yes](https://www.freedesktop.org/software/systemd/man/latest/systemd.exec.html#DynamicUser=) where
possible.

This is significantly better than the pattern of allocating users/groups
at "package install time" (e.g. [Fedora package user/group guidelines](https://docs.fedoraproject.org/en-US/packaging-guidelines/UsersAndGroups/)) because
it avoids potential UID/GID drift (see below).

#### Using systemd-sysusers

See [systemd-sysusers](https://www.freedesktop.org/software/systemd/man/latest/systemd-sysusers.html). For example in your derived build:

```
COPY mycustom-user.conf /usr/lib/sysusers.d
```

A key aspect of how this works is that `sysusers` will make changes
to the traditional `/etc/passwd` file as necessary on boot. If
`/etc` is persistent, this can avoid uid/gid drift (but
in the general case it does mean that uid/gid allocation can
depend on how a specific machine was upgraded over time).

#### Using systemd JSON user records

See [JSON user records](https://systemd.io/USER_RECORD/). Unlike `sysusers`,
the canonical state for these live in `/usr` - if a subsequent
image drops a user record, then it will also vanish
from the system - unlike `sysusers.d`.

#### nss-altfiles

The [nss-altfiles](https://github.com/aperezdc/nss-altfiles) project
(long) predates systemd JSON user records. It aims to help split
"system" users into `/usr/lib/passwd` and `/usr/lib/group`. It's
very important to understand that this aligns with the way
the OSTree project handles the "3 way merge" for `/etc` as it
relates to `/etc/passwd`. Currently, if the `/etc/passwd` file is
modified in any way on the local system, then subsequent changes
to `/etc/passwd` in the container image *will not be applied*.

Some base images may have `nss-altfiles` enabled by default;
this is currently the case for base images built by
[rpm-ostree](https://github.com/coreos/rpm-ostree).

Commonly, base images will have some "system" users pre-allocated
and managed via this file again to avoid uid/gid drift.

In a derived container build, you can also append users
to `/usr/lib/passwd` for example. (At the time of this
writing there is no command line to do so though).

Typically it is more preferable to use `sysusers.d`
or `DynamicUser=yes`.

### Machine-local state for users

At this point, it is important to understand the [filesystem](filesystem.md)
layout - the default is up to the base image.

The default Linux concept of a user has data stored in both `/etc` (`/etc/passwd`, `/etc/shadow` and groups)
and `/home`. The choice for how these work is up to the base image, but
a common default for generic base images is to have both be machine-local persistent state.
In this model `/home` would be a symlink to `/var/home/someuser`.

But it is also valid to default to having e.g. `/home` be a `tmpfs`
to ensure user data is cleaned up across reboots (and this pairs particularly
well with a transient `/etc` as well).

#### Injecting users and SSH keys via at system provisioning time

For base images where `/etc` and `/var` are configured to persist by default, it
will then be generally supported to inject users via "installers" such
as [Anaconda](https://github.com/rhinstaller/anaconda/) (interactively or
via kickstart) or any others.

Typically generic installers such as this are designed for "one time bootstrap"
and again then the configuration becomes mutable machine-local state
that can be changed "day 2" via some other mechanism.

The simple case is a user with a password - typically the installer helps
set the initial password, but to change it there is a different in-system
tool (such as `passwd` or a GUI as part of [Cockpit](https://cockpit-project.org/), GNOME/KDE/etc).

It is intended that these flows work equivalently in a bootc-compatible
system, to support users directly installing "generic" base images, without
requiring changes to the tools above.

### UID/GID drift

Ultimately the `/etc/passwd` and similar files are a mapping
between names and numeric identifiers. A problem then becomes
when this mapping is dynamic and mixed with "stateless"
container image builds.

For example today the CentOS Stream 9 `postgresql` package
allocates a [static uid of `26`](https://gitlab.com/redhat/centos-stream/rpms/postgresql/-/blob/a03cf81d4b9a77d9150a78949269ae52a0027b54/postgresql.spec#L847).

This means that
```
RUN dnf -y install postgresql
```

will always result in a change to `/etc/passwd` that allocates uid 26
and data in `/var/lib/postgres` will always be owned by that UID.

However in contrast, the cockpit project allocates
[a floating cockpit-ws user](https://gitlab.com/redhat/centos-stream/rpms/cockpit/-/blob/1909236ad28c7d93238b8b3b806ecf9c4feb7e46/cockpit.spec#L506).

This means that each container image build (without additional work)
may (due to RPM installation ordering or other reasons) result
in the uid changing.

This can be a problem if that user maintains persistent state.
Such cases are best handled by being converted to use `sysusers.d`
(see [Fedora change](https://fedoraproject.org/wiki/Changes/Adopting_sysusers.d_format)) - or again even better, using `DynamicUser=yes` (see above).

0 comments on commit 1789c6a

Please sign in to comment.