Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apparmor: build/run: net ns permission denied with pasta #5440

Closed
danishprakash opened this issue Mar 28, 2024 · 15 comments
Closed

apparmor: build/run: net ns permission denied with pasta #5440

danishprakash opened this issue Mar 28, 2024 · 15 comments

Comments

@danishprakash
Copy link
Contributor

Description
Buildah build fails with a permission denied error on the network namespace with pasta as the rootless networking backend. Folks have had success with modifying AppArmor rules (https://bugzilla.suse.com/show_bug.cgi?id=1221840#c3).

This problem arose with the switch to pasta as the default rootless networking mode. Upon switching to slirp4netns in containers.conf, everything works as expected.

Steps to reproduce the issue:

$ cat test.Dockerfile
FROM alpine
RUN apk add git

$ buildah build -f test.Dockerfile .
STEP 1/2: FROM alpine
STEP 2/2: RUN apk add git
Error: building at STEP "RUN apk add git": setup network: pasta failed with exit code 1:
Couldn't open network namespace /proc/3026/ns/net: Permission denied

$ buildah run $(buildah from registry.opensuse.org/opensuse/leap:15.5) /bin/bash
Error: setup network: pasta failed with exit code 1:
Couldn't open network namespace /proc/4596/ns/net: Permission denied

Describe the results you received:

Couldn't open network namespace /proc/1706/ns/net: Permission denied

Describe the results you expected:
Build to complete successfully as it does currently if I switch to slirp4netns.

Output of rpm -q buildah or apt list buildah:

$ rpm -q buildah passt
buildah-1.35.1-1.1.x86_64
passt-20240220.1e6f92b-1.1.x86_64

Output of buildah version:

Version:         1.35.1
Go Version:      go1.21.8
Image Spec:      1.1.0
Runtime Spec:    1.1.0
CNI Spec:        1.0.0
libcni Version:  v1.1.2
image Version:   5.30.0
Git Commit:      unknown
Built:           Tue Mar 19 15:53:06 2024
OS/Arch:         linux/amd64
BuildPlatform:   linux/amd64

Output of cat /etc/*release:

NAME="openSUSE Tumbleweed"
# VERSION="20221226"
ID="opensuse-tumbleweed"
ID_LIKE="opensuse suse"
VERSION_ID="20221226"
PRETTY_NAME="openSUSE Tumbleweed"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:tumbleweed:20221226"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Tumbleweed"
LOGO="distributor-logo-Tumbleweed"

Output of uname -a:

Linux localhost.localdomain 6.1.4-1-default #1 SMP PREEMPT_DYNAMIC Mon Jan  9 11:00:31 UTC 2023 (4b9b43c) x86_64 x86_64 x86_64 GNU/Linux

Output of cat /etc/containers/storage.conf:

# This file is the configuration file for all tools
# that use the containers/storage library. The storage.conf file
# overrides all other storage.conf files. Container engines using the
# container/storage library do not inherit fields from other storage.conf
# files.
#
#  Note: The storage.conf file overrides other storage.conf files based on this precedence:
#      /usr/containers/storage.conf
#      /etc/containers/storage.conf
#      $HOME/.config/containers/storage.conf
#      $XDG_CONFIG_HOME/containers/storage.conf (If XDG_CONFIG_HOME is set)
# See man 5 containers-storage.conf for more information
# The "container storage" table contains all of the server options.
[storage]

# Default Storage Driver, Must be set for proper operation.
driver = "overlay"

# Temporary storage location
runroot = "/run/containers/storage"

# Priority list for the storage drivers that will be tested one
# after the other to pick the storage driver if it is not defined.
# driver_priority = ["btrfs", "overlay"]

# Primary Read/Write location of container storage
# When changing the graphroot location on an SELINUX system, you must
# ensure  the labeling matches the default locations labels with the
# following commands:
# semanage fcontext -a -e /var/lib/containers/storage /NEWSTORAGEPATH
# restorecon -R -v /NEWSTORAGEPATH
graphroot = "/var/lib/containers/storage"


# Storage path for rootless users
#
# rootless_storage_path = "$HOME/.local/share/containers/storage"

# Transient store mode makes all container metadata be saved in temporary storage
# (i.e. runroot above). This is faster, but doesn't persist across reboots.
# transient_store = true

[storage.options]
# Storage options to be passed to underlying storage drivers

# AdditionalImageStores is used to pass paths to additional Read/Only image stores
# Must be comma separated list.
additionalimagestores = [
]

# Allows specification of how storage is populated when pulling images. This
# option can speed the pulling process of images compressed with format
# zstd:chunked. Containers/storage looks for files within images that are being
# pulled from a container registry that were previously pulled to the host.  It
# can copy or create a hard link to the existing file when it finds them,
# eliminating the need to pull them from the container registry. These options
# can deduplicate pulling of content, disk storage of content and can allow the
# kernel to use less memory when running containers.

# containers/storage supports four keys
#   * enable_partial_images="true" | "false"
#     Tells containers/storage to look for files previously pulled in storage
#     rather then always pulling them from the container registry.
#   * use_hard_links = "false" | "true"
#     Tells containers/storage to use hard links rather then create new files in
#     the image, if an identical file already existed in storage.
#   * ostree_repos = ""
#     Tells containers/storage where an ostree repository exists that might have
#     previously pulled content which can be used when attempting to avoid
#     pulling content from the container registry
pull_options = {enable_partial_images = "false", use_hard_links = "false", ostree_repos=""}

# Remap-UIDs/GIDs is the mapping from UIDs/GIDs as they should appear inside of
# a container, to the UIDs/GIDs as they should appear outside of the container,
# and the length of the range of UIDs/GIDs.  Additional mapped sets can be
# listed and will be heeded by libraries, but there are limits to the number of
# mappings which the kernel will allow when you later attempt to run a
# container.
#
# remap-uids = 0:1668442479:65536
# remap-gids = 0:1668442479:65536

# Remap-User/Group is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid or /etc/subgid file.  Mappings are set up starting
# with an in-container ID of 0 and then a host-level ID taken from the lowest
# range that matches the specified name, and using the length of that range.
# Additional ranges are then assigned, using the ranges which specify the
# lowest host-level IDs first, to the lowest not-yet-mapped in-container ID,
# until all of the entries have been used for maps.
#
# remap-user = "containers"
# remap-group = "containers"

# Root-auto-userns-user is a user name which can be used to look up one or more UID/GID
# ranges in the /etc/subuid and /etc/subgid file.  These ranges will be partitioned
# to containers configured to create automatically a user namespace.  Containers
# configured to automatically create a user namespace can still overlap with containers
# having an explicit mapping set.
# This setting is ignored when running as rootless.
# root-auto-userns-user = "storage"
#
# Auto-userns-min-size is the minimum size for a user namespace created automatically.
# auto-userns-min-size=1024
#
# Auto-userns-max-size is the minimum size for a user namespace created automatically.
# auto-userns-max-size=65536

[storage.options.overlay]
# ignore_chown_errors can be set to allow a non privileged user running with
# a single UID within a user namespace to run containers. The user can pull
# and use any image even those with multiple uids.  Note multiple UIDs will be
# squashed down to the default uid in the container.  These images will have no
# separation between the users in the container. Only supported for the overlay
# and vfs drivers.
#ignore_chown_errors = "false"

# Inodes is used to set a maximum inodes of the container image.
# inodes = ""

# Path to an helper program to use for mounting the file system instead of mounting it
# directly.
#mount_program = "/usr/bin/fuse-overlayfs"

# mountopt specifies comma separated list of extra mount options
mountopt = "nodev,metacopy=on"

# Set to skip a PRIVATE bind mount on the storage home directory.
# skip_mount_home = "false"

# Size is used to set a maximum size of the container image.
# size = ""

# ForceMask specifies the permissions mask that is used for new files and
# directories.
#
# The values "shared" and "private" are accepted.
# Octal permission masks are also accepted.
#
#  "": No value specified.
#     All files/directories, get set with the permissions identified within the
#     image.
#  "private": it is equivalent to 0700.
#     All files/directories get set with 0700 permissions.  The owner has rwx
#     access to the files. No other users on the system can access the files.
#     This setting could be used with networked based homedirs.
#  "shared": it is equivalent to 0755.
#     The owner has rwx access to the files and everyone else can read, access
#     and execute them. This setting is useful for sharing containers storage
#     with other users.  For instance have a storage owned by root but shared
#     to rootless users as an additional store.
#     NOTE:  All files within the image are made readable and executable by any
#     user on the system. Even /etc/shadow within your image is now readable by
#     any user.
#
#   OCTAL: Users can experiment with other OCTAL Permissions.
#
#  Note: The force_mask Flag is an experimental feature, it could change in the
#  future.  When "force_mask" is set the original permission mask is stored in
#  the "user.containers.override_stat" xattr and the "mount_program" option must
#  be specified. Mount programs like "/usr/bin/fuse-overlayfs" present the
#  extended attribute permissions to processes within containers rather then the
#  "force_mask"  permissions.
#
# force_mask = ""

[storage.options.thinpool]
# Storage Options for thinpool

# autoextend_percent determines the amount by which pool needs to be
# grown. This is specified in terms of % of pool size. So a value of 20 means
# that when threshold is hit, pool will be grown by 20% of existing
# pool size.
# autoextend_percent = "20"

# autoextend_threshold determines the pool extension threshold in terms
# of percentage of pool size. For example, if threshold is 60, that means when
# pool is 60% full, threshold has been hit.
# autoextend_threshold = "80"

# basesize specifies the size to use when creating the base device, which
# limits the size of images and containers.
# basesize = "10G"

# blocksize specifies a custom blocksize to use for the thin pool.
# blocksize="64k"

# directlvm_device specifies a custom block storage device to use for the
# thin pool. Required if you setup devicemapper.
# directlvm_device = ""

# directlvm_device_force wipes device even if device already has a filesystem.
# directlvm_device_force = "True"

# fs specifies the filesystem type to use for the base device.
# fs="xfs"

# log_level sets the log level of devicemapper.
# 0: LogLevelSuppress 0 (Default)
# 2: LogLevelFatal
# 3: LogLevelErr
# 4: LogLevelWarn
# 5: LogLevelNotice
# 6: LogLevelInfo
# 7: LogLevelDebug
# log_level = "7"

# min_free_space specifies the min free space percent in a thin pool require for
# new device creation to succeed. Valid values are from 0% - 99%.
# Value 0% disables
# min_free_space = "10%"

# mkfsarg specifies extra mkfs arguments to be used when creating the base
# device.
# mkfsarg = ""

# metadata_size is used to set the `pvcreate --metadatasize` options when
# creating thin devices. Default is 128k
# metadata_size = ""

# Size is used to set a maximum size of the container image.
# size = ""

# use_deferred_removal marks devicemapper block device for deferred removal.
# If the thinpool is in use when the driver attempts to remove it, the driver
# tells the kernel to remove it as soon as possible. Note this does not free
# up the disk space, use deferred deletion to fully remove the thinpool.
# use_deferred_removal = "True"

# use_deferred_deletion marks thinpool device for deferred deletion.
# If the device is busy when the driver attempts to delete it, the driver
# will attempt to delete device every 30 seconds until successful.
# If the program using the driver exits, the driver will continue attempting
# to cleanup the next time the driver is used. Deferred deletion permanently
# deletes the device and all data stored in device will be lost.
# use_deferred_deletion = "True"

# xfs_nospace_max_retries specifies the maximum number of retries XFS should
# attempt to complete IO when ENOSPC (no space) error is returned by
# underlying storage device.
# xfs_nospace_max_retries = "0"
@sbrivio-rh
Copy link

As far as I know, this is specific to openSUSE, see also https://bugzilla.suse.com/show_bug.cgi?id=1221840#c6 -- this part works with Debian because of https://salsa.debian.org/sbrivio/passt/-/commit/5bb812e79143670a57440cd8aa7f2979583c5a0a.

What's missing, also for Debian and Ubuntu, is something like @{run}/user/@{uid}/** r,, that pasta needs when started from Buildah or with a custom network. We can keep this ticket to track that part, unless there's something else I'm missing.

@dylangerdaly
Copy link

Any idea on fixing this? I can't update any of my podman containers?

@dylangerdaly
Copy link

Adding --network slirp4netns to build is a work around.

@sbrivio-rh
Copy link

@dylangerdaly if you're hitting this on openSUSE, see https://bugzilla.suse.com/show_bug.cgi?id=1221840#c6 -- I'm waiting on an answer to that.

If you're hitting a similar issue on another distribution, please describe it.

@rhatdan
Copy link
Member

rhatdan commented Mar 31, 2024

@Luap99 PTAL

@sbrivio-rh
Copy link

@rhatdan @Luap99 just to be clear, this isn't in any way an issue with Podman or Buildah themselves. Summary of my current understanding:

  • on openSUSE, AppArmor is blocking pasta from accessing /dev/net/tun and /proc/<PID>/ns/net, because the openSUSE package ships a separate profile for pasta, but it doesn't create a hard link from /usr/bin/pasta to /usr/bin/passt -- it's a soft link, so AppArmor can't attach pasta's profile to its binary. On Debian and Ubuntu, that's a hard link.

    Hence, pasta won't start at all with default AppArmor policies on openSUSE: it can't open the tap device, and it can't join the target network namespace.

    This should be fixed in the openSUSE package itself, see https://bugzilla.suse.com/show_bug.cgi?id=1221840#c6. I don't maintain the openSUSE package.

  • on Debian and Ubuntu (I maintain those packages), as well as on openSUSE, AppArmor denies access to /run/user/<UID>/containers/networks/rootless-netns, which pasta needs to access for a rootless custom network (i.e. Buildah, or podman network create, but not podman run).

    As a quick fix, I will change the corresponding regexp in the AppArmor abstraction we ship upstream, then prepare a new version of the Debian package, and notify the package maintainer for openSUSE. Eventually, we should have a separate Podman profile including pasta's abstraction plus whatever additional rule Podman and Buildah need, like we already do for libvirt (with passt(1)) on Debian.

@danishprakash
Copy link
Contributor Author

@sbrivio-rh; responded on bugzilla but I still seem to be running into the same issue unless of course, I'm missing something.

So, to be clear, you're suggesting a two-way fix that creates a hard link (instead of a sym) for pasta targeted to passt. Additionally, we also need to modify the AppArmor profiles for both usr.bin.pasta and usr.bin.passt? Am I understanding this correctly?

@sbrivio-rh
Copy link

@sbrivio-rh; responded on bugzilla but I still seem to be running into the same issue unless of course, I'm missing something.

So, to be clear, you're suggesting a two-way fix that creates a hard link (instead of a sym) for pasta targeted to passt.

That will be needed, yes.

Additionally, we also need to modify the AppArmor profiles for both usr.bin.pasta and usr.bin.passt? Am I understanding this correctly?

That too, but the one we ship upstream for usr.bin.pasta is anyway causing issues now, so let me fix that first.

@sbrivio-rh
Copy link

That too, but the one we ship upstream for usr.bin.pasta is anyway causing issues now, so let me fix that first.

This should do the trick.

@Luap99
Copy link
Member

Luap99 commented Apr 1, 2024

As mentioned on the podman issue containers/podman#22168 (comment) I do not think it is realistic to have the profile to limit it to certain paths. XDG_RUNTIME_DIR and so on are configurable to use different paths so assuming only defaults are ever used is wrong IMO. I don't think it is helpful to force every user who likes to use a different path to edit their profile but again I know nothing about apparmor and if that stuff is common or if their are proper solutions for stuff like this.

I can properly document the paths we will choose today but nothing changes the fact that the base paths are configurable by users and cannot assumed to be static.

I guess the same kind of issue applies to the pasta options --pcap, --log-file and --pid as it can take any path as arg.

@Luap99
Copy link
Member

Luap99 commented Apr 1, 2024

Also if pasta is so worried about allowing it to open all kind of paths maybe using landlock could be an option to add "dynamic" rules at runtime to block all paths besides the ones it got as arguments then it wouldn't need to rely only on selinux/apparmor to block these.

@Luap99 Luap99 changed the title build/run: net ns permission denied with pasta apparmor: build/run: net ns permission denied with pasta Apr 1, 2024
@sbrivio-rh
Copy link

As mentioned on the podman issue containers/podman#22168 (comment) I do not think it is realistic to have the profile to limit it to certain paths. XDG_RUNTIME_DIR and so on are configurable to use different paths so assuming only defaults are ever used is wrong IMO. I don't think it is helpful to force every user who likes to use a different path to edit their profile but again I know nothing about apparmor and if that stuff is common or if their are proper solutions for stuff like this.

Mind that AppArmor profiles are usually considered configuration matter by distributions. For example, on Debian, if you delete /etc/apparmor.d/usr.bin.passt, and reinstall the package, the profile won't be installed anymore, because it was recorded, as a configuration change, that you deleted it.

This isn't a clear-cut issue, but in general, the established practice with SELinux and AppArmor policies in distributions is to cover paths that are used by default, or commonly.

I can properly document the paths we will choose today but nothing changes the fact that the base paths are configurable by users and cannot assumed to be static.

I understand, but it's also reasonable to assume that if users configure base paths to values that are not the default on their distribution, they might have to adjust rules in their Linux Security Module policy.

Is there some other path I missed in the policies?

I guess the same kind of issue applies to the pasta options --pcap, --log-file and --pid as it can take any path as arg.

Definitely, it does, and there it's even worse in some sense as there's no default path for stand-alone usage.

For libvirt, we cover --pid with the path libvirt will use (not configurable) and --log-file with the path recommended in the documentation (configurable). This is done for SELinux and AppArmor in libvirt's own policy.

For stand-alone usage, we assume the user will use their home directory, or /tmp, which is consistent with the Filesystem Hierarchy... "Standard" (FHS).

Also if pasta is so worried about allowing it to open all kind of paths maybe using landlock could be an option to add "dynamic" rules at runtime to block all paths besides the ones it got as arguments then it wouldn't need to rely only on selinux/apparmor to block these.

The added value of AppArmor and SELinux here is that they are external components, so they would be unaffected in case of arbitrary code execution in the initial stages.

In this sense, I don't think Landlock would give us much of an advantage as passt and pasta anyway remount their / filesystem to an empty one as they start -- that would happen pretty much at the same time as we configure a Landlock policy.

@Luap99
Copy link
Member

Luap99 commented Apr 2, 2024

Is there some other path I missed in the policies?

If we assume defaults I think they are fine now after the rootless-netns change so that should be good. Although I do not see any patch on the passt ML for this (@{run}/user/@{uid}/** r,) ?!
Also if I read this right we will need it write as well for the --pid file that we use in the rootless netns case.

If this is something that should not be added to the pasta profile but rather some podman/buildah profile then we can do it as well but this is not something podman/buildah maintainers can figure out. However I am open to include such a profile upstream in our repo if that is contributed by someone so that all apparmor distros could use it from here.

In this sense, I don't think Landlock would give us much of an advantage as passt and pasta anyway remount their / filesystem to an empty one as they start -- that would happen pretty much at the same time as we configure a Landlock policy.

Right I guess that should be sufficient then.

@Luap99
Copy link
Member

Luap99 commented Apr 2, 2024

Also I am going to close this one as the root issue seems to be the incorrect apparmor setup in opensuse and not a upstream bug at all but we continue the conversation.

@Luap99 Luap99 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 2, 2024
@sbrivio-rh
Copy link

Is there some other path I missed in the policies?

If we assume defaults I think they are fine now after the rootless-netns change so that should be good. Although I do not see any patch on the passt ML for this (@{run}/user/@{uid}/** r,) ?!

Right, I haven't posted that yet, I was waiting for a conclusion on https://bugzilla.suse.com/show_bug.cgi?id=1221840 first.

Also if I read this right we will need it write as well for the --pid file that we use in the rootless netns case.

Oh, I forgot about that, thanks for pointing it out.

If this is something that should not be added to the pasta profile but rather some podman/buildah profile then we can do it as well

Ideally yes, but:

but this is not something podman/buildah maintainers can figure out. However I am open to include such a profile upstream in our repo if that is contributed by someone so that all apparmor distros could use it from here.

...that's not really a simple profile to write, so I would fix this up anyway in pasta's profile for the moment being.

hswong3i pushed a commit to alvistack/passt-top-passt that referenced this issue Apr 6, 2024
…tion

For the policy to work as expected across either AppArmor commit
9d3f8c6cc05d ("parser: fix parsing of source as mount point for
propagation type flags") and commit 300889c3a4b7 ("parser: fix option
flag processing for single conditional rules"), we need one mount
rule with matching mount options as "source" (that is, without
source), and one without mount options and an explicit, empty source.

Link: containers/buildah#5440
Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840
Signed-off-by: Stefano Brivio <[email protected]>
hswong3i pushed a commit to alvistack/passt-top-passt that referenced this issue Apr 6, 2024
… too

With Podman's custom networks, pasta will typically need to open the
target network namespace at /run/user/<UID>/containers/networks:
grant access to anything under /run/user/<UID> instead of limiting it
to some subpath.

Note that in this case, Podman will need pasta to write out a PID
file, so we need write access, for similar locations, too.

Reported-by: Jörg Sonnenberger <[email protected]>
Link: containers/buildah#5440
Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840
Signed-off-by: Stefano Brivio <[email protected]>
hswong3i pushed a commit to alvistack/passt-top-passt that referenced this issue Apr 6, 2024
From an original patch by Danish Prakash.

With commit ff22a78 ("pasta: Don't try to watch namespaces in
procfs with inotify, use timer instead"), if a filesystem-bound
target namespace is passed on the command line, we'll grab a handle
on its parent directory. That commit, however, didn't introduce a
matching AppArmor rule. Add it here.

To access a network namespace procfs entry, we also need a 'ptrace'
rule. See commit 594dce6 ("isolation: keep CAP_SYS_PTRACE when
required") for details as to when we need this -- essentially, it's
about operation with Buildah.

Reported-by: Jörg Sonnenberger <[email protected]>
Link: containers/buildah#5440
Link: https://bugzilla.suse.com/show_bug.cgi?id=1221840
Fixes: ff22a78 ("pasta: Don't try to watch namespaces in procfs with inotify, use timer instead")
Signed-off-by: Stefano Brivio <[email protected]>
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Jul 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants