Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadlet: Kube image build failed when starting service with missing image #20432

Closed
zephyros-dev opened this issue Oct 21, 2023 · 10 comments · Fixed by #20889
Closed

Quadlet: Kube image build failed when starting service with missing image #20432

zephyros-dev opened this issue Oct 21, 2023 · 10 comments · Fixed by #20889
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@zephyros-dev
Copy link

Issue Description

Podman kube play support building the image with Dockerfile/Containerfile inside directory. When the image is missing, the command run without issue manually, but it fails upon being ran from quadlet systemd kube service.

Steps to reproduce the issue

Steps to reproduce the issue

  1. Create an example kube file
# busybox.kube
[Kube]
PodmanArgs=--build
Yaml=/home/user/.config/containers/systemd/busybox/deployment.yaml
SetWorkingDirectory=yaml
[Service]
# Extend Timeout to allow time to pull the image
TimeoutStartSec=900
[Install]
# Start by default on boot
WantedBy=multi-user.target default.target
  1. Create deployment file
# busybox/deployment.yaml
spec:
  containers:
    - name: busybox
      image: busybox
metadata:
  name: busybox
  labels:
    app: busybox
  annotations:
    io.podman.annotations.infra.name: busybox-pod
kind: Pod
apiVersion: v1
  1. Create Containerfile for building
# busybox/busybox/Containerfile
FROM busybox
  1. Run the service
systemctl --user daemon-reload
systemctl --user start busybox

Describe the results you received

The kube quadlet service failed with the following logs, indicating failure at the image build step:
output.log

Describe the results you expected

The service build the missing image and started normally

podman info output

host:
  arch: amd64
  buildahVersion: 1.32.0
  cgroupControllers:
  - cpu
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.7-2.fc38.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.7, commit: '
  cpuUtilization:
    idlePercent: 94.81
    systemPercent: 2.17
    userPercent: 3.02
  cpus: 8
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: server
    version: "38"
  eventLogger: journald
  freeLocks: 1905
  hostname: server
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.5.7-200.fc38.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 558485504
  memTotal: 32322445312
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.8.0-1.fc38.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.8.0
    package: netavark-1.8.0-2.fc38.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.8.0
  ociRuntime:
    name: crun
    package: crun-1.9.2-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.9.2
      commit: 35274d346d2e9ffeacb22cc11590b0266a23d634
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231004.gf851084-1.fc38.x86_64
    version: |
      pasta 0^20231004.gf851084-1.fc38.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: true
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.1-1.fc38.x86_64
    version: |-
      slirp4netns version 1.2.1
      commit: 09e31e92fa3d2a1d3ca261adaeb012c8d75a8194
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 8588619776
  swapTotal: 8589930496
  uptime: 3h 30m 56.00s (Approximately 0.12 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - quay.io
store:
  configFile: /home/user/.config/containers/storage.conf
  containerStore:
    number: 85
    paused: 0
    running: 85
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/user/.local/share/containers/storage
  graphRootAllocated: 510405902336
  graphRootUsed: 85886656512
  graphStatus:
    Backing Filesystem: btrfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 36
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/user/.local/share/containers/storage/volumes
version:
  APIVersion: 4.7.1
  Built: 1696527292
  BuiltTime: Fri Oct  6 00:34:52 2023
  GitCommit: ""
  GoVersion: go1.20.8
  Os: linux
  OsArch: linux/amd64
  Version: 4.7.1

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

  • In the example I used rootless podman, but the exact same issue is also found in rootful mode.
  • If the image is present beforehand (manually or using kube play command), the service runs normally.
@zephyros-dev zephyros-dev added the kind/bug Categorizes issue or PR as related to a bug. label Oct 21, 2023
@ygalblum
Copy link
Contributor

@zephyros-dev thanks for bringing this up.

I've reproduced the issue and found that the root cause is a log print here: https://github.com/containers/common/blob/main/libimage/copier.go#L364.

This print occurs only when running under systemd and for this reason the issue does not occur when running this in the command line.

Furthermore, I was able to workaround this issue by providing the fully qualified image name: docker.io/library/busybox:latest instead of just busybox in the ContainerFile.

@zephyros-dev
Copy link
Author

@ygalblum Thanks for the fix. Regarding the workaround though, I couldn't managed to make it work with fully qualified image name with or without tag and sha256 signature as they still report the same error (I deleted the image then try to run the service again).

@ygalblum
Copy link
Contributor

@zephyros-dev You are right. I'm sorry. I guess I did not clean my images before trying it

@rhatdan
Copy link
Member

rhatdan commented Oct 22, 2023

@ygalblum is this an issue in buildah?

@ygalblum
Copy link
Contributor

@rhatdan no and yes. The issue is in containers-common which is also imported into buildah. However, the issue itself happens only when building under systemd (actually during pull). The fix was already merged to containers-common.

Copy link

A friendly reminder that this issue had no activity for 30 days.

@ygalblum
Copy link
Contributor

This issue was addressed in another repo so it was not closed automatically.

@zephyros-dev
Copy link
Author

I have updated podman to 4.8.0 with Fedora update-testing repo and I can report that the issue remains. Which version will the fix be rolled out to?

@ygalblum
Copy link
Contributor

@zephyros-dev you are right, this issue is not resolved. I guess I was doing something wrong when validating my fix. The issue is at the same place, and I think there is an issue with the reportWriter. But, it seems like there's a very long calling stack to figure out where this variable is actually set. I think @rhatdan was correct and it might be at buildah, but I'm not sure yet.
I'll update on my findings. For now, I'm reopening this issue

@ygalblum
Copy link
Contributor

@zephyros-dev FYI the fix was added to v4.8.2

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Mar 11, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 11, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants