Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested image pull inside QM container id failing #278

Closed
Yarboa opened this issue Nov 16, 2023 · 15 comments · Fixed by #280
Closed

Nested image pull inside QM container id failing #278

Yarboa opened this issue Nov 16, 2023 · 15 comments · Fixed by #280
Labels
bug Something isn't working

Comments

@Yarboa
Copy link
Collaborator

Yarboa commented Nov 16, 2023

With podman 4.7 and up inside c9s deployment

podman exec -it qm bash
bash-5.1# podman run -it quay.io/centos-sig-automotive/ffi-tools:latest

return the following error
Error: writing blob: storing blob to file "/var/tmp/container_images_storage738866264/1": write /var/tmp/container_images_storage738866264/1: no space left on device

podman exec -it qm bash -c "df -kh"
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 23G 5.1G 18G 23% /
tmpfs 64M 0 64M 0% /dev
tmpfs 887M 0 887M 0% /tmp
/dev/vda2 23G 618M 22G 3% /var
tmpfs 887M 140K 887M 1% /run
tmpfs 887M 0 887M 0% /run/lock
tmpfs 887M 0 887M 0% /var/tmp
tmpfs 355M 9.7M 346M 3% /etc/hosts
shm 63M 84K 63M 1% /dev/shm
tmpfs 887M 8.0M 879M 1% /var/log/journal

I can think of workarounds, Question is, what is the correct solution
cat /etc/redhat-release
CentOS Stream release 9
Host rpms
qm-0.6.0-1.20231113152056710175.main.7.gbdc6b1f.el9.noarch
podman-4.8.0~dev-1.20231115215527013754.main.2457.ec2e533a2.el9.x86_64
Host kernel uname -r
5.14.0-383.el9.x86_64

CI FFI gate is failing due to that

@Yarboa Yarboa added the bug Something isn't working label Nov 16, 2023
@Yarboa
Copy link
Collaborator Author

Yarboa commented Nov 16, 2023

@rhatdan @dougsland

@Yarboa
Copy link
Collaborator Author

Yarboa commented Nov 16, 2023

I can think of workarounds, Question is, what is the correct solution
like this one,

podman exec -it qm bash -c "mkdir -p /var/container-temp ; export TMPDIR=/var/container-temp; podman info" | grep imageCopyTmpDir
  imageCopyTmpDir: /var/container-temp

shall we add it to QM setup?

Yarboa added a commit to Yarboa/qm that referenced this issue Nov 16, 2023
Resolve containers#278

Signed-off-by: Yariv Rachmani <[email protected]>
Yarboa added a commit to Yarboa/qm that referenced this issue Nov 16, 2023
Resolve containers#278

Signed-off-by: Yariv Rachmani <[email protected]>
Yarboa added a commit to Yarboa/qm that referenced this issue Nov 16, 2023
Resolve containers#278

Signed-off-by: Yariv Rachmani <[email protected]>
Yarboa added a commit to Yarboa/qm that referenced this issue Nov 16, 2023
Resolve containers#278

Signed-off-by: Yariv Rachmani <[email protected]>
Yarboa added a commit to Yarboa/qm that referenced this issue Nov 16, 2023
Resolve containers#278

Signed-off-by: Yariv Rachmani <[email protected]>
@Yarboa
Copy link
Collaborator Author

Yarboa commented Nov 16, 2023

PR #280 does not resolve this one
https://artifacts.dev.testing-farm.io/9d310c51-8364-4978-a696-1dc4fb8de98d/work-ffimfngm2uf/log.txt

Error: failed to get new shm lock manager: failed to create 2048 locks in /libpod_lock: read-only file

inside qm while pulling ffi-tools image

One more detail, it happens on TF AWS instance, not reporduced in c9s vm

@rhatdan
Copy link
Member

rhatdan commented Nov 18, 2023

Inside contianer is /dev/shm readonly? Looks like something is setup incorrectly.

@Yarboa
Copy link
Collaborator Author

Yarboa commented Nov 20, 2023

This is not the issue, it was failing with same error even when i remounted r,w
podman exec -it qm bash -c "mount -o remount,rw /dev/shm"

I will try the test on blank c9s AWS image without partitions to see if the issue persist there

@Yarboa
Copy link
Collaborator Author

Yarboa commented Nov 20, 2023

OK, as i suspect it is an issue with AWS and qm setup, i removed the /var/qm partition for verifying root cause.

Added a parameter for add disk part, default is no.
https://github.com/containers/qm/pull/280/files#diff-ed4cd590d534395a418d961acd614ebff97ed5889b9ca785094dd0a10d27863dR48

The error is still valid, Will check with TestingFarm

https://artifacts.dev.testing-farm.io/c209b61e-87bb-4027-986d-3037cdc19854/work-ffiai68i_jy/log.txt

@Yarboa
Copy link
Collaborator Author

Yarboa commented Nov 20, 2023

Reproduced in Testing Farm with reserved machine
Inside the QM

podman images
Error: failed to get new shm lock manager: failed to create 2048 locks in /libpod_lock: read-only file system

@Yarboa
Copy link
Collaborator Author

Yarboa commented Nov 20, 2023

@rhatdan
Ok run the tests manually, default permission of /dev/shm
shm on /dev/shm type tmpfs (ro,nosuid,nodev,noexec,relatime,rootcontext=system_u:object_r:qm_file_t:s0,seclabel,size=64000k,inode64)

After every qm restart it returns to this status,
Is it a bug?

[root@ip-172-31-16-223 ~]# podman exec -it qm bash -c "mount -o remount,rw /dev/shm"
[root@ip-172-31-16-223 ~]# podman exec -it qm podman info
host:
  arch: amd64
  buildahVersion: 1.32.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: 65953271fc1e506ac4eb890c645f3f75976973b4'
  cpuUtilization:
    idlePercent: 96.48
    systemPercent: 0.79
    userPercent: 2.73
  cpus: 1
  databaseBackend: boltdb
  distribution:
    distribution: centos
    version: "9"
  eventLogger: journald
  freeLocks: 2048
  hostname: ip-172-31-16-223.us-east-2.compute.internal
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.0-383.el9.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 1072787456
  memTotal: 3730722816
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.8.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.8.0
    package: netavark-1.8.0-3.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.8.0
  ociRuntime:
    name: crun
    package: crun-1.11.2-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.11.2
      commit: ab0edeef1c331840b025e8f1d38090cfb8a0509d
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.el9.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 0h 57m 8.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 53617864704
  graphRootUsed: 8977350656
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.7.2
  Built: 1699861154
  BuiltTime: Mon Nov 13 07:39:14 2023
  GitCommit: ""
  GoVersion: go1.21.3
  Os: linux
  OsArch: linux/amd64
  Version: 4.7.2

@rhatdan
Copy link
Member

rhatdan commented Nov 20, 2023

Certainly ooks like a bug. What does the qm.service show for the podman command?

@Yarboa
Copy link
Collaborator Author

Yarboa commented Nov 20, 2023

Sure @rhatdan

ExecStart=/usr/bin/podman run --name=qm --cidfile=%t/%N.cid --replace --rm --cgroups=split --tz=local --network=host --sdnotify=conmon -d --security-opt label=type:qm_t --security-opt label=filetype:qm_file_t --security-opt label=level:s0 --device=/dev/fuse --cap-add=all --read-only --read-only-tmpfs=false -v ${RWETCFS}:/etc -v ${RWVARFS}:/var --pids-limit=-1 --security-opt label=nested --security-opt unmask=all --rootfs ${ROOTFS} /sbin/init

@Yarboa
Copy link
Collaborator Author

Yarboa commented Nov 20, 2023

@dougsland
What is this comment means?
https://github.com/containers/qm/blob/main/qm.container#L36-L39
Can we return this service attribute? in 4.7.2?

@rhatdan
Copy link
Member

rhatdan commented Nov 20, 2023

--read-only-tmpfs=false
Is the problem.

@Yarboa
Copy link
Collaborator Author

Yarboa commented Nov 20, 2023

Yes, i see, although i have to admit that name is confusing
Will propose a fix for that, running without it gave the correct one
shm on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,relatime,rootcontext=system_u:object_r:qm_file_t:s0,seclabel,size=64000k,inode64)

@rhatdan
Copy link
Member

rhatdan commented Nov 20, 2023

Yes I agree and actually internal to the code it is labeled ReadWriteTmpfs. Most users should never touch that flag. The basic idea of the flag is to allow users to configure the system in a way where the processes within the container can write no where, or just to volumes mounted into the container.

@Yarboa
Copy link
Collaborator Author

Yarboa commented Nov 20, 2023

Found this issue, adding 'VolatileTmp=true'
containers/podman#20439

dougsland pushed a commit that referenced this issue Nov 27, 2023
Resolve #278

Signed-off-by: Yariv Rachmani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants