kata cleanup is not completed when --rm is used #6222

snir911 · 2020-05-14T07:34:40Z

/kind bug

Description

When running kata-containers with --rm qemu is not terminated after container
running is completed.

Steps to reproduce the issue:

Install kata
sudo podman --runtime=/usr/bin/kata-runtime run --security-opt label=disable -it --rm fedora:latest sleep 1
ps aux | grep qemu

Describe the results you received:

qemu process is still running

Describe the results you expected:
qemu should have been terminated

Additional information you deem important (e.g. issue happens only occasionally):

This is happening since ContainerStateRemoving state was added
(25cc43c)

Output of podman version:

Version:            1.9.1 (+ upstream)
RemoteAPI Version:  1
Go Version:         go1.14.2
OS/Arch:            linux/amd64

kata-runtime: 1.11
qemu: 4.2.0

Output of podman info --debug:

debug:
  compiler: gc
  gitCommit: ""
  goVersion: go1.14.2
  podmanVersion: 1.9.1
host:
  arch: amd64
  buildahVersion: 1.14.8
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.15-1.fc32.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.15, commit: 33da5ef83bf2abc7965fc37980a49d02fdb71826'
  cpus: 2
  distribution:
    distribution: fedora
    version: "32"
  eventLogger: file
  hostname: kata-f32
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 5.6.8-300.fc32.x86_64
  memFree: 1201999872
  memTotal: 4118786048
  ociRuntime:
    name: runc
    package: runc-1.0.0-144.dev.gite6555cc.fc32.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10+dev
      commit: fbdbaf85ecbc0e077f336c03062710435607dbf1
      spec: 1.0.1-dev
  os: linux
  rootless: true
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.0.0-1.fc32.x86_64
    version: |-
      slirp4netns version 1.0.0
      commit: a3be729152a33e692cd28b52f664defbf2e7810a
      libslirp: 4.2.0
  swapFree: 0
  swapTotal: 0
  uptime: 191h 22m 42.55s (Approximately 7.96 days)
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /home/test/.config/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.0.0-1.fc32.x86_64
      Version: |-
        fusermount3 version: 3.9.1
        fuse-overlayfs: version 1.0.0
        FUSE library version 3.9.1
        using FUSE kernel interface version 7.31
  graphRoot: /home/test/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 13
  runRoot: /run/user/1000/containers
  volumePath: /home/test/.local/share/containers/storage/volumes

The text was updated successfully, but these errors were encountered:

snir911 · 2020-05-14T07:37:30Z

I was able to fix it with something like that:
https://github.com/snir911/libpod/tree/kata_cleanup

Although I'm not sure it's a proper fix as i found out that the issue is derived by the order of the cleanup, qemu is not terminated whenever runtime continer cleanup (delete(ctx)) is done after tearing down the storage (Is it a valid order of operations?)

Any guidance will be appreciated :)

rhatdan · 2020-05-14T12:12:22Z

I believe we are calling kata-runtime-stop and kata-runtime-kill when stopping and removing the container, then we remove the storage. kata-runtime should be stopping the qemu process. Podman does not know anything special about kata versus crun, versus runc.

@mheon WDYT

mheon · 2020-05-14T13:49:51Z

The suggested patch is completely unworkable; we cannot allow cleanup on running containers, it would allow containers that are running to be unmounted.

It sounds like Kata really disagrees with us unmounting the storage first, then removing the container from the runtime. That seems like a reasonable ask from an OCI runtime, so I've added a patch (#6229) to do this. If that's not enough to resolve things, this is definitely a Kata bug.

snir911 · 2020-05-14T14:57:03Z

Does Removing state means container might still be running? (in this patch i changed it so be set to Removing only if it's not Running)

#6229 won't help as cleanupRuntime checks if it's Stopped or Created state on the beginning, hence it immediately returns since the state is Removing.

The problem seems to be with teardownStorage not cleanupStorage, what is actually the difference?

mheon · 2020-05-14T15:37:31Z

Removing means the container is in the process of being deleted. I'm really confused as to how we're getting into Removing while cleanup hasn't been completed, though, Removing should guarantee cleanup already ran...

mheon · 2020-05-14T15:54:54Z

Nevermind, think I know what's going on here. If cleanupRuntime() is happening as part of remove() we are never in a good state to actually remove the runtime if it exists (IE, the container was in ContainerStateStopped).

mheon · 2020-05-14T16:04:40Z

Pushed one more commit that might resolve this

snir911 · 2020-05-14T21:05:04Z

fixed by #6229

openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label May 14, 2020

snir911 closed this as completed May 14, 2020

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kata cleanup is not completed when --rm is used #6222

kata cleanup is not completed when --rm is used #6222

snir911 commented May 14, 2020

snir911 commented May 14, 2020

rhatdan commented May 14, 2020

mheon commented May 14, 2020

snir911 commented May 14, 2020 •

edited

Loading

mheon commented May 14, 2020

mheon commented May 14, 2020

mheon commented May 14, 2020

snir911 commented May 14, 2020

kata cleanup is not completed when --rm is used #6222

kata cleanup is not completed when --rm is used #6222

Comments

snir911 commented May 14, 2020

snir911 commented May 14, 2020

rhatdan commented May 14, 2020

mheon commented May 14, 2020

snir911 commented May 14, 2020 • edited Loading

mheon commented May 14, 2020

mheon commented May 14, 2020

mheon commented May 14, 2020

snir911 commented May 14, 2020

snir911 commented May 14, 2020 •

edited

Loading