conmon high cpu after a container exec via the rest API #221

codemaker219 · 2020-12-16T09:10:43Z

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

After multiple exec on a container, conman processes sporn that take 100% CPU for 5 mins

Steps to reproduce the issue:

Create a container podman run --rm --name test -d docker.io/nginx
Start the service for the REST API podman system service tcp:0.0.0.0:8090 -t0
Do MULTIPLE exec into the conatiner e.G. for i in {1..20}; do podman --remote --url tcp://127.0.0.1:8090 exec test ls; done (Maybe must even be executed 2 or 3 times)

Describe the results you received:
After that some conman processes sporn that take ~100% CPU

Describe the results you expected:
Normal CPU usage :-)

Additional information you deem important (e.g. issue happens only occasionally):
Seems to be irrelevant what container and which command is executed, but it only happens when you execute it about 10 times or more.

Output of podman version:

Version:      2.2.1
API Version:  2.1.0
Go Version:   go1.14.7
Built:        Thu Dec 10 13:26:48 2020
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.18.0
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.0.21-1.el8.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.21, commit: f619ab8ef5f69bd40bb75ed64f3e1dace1815c22-dirty'
  cpus: 4
  distribution:
    distribution: '"centos"'
    version: "8"
  eventLogger: journald
  hostname: localhost.localdomain
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 4.18.0-240.1.1.el8_3.x86_64
  linkmode: dynamic
  memFree: 7358492672
  memTotal: 8145018880
  ociRuntime:
    name: runc
    package: runc-1.0.0-145.rc91.git24a3cf8.el8.x86_64
    path: /usr/bin/runc
    version: 'runc version spec: 1.0.2-dev'
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  rootless: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 5368705024
  swapTotal: 5368705024
  uptime: 24m 24.79s
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - registry.centos.org
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 4
    paused: 0
    running: 1
    stopped: 3
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphStatus:
    Backing Filesystem: xfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "true"
  imageStore:
    number: 1
  runRoot: /var/run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 2.1.0
  Built: 1607624808
  BuiltTime: Thu Dec 10 13:26:48 2020
  GitCommit: ""
  GoVersion: go1.14.7
  OsArch: linux/amd64
  Version: 2.2.1

Package info (e.g. output of rpm -q podman or apt list podman):

podman-2.2.1-1.el8.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide?

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):
Tested in a VirtualBox vm

The text was updated successfully, but these errors were encountered:

vrothberg · 2020-12-18T08:35:46Z

Thanks for reaching out, @codemaker219!

I can reproduce but only cgroups v1 (crun and runc). I cannot reproduce on cgroups v2 (crun). @giuseppe could you have a look?

now that we use a delay to call the cleanup program, we might end up in a race where the event fd used by glib is close'd and it causes the blig event handler to keep polling the closed file descriptor in a tight loop. To avoid closing files that are handled by glib, store what FDs are opened when conmon first started and close only them. Closes: containers#221 Signed-off-by: Giuseppe Scrivano <[email protected]>

now that we use a delay to call the cleanup program, we might end up in a race where the event fd used by glib is close'd and it causes the glib event handler to keep polling the closed file descriptor in a tight loop. To avoid closing files that are handled by glib, store what FDs are opened when conmon first started and close only them. Closes: containers#221 Signed-off-by: Giuseppe Scrivano <[email protected]>

giuseppe · 2020-12-18T15:25:31Z

PR here: #222

now that we use a delay to call the cleanup program, we might end up in a race where the event fd used by glib is close'd and it causes the glib event handler to keep polling the closed file descriptor in a tight loop. To avoid closing files that are handled by glib, store what FDs are opened when conmon first started and close only them. Closes: containers#221 Signed-off-by: Giuseppe Scrivano <[email protected]>

codemaker219 referenced this issue in bitsolve/podman Dec 16, 2020

workaround https://github.com/containers/podman/issues/8745

ed2e61e

vrothberg assigned giuseppe Dec 18, 2020

giuseppe transferred this issue from containers/podman Dec 18, 2020

giuseppe mentioned this issue Dec 18, 2020

conmon: store open FDs and close only them #222

Merged

rhatdan closed this as completed in #222 Dec 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conmon high cpu after a container exec via the rest API #221

conmon high cpu after a container exec via the rest API #221

codemaker219 commented Dec 16, 2020

vrothberg commented Dec 18, 2020

giuseppe commented Dec 18, 2020

conmon high cpu after a container exec via the rest API #221

conmon high cpu after a container exec via the rest API #221

Comments

codemaker219 commented Dec 16, 2020

vrothberg commented Dec 18, 2020

giuseppe commented Dec 18, 2020