Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using podman.socket from inside container results in very long 'podman ps' latency #20270

Closed
tghansen opened this issue Oct 5, 2023 · 24 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@tghansen
Copy link

tghansen commented Oct 5, 2023

Issue Description

I am running podman on OL8 8.8 virtual machine with three containers (using either overlap of vfs storage, makes no difference). One of the containers maps /var/run/podman/podman/socket and uses docker CLI to list containers, etc. This is classic docker-in-docker model that works in docker. It also works with podman, however, the CLI degrades to VERY high latency.

time docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
985d2cd1bd34 docker.io/library/escs-proxy-manager-xpercps:231010.056.0 45 minutes ago Up 38 minutes escs-proxy-manager-xpercps-231010.056.0
0430b9346a15 docker.io/library/escs-telemetry-proxy-xpercps:231010.054.0 43 minutes ago Up 28 minutes escs-telemetry-proxy_xpercps_231010.054.0
f2c774084649 docker.io/library/escs-mgmt-proxy-xpercps:231010.056.0 41 minutes ago Up 28 minutes escs-mgmt-proxy_xpercps_231010.056.0

real 2m14.452s
user 0m0.108s
sys 0m0.041s

2-3 minutes is common. Note that this happens over a period of a few minutes, so something seems to build up insice of podman daemon resulting in degradation. If I run 'systemctl restart podman.socket', the problem clears up and then returns after some number of minutes.

'podman info' struggles too with same kind of latency.

Steps to reproduce the issue

I can make this problem occur consistently with:

  1. docker load -i image.tar
  2. docker create and start container with /var/run/docker.sock mapped to container
  3. container loads several other images, starts a few containers and then does a periodic docker ps to check that they are running. This is the keepalive daemon process for all services. NOTE: it is running docker cli, not podman. Seems to work fine, except for performance degradation.
  4. after several minutes, the high latency is seen on the VM host

Describe the results you received

When I run these steps, I observe very high latency from podman ps, podman info, etc. Note that the commands do complete, but after several minutes.

Describe the results you expected

Since we use the docker cli to monitor the health of all services, this high latency will mean we cannot manage the service keepalive in a reasonable time period.

podman info output

podman info
host:
  arch: amd64
  buildahVersion: 1.29.0
  cgroupControllers:
  - cpuset
  - cpu
  - cpuacct
  - blkio
  - memory
  - devices
  - freezer
  - net_cls
  - perf_event
  - net_prio
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v1
  conmon:
    package: conmon-2.1.6-1.module+el8.8.0+21045+adcb6a64.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.6, commit: 31a72124adb6095b6be85b27e3e481313a1cea96'
  cpuUtilization:
    idlePercent: 37.56
    systemPercent: 13.7
    userPercent: 48.74
  cpus: 2
  distribution:
    distribution: '"ol"'
    variant: server
    version: "8.8"
  eventLogger: file
  hostname: scaqan11cps01vm03.us.oracle.com
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.0-3.60.5.1.el8uek.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 655323136
  memTotal: 8026165248
  networkBackend: cni
  ociRuntime:
    name: runc
    package: runc-1.1.4-1.0.1.module+el8.8.0+21119+51f68ed8.x86_64
    path: /usr/bin/runc
    version: |-
      runc version 1.1.4
      spec: 1.0.2-dev
      go: go1.19.10
      libseccomp: 2.5.2
  os: linux
  remoteSocket:
    exists: true
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_SYS_CHROOT,CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.0-2.module+el8.8.0+21045+adcb6a64.x86_64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 1045266432
  swapTotal: 1073737728
  uptime: 25h 23m 23.00s (Approximately 1.04 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - container-registry.oracle.com
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 3
    paused: 0
    running: 3
    stopped: 0
  graphDriverName: vfs
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 78604800000
  graphRootUsed: 8803954688
  graphStatus: {}
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 3
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.4.1
  Built: 1695376079
  BuiltTime: Fri Sep 22 09:47:59 2023
  GitCommit: ""
  GoVersion: go1.19.10
  Os: linux
  OsArch: linux/amd64
  Version: 4.4.1

Podman in a container

Yes

Privileged Or Rootless

Privileged

Upstream Latest Release

No

Additional environment details

OL8 U8 VM.

cat /etc/system-release
Oracle Linux Server release

df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 3.8G 0 3.8G 0% /dev
tmpfs 3.8G 84K 3.8G 1% /dev/shm
tmpfs 3.8G 201M 3.6G 6% /run
tmpfs 3.8G 0 3.8G 0% /sys/fs/cgroup
/dev/vda3 74G 8.3G 66G 12% /
/dev/vda5 20G 172M 20G 1% /home
/dev/vdb1 98G 1.8G 96G 2% /state
/dev/vda1 1014M 308M 707M 31% /boot
tmpfs 766M 0 766M 0% /run/user/0
shm 63M 0 63M 0% /var/lib/containers/storage/vfs-containers/985d2cd1bd3458b6fd298ae0b473d016182c5e8a4a92d7e7c603cf299fcf7480/userdata/shm
shm 63M 0 63M 0% /var/lib/containers/storage/vfs-containers/f2c77408464979fc2755b5c2bdea4855ddd2e43846fff03785ed5843b7b38dac/userdata/shm
shm 63M 0 63M 0% /var/lib/containers/storage/vfs-containers/0430b9346a1500a2b6812207cc9c1911688650d8856ca7a441e9d8ef476519e6/userdata/shm

Additional information

Any ideas of where I can look at log output to get some ideas of what is happening during this long period of latency?

@tghansen tghansen added the kind/bug Categorizes issue or PR as related to a bug. label Oct 5, 2023
@vrothberg
Copy link
Member

Thanks for reaching out, @tghansen.

Can you please share an exact reproducer? I am bit lost in which socket is being used where and why Docker is being used.

@tghansen
Copy link
Author

tghansen commented Oct 5, 2023

Podman is running on the OL8.8 host. podman.socket is also running which maps /var/run/docker.sock to /run/podman/podman.socket. This should get the podman daemon running.
We download in image with docker load -i image.tar. This is an OL7 image that has docker-cli installed.
We create a contain using this image that sets -v /var/run/docker.sock:/var/run/docker.sock This exposes the docker socket to the container. This is the recommded way to do docker in docker (see. https://devopscube.com/run-docker-in-docker/) if you want to control host containers.
This container now downloads and installs additional containers, also using docker load -i downloaded_image.tar and creates containers that do NOT have /var/run/docker.sock bound.
The container that is managing containers has a daemon thread that every minute checks that the containers it created are still running and at the version desired. This is steady state, simply doing 'docker ps --all' commands.
With this steady state load, 'podman ps' on the host VM starts to get slower and slower. There is no cpu usages as reported by 'time', but the delay in running the command starts to climb up into the minutes.

@tghansen
Copy link
Author

tghansen commented Oct 5, 2023

I am hoping thereis some logging from podman.socket that I can find to help diagnose what it is waiting for. The latency is seen both on the host VM and inside of the container.

@tghansen
Copy link
Author

tghansen commented Oct 5, 2023

Producing the problem outside of our environment will take some work, since we need the docker-in-docker container. It may be possible to create a simple stripped down version of that container that just comes up and loops doing 'docker ps ' commands.

@tghansen
Copy link
Author

tghansen commented Oct 5, 2023

I had considered installing PODMAN CLI inside of the OL7 container. If I do that, will it still use /var/run/podman/podman.socket? It seems like a bit of an anti pattern running docker cli with podman over /var/run/docker.sock.

@vrothberg
Copy link
Member

There are official podman container images, for instance quay.io/podman/stable:v4.7. You can use those to run podman-in-podman. That does not require mounting the socket (and is more secure).

Maybe there is a specific reason for why you're mounting the socket into the container but you're loosing quite some security when doing it as the containers inside will have root access to your host.

@tghansen
Copy link
Author

tghansen commented Oct 5, 2023

Unfortinately I am required to use the Oracle official base image for containers. This is also why I am stuck with OL7. This was obviously all running on docker, not podman, but is now moving to podman with the transition to OL8.

@tghansen
Copy link
Author

tghansen commented Oct 5, 2023

also note that I do NOT want the containers to run inside my container, I just want to manage them from inside of the container.

@rhatdan
Copy link
Member

rhatdan commented Oct 5, 2023

@vrothberg this looks like it could have been some of the fixes you did to improve latency on ps and images commands.

@tghansen Can you try with a new podman 4.6 or later.

@tghansen
Copy link
Author

tghansen commented Oct 5, 2023

Let me try that.

@tghansen
Copy link
Author

tghansen commented Oct 5, 2023

are there any instructions for installing newer podman distributions on centos? So far google hits a bunch of non-functional instructions. thanks

@tghansen
Copy link
Author

tghansen commented Oct 5, 2023

FYI: watchtower is a good example of exactly what I am doing. https://github.com/containrrr/watchtower
Looks like podman system service is removed from 4.7 version of podman. So not sure how I replace podman on OL8 to the latest 4.7 remote application.

@vrothberg
Copy link
Member

Regarding OL8 specific questions, it's probably best to reach out to Oracle.

Looks like podman system service is removed from 4.7 version of podman.

Can you elaborate on that?

@tghansen
Copy link
Author

tghansen commented Oct 6, 2023

I was just hoping folks had the steps to get 4.6-4.7 setup on CENTOS distribution. I found some steps for CENTOS 7 with older PODMAN versions but since I now want to test 4.6+ on OL8 (4.6.1 is going to be included in 8.9 according to folks here at Oracle) I need to do the install manually.

@tghansen
Copy link
Author

tghansen commented Oct 6, 2023

I can download the podman-remote executable. But it doesn't support 'podman system service' anymore, so when systemd tries to start podman.socket, it gets errors. So guessing there is a new mechanism to launched that.

@vrothberg
Copy link
Member

I can download the podman-remote executable. But it doesn't support 'podman system service' anymore, so when systemd tries to start podman.socket, it gets errors. So guessing there is a new mechanism to launched that.

No mechanism has changed, so something else might be going on

@tghansen
Copy link
Author

tghansen commented Oct 6, 2023

Interesting. podman system service returns unknown command with 4.7 and works with 4.1.1.

@Luap99
Copy link
Member

Luap99 commented Oct 6, 2023

podman-remote is not podman. podman-remote is just a client to connect to the podman service.
You need the proper podman not remote to start the service.

@vrothberg
Copy link
Member

vrothberg commented Oct 6, 2023

Ah, I think I need to clarify the difference. podman-remote is the remote client of Podman and requires talking via a socket to a podman-system-service. Only the Linux-native (local) Podman can create the system-service.

@tghansen
Copy link
Author

tghansen commented Oct 6, 2023

got it. so my original question: how do I build or get a prebuilt binary for centos distributions for podman 4.7? Do I need to build it? Hitting lots of issues with that, but slowly working through them.

@tghansen
Copy link
Author

tghansen commented Oct 6, 2023

I only see the remote binary linked to the update

@Luap99
Copy link
Member

Luap99 commented Oct 6, 2023

You can try https://copr.fedorainfracloud.org/coprs/rhcontainerbot/podman-next/

@tghansen
Copy link
Author

tghansen commented Oct 6, 2023

Progress.

podman version
Client: Podman Engine
Version: 4.8.0-dev-f4348bab6
API Version: 4.8.0-dev-f4348bab6
Go Version: go1.20.6
Built: Fri Oct 6 11:14:54 2023
OS/Arch: linux/amd64

@tghansen
Copy link
Author

tghansen commented Oct 6, 2023

I am testing with this version and so far so good. I will do some more testing and then close the issue. Hopefully OL 8.9 which should include podman 4.6.1 behaves as well as I am seeing so far with podman 4.8.0. Thanks for all the help.

@containers containers locked and limited conversation to collaborators Oct 6, 2023
@rhatdan rhatdan converted this issue into discussion #20289 Oct 6, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants