Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman hangs when starting mariadb #4344

Closed
codeling opened this issue Oct 25, 2019 · 8 comments · Fixed by #4438
Closed

podman hangs when starting mariadb #4344

codeling opened this issue Oct 25, 2019 · 8 comments · Fixed by #4438
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@codeling
Copy link

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description
podman isn't starting my mariadb 10.4 container anymore since yesterday (running podman 1.6.2 on Ubuntu 18.04.3); podman just hangs on either simply starting the container, as well as on removing it and running it again. I suspect this is because podman was updated recently (from 1.6.1 to 1.6.2 on 2019-10-21; this happens now after the first server restart since then).

Steps to reproduce the issue:

  1. Run podman --log-level=debug run -d --name mariadb-10.4 -p 3306:3306 -v mariadbdata:/var/lib/mysql --restart=always mariadb:10.4 (as non-root)
  2. podman hangs at a slirp4netns execution:
DEBU[0000] using conmon: "/usr/bin/conmon"              
DEBU[0000] Initializing boltdb state at /home/codeling/.local/share/containers/storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver vfs                       
DEBU[0000] Using graph root /home/codeling/.local/share/containers/storage 
DEBU[0000] Using run root /run/user/1000                
DEBU[0000] Using static dir /home/codeling/.local/share/containers/storage/libpod 
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp      
DEBU[0000] Using volume path /home/codeling/.local/share/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] [graphdriver] trying provided driver "vfs"   
DEBU[0000] Initializing event backend journald          
DEBU[0000] using runtime "/usr/lib/cri-o-runc/sbin/runc" 
WARN[0000] Error initializing configured OCI runtime crun: no valid executable found for OCI runtime crun: invalid argument 
INFO[0000] running as rootless                          
DEBU[0000] using conmon: "/usr/bin/conmon"              
DEBU[0000] Initializing boltdb state at /home/codeling/.local/share/containers/storage/libpod/bolt_state.db 
DEBU[0000] Using graph driver vfs                       
DEBU[0000] Using graph root /home/codeling/.local/share/containers/storage 
DEBU[0000] Using run root /run/user/1000                
DEBU[0000] Using static dir /home/codeling/.local/share/containers/storage/libpod 
DEBU[0000] Using tmp dir /run/user/1000/libpod/tmp      
DEBU[0000] Using volume path /home/codeling/.local/share/containers/storage/volumes 
DEBU[0000] Set libpod namespace to ""                   
DEBU[0000] Initializing event backend journald          
DEBU[0000] using runtime "/usr/lib/cri-o-runc/sbin/runc" 
WARN[0000] Error initializing configured OCI runtime crun: no valid executable found for OCI runtime crun: invalid argument 
DEBU[0000] parsed reference into "[vfs@/home/codeling/.local/share/containers/storage+/run/user/1000]docker.io/library/mariadb:10.4" 
DEBU[0000] parsed reference into "[vfs@/home/codeling/.local/share/containers/storage+/run/user/1000]@a9e108e8ee8a9076bfc5e327f68aeba93a7625d7e6c1d3de1f0ee4c5a3cdc134" 
DEBU[0000] exporting opaque data as blob "sha256:a9e108e8ee8a9076bfc5e327f68aeba93a7625d7e6c1d3de1f0ee4c5a3cdc134" 
DEBU[0000] parsed reference into "[vfs@/home/codeling/.local/share/containers/storage+/run/user/1000]@a9e108e8ee8a9076bfc5e327f68aeba93a7625d7e6c1d3de1f0ee4c5a3cdc134" 
DEBU[0000] exporting opaque data as blob "sha256:a9e108e8ee8a9076bfc5e327f68aeba93a7625d7e6c1d3de1f0ee4c5a3cdc134" 
DEBU[0000] parsed reference into "[vfs@/home/codeling/.local/share/containers/storage+/run/user/1000]@a9e108e8ee8a9076bfc5e327f68aeba93a7625d7e6c1d3de1f0ee4c5a3cdc134" 
DEBU[0000] User mount mariadbdata:/var/lib/mysql options [] 
DEBU[0000] No hostname set; container's hostname will default to runtime default 
DEBU[0000] Using slirp4netns netmode                    
DEBU[0000] setting container name mariadb-10.4          
DEBU[0000] created OCI spec and options for new container 
DEBU[0000] Allocated lock 0 for container 6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f 
DEBU[0000] parsed reference into "[vfs@/home/codeling/.local/share/containers/storage+/run/user/1000]@a9e108e8ee8a9076bfc5e327f68aeba93a7625d7e6c1d3de1f0ee4c5a3cdc134" 
DEBU[0000] exporting opaque data as blob "sha256:a9e108e8ee8a9076bfc5e327f68aeba93a7625d7e6c1d3de1f0ee4c5a3cdc134" 
DEBU[0002] created container "6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f" 
DEBU[0002] container "6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f" has work directory "/home/codeling/.local/share/containers/storage/vfs-containers/6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f/userdata" 
DEBU[0002] container "6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f" has run directory "/run/user/1000/vfs-containers/6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f/userdata" 
DEBU[0002] New container created "6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f" 
DEBU[0002] container "6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f" has CgroupParent "/libpod_parent/libpod-6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f" 
DEBU[0002] Made network namespace at /run/user/1000/netns/cni-c4e3e1ed-87a4-366a-0c6c-e6ba48cbf90e for container 6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f 
DEBU[0002] mounted container "6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f" at "/home/codeling/.local/share/containers/storage/vfs/dir/9bb25892b5f5e057f9484a34683e3d88471c1d5da622a422c05561c454a028d7" 
DEBU[0002] slirp4netns command: /usr/bin/slirp4netns --api-socket /run/user/1000/libpod/tmp/6b000910645550b576ac72f9fadc429dc8fa636cfdbf71e1ca282f3eb682099f.net --disable-host-loopback --mtu 65520 --enable-sandbox -c -e 3 -r 4 --netns-type=path /run/user/1000/netns/cni-c4e3e1ed-87a4-366a-0c6c-e6ba48cbf90e tap0

Describe the results you received:
podman hangs.

Describe the results you expected:
podman continues and starts up the container.

Additional information you deem important (e.g. issue happens only occasionally):
The container ran just fine before (probably this started with an upgrade to 1.6.2, or maybe earlier with 1.6.1 already). Another rootless port-exposed container (redis) still runs just fine.

Output of podman version:

Version:            1.6.2
RemoteAPI Version:  1
Go Version:         go1.10.4
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: ""
  go version: go1.10.4
  podman version: 1.6.2
host:
  BuildahVersion: 1.11.3
  CgroupVersion: v1
  Conmon:
    package: 'conmon: /usr/bin/conmon'
    path: /usr/bin/conmon
    version: 'conmon version 2.0.2, commit: 65fe0226d85b69fc9e527e376795c9791199153d'
  Distribution:
    distribution: ubuntu
    version: "18.04"
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  MemFree: 980115456
  MemTotal: 8152612864
  OCIRuntime:
    name: runc
    package: 'cri-o-runc: /usr/lib/cri-o-runc/sbin/runc'
    path: /usr/lib/cri-o-runc/sbin/runc
    version: 'runc version spec: 1.0.1-dev'
  SwapFree: 1022619648
  SwapTotal: 1023406080
  arch: amd64
  cpus: 4
  eventlogger: journald
  hostname: baldur
  kernel: 5.0.0-32-generic
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: 'slirp4netns: /usr/bin/slirp4netns'
    Version: |-
      slirp4netns version 0.4.2
      commit: unknown
  uptime: 36h 46m 31.28s (Approximately 1.50 days)
registries:
  blocked: null
  insecure: null
  search:
  - docker.io
  - registry.fedoraproject.org
  - registry.access.redhat.com
store:
  ConfigFile: /home/codeling/.config/containers/storage.conf
  ContainerStore:
    number: 3
  GraphDriverName: vfs
  GraphOptions: {}
  GraphRoot: /home/codeling/.local/share/containers/storage
  GraphStatus: {}
  ImageStore:
    number: 4
  RunRoot: /run/user/1000
  VolumePath: /home/codeling/.local/share/containers/storage/volumes

(conmon updated to version 2.0.2 manually, as it was suggested that this might fix the issue on the #crio channel on kubernetes slack).

Package info (e.g. output of rpm -q podman or apt list podman):

podman/bionic,now 1.6.2-1~ubuntu18.04~ppa1 amd64  [installiert]

Additional environment details (AWS, VirtualBox, physical, etc.):
Ubuntu 18.04.3 LTS on a Gigabyte Brix GB-BACE 3150, 8GB RAM

@openshift-ci-robot openshift-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 25, 2019
@codeling
Copy link
Author

While podman hangs, a parallel podman ps also hangs. So this basically seems to completely freeze any podman interactions.
Anyone have an idea?
In the meantime I tried running the container without volume mapping an restart always, and then the container seems to start, but it dies immediately (for unknown reason, journalctl says container died. So maybe there's something wrong with the image?

When aborting the start (Ctrl+C) and then removing the container (via podman container rm mariadb-10.4), the output is:

Error: error removing container 6b00... root filesystem: remove /home/codeling/.local/share/containers/storage/vfs-containers/6b000.../userdata/shm: device or resource busy

(name abbreviated for better readability).
Yet the container gets removed all the same (at least podman ps -a doesn't show it anymore, and a container of the same name can be re-created).

@mheon
Copy link
Member

mheon commented Oct 27, 2019 via email

@giuseppe
Copy link
Member

can you show what processes are running? The ps aux output will be helpful

@jmigot-tehtris
Copy link

jmigot-tehtris commented Oct 30, 2019

Hello, I can confirm same problem here, which seems related to this: #2942

Without port redirection, container starts just fine.

ps aux result while hanging:

user 2642  0.0  0.8 932616 35424 pts/1    Sl+  15:48   0:00  | \_ podman run -p 9000:9000 --log-level=debug --rm -it --entrypoint /bin/bash 3a359a493bcc
user 2653  0.7  1.0 1094104 41232 pts/1   Sl+  15:48   0:06  |     \_ podman run -p 9000:9000 --log-level=debug --rm -it --entrypoint /bin/bash 3a359a493bcc
user 2656  0.0  0.0      0     0 ?        Zs   15:48   0:00  |         \_ [podman] <defunct>
user 2678  0.0  0.0   2468  1520 pts/1    S    15:48   0:00  |         \_ slirp4netns --api-socket /home/user/run/libpod/432b[...]9a0dc.net --disable-host-loopback --mtu 65520 --enable-sandbox -c -e 3 -r 4 --netns-type=path /tmp/run-1000/netns/cni-8fcdd5e8-e2cf-efbb-7376-fb4c033ec44a tap0

While hanging, the file pointed out by --api-socket does not exist.

Tried running https://github.com/rootless-containers/slirp4netns#usage adding --api-socket file, everything works fine manually.

@jmigot-tehtris
Copy link

After investigation, problem is that the "sun_path" field of "struct sockaddr_un" is limited to 108 chars, so our api-socket path were truncated by str_dup() and provided as-is to bind().

I think libpod could check the path length and print a warning (not an error) but of course slirp4netns should too. Apparently some systems are limited to 92 chars.

The OP's problem don't seem to be related to path length but the error could be that the api-socket file can't be created for another reason.

@mheon
Copy link
Member

mheon commented Oct 31, 2019

Ahhh. We have handling for that on other sockets (attach socket), but evidently not the slirp socket.

@giuseppe PTAL

@rhatdan
Copy link
Member

rhatdan commented Oct 31, 2019

@AkihiroSuda FYI

giuseppe added a commit to giuseppe/libpod that referenced this issue Nov 4, 2019
the pidWaitTimeout is already a Duration so do not multiply it again
by time.Millisecond.

Closes: containers#4344

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

giuseppe commented Nov 4, 2019

opened two PRs:

#4438
rootless-containers/slirp4netns#158

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 23, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants