Compat API endpoints for events and container logs with `follow=true` do not send response immediately like Docker #23712

jjzxcd · 2024-08-21T21:00:46Z

Issue Description

It seems that the Podman compat API does not immediately return an HTTP response when querying /events or /containers/${id}/logs?follow=true&stdout=true&stderr=true if no events or logs are produced yet. In this case, the client cannot determine whether the request was successful until some data is received.
For example, I use Vector, which in turn uses the Bollard library to interact with the Docker API, to collect logs from Docker and Podman. When Vector starts fetching system events and container logs using the Podman 5.2.1 compat API via the socket unix:///run/podman/podman.sock, the requests fail with error RequestTimeoutError after a 2 minute timeout if no response is received from Podman, causing Vector to stop monitoring the container. At the same time, everything works fine with Docker.

Vector logs:

vector[19350]: 2024-08-21T17:05:19.072757Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::sources::docker_logs: Capturing logs from now on. now=2024-08-21T17:05:19.072742638+00:00

vector[19350]: 2024-08-21T17:05:19.072925Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::sources::docker_logs: Listening to docker log events.

vector[19350]: 2024-08-21T17:05:19.114233Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Started watching for container logs. container_id=6c7b2459bcb71617e192003e88955527b738bb20dc4b6b5cbbc798112b45b357

vector[19350]: 2024-08-21T17:07:19.098385Z ERROR source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Error in communication with Docker daemon. error=RequestTimeoutError error_type="connection_failed" stage="receiving" container_id=None internal_log_rate_limit=true

vector[19350]: 2024-08-21T17:07:19.115508Z ERROR source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Internal log [Error in communication with Docker daemon.] is being suppressed to avoid flooding.

vector[19350]: 2024-08-21T17:07:19.115597Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Stopped watching for container logs. container_id=6c7b2459bcb71617e192003e88955527b738bb20dc4b6b5cbbc798112b45b357

Steps to reproduce the issue

Run any container that does not produce logs: id=$(podman run -d --rm alpine sleep infinity)
Get events: curl -v --unix-socket /run/podman/podman.sock "http://d/events" -o -
Get container logs: curl -v --unix-socket /run/podman/podman.sock "http://d/containers/${id}/logs?follow=true&stdout=true&stderr=true" -o -

Describe the results you received

There is no HTTP response if no events or logs are produced:

# curl -v --unix-socket /run/podman/podman.sock "http://d/events" -o -
*   Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80
> GET /events HTTP/1.1
> Host: d
> User-Agent: curl/8.6.0
> Accept: */*
>
^C

# curl -v --unix-socket /run/podman/podman.sock "http://d/containers/${id}/logs?follow=true&stdout=true&stderr=true" -o -
*   Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80
> GET /containers/6c7b2459bcb71617e192003e88955527b738bb20dc4b6b5cbbc798112b45b357/logs?follow=true&stdout=true&stderr=true HTTP/1.1
> Host: d
> User-Agent: curl/8.6.0
> Accept: */*
>
^C

Describe the results you expected

For events endpoint

Receive the HTTP response even if there are no logs or events.

Like with Podman 5.0.3:

# curl -v --unix-socket /run/podman/podman.sock "http://d/events" -o -
*   Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80
> GET /events HTTP/1.1
> Host: d
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Api-Version: 1.41
< Content-Type: application/json
< Libpod-Api-Version: 5.0.3
< Server: Libpod/5.0.3 (linux)
< X-Reference-Id: 0xc00006a338
< Date: Wed, 21 Aug 2024 19:16:59 GMT
< Transfer-Encoding: chunked
<
^C

Like with Docker:

# curl -v --unix-socket /run/docker.sock "http://d/events" -o -
*   Trying /run/docker.sock:0...
* Connected to d (/run/docker.sock) port 80
> GET /events HTTP/1.1
> Host: d
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Api-Version: 1.46
< Content-Type: application/json
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/27.1.2 (linux)
< Date: Wed, 21 Aug 2024 20:08:13 GMT
< Transfer-Encoding: chunked
<
^C

For container logs endpoint

Recevice the HTTP response even if there are no logs or events.

Like with Docker:

# curl -v --unix-socket /run/docker.sock "http://d/containers/${id}/logs?follow=true&stdout=true&stderr=true" -o -
*   Trying /run/docker.sock:0...
* Connected to d (/run/docker.sock) port 80
> GET /containers/c60d242591740394e4ef8f2dd9a4264a931dffd9f357d0adb2b1e53dd0024196/logs?follow=true&stdout=true&stderr=true HTTP/1.1
> Host: d
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Api-Version: 1.46
< Content-Type: application/vnd.docker.multiplexed-stream
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/27.1.2 (linux)
< Date: Wed, 21 Aug 2024 20:10:30 GMT
< Transfer-Encoding: chunked
<
^C

podman info output

host:
  arch: amd64
  buildahVersion: 1.37.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 99.44
    systemPercent: 0.27
    userPercent: 0.29
  cpus: 2
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: container
    version: "40"
  eventLogger: file
  freeLocks: 2048
  hostname: 2659a687fd77
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.0-208.159.3.2.el9uek.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1310531584
  memTotal: 3624329216
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.1-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.1
    package: netavark-1.12.1-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.1
  ociRuntime:
    name: crun
    package: crun-1.15-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240814.g61c0b0d-1.fc40.x86_64
    version: |
      pasta 0^20240814.g61c0b0d-1.fc40.x86_64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 2147479552
  swapTotal: 2147479552
  uptime: 27h 22m 32.00s (Approximately 1.12 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /usr/lib/containers/storage
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.13-1.fc40.x86_64
      Version: |-
        fusermount3 version: 3.16.2
        fuse-overlayfs: version 1.13-dev
        FUSE library version 3.16.2
        using FUSE kernel interface version 7.38
    overlay.mountopt: nodev,fsync=0
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 21027463168
  graphRootUsed: 10851258368
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.2.1
  Built: 1723593600
  BuiltTime: Wed Aug 14 00:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.5
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.1

Podman in a container

Yes

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

The text was updated successfully, but these errors were encountered:

Luap99 · 2024-08-22T16:35:33Z

Most likely caused by c46884a. We need to wait until we get either a successful event or error first before setting the status code so the fix is valid IMO.
So in order to fix this here we would need to have a way that tell us explicitly that events is ready to send events an no error after this will be reported in which case we can set the status code to 200 without having to send an event

github-actions · 2024-09-22T00:09:44Z

A friendly reminder that this issue had no activity for 30 days.

One of the problems with the Events() API was that you had to call it in a new goroutine. This meant the the error returned by it had to be read back via a second channel. This cuased other bugs in the past but here the biggest problem is that basic errors such as invalid since/until options were not directly returned to the caller. It meant in the API we were not able to write http code 200 quickly because we always waited for the first event or error from the channels. This in turn made some clients not happy as they assume the server hangs on time out if no such events are generated. To fix this we resturcture the entire event flow. First we spawn the goroutine inside the eventer Read() function so not all the callers have to. Then we can return the basic error quickly without the goroutine. The caller then checks the error like any normal function and the API can use this one to decide which status code to return. Second we now return errors/event in one channel then the callers can decide to ignore or log them which makes it a bit more clear. Fixes c46884a ("podman events: check for an error after we finish reading events") Fixes containers#23712 Signed-off-by: Paul Holzinger <[email protected]>

API clients expect the status code quickly otherwise they can time out. If we do not flush we may not write the header immediately and only when futher logs are send. Fixes containers#23712 Signed-off-by: Paul Holzinger <[email protected]>

One of the problems with the Events() API was that you had to call it in a new goroutine. This meant the the error returned by it had to be read back via a second channel. This cuased other bugs in the past but here the biggest problem is that basic errors such as invalid since/until options were not directly returned to the caller. It meant in the API we were not able to write http code 200 quickly because we always waited for the first event or error from the channels. This in turn made some clients not happy as they assume the server hangs on time out if no such events are generated. To fix this we resturcture the entire event flow. First we spawn the goroutine inside the eventer Read() function so not all the callers have to. Then we can return the basic error quickly without the goroutine. The caller then checks the error like any normal function and the API can use this one to decide which status code to return. Second we now return errors/event in one channel then the callers can decide to ignore or log them which makes it a bit more clear. Fixes c46884a ("podman events: check for an error after we finish reading events") Fixes containers#23712 Signed-off-by: Paul Holzinger <[email protected]>

API clients expect the status code quickly otherwise they can time out. If we do not flush we may not write the header immediately and only when futher logs are send. Fixes containers#23712 Signed-off-by: Paul Holzinger <[email protected]>

Luap99 · 2024-10-29T16:24:59Z

PR #24406 if you would like to test that

One of the problems with the Events() API was that you had to call it in a new goroutine. This meant the the error returned by it had to be read back via a second channel. This cuased other bugs in the past but here the biggest problem is that basic errors such as invalid since/until options were not directly returned to the caller. It meant in the API we were not able to write http code 200 quickly because we always waited for the first event or error from the channels. This in turn made some clients not happy as they assume the server hangs on time out if no such events are generated. To fix this we resturcture the entire event flow. First we spawn the goroutine inside the eventer Read() function so not all the callers have to. Then we can return the basic error quickly without the goroutine. The caller then checks the error like any normal function and the API can use this one to decide which status code to return. Second we now return errors/event in one channel then the callers can decide to ignore or log them which makes it a bit more clear. Fixes c46884a ("podman events: check for an error after we finish reading events") Fixes containers#23712 Signed-off-by: Paul Holzinger <[email protected]>

API clients expect the status code quickly otherwise they can time out. If we do not flush we may not write the header immediately and only when futher logs are send. Fixes containers#23712 Signed-off-by: Paul Holzinger <[email protected]>

jjzxcd added the kind/bug Categorizes issue or PR as related to a bug. label Aug 21, 2024

Luap99 added the regression label Aug 22, 2024

jjzxcd mentioned this issue Sep 11, 2024

Verify docker_logs support for podman vectordotdev/vector#6807

Closed

github-actions bot added the stale-issue label Sep 22, 2024

Luap99 mentioned this issue Oct 7, 2024

Unable to use the events streaming endpoint as when using docker engine #24185

Closed

Luap99 self-assigned this Oct 29, 2024

Luap99 removed the stale-issue label Oct 29, 2024

Luap99 mentioned this issue Oct 29, 2024

fix API issue about missing the status code in the events and logs endpoints #24406

Merged

Luap99 added the jira label Oct 29, 2024

openshift-merge-bot bot closed this as completed in #24406 Nov 4, 2024

openshift-merge-bot bot closed this as completed in 768ad86 Nov 4, 2024

petrzjunior mentioned this issue Nov 17, 2024

podman: 5.2.3 -> 5.3.0 NixOS/nixpkgs#356754

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compat API endpoints for events and container logs with `follow=true` do not send response immediately like Docker #23712

Compat API endpoints for events and container logs with `follow=true` do not send response immediately like Docker #23712

jjzxcd commented Aug 21, 2024

Luap99 commented Aug 22, 2024

github-actions bot commented Sep 22, 2024

Luap99 commented Oct 29, 2024

Compat API endpoints for events and container logs with follow=true do not send response immediately like Docker #23712

Compat API endpoints for events and container logs with follow=true do not send response immediately like Docker #23712

Comments

jjzxcd commented Aug 21, 2024

Issue Description

Steps to reproduce the issue

Describe the results you received

Describe the results you expected

For events endpoint

For container logs endpoint

podman info output

Podman in a container

Privileged Or Rootless

Upstream Latest Release

Additional environment details

Additional information

Luap99 commented Aug 22, 2024

github-actions bot commented Sep 22, 2024

Luap99 commented Oct 29, 2024

Compat API endpoints for events and container logs with `follow=true` do not send response immediately like Docker #23712

Compat API endpoints for events and container logs with `follow=true` do not send response immediately like Docker #23712