Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compat API endpoints for events and container logs with follow=true do not send response immediately like Docker #23712

Closed
jjzxcd opened this issue Aug 21, 2024 · 3 comments · Fixed by #24406
Assignees
Labels
jira kind/bug Categorizes issue or PR as related to a bug. regression

Comments

@jjzxcd
Copy link

jjzxcd commented Aug 21, 2024

Issue Description

It seems that the Podman compat API does not immediately return an HTTP response when querying /events or /containers/${id}/logs?follow=true&stdout=true&stderr=true if no events or logs are produced yet. In this case, the client cannot determine whether the request was successful until some data is received.
For example, I use Vector, which in turn uses the Bollard library to interact with the Docker API, to collect logs from Docker and Podman. When Vector starts fetching system events and container logs using the Podman 5.2.1 compat API via the socket unix:///run/podman/podman.sock, the requests fail with error RequestTimeoutError after a 2 minute timeout if no response is received from Podman, causing Vector to stop monitoring the container. At the same time, everything works fine with Docker.

Vector logs:

vector[19350]: 2024-08-21T17:05:19.072757Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::sources::docker_logs: Capturing logs from now on. now=2024-08-21T17:05:19.072742638+00:00

vector[19350]: 2024-08-21T17:05:19.072925Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::sources::docker_logs: Listening to docker log events.

vector[19350]: 2024-08-21T17:05:19.114233Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Started watching for container logs. container_id=6c7b2459bcb71617e192003e88955527b738bb20dc4b6b5cbbc798112b45b357

vector[19350]: 2024-08-21T17:07:19.098385Z ERROR source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Error in communication with Docker daemon. error=RequestTimeoutError error_type="connection_failed" stage="receiving" container_id=None internal_log_rate_limit=true

vector[19350]: 2024-08-21T17:07:19.115508Z ERROR source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Internal log [Error in communication with Docker daemon.] is being suppressed to avoid flooding.

vector[19350]: 2024-08-21T17:07:19.115597Z  INFO source{component_kind="source" component_id=src_podman component_type=docker_logs}: vector::internal_events::docker_logs: Stopped watching for container logs. container_id=6c7b2459bcb71617e192003e88955527b738bb20dc4b6b5cbbc798112b45b357

Steps to reproduce the issue

  1. Run any container that does not produce logs: id=$(podman run -d --rm alpine sleep infinity)
  2. Get events: curl -v --unix-socket /run/podman/podman.sock "http://d/events" -o -
  3. Get container logs: curl -v --unix-socket /run/podman/podman.sock "http://d/containers/${id}/logs?follow=true&stdout=true&stderr=true" -o -

Describe the results you received

There is no HTTP response if no events or logs are produced:

# curl -v --unix-socket /run/podman/podman.sock "http://d/events" -o -
*   Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80
> GET /events HTTP/1.1
> Host: d
> User-Agent: curl/8.6.0
> Accept: */*
>
^C

# curl -v --unix-socket /run/podman/podman.sock "http://d/containers/${id}/logs?follow=true&stdout=true&stderr=true" -o -
*   Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80
> GET /containers/6c7b2459bcb71617e192003e88955527b738bb20dc4b6b5cbbc798112b45b357/logs?follow=true&stdout=true&stderr=true HTTP/1.1
> Host: d
> User-Agent: curl/8.6.0
> Accept: */*
>
^C

Describe the results you expected

For events endpoint

Receive the HTTP response even if there are no logs or events.

Like with Podman 5.0.3:

# curl -v --unix-socket /run/podman/podman.sock "http://d/events" -o -
*   Trying /run/podman/podman.sock:0...
* Connected to d (/run/podman/podman.sock) port 80
> GET /events HTTP/1.1
> Host: d
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Api-Version: 1.41
< Content-Type: application/json
< Libpod-Api-Version: 5.0.3
< Server: Libpod/5.0.3 (linux)
< X-Reference-Id: 0xc00006a338
< Date: Wed, 21 Aug 2024 19:16:59 GMT
< Transfer-Encoding: chunked
<
^C

Like with Docker:

# curl -v --unix-socket /run/docker.sock "http://d/events" -o -
*   Trying /run/docker.sock:0...
* Connected to d (/run/docker.sock) port 80
> GET /events HTTP/1.1
> Host: d
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Api-Version: 1.46
< Content-Type: application/json
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/27.1.2 (linux)
< Date: Wed, 21 Aug 2024 20:08:13 GMT
< Transfer-Encoding: chunked
<
^C

For container logs endpoint

Recevice the HTTP response even if there are no logs or events.

Like with Docker:

# curl -v --unix-socket /run/docker.sock "http://d/containers/${id}/logs?follow=true&stdout=true&stderr=true" -o -
*   Trying /run/docker.sock:0...
* Connected to d (/run/docker.sock) port 80
> GET /containers/c60d242591740394e4ef8f2dd9a4264a931dffd9f357d0adb2b1e53dd0024196/logs?follow=true&stdout=true&stderr=true HTTP/1.1
> Host: d
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Api-Version: 1.46
< Content-Type: application/vnd.docker.multiplexed-stream
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/27.1.2 (linux)
< Date: Wed, 21 Aug 2024 20:10:30 GMT
< Transfer-Encoding: chunked
<
^C

podman info output

host:
  arch: amd64
  buildahVersion: 1.37.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.10-1.fc40.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.10, commit: '
  cpuUtilization:
    idlePercent: 99.44
    systemPercent: 0.27
    userPercent: 0.29
  cpus: 2
  databaseBackend: sqlite
  distribution:
    distribution: fedora
    variant: container
    version: "40"
  eventLogger: file
  freeLocks: 2048
  hostname: 2659a687fd77
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.0-208.159.3.2.el9uek.x86_64
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1310531584
  memTotal: 3624329216
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.12.1-1.fc40.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.12.1
    package: netavark-1.12.1-1.fc40.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.12.1
  ociRuntime:
    name: crun
    package: crun-1.15-1.fc40.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20240814.g61c0b0d-1.fc40.x86_64
    version: |
      pasta 0^20240814.g61c0b0d-1.fc40.x86_64-pasta
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 2147479552
  swapTotal: 2147479552
  uptime: 27h 22m 32.00s (Approximately 1.12 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.imagestore: /usr/lib/containers/storage
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: fuse-overlayfs-1.13-1.fc40.x86_64
      Version: |-
        fusermount3 version: 3.16.2
        fuse-overlayfs: version 1.13-dev
        FUSE library version 3.16.2
        using FUSE kernel interface version 7.38
    overlay.mountopt: nodev,fsync=0
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 21027463168
  graphRootUsed: 10851258368
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "true"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.2.1
  Built: 1723593600
  BuiltTime: Wed Aug 14 00:00:00 2024
  GitCommit: ""
  GoVersion: go1.22.5
  Os: linux
  OsArch: linux/amd64
  Version: 5.2.1

Podman in a container

Yes

Privileged Or Rootless

Privileged

Upstream Latest Release

Yes

Additional environment details

Additional environment details

Additional information

Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting

@jjzxcd jjzxcd added the kind/bug Categorizes issue or PR as related to a bug. label Aug 21, 2024
@Luap99
Copy link
Member

Luap99 commented Aug 22, 2024

Most likely caused by c46884a. We need to wait until we get either a successful event or error first before setting the status code so the fix is valid IMO.
So in order to fix this here we would need to have a way that tell us explicitly that events is ready to send events an no error after this will be reported in which case we can set the status code to 200 without having to send an event

Copy link

A friendly reminder that this issue had no activity for 30 days.

@Luap99 Luap99 self-assigned this Oct 29, 2024
Luap99 added a commit to Luap99/libpod that referenced this issue Oct 29, 2024
One of the problems with the Events() API was that you had to call it in
a new goroutine. This meant the the error returned by it had to be read
back via a second channel. This cuased other bugs in the past but here
the biggest problem is that basic errors such as invalid since/until
options were not directly returned to the caller.
It meant in the API we were not able to write http code 200 quickly
because we always waited for the first event or error from the
channels. This in turn made some clients not happy as they assume the
server hangs on time out if no such events are generated.

To fix this we resturcture the entire event flow. First we spawn the
goroutine inside the eventer Read() function so not all the callers have
to. Then we can return the basic error quickly without the goroutine.
The caller then checks the error like any normal function and the API
can use this one to decide which status code to return.
Second we now return errors/event in one channel then the callers can
decide to ignore or log them which makes it a bit more clear.

Fixes c46884a ("podman events: check for an error after we finish reading events")
Fixes containers#23712

Signed-off-by: Paul Holzinger <[email protected]>
Luap99 added a commit to Luap99/libpod that referenced this issue Oct 29, 2024
API clients expect the status code quickly otherwise they can time out.
If we do not flush we may not write the header immediately and only when
futher logs are send.

Fixes containers#23712

Signed-off-by: Paul Holzinger <[email protected]>
Luap99 added a commit to Luap99/libpod that referenced this issue Oct 29, 2024
One of the problems with the Events() API was that you had to call it in
a new goroutine. This meant the the error returned by it had to be read
back via a second channel. This cuased other bugs in the past but here
the biggest problem is that basic errors such as invalid since/until
options were not directly returned to the caller.
It meant in the API we were not able to write http code 200 quickly
because we always waited for the first event or error from the
channels. This in turn made some clients not happy as they assume the
server hangs on time out if no such events are generated.

To fix this we resturcture the entire event flow. First we spawn the
goroutine inside the eventer Read() function so not all the callers have
to. Then we can return the basic error quickly without the goroutine.
The caller then checks the error like any normal function and the API
can use this one to decide which status code to return.
Second we now return errors/event in one channel then the callers can
decide to ignore or log them which makes it a bit more clear.

Fixes c46884a ("podman events: check for an error after we finish reading events")
Fixes containers#23712

Signed-off-by: Paul Holzinger <[email protected]>
Luap99 added a commit to Luap99/libpod that referenced this issue Oct 29, 2024
API clients expect the status code quickly otherwise they can time out.
If we do not flush we may not write the header immediately and only when
futher logs are send.

Fixes containers#23712

Signed-off-by: Paul Holzinger <[email protected]>
@Luap99 Luap99 added the jira label Oct 29, 2024
@Luap99
Copy link
Member

Luap99 commented Oct 29, 2024

PR #24406 if you would like to test that

Luap99 added a commit to Luap99/libpod that referenced this issue Oct 29, 2024
One of the problems with the Events() API was that you had to call it in
a new goroutine. This meant the the error returned by it had to be read
back via a second channel. This cuased other bugs in the past but here
the biggest problem is that basic errors such as invalid since/until
options were not directly returned to the caller.
It meant in the API we were not able to write http code 200 quickly
because we always waited for the first event or error from the
channels. This in turn made some clients not happy as they assume the
server hangs on time out if no such events are generated.

To fix this we resturcture the entire event flow. First we spawn the
goroutine inside the eventer Read() function so not all the callers have
to. Then we can return the basic error quickly without the goroutine.
The caller then checks the error like any normal function and the API
can use this one to decide which status code to return.
Second we now return errors/event in one channel then the callers can
decide to ignore or log them which makes it a bit more clear.

Fixes c46884a ("podman events: check for an error after we finish reading events")
Fixes containers#23712

Signed-off-by: Paul Holzinger <[email protected]>
Luap99 added a commit to Luap99/libpod that referenced this issue Oct 29, 2024
API clients expect the status code quickly otherwise they can time out.
If we do not flush we may not write the header immediately and only when
futher logs are send.

Fixes containers#23712

Signed-off-by: Paul Holzinger <[email protected]>
Luap99 added a commit to Luap99/libpod that referenced this issue Nov 1, 2024
API clients expect the status code quickly otherwise they can time out.
If we do not flush we may not write the header immediately and only when
futher logs are send.

Fixes containers#23712

Signed-off-by: Paul Holzinger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira kind/bug Categorizes issue or PR as related to a bug. regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants