podman logs --follow, k8s-file, misses a log #17286

edsantiago · 2023-01-30T18:02:52Z

not ok 108 podman logs - --until --follow k8s-file
...
# podman-remote --url unix:/tmp/podman_tmp_hJUP run --log-driver=k8s-file --name dT1IxYdteS -d quay.io/libpod/testimage:20221018 sh -c while :; do echo RnePKS3YFW && sleep 2; done
26960512e2ef4dda5ac8219d66f7d678d1a30e8fea8f8a6714188468610c6cf9
# podman-remote --url unix:/tmp/podman_tmp_hJUP logs --until 3s -f dT1IxYdteS
RnePKS3YFW
#/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#|     FAIL: logs --until -f on running container works
#| expected: 'RnePKS3YFW
RnePKS3YFW'
#|   actual: 'RnePKS3YFW'
#\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I'm suspecting a test bug (because of the sleep), not an actual podman bug, but will figure it out later.

fedora-36 : sys remote fedora-36 root host [remote]
- PR [v4.4] fix: don't output "ago" when container is currently up and running #17255
  - 01-27 16:08
fedora-37 : sys remote fedora-37 rootless host [remote]
- PR StopContainer: return if cleanup process changed state #17165
  - 01-19 07:39
- PR Quadlet Kube - add support for PublishPort key #17068
  - 01-12 04:28

The text was updated successfully, but these errors were encountered:

vrothberg · 2023-02-02T12:23:58Z

@Luap99 mind taking a look? You turned out to be the "Trüffelschwein" of finding subtle bugs in the log code.

edsantiago · 2023-02-16T13:29:17Z

Twice in one day, both f36 remote root: here and here

Luap99 · 2023-02-16T14:30:08Z

I suspect it is a timing issue in CI, logs in general gives no guarantees about timing when they are written.

edsantiago · 2023-02-20T16:47:35Z

I can reproduce locally, and in the process discovered that the flake is correlated with this server output:

time="2023-02-20T07:22:27-06:00" level=error msg="Stopping logger: hangup by client"

I'm reworking the test logic anyway, making it (sigh) sloppier in hopes of eliminating the flake, but I think this is an interesting clue.

...in hopes of fixing a flake with podman-remote. It's still possible that there's a real problem with logs under remote, and this will just sweep that under the rug. Also, fix a nasty-red test warning (add cleanup), refactor uses of $(pause_image), and improve a few test assertions. Closes: containers#17286 Signed-off-by: Ed Santiago <[email protected]>

edsantiago · 2023-03-21T14:36:42Z

Seen just now, in a PR based on a recent main. root debian. Reopening.

edsantiago · 2023-03-21T14:38:01Z

Removing the remote tag and title, because this new failure is local.

edsantiago · 2023-04-21T00:28:48Z

Thinking out loud after a long day (usually not a good idea). But: #15321 was reporting "missing log lines" as one of its symptoms; the "channel 123 found" error was drowning out any useful info. But "channel 123", in podman-remote, appears to mean "container not found". Is it possible that a container gets removed right in the middle of another process running logs on it? I'll look into it next week.

edsantiago · 2023-05-16T11:07:48Z

Is this the same bug? f37 rootless and journald, not k8s-file:

not ok 113 podman logs - --since --follow journald
...
$ podman run --log-driver=journald --name jMXWgW9IhP -d quay.io/libpod/testimage:20221018 sh -c sleep 1; while :; do echo mRs8hYocrw && sleep 5; done
7ceddbedbc4b1937a21de5e927e780a59091377557bfde5cd4716116e71add71
$ podman logs --since 0.1s -f jMXWgW9IhP
timeout: sending signal TERM to command ?/var/tmp/go/src/github.com/containers/podman/bin/podman?
[ rc=124 (expected) ]
#/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#|     FAIL: logs --since -f on running container works
#| expected: =~ \^mRs8hYocrw
#|         >    timeout: sending signal TERM to command.\*
#|   actual:    timeout: sending signal TERM to command ?/var/tmp/go/src/github.com/containers/podman/bin/podman?

Luap99 · 2023-05-16T11:19:26Z

I guess these are just timing issues in the test, can we make run_podman print the timing when the commands are executed, i.e. date +"%T.%N"?. I need to see the milliseconds to judge that.
If it takes more than 1.1 seconds between the podman run -d and podman logs it will likely fail.

edsantiago · 2023-05-16T11:34:44Z

That would really clutter up the logs ... but maybe I can use logformatter to help. I'll look into it.

Luap99 · 2023-05-16T11:39:23Z

I was think just adding the timestamp here:

podman/test/system/helpers.bash

Line 200 in e2d1ad1

echo "$_LOG_PROMPT $PODMAN $*"

Would that there really clutter logs?

edsantiago · 2023-05-16T11:41:07Z

Sample from my proof-of-concept:

$ bats --filter can remove infra test/system/010-images.bats
010-images.bats
 ✗ podman images - rmi -f can remove infra images
   (in test file test/system/010-images.bats, line 294)
     `false' failed
   [05:32:07.011163228] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman rm -t 0 --all --force --ignore
   [05:32:07.069125287] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman ps --all --external --format {{.ID}} {{.Names}}
   [05:32:07.118253136] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
   [05:32:07.159672429] quay.io/libpod/testimage:20221018 f5a99120db64
   [05:32:07.171796125] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman create --pod new:VHrI9sMfcZ quay.io/libpod/testimage:20221018
   [05:32:07.474894186] 7d2ae91e515a9d5dfbe2fa81f60afd62d0864b252647d23e6b030e4071cb4e04
   [05:32:07.513549847] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman inspect --format {{.ID}} localhost/podman-pause:4.6.0-dev-1684169501
   [05:32:07.565370054] 97b7962e651c6892bb6eae55c64d3a17d0a8f7429f0575a52ecd1d204226d058
   [05:32:07.572005456] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman rmi localhost/podman-pause:4.6.0-dev-1684169501
   [05:32:07.620290272] Error: image used by 8a7a8c93e6cbe2b08e42374649a99924fdf0a4575d9e52ac3b258a90b82d6e21: image is in use by a container: consider listing external containers and force-removing image
   [05:32:07.623198044] [ rc=2 (expected) ]
   [05:32:07.633569688] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman rmi -f localhost/podman-pause:4.6.0-dev-1684169501
   [05:32:07.737421482] Untagged: localhost/podman-pause:4.6.0-dev-1684169501
   Deleted: 97b7962e651c6892bb6eae55c64d3a17d0a8f7429f0575a52ecd1d204226d058
   [05:32:07.744243796] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman ps --all --noheading
   [05:32:07.791298656] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman pod ps --noheading
   [05:32:07.843681019] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman image exists quay.io/libpod/testimage:20221018
   # [teardown]
   [05:32:07.897184675] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman pod rm -t 0 --all --force --ignore
   [05:32:07.940598430] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman rm -t 0 --all --force --ignore
   [05:32:07.983983166] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman network prune --force
   [05:32:08.025695704] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman volume rm -a -f

edsantiago · 2023-05-31T11:46:55Z

If it takes more than 1.1 seconds between the podman run -d and podman logs it will likely fail.

That is consistent with the most recent failure:

[+0292s] not ok 109 podman logs - until k8s-file
...
<+1.04s> # # podman run --log-driver=k8s-file -d --name test quay.io/libpod/testimage:20221018 sh -c echo before_4XqvkEr72m_k8s-file; trap 'echo after_z1aX2YuaDu_k8s-file; exit' SIGTERM; while :; do sleep 1; done
<+329ms> # ed7a2e17b1415544c833a39939d4f282626c3029950f827ecc366cabd1584a7f
<+1.01s> # # podman stop test
<+1.17s> # test
<+009ms> # # podman wait test
<+055ms> # 0
<+011ms> # # podman logs test
<+056ms> # before_4XqvkEr72m_k8s-file
         # after_z1aX2YuaDu_k8s-file
<+018ms> # # podman logs --until 2023-05-26T07:40:31-05:00 test
<+067ms> # # podman logs --until 2023-05-26T07:40:35-05:00 test
<+057ms> # before_4XqvkEr72m_k8s-file
         # #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
         # #|     FAIL: podman logs --until after
         # #| expected: 'before_4XqvkEr72m_k8s-file
         # after_z1aX2YuaDu_k8s-file'
         # #|   actual: 'before_4XqvkEr72m_k8s-file'
         # #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Submitted #18752 as possible fix

Luap99 · 2023-05-31T12:12:13Z

I don't think #18752 fixes the flake. Testing how date --date='+1 second' works shows the issue

$ date +"%T.%N" && date --date='+1 second' --iso-8601=seconds
14:00:39.849253983
2023-05-31T14:00:40+02:00

Obviously 1 second is only rounded up to one second which means if we make the date call at 0.9 second then there is only a 0.1 second window for podman to output the logs because --until means we exit at that given time and stop reading more. I think this is racy because I coded it in two different go routines (one that writes logs and one that waits for until then stops the log reader). I think I can fix that in the actual code but it may not be trivial. Maybe change the test to do: +2 second in the meantime.

Luap99 · 2023-05-31T12:15:08Z

Also note that this applies for the one specific test, the podman logs - until ... and NOT for the one with --follow.

edsantiago · 2023-05-31T12:35:34Z

Would this help?

 ..._log_test_until()
...
-    after=$(date --date='+1 second' --iso-8601=seconds)
+    after=$(date --date='+1 second' --iso-8601=ns)

Seems to work, although I can't tell if it does the right thing internally.

Luap99 · 2023-05-31T12:41:52Z

I think this is fine.

edsantiago added flakes Flakes from Continuous Integration remote Problem is in podman-remote labels Jan 30, 2023

github-actions bot removed the remote Problem is in podman-remote label Jan 30, 2023

edsantiago added the remote Problem is in podman-remote label Jan 30, 2023

edsantiago mentioned this issue Feb 22, 2023

Logs follow-until tests: loosen checks #17603

Merged

openshift-merge-robot closed this as completed in #17603 Feb 22, 2023

edsantiago mentioned this issue Feb 23, 2023

podman logs --since --follow: missing output #17616

Closed

edsantiago reopened this Mar 21, 2023

edsantiago removed the remote Problem is in podman-remote label Mar 21, 2023

edsantiago changed the title ~~podman-remote, logs --follow, k8s-file, misses a log~~ podman logs --follow, k8s-file, misses a log Mar 21, 2023

edsantiago added the kind/bug Categorizes issue or PR as related to a bug. label Mar 28, 2023

edsantiago mentioned this issue May 31, 2023

systests: minimize race-condition window #18752

Merged

openshift-merge-robot closed this as completed in #18752 May 31, 2023

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Aug 30, 2023

github-actions bot locked as resolved and limited conversation to collaborators Aug 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman logs --follow, k8s-file, misses a log #17286

podman logs --follow, k8s-file, misses a log #17286

edsantiago commented Jan 30, 2023

vrothberg commented Feb 2, 2023

edsantiago commented Feb 16, 2023

Luap99 commented Feb 16, 2023

edsantiago commented Feb 20, 2023

edsantiago commented Mar 21, 2023

edsantiago commented Mar 21, 2023

edsantiago commented Apr 21, 2023

edsantiago commented May 16, 2023

Luap99 commented May 16, 2023

edsantiago commented May 16, 2023

Luap99 commented May 16, 2023

edsantiago commented May 16, 2023

edsantiago commented May 31, 2023

Luap99 commented May 31, 2023

Luap99 commented May 31, 2023

edsantiago commented May 31, 2023

Luap99 commented May 31, 2023

podman logs --follow, k8s-file, misses a log #17286

podman logs --follow, k8s-file, misses a log #17286

Comments

edsantiago commented Jan 30, 2023

vrothberg commented Feb 2, 2023

edsantiago commented Feb 16, 2023

Luap99 commented Feb 16, 2023

edsantiago commented Feb 20, 2023

edsantiago commented Mar 21, 2023

edsantiago commented Mar 21, 2023

edsantiago commented Apr 21, 2023

edsantiago commented May 16, 2023

Luap99 commented May 16, 2023

edsantiago commented May 16, 2023

Luap99 commented May 16, 2023

edsantiago commented May 16, 2023

edsantiago commented May 31, 2023

Luap99 commented May 31, 2023

Luap99 commented May 31, 2023

edsantiago commented May 31, 2023

Luap99 commented May 31, 2023