podman logs: missing output #18501

edsantiago · 2023-05-08T12:39:19Z

 [It] tail 800 lines
...
podman [options] logs --tail 800 CID
...
[FAILED] Expected <LONG STRING> to have length 800

Different CI runs on different days. root/rootless, f37/38, journald/json-file

fedora-37 : int podman fedora-37 root host sqlite
- 05-06 10:27 in tail 800 lines: journald
fedora-38 : int podman fedora-38 rootless host sqlite
- 05-05 15:51 in tail 800 lines: json-file

Echoes of #14362, except these failures are on main and include the wait fix.

The text was updated successfully, but these errors were encountered:

edsantiago · 2023-05-08T14:38:43Z

I wonder if this could be a different manifestation of the same bug? Basically, podman runs a systemd container, spins waiting for "Reached target multi-user.target", and test fails because the string never appears in podman logs. This is podman-remote, though, so it could be a different issue.

Luap99 · 2023-05-08T14:47:13Z

I see some weird output formatting in the json-file case:

...
           line 424
line 425
         
           line 426
...

However I also see this in tests which pass so I am not sure if this is caused by ginkgo or the logformatter maybe?
I also counted the lines in the log and see 800 so why does the matcher say 799???

The journal case is more concerning, we are actually missing 261 lines in the log.

In any case each line also prints a \r because we start the container with -t should we drop that?

edsantiago · 2023-05-08T15:52:03Z

As a general rule I like removing -t from tests; but ISTR that podman logs can behave differently with -t, and I don't know if there's a good reason for the -t in this test. (From a quick tig blame session, I don't think so, but I can't be sure). I think it's worth looking into removing -t here.

Luap99 · 2023-05-08T16:05:21Z

yeah with -t we only have one stream instead of separate stdout/err streams. However for the 800 line test, which I added a long time ago, the point was to test logs --tail with a size greater than the default pagesize (4096 bytes usually) read size. #7232

edsantiago · 2023-05-31T13:44:46Z

Repurposing this issue as a gathering place for all "podman logs missing output" flakes.

debian-12 : int podman debian-12 root host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 05-18 21:16 in tail 800 lines: json-file
debian-12 : int podman debian-12 rootless host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 05-18 14:08 in podman-remote pod logs test
debian-12 : int remote debian-12 root host sqlite [remote]
- PR Ed's pet PR with no flake retries #17831
  - 05-05 15:57 in podman run container with systemd PID1
fedora-37 : int podman fedora-37 root host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 05-18 21:19 in podman pod logs -l: journald
  - 05-17 11:58 in podman play kube seccomp container level
  - 05-17 11:58 in podman play kube seccomp pod level
  - 05-16 14:30 in using journald container name
  - 05-11 09:45 in since duration 10m: journald
  - 05-06 10:27 in tail 800 lines: journald
fedora-37 : int remote fedora-37 root host sqlite [remote]
- PR Ed's pet PR with no flake retries #17831
  - 05-22 12:12 in since time 2017-08-07: journald
fedora-38 : int podman fedora-38 root host boltdb
- PR Ed's pet PR with no flake retries #17831
  - 05-21 22:16 in podman pod logs -l: json-file
fedora-38 : int podman fedora-38 root host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 05-31 07:43 in podman pod logs -l: k8s-file
fedora-38 : int podman fedora-38 rootless host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 05-05 15:51 in tail 800 lines: json-file
rawhide : int podman rawhide root host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 05-27 09:52 in using container with container log-size: journald

edsantiago · 2023-06-20T19:45:29Z

Hey, is anyone looking into this? It's a serious problem, hitting a lot of different tests (suggesting it's not a badly written test). I would hate for customers to start hitting this in the field.

debian-12 : int podman debian-12 root host sqlite
- 05-18 21:16 in tail 800 lines: json-file
debian-12 : int podman debian-12 rootless host sqlite
- 05-18 14:08 in podman-remote pod logs test
debian-12 : int remote debian-12 root host sqlite [remote]
- 05-05 15:57 in podman run container with systemd PID1
fedora-37 : int podman fedora-37 root host sqlite
- 05-18 21:19 in podman pod logs -l: journald
- 05-17 11:58 in podman play kube seccomp container level
- 05-17 11:58 in podman play kube seccomp pod level
- 05-16 14:30 in using journald container name
- 05-11 09:45 in since duration 10m: journald
- 05-06 10:27 in tail 800 lines: journald
fedora-37 : int remote fedora-37 root host sqlite [remote]
- 05-22 12:12 in since time 2017-08-07: journald
fedora-38 : int podman fedora-38 root host boltdb
- 05-21 22:16 in podman pod logs -l: json-file
fedora-38 : int podman fedora-38 root host sqlite
- 05-31 07:43 in podman pod logs -l: k8s-file
fedora-38 : int podman fedora-38 rootless host sqlite
- 05-05 15:51 in tail 800 lines: json-file
rawhide : int podman rawhide root host sqlite
- 05-27 09:52 in using container with container log-size: journald

vrothberg · 2023-06-21T06:59:45Z

I am flooded with upstream issues and bugs at the moment but totally agree with you Ed.

@mheon, can we schedule a bug week soon-ish?

vrothberg · 2023-06-21T07:00:28Z

It's bookmarked, so I will take a look when I have some cycles (no ETA).

Luap99 · 2023-06-21T08:54:03Z

Some tests are definitely broken (missing wait or should not use run -d, i.e. podman pod logs -l)

The 800 lines thing is interesting because the output is screwed up with the \r bytes somehow: https://api.cirrus-ci.com/v1/artifact/task/6317242757939200/html/int-podman-debian-12-root-host-sqlite.log.html#t--tail-800-lines--json-file--1
But it does the same thing in tests that pass so I don't think that is the cause. And even more weird it actually shows the full 800 lines output in the log so why does ginkgo say it is only 799 lines?!!!!!!

This one is odd, looks like the linker is broken: https://api.cirrus-ci.com/v1/artifact/task/6365092854366208/html/int-podman-debian-12-rootless-host-sqlite.log.html#t--podman-remote-pod-logs-test--1
But then again other tests in this run pass so why does loading a shared lib flake with ENOENT????????

edsantiago · 2023-06-21T11:04:03Z

Oh, I didn't look closely enough at that one. It's probably #17042, another scary-nasty flake. I've reclassified it.

A few tests were doing "podman run -d" + "podman logs". This is racy. Remove the unnecessary "-d". And, as long as we're mucking around in here: - remove the "-t" from the 800-lines test, so we get clean output without ^Ms - remove unnecessary "sh", "-c" from simple echo commands - add actual error-message checks to two places that were only checking exit status Resolves one (not all) of the flakes tracked in containers#18501 Signed-off-by: Ed Santiago <[email protected]>

edsantiago · 2023-06-27T12:15:37Z

Here's one from yesterday, 800-line test, journald. It's a humongous failure, expecting 800 lines, getting 551. Looking at the journal, search in-page for "line 551", we find:

Jun 26 16:46:46 cirrus-task-6755306202464256 eloquent_williamson[223648]: line 551
Jun 26 16:46:46 cirrus-task-6755306202464256 eloquent_williamson[223648]: line 552
Jun 26 16:46:47 cirrus-task-6755306202464256 systemd-journald[557]: Data hash table of /var/log/journal/eedc8417f8cb3f847c826e088dfd67ab/system.journal has a fill level at 75.0 (174763 of 233016 items, 50331648 file size, 287 bytes per hash table item), suggesting rotation.
Jun 26 16:46:47 cirrus-task-6755306202464256 systemd-journald[557]: /var/log/journal/eedc8417f8cb3f847c826e088dfd67ab/system.journal: Journal header limits reached or header out-of-date, rotating.
........
Jun 26 16:46:46 cirrus-task-6755306202464256 rsyslogd[666]: imjournal: journal files changed, reloading...  [v8.2210.0-4.fc38 try https://www.rsyslog.com/e/0 ]
Jun 26 16:46:46 cirrus-task-6755306202464256 eloquent_williamson[223648]: line 553
Jun 26 16:46:47 cirrus-task-6755306202464256 rsyslogd[666]: imjournal: journal files changed, reloading...  [v8.2210.0-4.fc38 try https://www.rsyslog.com/e/0 ]

Looks like a smoking gun to me.

Luap99 · 2023-06-27T12:22:17Z

Yeah that is definitely suspicious and in fact we miss the logs after that line but there are also other test failures were we actually saw all lines in the log so I am not sure it is the only bug.

edsantiago · 2023-06-27T12:30:33Z

The new logformatter adds direct links to journal, making it much easier to look into these. So like here's one of those 800-line failures, k8s-file (not journald), and it looks like all the lines are there (I didn't count) ... but journal shows a conmon error:

Jun 15 16:51:11 cirrus-task-6083635929939968 conmon[93450]: conmon 6a82177ee9e67f96f12c <nwarn>: Failed to open cgroups file: /sys/fs/cgroup/machine.slice/libpod-6a82177ee9e67f96f12c938169b73c16537188fa8d712e151d2f7a2505ee21aa.scope/container/memory.events
Jun 15 16:51:11 cirrus-task-6083635929939968 peaceful_bhaskara[93450]: line 488
Jun 15 16:51:11 cirrus-task-6083635929939968 conmon[93450]: conmon 6a82177ee9e67f96f12c <nwarn>: stdio_input read failed Input/output error

Luap99 · 2023-06-27T12:47:12Z

Ok I guess the first step is to actually make ginkgo print the full matcher output so we can at least know which element is missing. Because AFAICS podman logs printed all lines we would expect.

Sometimes this tests flakes but in the CI log I see all expected lines printed but still for some reason the matcher fails. Right now it will truncate the array so it is not possible to verify what the matcher sees. Change this be removing the truncate limit for this specific test only. see containers#18501 Signed-off-by: Paul Holzinger <[email protected]>

edsantiago · 2023-06-29T12:14:27Z

Here's an interesting variation: remote f38 root. The interesting thing about it is journalctl being run, and not seeing any output (vs podman logs).

edsantiago · 2023-08-07T20:03:58Z

So, it's not just journald: tests fail in json-file too: f38 rootless, rawhide rootless. We've seen json-file flakes here before, but most failures are journald. Two json flakes in one run seems unusual.

Does this help chase down the bug?

edsantiago · 2023-08-08T01:01:28Z

I have a reproducer with json-file, nothing special needed, just a laptop (mine is f38):

$ nl="        <----- basically doublequote, enter, doublequote, enter
"
$ cr="^M"      <-------- control-V for quote-next-character, control-M for CR
$ while :;do cid=$(bin/podman run --log-driver json-file -dt quay.io/libpod/testimage:20221018 sh -c "echo podman;echo podman;echo podman");bin/podman wait $cid;logs=$(bin/podman logs --tail 2 $cid); test "$logs" = "podman${cr}${nl}podman${cr}" || break;bin/podman rm $cid;done
...
0
$ echo "$logs"
podman

$ echo "$logs" | cat -vET
podman^Mpodman$         <<<---------- Ooooh, isn't this interesting!
^M$
$ bin/podman logs $cid | cat -vET
podman^M$
podman^M$
podman^M$

edsantiago · 2023-08-08T02:05:09Z

Another failure (it doesn't take long). More info:

$ echo "$logs"|cat -vET
^M$
podman^M$
$ bin/podman logs $cid | cat -vET
podman^M$
podman^M$
podman^M$
$ bin/podman logs --tail 2 $cid | cat -vET
^M$
podman^M$                        <<<<---- OK, at least it's consistent
$ cat -vET ~/.local/share/containers/storage/overlay-containers/$cid/userdata/ctr.log
2023-08-07T19:56:34.223758260-06:00 stdout F podman^M$
2023-08-07T19:56:34.223758260-06:00 stdout P podman$
2023-08-07T19:56:34.223816701-06:00 stdout F ^M$
2023-08-07T19:56:34.223816701-06:00 stdout F podman^M$

I doubt that this is connected in any way to the journald bug, since that one reproduces easily without podman or conmon anywhere (or even installed), but this is still a bug.

Luap99 · 2023-08-08T08:38:24Z

Thanks for the reproducer, yes this seems to be a bug in our log reader.
Basically with --tail we read last N lines from the file which breaks when it contains partial (P) lines. If we read the file backwards we see two full (F) lines so it looks good but F basically only means append newline, the correct way would be to continue reading backwards until the next F line and we must keep all partial lines in between as well.

edsantiago · 2023-08-08T11:24:40Z

Yes, precisely. Filed #19545.

...to reduce flakes. Reason: journald makes no guarantees. Just because a systemd job has finished, or podman has written+flushed log entries, doesn't mean that journald will actually know about them: systemd/systemd#28650 Workaround: wrap some podman-logs tests inside Eventually() so they will be retried when log == journald This addresses, but does not close, containers#18501. That's a firehose, with many more failures than I can possibly cross-reference. I will leave it open, then keep monitoring missing-logs flakes over time, and pick those off as they occur. Signed-off-by: Ed Santiago <[email protected]>

edsantiago · 2023-10-25T16:51:54Z

Hi, this is still a big source of frustration. Any new insights?

debian-13 : sys podman debian-13 rootless host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 09-27-2023 10:38 in [sys] podman logs - --since --follow journald
fedora-37 : sys podman fedora-37 root host boltdb
- PR podman machine: disable zincati update service #20190
  - 09-28 15:08 in [sys] podman kube play with umask from containers.conf
fedora-38 : int remote fedora-38 root host sqlite [remote]
- PR Ed's pet PR with no flake retries #17831
  - 09-14-2023 21:34 in Podman play kube podman play kube test with sysctl defined
fedora-38 : sys podman fedora-38 root host boltdb
- PR Support size option when creating tmpfs volumes #20451
  - 10-23 17:42 in [sys] podman kube play with configmaps
- PR Use node hostname in kube play when hostNetwork=true #20385
  - 10-19 15:25 in [sys] podman kube play with umask from containers.conf
  - 10-19 14:27 in [sys] podman kube play with configmaps
  - 10-19 13:33 in [sys] podman kube play with configmaps
- PR Add TERM iff TERM not defined in container when podman exec -t #20357
  - 10-18 18:22 in [sys] podman kube play with configmaps
  - 10-18 16:46 in [sys] podman kube play with umask from containers.conf
- PR CI: test overlay and vfs #20161
  - 10-17 15:44 in [sys] sdnotify : play kube - with policies
fedora-38 : sys podman fedora-38 root host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 10-25 11:30 in [sys] sdnotify : play kube - with policies
  - 10-05 18:34 in [sys] sdnotify : play kube - with policies
fedora-38 : sys podman fedora-38 rootless host boltdb
- PR CI: test overlay and vfs #20161
  - 10-17 15:43 in [sys] sdnotify : play kube - with policies
  - 10-10 15:23 in [sys] sdnotify : play kube - with policies
fedora-38 : sys podman fedora-38 rootless host sqlite
- PR Ed's pet PR with no flake retries #17831
  - 10-09 17:47 in [sys] sdnotify : play kube - with policies

Seen in: int+sys podman+remote debian-13+fedora-37+fedora-38 root+rootless host boltdb+sqlite

Add a wait_for_ready() to one kube-play test, to make sure container output has made it to the journal. Probably does not fix containers#18501, but I think it might fix its most common presentation. Signed-off-by: Ed Santiago <[email protected]>

edsantiago added flakes Flakes from Continuous Integration kind/bug Categorizes issue or PR as related to a bug. labels May 8, 2023

edsantiago changed the title ~~e2e: log tail 800 lines: flaking again, journald & json-file~~ podman logs: missing output May 31, 2023

edsantiago mentioned this issue May 31, 2023

e2e: play kube seccomp: missing output in logs #18613

Closed

edsantiago mentioned this issue Jun 20, 2023

Epic: ginkgo: remove -flakeAttempts 3 #17967

Closed

edsantiago mentioned this issue Jun 21, 2023

e2e: fix one of the many log flakes #18959

Merged

Luap99 mentioned this issue Jun 27, 2023

debug tail 800 lines flake #19008

Merged

edsantiago mentioned this issue Jul 13, 2023

executable file /catatonit [et al] not found in $PATH: No such file or directory #17042

Closed

edsantiago mentioned this issue Aug 8, 2023

podman logs --tail, with json-file, misses partial lines #19545

Closed

edsantiago mentioned this issue Jan 11, 2024

systests: kube with policies test: fix race #21234

Merged

openshift-merge-bot bot closed this as completed in #21234 Jan 15, 2024

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Apr 15, 2024

github-actions bot locked as resolved and limited conversation to collaborators Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

podman logs: missing output #18501

podman logs: missing output #18501

edsantiago commented May 8, 2023

edsantiago commented May 8, 2023

Luap99 commented May 8, 2023

edsantiago commented May 8, 2023

Luap99 commented May 8, 2023

edsantiago commented May 31, 2023

edsantiago commented Jun 20, 2023

vrothberg commented Jun 21, 2023

vrothberg commented Jun 21, 2023

Luap99 commented Jun 21, 2023

edsantiago commented Jun 21, 2023

edsantiago commented Jun 27, 2023

Luap99 commented Jun 27, 2023 •

edited

Loading

edsantiago commented Jun 27, 2023

Luap99 commented Jun 27, 2023

edsantiago commented Jun 29, 2023

edsantiago commented Aug 7, 2023

edsantiago commented Aug 8, 2023

edsantiago commented Aug 8, 2023

Luap99 commented Aug 8, 2023

edsantiago commented Aug 8, 2023

edsantiago commented Oct 25, 2023

podman logs: missing output #18501

podman logs: missing output #18501

Comments

edsantiago commented May 8, 2023

edsantiago commented May 8, 2023

Luap99 commented May 8, 2023

edsantiago commented May 8, 2023

Luap99 commented May 8, 2023

edsantiago commented May 31, 2023

edsantiago commented Jun 20, 2023

vrothberg commented Jun 21, 2023

vrothberg commented Jun 21, 2023

Luap99 commented Jun 21, 2023

edsantiago commented Jun 21, 2023

edsantiago commented Jun 27, 2023

Luap99 commented Jun 27, 2023 • edited Loading

edsantiago commented Jun 27, 2023

Luap99 commented Jun 27, 2023

edsantiago commented Jun 29, 2023

edsantiago commented Aug 7, 2023

edsantiago commented Aug 8, 2023

edsantiago commented Aug 8, 2023

Luap99 commented Aug 8, 2023

edsantiago commented Aug 8, 2023

edsantiago commented Oct 25, 2023

Luap99 commented Jun 27, 2023 •

edited

Loading