-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
podman logs: missing output #18501
Comments
I wonder if this could be a different manifestation of the same bug? Basically, podman runs a systemd container, spins waiting for " |
I see some weird output formatting in the json-file case:
However I also see this in tests which pass so I am not sure if this is caused by ginkgo or the logformatter maybe? The journal case is more concerning, we are actually missing 261 lines in the log. In any case each line also prints a |
As a general rule I like removing |
yeah with |
Repurposing this issue as a gathering place for all "podman logs missing output" flakes.
|
Hey, is anyone looking into this? It's a serious problem, hitting a lot of different tests (suggesting it's not a badly written test). I would hate for customers to start hitting this in the field.
|
I am flooded with upstream issues and bugs at the moment but totally agree with you Ed. @mheon, can we schedule a bug week soon-ish? |
It's bookmarked, so I will take a look when I have some cycles (no ETA). |
Some tests are definitely broken (missing wait or should not use run -d, i.e. The 800 lines thing is interesting because the output is screwed up with the This one is odd, looks like the linker is broken: https://api.cirrus-ci.com/v1/artifact/task/6365092854366208/html/int-podman-debian-12-rootless-host-sqlite.log.html#t--podman-remote-pod-logs-test--1 |
Oh, I didn't look closely enough at that one. It's probably #17042, another scary-nasty flake. I've reclassified it. |
A few tests were doing "podman run -d" + "podman logs". This is racy. Remove the unnecessary "-d". And, as long as we're mucking around in here: - remove the "-t" from the 800-lines test, so we get clean output without ^Ms - remove unnecessary "sh", "-c" from simple echo commands - add actual error-message checks to two places that were only checking exit status Resolves one (not all) of the flakes tracked in containers#18501 Signed-off-by: Ed Santiago <[email protected]>
Here's one from yesterday, 800-line test, journald. It's a humongous failure, expecting 800 lines, getting 551. Looking at the journal, search in-page for "
Looks like a smoking gun to me. |
Yeah that is definitely suspicious and in fact we miss the logs after that line but there are also other test failures were we actually saw all lines in the log so I am not sure it is the only bug. |
The new logformatter adds direct links to journal, making it much easier to look into these. So like here's one of those 800-line failures,
|
Ok I guess the first step is to actually make ginkgo print the full matcher output so we can at least know which element is missing. Because AFAICS podman logs printed all lines we would expect. |
Sometimes this tests flakes but in the CI log I see all expected lines printed but still for some reason the matcher fails. Right now it will truncate the array so it is not possible to verify what the matcher sees. Change this be removing the truncate limit for this specific test only. see containers#18501 Signed-off-by: Paul Holzinger <[email protected]>
Here's an interesting variation: remote f38 root. The interesting thing about it is |
So, it's not just journald: tests fail in Does this help chase down the bug? |
I have a reproducer with $ nl=" <----- basically doublequote, enter, doublequote, enter
"
$ cr="^M" <-------- control-V for quote-next-character, control-M for CR
$ while :;do cid=$(bin/podman run --log-driver json-file -dt quay.io/libpod/testimage:20221018 sh -c "echo podman;echo podman;echo podman");bin/podman wait $cid;logs=$(bin/podman logs --tail 2 $cid); test "$logs" = "podman${cr}${nl}podman${cr}" || break;bin/podman rm $cid;done
...
0
$ echo "$logs"
podman
$ echo "$logs" | cat -vET
podman^Mpodman$ <<<---------- Ooooh, isn't this interesting!
^M$
$ bin/podman logs $cid | cat -vET
podman^M$
podman^M$
podman^M$ |
Another failure (it doesn't take long). More info: $ echo "$logs"|cat -vET
^M$
podman^M$
$ bin/podman logs $cid | cat -vET
podman^M$
podman^M$
podman^M$
$ bin/podman logs --tail 2 $cid | cat -vET
^M$
podman^M$ <<<<---- OK, at least it's consistent
$ cat -vET ~/.local/share/containers/storage/overlay-containers/$cid/userdata/ctr.log
2023-08-07T19:56:34.223758260-06:00 stdout F podman^M$
2023-08-07T19:56:34.223758260-06:00 stdout P podman$
2023-08-07T19:56:34.223816701-06:00 stdout F ^M$
2023-08-07T19:56:34.223816701-06:00 stdout F podman^M$ I doubt that this is connected in any way to the journald bug, since that one reproduces easily without podman or conmon anywhere (or even installed), but this is still a bug. |
Thanks for the reproducer, yes this seems to be a bug in our log reader. |
Yes, precisely. Filed #19545. |
...to reduce flakes. Reason: journald makes no guarantees. Just because a systemd job has finished, or podman has written+flushed log entries, doesn't mean that journald will actually know about them: systemd/systemd#28650 Workaround: wrap some podman-logs tests inside Eventually() so they will be retried when log == journald This addresses, but does not close, containers#18501. That's a firehose, with many more failures than I can possibly cross-reference. I will leave it open, then keep monitoring missing-logs flakes over time, and pick those off as they occur. Signed-off-by: Ed Santiago <[email protected]>
Hi, this is still a big source of frustration. Any new insights?
Seen in: int+sys podman+remote debian-13+fedora-37+fedora-38 root+rootless host boltdb+sqlite |
Add a wait_for_ready() to one kube-play test, to make sure container output has made it to the journal. Probably does not fix containers#18501, but I think it might fix its most common presentation. Signed-off-by: Ed Santiago <[email protected]>
Add a wait_for_ready() to one kube-play test, to make sure container output has made it to the journal. Probably does not fix containers#18501, but I think it might fix its most common presentation. Signed-off-by: Ed Santiago <[email protected]>
Add a wait_for_ready() to one kube-play test, to make sure container output has made it to the journal. Probably does not fix containers#18501, but I think it might fix its most common presentation. Signed-off-by: Ed Santiago <[email protected]>
Add a wait_for_ready() to one kube-play test, to make sure container output has made it to the journal. Probably does not fix containers#18501, but I think it might fix its most common presentation. Signed-off-by: Ed Santiago <[email protected]>
Add a wait_for_ready() to one kube-play test, to make sure container output has made it to the journal. Probably does not fix containers#18501, but I think it might fix its most common presentation. Signed-off-by: Ed Santiago <[email protected]>
Add a wait_for_ready() to one kube-play test, to make sure container output has made it to the journal. Probably does not fix containers#18501, but I think it might fix its most common presentation. Signed-off-by: Ed Santiago <[email protected]>
Different CI runs on different days. root/rootless, f37/38, journald/json-file
Echoes of #14362, except these failures are on
main
and include thewait
fix.The text was updated successfully, but these errors were encountered: