Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: ginkgo: ExitWithError() considered harmful #18188

Closed
edsantiago opened this issue Apr 13, 2023 · 14 comments
Closed

Epic: ginkgo: ExitWithError() considered harmful #18188

edsantiago opened this issue Apr 13, 2023 · 14 comments
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

Deep-diving into fixing logformatter so it handles ginkgo v2, I found a log that included our number-one flake :

# podman-remote [options] exec -it test ip addr show eth0
Error: container create failed (no logs from conmon): .....

...but the test in question passed! TL;DR the root cause is:

exec = podmanTest.Podman([]string{"exec", "-it", "test", "ip", "addr", "show", "eth0"})
exec.WaitWithDefaultTimeout()
Expect(exec).Should(ExitWithError())

This is one of my pet peeves. All of you have heard me rant angrily about that, but I'm going to repeat it here anyway: please, please do not just check that "some error occured"! Everywhere possible—and it's usually possible—tests should:

  • Check for a PRECISE EXIT CODE; and
  • Check for an ERROR MESSAGE.

I count over 200 instances of ExitWithError under tests/e2e. Those need to be fixed. I'll tackle some of them over the next few weeks/months/years, but please, everyone, when writing new tests, please please please be more careful.

@vrothberg
Copy link
Member

vrothberg commented Apr 14, 2023

Fair to rephrase that just checking exit code != 0 will hide flakes of some entirely different error?

I also count 239 but will refrain from changing until @Luap99's PR migrating to v2 has merged.

@Luap99
Copy link
Member

Luap99 commented Apr 14, 2023

Given that podman mostly exits 125 anyway I see little problems with ExitWithError(), the big thing here is, as you said, that they do not check error messages which is a big problem.

The e2e test have unique advantage to the systems test they we get individual stdout/err streams. So if a test checks Expect(output).To(Equal("foo")) it will likely work in most cases even when there are unexpected logrus warnings/errors logged on stderr.

What each test should do is also Expect(syderr).To(Equal("")). And then in cases where expects an error match it correctly.

@edsantiago
Copy link
Member Author

I'm just going to leave this here:

It("podman system prune --all --external fails", func() {
prune := podmanTest.Podman([]string{"system", "prune", "--all", "--enternal"})
prune.WaitWithDefaultTimeout()
Expect(prune).Should(Exit(125))
})

The test passes. This is the kind of stuff that I'm always terrified will make its way into our code base.

@edsantiago
Copy link
Member Author

Here's a beautiful example of what I'm talking about: machine e2e tests. There is one failure, at the start: timeout waiting for machine. No other tests fail! A human reader can see error after error, but the tests don't actually check output, so la la la la. All those tests need major surgery to add message checking; I started doing that just now but it's too much to do in a day.

@Luap99
Copy link
Member

Luap99 commented May 9, 2023

Yeah there is a lot of nasty stuff in there. I am currently going through most problems I encountered while working on ginkgo v2. Will report an issue in the next few days. Poorly written tests are definitely at the top of that list.

@Luap99 Luap99 mentioned this issue May 11, 2023
13 tasks
@github-actions
Copy link

A friendly reminder that this issue had no activity for 30 days.

@rhatdan
Copy link
Member

rhatdan commented Jun 20, 2023

@edsantiago @Luap99 Is this still an issue?

@edsantiago
Copy link
Member Author

Very much so.

@Luap99
Copy link
Member

Luap99 commented Apr 4, 2024

@edsantiago Can this be closed with since you added all these stderr checks?

@edsantiago
Copy link
Member Author

No, this is completely different and unrelated. (The stderr checks I added were only for Exit(0)).

The purpose of this epic is:

-    Expect(ExitWithError)
+    Expect(ExactExitStatusNotJustSomeVagueErrorCode)
+    Expect(ExactErrorMessageOrAtLeastADistinctiveSubstring)

@Luap99
Copy link
Member

Luap99 commented Apr 4, 2024

ok, agreed somehow I thought you added many of them while you did your other changes.

@edsantiago
Copy link
Member Author

These are hard, and I'm lazy. For every individual ExitWithError() I need to manually find a test log, guess the exit code, identify the error message, see if it differs between root/rootless/remote, and add an ExpectSubstring() that matches all cases (or, yuk, a conditional).

What I'm thinking about now is a commit hook that prevents new ExitWithError. To at least prevent the problem from getting worse.

@Luap99
Copy link
Member

Luap99 commented Apr 4, 2024

These are hard, and I'm lazy. For every individual ExitWithError() I need to manually find a test log, guess the exit code, identify the error message, see if it differs between root/rootless/remote, and add an ExpectSubstring() that matches all cases (or, yuk, a conditional).

I wonder if this could be automated. We already have the logs and know the stderr output, so technically it would juts need to be captured then matched for common substrings and then added as check in the right place. IF there are enough of them I think it may be worth to automate this. Of course we still need review the results to make sure the error messages are actually the ones that should be happening in the tests.

What I'm thinking about now is a commit hook that prevents new ExitWithError. To at least prevent the problem from getting worse.

This sounds like a good first step.

edsantiago added a commit to edsantiago/libpod that referenced this issue Apr 10, 2024
...and an optional error-message string, to be checked
against stderr.

This is a starting point and baby-steps progress toward containers#18188.
There are 249 ExitWithError() checks in test/e2e. It will take
weeks to fix them all. This commit enables new functionality:

    Expect(ExitWithError(125, "expected substring"))

...while also allowing the current empty-args form. Once
all 249 empty-args uses are modernized, the matcher code
will be cleaned up.

I expect it will take several months of light effort to get
all e2e tests transitioned to the new form. I am choosing to
do so in pieces, for (relative) ease of review. This PR:

  1) makes the initial changes described above; and
  2) updates a small subset of e2e _test.go files such that:
     a) ExitWithError() is given an exit code and error string; and
     b) Exit(Nonzero) is changed to ExitWithError(Nonzero, "string")
        (when possible)

Signed-off-by: Ed Santiago <[email protected]>
@edsantiago
Copy link
Member Author

ExitWithError() now requires two args: exit status and error substring. I can't (automatically) enforce that new test code test for meaningful substrings, we will all have to be diligent in writing and reviewing new tests.

There's still the Expect(Should(Exit(N))) pattern. There's no way to block that, because it is sometimes impossible to use ExitWithError(). Again, we'll try to flag those in review and allow them only if nothing else will work.

Current list of exceptions that I can't clean up:

$ ack '\(Exit\([^0]' test/e2e
test/e2e/containers_conf_test.go:606:		Eventually(session, DefaultWaitTimeout).Should(Exit(125), description)
test/e2e/images_test.go:301:			Expect(session).Should(Exit(result))
test/e2e/info_test.go:43:			Expect(session).Should(Exit(tt.exitCode), desc)
test/e2e/play_kube_test.go:2093:		Expect(kube).Should(Exit(-1))
test/e2e/play_kube_test.go:2105:		Expect(exec).Should(Exit(-1))
test/e2e/quadlet_test.go:638:			Expect(session).Should(Exit(exitCode))
test/e2e/quadlet_test.go:708:			Expect(session).Should(Exit(1))
test/e2e/systemd_activate_test.go:137:		Expect(activateSession).To(Exit(125))
test/e2e/version_test.go:59:			Expect(session).Should(Exit(tt.exitCode), desc)

I consider this closed. This was an interesting challenge.

@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 1, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Oct 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

No branches or pull requests

4 participants