Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed e2e tests don't appear to be stored #910

Closed
NeilW opened this issue Sep 27, 2019 · 6 comments · Fixed by #916
Closed

Failed e2e tests don't appear to be stored #910

NeilW opened this issue Sep 27, 2019 · 6 comments · Fixed by #916
Assignees
Labels

Comments

@NeilW
Copy link

NeilW commented Sep 27, 2019

What steps did you take and what happened:
e2e fails some conformance tests, but sonobuoy retrieve doesn't store the failures.

$ sonobuoy e2e 201909261134_sonobuoy_b7635f06-19b1-4037-9c46-e6b35dd62ece.tar.gz 
ERRO[0000] could not get tests from archive: failed to find results file "plugins/e2e/results/global/junit_01.xml" in archive

What did you expect to happen:

To be able to inspect and list the failures that have occurred.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Running the logs on the e2e containers shows that the test ran and displayed errors. Two sig-network tests failed.

The e2e container has gone once the run completes.

Environment:

  • Sonobuoy version: 0.16.0
  • Kubernetes version: (use kubectl version):
    Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-18T14:27:17Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version:
    kubeadm version: &version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.0", GitCommit:"2bd9643cee5b3b3a5ecbd3af49d09018f0773c77", GitTreeState:"clean", BuildDate:"2019-09-18T14:34:01Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    Brightbox
  • OS (e.g. from /etc/os-release):
    Ubuntu 18.04.3 LTS
  • Sonobuoy tarball (which contains * below)
@johnSchnake
Copy link
Contributor

Could you share your tarball with me?

This is a somewhat common failure mode where the plugin fails to report the results to Sonobuoy.
Only because Sonobuoy gathers logs afterwards do we see that tests ran. You can usually figure out what went wrong with the reporting with the info in the rest of the tarball.

Without seeing the logs it is hard to diagnose though.

@NeilW
Copy link
Author

NeilW commented Sep 28, 2019

The logs from the e2e container (captured in a terminal session running sonobuoy logs -p e2e -f)

Sep 28 08:19:20.448: INFO: Running AfterSuite actions on all nodes
Sep 28 08:19:20.448: INFO: Running AfterSuite actions on node 1
Sep 28 08:19:20.448: INFO: Skipping dumping logs from cluster


Summarizing 2 Failures:

[Fail] [sig-network] Services [It] should be able to change the type from ExternalName to NodePort [Conformance] 
/workspace/anago-v1.16.0-rc.2.1+2bd9643cee5b3b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/service/jig.go:821

[Fail] [sig-network] Services [It] should be able to create a functioning NodePort service [Conformance] 
/workspace/anago-v1.16.0-rc.2.1+2bd9643cee5b3b/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/service/jig.go:821

Ran 22 of 4897 Specs in 752.071 seconds
FAIL! -- 20 Passed | 2 Failed | 0 Pending | 4875 Skipped
--- FAIL: TestE2E (752.21s)
FAIL

Ginkgo ran 1 suite in 12m34.479661122s
Test Suite Failed

namespace="sonobuoy" pod="sonobuoy-e2e-job-91599be9d346456e" container="sonobuoy-worker"
time="2019-09-28T08:24:23Z" level=info msg="received a signal. Waiting then sending the real shutdown signal." signal=terminated
$ sonobuoy status
         PLUGIN     STATUS    RESULT   COUNT
            e2e     failed   unknown       1
   systemd-logs   complete    passed       1
   systemd-logs   complete                 3

Sonobuoy has completed. Use `sonobuoy retrieve` to get results.
$ sonobuoy retrieve
201909280805_sonobuoy_94247daa-c113-4ad4-a100-5079409bdebe.tar.gz

@NeilW
Copy link
Author

NeilW commented Sep 28, 2019

@johnSchnake
Copy link
Contributor

johnSchnake commented Sep 28, 2019

Thanks so much for the logs.

Some things expected, some things not:
Expected:

  • the aggregator doesnt ever report being given results for that plugin
  • as a result it as an errors directory for the plugin instead of a results dir. The reason for the failure it says is the pod exited without reporting results.

Unexpected

  • So the tests do seem to run fully (22 tests expected and I see them all reported on in the logs)
  • I dont see any timeouts/odd exit reasons

Note:
The e2e container did terminate with a non-zero status and I am concerned that it is this, and some flaw in the upstream e2e image, that caused the reporting to not take place:

"exitCode": 1,
"finishedAt": "2019-09-28T08:19:20Z",
"reason": "Error",

2 possible things right now I'm most concerned about:

  • either the v1.16.0 conformance image upstream can exit without reporting results
  • sonobuoy is inappropriately recording it as a failure before it has time for the results to be uploaded.

Continuing to investigate.

@johnSchnake
Copy link
Contributor

From pod info, the container exited 1 at

"finishedAt": "2019-09-28T08:19:20Z"

And in the Sonobuoy logs, it stopped waiting for results at:

time="2019-09-28T08:24:20Z" level=info msg="Last update to annotations on exit"

So it did wait 5 more miniutes after the container terminated to see results.

  1. I think more logging was needed here to make this clear. There are logs in place
    for when we get results, but apparently the logs are not sufficiently clear when
    this error mode occurs.
  2. The error reported mentioned the termination, it would have also been helpful to list the
    time that the error message was generated (so I didn't have to cross-reference the other set of logs)
  3. It does seem to be an issue with the upstream issue. Going to take a look there and try
    to repro. This sort of thing has happened before unfortunately and caused us to push our own
    conformance image to patch the issue.

@johnSchnake
Copy link
Contributor

So it does seem the problem lays in the upstream image.

kubernetes/kubernetes@2242718#diff-0efd26687540ba2cbfd9547019622551R70 modified the code to run the tests in the foreground and eliminated a race.

However, when it exits non-zero it causes the script to exit immediately without saving results and sending them to sonobuoy.

Remediation:
I'll have to fix the upstream image bash runner, but this is part of the reason I personally wanted to get away from a bash script like this (clarity, logging, testing, etc).

You can leverage the go-based runner which was added this release as well by setting E2E_USE_GO_RUNNER=true.

It may be that we need to make this default for Sonobuoy sooner than expected. We wanted to use it in our testing but let it soak a bit but if the e2e image isn't reporting results in failure situations then we need to address it.

We may be able to publish a new e2e conformance image once fixing it, but in the past, I think we had to just wait until the next release, especially if Sonobuoy was able to mitigate the problem.

@johnSchnake johnSchnake added kind/bug Behavior isn't as expected or intended area/k8s-upstream p0-highest labels Sep 28, 2019
johnSchnake added a commit that referenced this issue Sep 29, 2019
In Kubernetes 1.16.0 a problem in the bash runner of the conformance
tests caused the plugin to fail to report results whenever a test
failure occured.

This problem does not impact the golang-based runner which was
introduced also in Kubernetes 1.16.0. By default we would like to use
this feature as a result. It is a new feature but has been shown
so far to be functional and an improvement over the bash-based
runner (e.g. like this issue in general).

Users of Sonobuoy v0.16.0 should update to a version which utilizes
this env var by default, but they can also specify
--plugin-env e2e.E2E_USE_GO_RUNNER=true to workaround the issue.

If users of new versions of Sonobuoy wish to NOT use the go runner
they can also set --plugin-env e2e.E2E_USE_GO_RUNNER to unset the value.

Fixes #910

Signed-off-by: John Schnake <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants