Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster-launch-installer-e2e: Only pull pod and container logs on failures #1815

Closed
wants to merge 1 commit into from

Conversation

wking
Copy link
Member

@wking wking commented Oct 4, 2018

These are currently generating a lot of error messages. From here (testing openshift/installer#415):

Gathering artifacts ...
Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log)
Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log)
Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log)
Error from server (NotFound): the server could not find the requested resource
...
Error from server (BadRequest): previous terminated container "registry" in pod "registry-b6df966cf-fkhpl" not found
Error from server (BadRequest): previous terminated container "kube-apiserver" in pod "kube-apiserver-2hf2w" not found
Error from server (BadRequest): previous terminated container "kube-apiserver" in pod "kube-apiserver-7pgl9" not found
...

Looking at the extracted logs, lots of them are zero (which compresses to 20 bytes):

$ POD_LOGS="$(w3m -dump https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_installer/415/pull-ci-openshift-installer-master-e2e-aws/456/artifacts/e2e-aws/pods/)"
$ echo "${POD_LOGS}" | grep '^ *20$' | wc -l
86
$ echo "${POD_LOGS}" | grep '\[file\]' | wc -l
172

And, possibly because of the errors?, the commands are slow with oneof the above lines coming out every second or so. The teardown container obviously does some other things as well, but it's taking a significant chunk of our e2e-aws time. From here:

2018/10/04 17:59:00 Running pod e2e-aws
2018/10/04 18:03:25 Container setup in pod e2e-aws completed successfully
2018/10/04 18:16:37 Container test in pod e2e-aws completed successfully
2018/10/04 18:33:31 Container teardown in pod e2e-aws completed successfully
2018/10/04 18:33:31 Pod e2e-aws succeeded after 34m31s

So 4.5 minutes to setup, 13 minutes to test, and 17 minutes to teardown.

When the test pass, we probably aren't going to be poking around in the logs, so drop log acquisition in those cases to speed up our CI.

CC @abhinavdahiya

…lures

These are currently generating a lot of error messages.  From [1]
(testing openshift/installer#415):

  Gathering artifacts ...
  Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log)
  Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log)
  Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log)
  Error from server (NotFound): the server could not find the requested resource
  ...
  Error from server (BadRequest): previous terminated container "registry" in pod "registry-b6df966cf-fkhpl" not found
  Error from server (BadRequest): previous terminated container "kube-apiserver" in pod "kube-apiserver-2hf2w" not found
  Error from server (BadRequest): previous terminated container "kube-apiserver" in pod "kube-apiserver-7pgl9" not found
  ...

Looking at the extracted logs, lots of them are zero (which compresses
to 20 bytes):

  $ POD_LOGS="$(w3m -dump https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_installer/415/pull-ci-openshift-installer-master-e2e-aws/456/artifacts/e2e-aws/pods/)"
  $ echo "${POD_LOGS}" | grep '^ *20$' | wc -l
  86
  $ echo "${POD_LOGS}" | grep '\[file\]' | wc -l
  172

And, possibly because of the errors?, the commands are slow with one
of the above lines coming out every second or so.  The teardown
container obviously does some other things as well, but it's taking a
significant chunk of our e2e-aws time [2]:

  2018/10/04 17:59:00 Running pod e2e-aws
  2018/10/04 18:03:25 Container setup in pod e2e-aws completed successfully
  2018/10/04 18:16:37 Container test in pod e2e-aws completed successfully
  2018/10/04 18:33:31 Container teardown in pod e2e-aws completed successfully
  2018/10/04 18:33:31 Pod e2e-aws succeeded after 34m31s

So 4.5 minutes to setup, 13 minutes to test, and 17 minutes to
teardown.

When the test pass, we probably aren't going to be poking around in
the logs, so drop log acquisition in those cases to speed up our CI.

[1]: https://api.ci.openshift.org/console/project/ci-op-w11cl72x/browse/pods/e2e-aws?tab=logs
[2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/415/pull-ci-openshift-installer-master-e2e-aws/456/build-log.txt
@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 4, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2018
@smarterclayton
Copy link
Contributor

I want pod logs all the time - there is valuable info even on success

@smarterclayton
Copy link
Contributor

If the problem is speed, we can make them faster

@wking
Copy link
Member Author

wking commented Oct 4, 2018

If the problem is speed, we can make them faster

I'm happy to work up anything that helps with this, if you point me in the direction you want me to go ;).

@wking wking closed this Oct 4, 2018
@wking wking deleted the only-get-logs-on-failures branch October 4, 2018 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants