Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up artifact reporting for ci-operator jobs #867

Merged
merged 1 commit into from
May 18, 2018

Conversation

smarterclayton
Copy link
Contributor

No description provided.

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label May 18, 2018
@smarterclayton smarterclayton merged commit 0cfbe4e into openshift:master May 18, 2018
@openshift-ci-robot
Copy link
Contributor

@smarterclayton: Updated the config configmap from file cluster/ci/config/prow/config.yaml

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking added a commit to wking/openshift-release that referenced this pull request Oct 4, 2018
With 10 pulls going at once.

These are currently generating a lot of error messages.  From recent
openshift/installer#415 tests [1]:

  $ oc project ci-op-w11cl72x
  $ oc logs e2e-aws -c teardown --timestamps
  2018-10-04T18:17:06.557740109Z Gathering artifacts ...
  2018-10-04T18:17:24.875374828Z Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log)
  ...
  2018-10-04T18:17:29.331684772Z Error from server (Forbidden): Forbidden (user=kube-apiserver, verb=get, resource=nodes, subresource=log)
  2018-10-04T18:17:29.351919855Z Error from server (NotFound): the server could not find the requested resource
  2018-10-04T18:17:39.592948165Z Error from server (BadRequest): previous terminated container "registry" in pod "registry-b6df966cf-fkhpl" not found
  ...
  2018-10-04T18:29:24.457841097Z Error from server (BadRequest): previous terminated container "kube-addon-operator" in pod "kube-addon-operator-775d4c8f8d-289zm" not found
  2018-10-04T18:29:24.466213055Z Waiting for node logs to finish ...
  2018-10-04T18:29:24.466289887Z Deprovisioning cluster ...
  2018-10-04T18:29:24.483065903Z level=debug msg="Deleting security groups"
  ...
  2018-10-04T18:33:29.857465158Z level=debug msg="goroutine deleteVPCs complete"

So 12 minutes to pull the logs, followed by four minutes for
destroy-cluster.

Looking at the extracted logs, lots of them are zero (which compresses
to 20 bytes):

  $ POD_LOGS="$(w3m -dump https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_installer/415/pull-ci-openshift-installer-master-e2e-aws/456/artifacts/e2e-aws/pods/)"
  $ echo "${POD_LOGS}" | grep '^ *20$' | wc -l
  86
  $ echo "${POD_LOGS}" | grep '\[file\]' | wc -l
  172

So it's possible that the delay is due to the errors, or to a few
large logs blocking the old, serial pod/container pulls.

With this commit, I've added a new 'queue' command.  This command
checks to see how many background jobs we have using 'jobs' [2], and
idles until we get below 10.  Then it launches its particular command
in the background.  By using 'queue', we'll keep up to 10 log-fetches
running in parallel, and the final 'wait' will block for any which
still happen to be running by that point.

The previous gzip invocations used -c, which dates back to 82d333e
(Set up artifact reporting for ci-operator jobs, 2018-05-17, openshift#867).
But with these gzip filters running on stdin anyway, the -c was
superfluous.  I've dropped it in this commit.

Moving redirect target to a positional argument is a bit cludgy.  I'd
rather have a more familiar way of phrasing that redirect, but passing
it in as ${1} was the best I've come up with.

[1]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/415/pull-ci-openshift-installer-master-e2e-aws/456/build-log.txt
[2]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/jobs.html
@smarterclayton smarterclayton deleted the prow2 branch April 24, 2019 14:01
derekhiggins pushed a commit to derekhiggins/release that referenced this pull request Oct 24, 2023
…#867)

Convert OS Image cache to use new install-config interface
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants