Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boskos issues causing CI/presubmits to fail #638

Closed
adrcunha opened this issue Mar 28, 2019 · 12 comments
Closed

Boskos issues causing CI/presubmits to fail #638

adrcunha opened this issue Mar 28, 2019 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@adrcunha
Copy link
Contributor

We've been seeing failures with acquiring projects from boskos for the CI/presubmit flows.

A generic "resource not found" error:

https://storage.googleapis.com/knative-prow/logs/ci-knative-serving-continuous/1111055658830008320/build-log.txt

Same error, but with permission denials listed:

https://gubernator.knative.dev/build/knative-prow/pr-logs/pull/knative_eventing/951/pull-knative-eventing-integration-tests/1110601542886494209/

One possibility is that the boskos pool could be eventually exhausted at peak times.

@adrcunha adrcunha added the bug Something isn't working label Mar 28, 2019
@adrcunha
Copy link
Contributor Author

We're using a pretty old boskos image (https://github.com/knative/test-infra/blob/master/ci/prow/boskos/config.yaml#L56), we should at least update it to get the latest improvements and bugfixes.

@steuhs
Copy link
Contributor

steuhs commented Mar 28, 2019

This is the same issue as #625

@adrcunha
Copy link
Contributor Author

From 625, another example of permission denied errors:

Mar 25th
Mar 26th

https://gubernator.knative.dev/build/knative-prow/pr-logs/pull/knative_eventing/951/pull-knative-eventing-integration-tests/1110603806170681344/

I0326 18:06:23.420] 2019/03/26 18:06:23 main.go:312: Something went wrong: failed to prepare test environment: --provider=gke boskos failed to acquire project: resource not found
I0326 18:06:23.422] Test subprocess exited with code 1
I0326 18:06:45.429] ERROR: (gcloud.compute.target-pools.list) Some requests did not succeed:
I0326 18:06:45.429] - Required 'compute.targetPools.list' permission for 'projects/knative-tests'
I0326 18:06:45.429]
I0326 18:07:07.246] ERROR: (gcloud.compute.target-pools.list) Some requests did not succeed:
I0326 18:07:07.246] - Required 'compute.targetPools.list' permission for 'projects/knative-tests'
I0326 18:07:07.247]
I0326 18:07:07.318] Artifacts were written to /workspace/_artifacts
I0326 18:07:07.319] Test result code is 1
I0326 18:07:07.322] ==================================
I0326 18:07:07.322] ==== INTEGRATION TESTS FAILED ====
I0326 18:07:07.322] ==================================

@adrcunha
Copy link
Contributor Author

We can ignore the permission denied issues, they're not related to boskos. They actually come from the network cleanup code in e2e-tests.sh:

local http_health_checks="$(gcloud compute target-pools list \

@adrcunha
Copy link
Contributor Author

In addition to updating the boskos service, I also suggest expanding the pool (it definitely won't hurt).

@adrcunha
Copy link
Contributor Author

@steuhs @jessiezcc Just got pinged about this issue. Stephen, I think you should prioritize that; at least the initial tentative fixes I suggested.

steuhs added a commit that referenced this issue Mar 29, 2019
The existing one is pretty outdated and may lead to the issue here #638
knative-prow-robot pushed a commit that referenced this issue Mar 29, 2019
* update boskos version

The existing one is pretty outdated and may lead to the issue here #638

* update outdated reaper and janitor image
@steuhs
Copy link
Contributor

steuhs commented Mar 29, 2019

@adrcunha by expanding the pool, I guess you mean to increase the number for E2E_MAX_CLUSTER_NODES?

@evankanderson
Copy link
Member

I have several small cleanup PRs that seem to be hitting this.

Is there some way that we can queue (rather than fail) the e2e tests when the Boskos pool is empty? Something like a max_concurrency for serverless functions?

@adrcunha
Copy link
Contributor Author

@steuhs No. There are instructions about expanding the pool in the ci/prow dir README.
@evankanderson Sorry, there isn't. kubetest kinda interacts under the hood with boskos.

@chaodaiG
Copy link
Contributor

chaodaiG commented Apr 1, 2019

The permission denied issue is becoming worse, it has failed at least latest 10 continuous runs in serving:
https://testgrid.knative.dev/knative-serving#continuous

@adrcunha
Copy link
Contributor Author

adrcunha commented Apr 1, 2019

As I mentioned in #638 (comment) the permission denied error is a red herring. I believe Stephen is working on expanding the boskos pool, hopefully that will solve (or at least alleviate) the problem.

steuhs added a commit to steuhs/test-infra-1 that referenced this issue Apr 1, 2019
steuhs added a commit to steuhs/test-infra-1 that referenced this issue Apr 1, 2019
knative-prow-robot pushed a commit that referenced this issue Apr 1, 2019
steuhs added a commit to steuhs/test-infra-1 that referenced this issue Apr 1, 2019
@steuhs
Copy link
Contributor

steuhs commented Apr 1, 2019

knative-prow-robot pushed a commit that referenced this issue Apr 2, 2019
* update boskos images and flags

Fixed #638

* use images that exists

also the most recent ones with the same tags
Cynocracy pushed a commit to Cynocracy/test-infra that referenced this issue Jun 13, 2020
…native#638)

* Introduce error util ErrInvalidCombination for invalid combination

Sometimes valid value becomes invalid value by combination.

example 1. knative/serving#5382
example 2. following combination in `spec.traffic`.

```
  traffic:
  - latestRevision: true
    revisionName: hello-example-dk7nd
    percent: 100
```

But there are no error util for them, so we need to create
custom error like knative/serving@c1583f3
or `ErrInvalidValue`.

The custom error will make code complicated and `ErrInvalidValue` is
not debug friendly.

To solve it, this patch introduces an util func `ErrInvalidCombination`.

* Introduce ErrGeneric instead of ErrInvalidCombination
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants