-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boskos issues causing CI/presubmits to fail #638
Comments
We're using a pretty old boskos image (https://github.com/knative/test-infra/blob/master/ci/prow/boskos/config.yaml#L56), we should at least update it to get the latest improvements and bugfixes. |
This is the same issue as #625 |
From 625, another example of permission denied errors: Mar 25th
|
We can ignore the permission denied issues, they're not related to boskos. They actually come from the network cleanup code in e2e-tests.sh: test-infra/scripts/e2e-tests.sh Line 190 in f2aa306
|
In addition to updating the boskos service, I also suggest expanding the pool (it definitely won't hurt). |
@steuhs @jessiezcc Just got pinged about this issue. Stephen, I think you should prioritize that; at least the initial tentative fixes I suggested. |
The existing one is pretty outdated and may lead to the issue here #638
* update boskos version The existing one is pretty outdated and may lead to the issue here #638 * update outdated reaper and janitor image
@adrcunha by expanding the pool, I guess you mean to increase the number for E2E_MAX_CLUSTER_NODES? |
I have several small cleanup PRs that seem to be hitting this. Is there some way that we can queue (rather than fail) the e2e tests when the Boskos pool is empty? Something like a max_concurrency for serverless functions? |
@steuhs No. There are instructions about expanding the pool in the ci/prow dir README. |
The permission denied issue is becoming worse, it has failed at least latest 10 continuous runs in serving: |
As I mentioned in #638 (comment) the permission denied error is a red herring. I believe Stephen is working on expanding the boskos pool, hopefully that will solve (or at least alleviate) the problem. |
attempt to solve this issue: knative#638
attempt to solve this issue: knative#638
attempt to solve this issue: #638
After 3 actions taken to solve this issue, this issue stopped occurring from 2019-04-01 10:36:10.000 PDT
The problem is resolved after the last action is taken This link can be used to verify the stop of occurrence: https://pantheon.corp.google.com/logs/viewer?project=knative-tests&minLogLevel=0&expandAll=false×tamp=2019-04-01T20:38:06.120000000Z&customFacets=jsonPayload.to&limitCustomFacetWidth=true&dateRangeStart=2019-03-25T20:37:42.005Z&interval=P7D&resource=container&scrollTimestamp=2019-04-01T17:36:10.000000000Z&filters=text:%22failed%20to%20acquire%20project:%20resource%20not%20found%22&dateRangeUnbound=forwardInTime |
* update boskos images and flags Fixed #638 * use images that exists also the most recent ones with the same tags
…native#638) * Introduce error util ErrInvalidCombination for invalid combination Sometimes valid value becomes invalid value by combination. example 1. knative/serving#5382 example 2. following combination in `spec.traffic`. ``` traffic: - latestRevision: true revisionName: hello-example-dk7nd percent: 100 ``` But there are no error util for them, so we need to create custom error like knative/serving@c1583f3 or `ErrInvalidValue`. The custom error will make code complicated and `ErrInvalidValue` is not debug friendly. To solve it, this patch introduces an util func `ErrInvalidCombination`. * Introduce ErrGeneric instead of ErrInvalidCombination
We've been seeing failures with acquiring projects from boskos for the CI/presubmit flows.
A generic "resource not found" error:
https://storage.googleapis.com/knative-prow/logs/ci-knative-serving-continuous/1111055658830008320/build-log.txt
Same error, but with permission denials listed:
https://gubernator.knative.dev/build/knative-prow/pr-logs/pull/knative_eventing/951/pull-knative-eventing-integration-tests/1110601542886494209/
One possibility is that the boskos pool could be eventually exhausted at peak times.
The text was updated successfully, but these errors were encountered: