-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: passed test suite contains failed test cases #5123
Comments
This appears to happen for Linux as well: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-capz-conformance/1416217627528794112 |
/priority important-soon |
@CecileRobertMichon: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Looks like this is an issue in CAPI test framework, CAPA has the same problem: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-provider-aws-e2e-conformance-with-k8s-ci-artifacts/1422914264003252224 |
Transferred to CAPI repo as this is an issue in the CAPI test framework and affects CAPI as well https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-main-1-18-1-19.
|
/help |
@CecileRobertMichon: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Chat from Slack, looks like we're going to have to parse the JUnit XML and figure out if something failed, which is pretty terrible. Philosophically in the initial implementation, the current behaviour was intended given that to get any sort of result from conformance was a big success in itself. Taking the next step in making a failed conformance test fail the suite is reasonable. |
fyi. Prow is using GCP/testgrid to parse junit reports for the junit lens: No idea if it's a good idea to use it. |
If it does what we want, i don't see why not. |
Just thinking about that we might don't want to have a dependency on testgrid and not sure how stable it is as a library (current verison is 0.0.91). But I guess as the worst case is to copy ~200 lines of code, it shouldn't hurt to have a dependency on it. |
We have the /milestone v1.0 |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/lifecycle frozen |
/triage accepted |
Definitely still the case. This can be reproduced ~ like this:
cc @knabben (Just in case you're looking for more work in a bit :), I can help you getting the local e2e test up & running) |
This issue is labeled with You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
/triage accepted |
@sbueringer Can we close this now that kubernetes-sigs/cluster-api-provider-azure#2265 is solved? |
Not sure about CAPZ I think they had other problems, but I think our problem is not solved. Should be still reproducible via: #5123 (comment) We didn't make any changes in core CAPI |
This issue is labeled with You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
/assign @chrischdi |
I tried to reproduce it again and @sbueringer already had the correct idea here:
source: kubernetes-sigs/cluster-api-provider-azure#2265 So before merging #7946 there was a race condition between reading the exit code and the restart triggered by docker, which cleared the exit code. So if the ginkgo fails it should always return its non-zero exit code and because of that let the test fail. One improvement which could maybe be done: still run the /close Appendix: I was able to reproduce the issue by using the instructions from here plus:
Note: this means that every usage of |
@chrischdi: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@chrischdi Just that I got it right. Current behavior:
|
Exactly, the case described in this issue that the test "passed" was because of the race condition because docker restarted the container.
We would get them in prow, but hidden in artifacts as xml file / not visualised by prow. But this will be fixed via #10493 |
Not sure I got this part. I think we only have one xml file in artifacts, and that is the one that Prow uses to visualize: https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-conformance-main/1780353942337622016/artifacts/ Are there more? (that are also uploaded to artifacts) |
Ah, we move the files (in GatherJUnitReports)? So if the move doesn't happen they are still here? https://gcsweb.k8s.io/gcs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-conformance-main/1780353942337622016/artifacts/kubetest/k8s-conformance-zcv9fy/ |
Yeah, to make them visible to prow we have to move them up. |
/kind bug
What steps did you take and what happened:
[A clear and concise description of what the bug is.]
example: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-capz-windows-dockershim/1417250279195152384
The job above has two failed test cases but it's marked as passed:
What did you expect to happen:
Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):/etc/os-release
):The text was updated successfully, but these errors were encountered: