Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI job passed and test script exit 0, but failed by timeout #186

Open
yih-redhat opened this issue May 13, 2024 · 6 comments
Open

CI job passed and test script exit 0, but failed by timeout #186

yih-redhat opened this issue May 13, 2024 · 6 comments

Comments

@yih-redhat
Copy link

Type of issue

None

Description

This bug is as same as #166, as it was closed and I cannot reopen it, so created a new bug to track this.

Descripion:

  1. I have a pull request Test testing-farm v2 yih-redhat/tmt-demo#42 that runs all test cases in testing-farm with v2.
  2. In this pull request, the sub job "Testing Farm - edge-9to9-9.4" is very strange, the test script is passed and exit with 0, but testing-farm plugin always report timetout error. Job link is https://artifacts.osci.redhat.com/testing-farm/befe8230-0cca-4417-816c-af13e20f564f/
  3. The sub job "Testing Farm - edge-8to9-9.4" has the same issue. And in this job, I checked all leftover processes in vm that may cause the timeout bug and printed them out in log, job link is https://artifacts.osci.redhat.com/testing-farm/24f28bf8-1c7d-47d5-9779-63723ecfb222/
  4. All sub jobs running in this pull request has same configuration. but only "Testing Farm - edge-9to9-9.4" and "Testing Farm - edge-8to9-9.4" has this strange timeout issue. Which means there might be something in the test scripts that caused this issue but not the configuration. The test script for these two sub jobs are https://github.com/yih-redhat/tmt-demo/blob/main/ostree-9-to-9.sh and https://github.com/yih-redhat/tmt-demo/blob/main/ostree-8-to-9.sh, but I cannot see anything special in these scripts, they are just normal shell scripts, like other test scripts in my repo.

Reproducer

No response

@jamacku
Copy link
Member

jamacku commented May 13, 2024

I would suggest you to increase the timeout. test run for 9000s ~ 150min

Maximum test time '150m' exceeded.
Adjust the test 'duration' attribute if necessary.
https://tmt.readthedocs.io/en/stable/spec/tests.html#duration

@yih-redhat
Copy link
Author

If you look into the log, you can see the test script was actually passed and exit with 0, but it looks like some child process blocked the job to complete until timeout.
I have tried to set the timeout to a very long time, and still got this issue. And with the same timeout value, other sub jobs which take much longer than this script can pass.

@yih-redhat
Copy link
Author

@jamacku Could you please take a look of this bug?
Because of this bug, I cannot get green in our CI job, and need to check it manually to see it passed or not.
This bug only happens on these two sub jobs, no matter how long I set the timeout, it will always exit 0 and then timeout.

@jamacku
Copy link
Member

jamacku commented Jul 23, 2024

Is this a duplicate of #209 ?

@mcattamoredhat
Copy link

I believe they are slightly different.
Failed test mentioned above (edge-9to9-94) took 2h 35m 21s whereas #209 issue occurs after 6h (canceled request).

@jamacku
Copy link
Member

jamacku commented Jul 23, 2024

I see. But I believe that this is not our bug. We are just requesting job runs on TF and not blocking anything.

@sclorg sclorg locked as resolved and limited conversation to collaborators Aug 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants