Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase timeout for cluster integration tests #7620

Merged
merged 1 commit into from
Apr 10, 2024

Conversation

berland
Copy link
Contributor

@berland berland commented Apr 10, 2024

The default timeout at 10 minutes can be too low if the compute cluster has low availability of compute resources, as the time spent waiting in the queue for a compute job is counted. Increasing the timeout to 1 hour only when running the tests against the real compute cluster.

The timeout for the test_kill had to be removed, as that test can also suffer the same problem. Note that this can then hide a bug for the LSF driver as the driver cannot distuingish between which will happen when the job is not killed and the exit code when it is killed.

Issue
Resolves #7595

Approach

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure tests pass locally (after every commit!)

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

The default timeout at 10 minutes can be too low if the compute cluster
has low availability of compute resources, as the time spent waiting in the
queue for a compute job is counted. Increasing the timeout to 1 hour only
when running the tests against the real compute cluster.

The timeout for the test_kill had to be removed, as that test can also
suffer the same problem. Note that this can then hide a bug for the LSF
driver as the driver cannot distuingish between  which will
happen when the job is not killed and the exit code when it is killed.
@berland berland self-assigned this Apr 10, 2024
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.18%. Comparing base (20768fa) to head (4ba2107).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7620   +/-   ##
=======================================
  Coverage   85.18%   85.18%           
=======================================
  Files         383      383           
  Lines       23305    23305           
  Branches      879      886    +7     
=======================================
  Hits        19852    19852           
- Misses       3343     3346    +3     
+ Partials      110      107    -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@berland berland added release-notes:skip If there should be no mention of this in release notes testing labels Apr 10, 2024
Copy link
Contributor

@jonathan-eq jonathan-eq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🏃 Lets go!

@berland berland merged commit 61046df into equinor:main Apr 10, 2024
37 checks passed
@berland berland deleted the bump_integration_test_timeout branch May 29, 2024 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes:skip If there should be no mention of this in release notes testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky scheduler integration tests running against real LSF
3 participants