Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error handling in runner.py job.run(...) #8761

Conversation

jonathan-eq
Copy link
Contributor

@jonathan-eq jonathan-eq commented Sep 20, 2024

Issue
Resolves #8738

Approach
This commit wraps the job.run(...) method in a try-except block making sure the Exited message is sent if the job_runner crashes. This fixes the bug where ert is hanging when unhandled exceptions are raised in the job runner.

The PR also contains a commit that refactors the mega-method job.run(...) into separate methods, making it more readable.

(Screenshot of new behavior in GUI if applicable)

  • PR title captures the intent of the changes, and is fitting for release notes.
  • Added appropriate release note label
  • Commit history is consistent and clean, in line with the contribution guidelines.
  • Make sure unit tests pass locally after every commit (git rebase -i main --exec 'pytest tests/unit_tests -n logical -m "not integration_test"')

When applicable

  • When there are user facing changes: Updated documentation
  • New behavior or changes to existing untested code: Ensured that unit tests are added (See Ground Rules).
  • Large PR: Prepare changes in small commits for more convenient review
  • Bug fix: Add regression test for the bug
  • Bug fix: Create Backport PR to latest release

@jonathan-eq jonathan-eq force-pushed the improve-forward-model-runner-exception-handling branch 4 times, most recently from bfc53cf to 6587920 Compare September 23, 2024 11:17
@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.53%. Comparing base (a342528) to head (71daee6).
Report is 29 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8761      +/-   ##
==========================================
+ Coverage   91.44%   91.53%   +0.09%     
==========================================
  Files         344      345       +1     
  Lines       21122    21252     +130     
==========================================
+ Hits        19314    19454     +140     
+ Misses       1808     1798      -10     
Flag Coverage Δ
cli-tests 39.59% <ø> (-0.05%) ⬇️
gui-tests 73.37% <ø> (-0.16%) ⬇️
performance-tests 50.17% <ø> (+0.02%) ⬆️
unit-tests 80.30% <ø> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jonathan-eq jonathan-eq force-pushed the improve-forward-model-runner-exception-handling branch from 07228b8 to 8a9720c Compare September 24, 2024 12:17
@jonathan-eq jonathan-eq self-assigned this Sep 25, 2024
@jonathan-eq jonathan-eq added maintenance Not a bug now but could be one day, repaying technical debt release-notes:skip If there should be no mention of this in release notes labels Sep 25, 2024
@jonathan-eq jonathan-eq force-pushed the improve-forward-model-runner-exception-handling branch 2 times, most recently from 58f3fc2 to f2f17d7 Compare September 25, 2024 07:03

# exit_code is 0

if self.job_data.get("error_file") and os.path.exists(
self.job_data["error_file"]
):
yield exited_message.with_error(
exited_message = Exited(self, exit_code)
return exited_message.with_error(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return?? previously we yielded ....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is a helper function for creating the message that is being yielded from the calling method.

@xjules
Copy link
Contributor

xjules commented Sep 25, 2024

Lots of errors. Wondering if it is related that you have changed some yield statements with return 🤔

@jonathan-eq jonathan-eq force-pushed the improve-forward-model-runner-exception-handling branch from f2f17d7 to baa4c51 Compare September 25, 2024 09:54
@jonathan-eq jonathan-eq force-pushed the improve-forward-model-runner-exception-handling branch from baa4c51 to b688ca8 Compare October 3, 2024 06:44
@jonathan-eq
Copy link
Contributor Author

Lots of errors. Wondering if it is related that you have changed some yield statements with return 🤔

It should be fine. The methods returning values are just for creating the messages that are being yielded in the _run method.

@jonathan-eq jonathan-eq force-pushed the improve-forward-model-runner-exception-handling branch 5 times, most recently from e401754 to a641fdf Compare October 4, 2024 05:41
This commit wraps the `job.run(...)` method in a try-except block making sure the `Exited` message is sent if the job_runner crashes. This fixes the bug where ert is hanging when unhandled exceptions are raised in the job runner.
@jonathan-eq jonathan-eq force-pushed the improve-forward-model-runner-exception-handling branch from a641fdf to 72e838e Compare October 4, 2024 06:48
@jonathan-eq jonathan-eq added improvement Something nice to have, that will make life easier for developers or users or both. and removed release-notes:skip If there should be no mention of this in release notes labels Oct 7, 2024
@jonathan-eq jonathan-eq requested a review from xjules October 8, 2024 11:32
@jonathan-eq jonathan-eq enabled auto-merge (rebase) October 9, 2024 09:11
@jonathan-eq jonathan-eq disabled auto-merge October 9, 2024 09:43
@jonathan-eq jonathan-eq force-pushed the improve-forward-model-runner-exception-handling branch from 72e838e to 71daee6 Compare October 9, 2024 09:47
@jonathan-eq jonathan-eq added release-notes:improvement Automatically categorise as improvement in release notes release-notes:refactor PR changes code without changing ANY (!) behavior. labels Oct 9, 2024
@jonathan-eq jonathan-eq enabled auto-merge (rebase) October 9, 2024 09:59
@jonathan-eq jonathan-eq merged commit 86c0127 into equinor:main Oct 9, 2024
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Something nice to have, that will make life easier for developers or users or both. maintenance Not a bug now but could be one day, repaying technical debt release-notes:improvement Automatically categorise as improvement in release notes release-notes:refactor PR changes code without changing ANY (!) behavior.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exceptions in forward_model_runner will cause it to hang
4 participants