-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BazelTestRunner: exit with 137 on OOMs #24436
base: master
Are you sure you want to change the base?
Conversation
Motivation: Make it easier to detect whether a test failed due to an OutOfMemoryException Current behavior: The test runner exited with a code 1 on any test failure. This can make it hard to programmatically differentiate whether a test failed due to an actual test failure, or because the test was aborted due to an OutOfMemoryException. Proposed change: If a run of a test suite fails, the failures are checked for any OOMs. If at least one failure was caused by an OOM, the program exits with a code 137, rather than 1. Additionally, this is also the exit code if the test runner itself OOMs. This distinction can help with automatically detecting OOMs and then retrying the execution with more memory available, and is especially helpful in the context of remote execution, where the action may be retried on a larger executor.
This seems reasonable to me. Any chance we could add a test for this? |
I had looked, but didn't see where I could easily add a test. Do you have guidance on where/how I could add a test for this? |
Added a test - is that the right way to do it? |
7e7461b
to
a7eb4ec
Compare
a7eb4ec
to
86b0b42
Compare
Thanks for adding the tests. Could you maybe separate out the new test into a different class? It would be preferable to not modify existing tests so we can be sure we're not inadvertently affecting something else. |
Just so I head in the right direction:
|
A separate java test class and new test case in the same sh file SG. |
…it4_testbridge_integration_tests.sh`
PTAL. |
Thanks! |
Motivation: Make it easier to detect whether a test failed due to an OutOfMemoryException
Current behavior:
The test runner exited with a code 1 on any test failure. This can make it hard to programmatically differentiate whether a test failed due to an actual test failure, or because the test was aborted due to an OutOfMemoryException.
Proposed change:
If a run of a test suite fails, the failures are checked for any OOMs. If at least one failure was caused by an OOM, the program exits with a code 137, rather than 1. Additionally, this is also the exit code if the test runner itself OOMs.
This distinction can help with automatically detecting OOMs and then retrying the execution with more memory available, and is especially helpful in the context of remote execution, where the action may be retried on a larger executor.