Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to detect if test crashed from stack overflow by examining console runner output? #266

Closed
MikeTheGreat opened this issue May 8, 2018 · 4 comments

Comments

@MikeTheGreat
Copy link

Hello!

Let's say that I've got a method that I'm testing, and it's got an infinite recursion problem in it. I've got an NUnit test which calls this method. The method then crashes, and takes down NUnit with it (since user code can't catch StackOverflowExceptions in 2.0 and later).

The console runner doesn't seem to be able to detect that the test process crashed from the stack overflow (which is reasonable) and instead writes a single result to the output XML saying "An existing connection was forcibly closed by the remote host" (i.e., the process running NUnit disappeared).

I'd like to write some code that can programatically detect this. It'll have access to the XML file (I assume I can get the return value from the console runner's command line invocation, too, if that's helpful).

It would be nice to diagnose the problem down to the 'infinite recursion' level of detail, but even being able to determine that some tested code caused a problem so bad that the CLR crashed is fine. I'm testing code written by my students and can reasonably expect that a catastrophic crash is being caused by recursion (since it's what we're covering now) :)

If y'all don't mind my asking, what's a good way to detect that NUnit has crashed (given the results.xml file)?

What I'm looking for is something to look for / check in the file that will reliably indicate an error like this. Some possiblities:

  • Is it enough to check for the string that indicates the client died ( "An existing connection...")?
  • When it crashes NUnit sets the 'testcasecount' to 1 (on the test-run element). I know that I'm running more than 1 test - would it be sufficient to check for
  • Within the test-suite XML element there's an attribute named "runstate" that's set to NotRunnable - is that a good indicator?
  • Would it be better to check for several things?

(Also: I haven't posted this on StackOverflow.com yet, but I'd be happy to if that's a better way to get help - I greatly appreciate all your help and want to be respectful of your time!)

@MikeTheGreat MikeTheGreat changed the title How to detect if test crashed from stack overflow by examining console runner output? is:question How to detect if test crashed from stack overflow by examining console runner output? May 8, 2018
@CharliePoole
Copy link
Member

Your objective is a good one but I think you have the wrong end of the stick in trying to resolve it by looking at the output created on the console runner. The console runner gets a top-element-only XML result from the engine because there is nothing better available. So any fields in that result were set without information to go on and trying to make them mean something more specific is futile.

You have to look closer to the source in order to detect what causes the exception itself. Then you would try to figure out how to leave behind information for the runner.

In this case, the source would be the code in the framework that calls the method that overflows. In theory, it could be any of several cases where user code is invoked, but starting with the invocation of the test method probably makes sense.

@ChrisMaddock
Copy link
Member

ChrisMaddock commented May 9, 2018

I do like this idea! A lot like something I posted at: nunit/nunit-console#391

So. It's possible to detect the agent crash, definitely, and I believe StackOverflowException exists with a specific error code as detailed in the above issue, so it's theoretically possible to detect when a test agent has crashed specifically on Stack Overflow.

My plan was just to be lazy, and print that to console. The 'nice' solution however would be to update the testresult xml that's written, as you say. It should be possible to work out which test assembly was running on the agent that crashed, and modify the xml accordingly - @CharliePoole may be better able to advise on the architecture there.

Your points:

Is it enough to check for the string that indicates the client died ( "An existing connection...")?

  • Not to ensure StackOverflow - this crash can also be caused by other reasons. It would be good to handle 'agent crashed for unknown reasons' in the same way, however.

When it crashes NUnit sets the 'testcasecount' to 1 (on the test-run element). I know that I'm running more than 1 test - would it be sufficient to check for

Within the test-suite XML element there's an attribute named "runstate" that's set to NotRunnable - is that a good indicator?
Would it be better to check for several things?

Personally - if it's stack overflow specifically you'd want to track - I think it's best to look at the specific agent exit code, from inside the NUnit console source. I wrote a little about how we can currently handle different exit codes in the above issue - let us know how it works out! 😄

@MikeTheGreat
Copy link
Author

Huh - so the OS (yes?) notices that console runner exited from a stack overflow, and then uses -1073741571 as the command-line return value?
I had assumed that since user code couldn't catch it then it wasn't going to be directly detectable (i.e., uncatchable means we could only infer that the console runner had crashed catastrophically)

I'm a bit strapped for time right now, but if I can find some time in the future I'll see about looking into this.
Thanks!
--Mike

@ChrisMaddock
Copy link
Member

I think that’s the case - please check my working however!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants