Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build hangs when test spawns processes which do not terminate properly #1183

Closed
Tracked by #1258
ValentijnMakkenze opened this issue May 11, 2022 · 16 comments
Closed
Tracked by #1258
Assignees
Milestone

Comments

@ValentijnMakkenze
Copy link
Contributor

ValentijnMakkenze commented May 11, 2022

Hi,

After upgrading NUnit and NUnit.ConsoleRunner our build hangs and errors out after the maximum time has exceeded

We came from
NUnit version=3.13.2
NUnit.ConsoleRunner version=3.12.0

And went to
NUnit version=3.13.3
NUnit.ConsoleRunner version=3.15.0

The main difference is that the project that hangs runs in ApartmentState.STA as it uses some proprietary COM objects.
The projects that don't run in ApartmentState.STA seem to work fine.

The last line in our build log is
Results (nunit3) saved as

We are not seeing the committing line, so I guess the nunit-agent.exe process is not terminating correctly
Committing...

Due to the proprietary COM objects I am unable to provide a small repro.

I did some more experiments and it turns out to be the NUnit.ConsoleRunner upgrade that was the culprit.
I started downgrading it and when I got back to 3.12.0 the test didn't hang anymore, so from 3.13.0 and onwards it seems broken.

We are running through OpenCover and I added --trace=Debug and ran once with 3.12.0 and once with 3.13.0

"C:\Git\Neuron Stelsel Registratie\packages\OpenCover.4.7.1221\tools\OpenCover.Console.exe" -register:administrator "-target:C:\Git\Neuron Stelsel Registratie\packages\NUnit.ConsoleRunner.3.12.0\tools\nunit3-console.exe" "-targetargs:""C:\Git\NEURON~1\\Vicrea.GVR.UI.Tests\bin\Debug\Vicrea.GVR.UI.Tests.dll"" --trace=Debug --result=""C:\Git\NEURON~1\/NUnitVicrea.GVR.UI.Tests.xml""" -output:C:\Git\NEURON~1\\Vicrea.GVR.UI.Tests.xml -filter:+[Vicrea.GVR.UI]* -excludebyfile:*Reference*;*Generated* -excludebyattribute:System.Diagnostics.CodeAnalysis.ExcludeFromCodeCoverageAttribute;System.Runtime.CompilerServices.CompilerGeneratedAttribute;System.CodeDom.Compiler.GeneratedCodeAttribute -skipautoprops -hideskipped:all -returntargetcode

Logfiles:
InternalTrace.29508-version3.12.log
nunit-agent_35952-version3.12-.log
InternalTrace.15540-version3.12.log
nunit-agent_31012-version3.12.log

Kind regards,

Valentijn

@ValentijnMakkenze
Copy link
Contributor Author

I accidentally did an upgrade to 3.15.2 and the build hangs again, downgrading back to 3.12 fixed it

@tcali101
Copy link

We have the same problem. Nunit-console runner just hangs and never terminates after test execution. Version 3.12 works fine and newer versions hang. It could be related to Selenium tests. We are using Chrome / ChromeDriver and the libraries Selenium.Support and Selenium.WebDriver to execute Selenium tests with Nunit.

@CharliePoole
Copy link
Member

@tcali101 @ValentijnMakkenze I'm marking this confirmed. Unfortunately, the logs from @ValentijnMakkenze show no errors at all. The tests terminate successfully. Therefore the hang is happening in the console runner itself.

Can either of you verify that this happens when not using OpenCover? @tcali101 Can you provide a simple repro, not involving any special software?

@ValentijnMakkenze
Copy link
Contributor Author

Ok so I ran once with 3.12
"packages\NUnit.ConsoleRunner.3.12.0\tools\nunit3-console.exe" "Vicrea.GVR.UI.Tests\bin\Debug\Vicrea.GVR.UI.Tests.dll" --trace=Debug --result="NUnitVicrea.GVR.UI.Tests.xml"

And once with 3.15.2
"packages\NUnit.ConsoleRunner.3.15.2\tools\nunit3-console.exe" "Vicrea.GVR.UI.Tests\bin\Debug\Vicrea.GVR.UI.Tests.dll" --trace=Debug --result="NUnitVicrea.GVR.UI.Tests.xml"

The first one finished, the second one hangs
image

I had to kill the nunit3-console.exe process to be able to zip the logs, but then I noticed that there were two nunit-agent.exe processes. So maybe the agent doesn't close correctly and the console now notices that and waits.
3.15.2logs.zip
3.12logs.zip

@tcali101
Copy link

It has something to do with spawned processes. Selenium tests spawn processes and I managed to reproduce it with a simplistic test.

Create a .NET Framework Class Library project.
Packages:

  <package id="NUnit" version="3.13.3" targetFramework="net48" />
  <package id="Selenium.Support" version="4.4.0" targetFramework="net48" />
  <package id="Selenium.WebDriver" version="4.4.0" targetFramework="net48" />

In a new class:

        [Test]
        public void TestHang()
        {
            var driver = new ChromeDriver(@"C:\git\chromedriver_win32");
            driver.Url = "https://google.com";
            //driver.Close();
            //driver.Quit();
        }

Adjust the path to the chrome driver location, get it from https://chromedriver.chromium.org

I tried something more simplistic like:

        [Test]
        public void Test()
        {
            Process.Start(@"notepad.exe");
        }

But that does not reproduce it.

@CharliePoole
Copy link
Member

@ValentijnMakkenze
The logs are pretty much identical between the two runs. In particular, the agent terminates normally in both runs and the console does everything it needs to do and then tries to exit.

As you suggest, I think that the agent is spawning some other processes, which do not close correctly. Neither the agent nor the runner process are able to terminate while those child processes are alive. It's possible that you have some subtle error in your tests, similar to the more obvious error in your example. However, I think that the runner and engine should ultimately deal with this, terminating all the child processes forcibly if they don't actually close.

@CharliePoole CharliePoole modified the milestones: 3.15.3, 3.16 Sep 24, 2022
@CharliePoole CharliePoole changed the title Build hangs after upgrading NUnit.ConsoleRunner to version 3.13.0 Build hangs when test spawns processes which do not terminate properly Sep 25, 2022
@CharliePoole
Copy link
Member

Interesting! My initial assumption was that 3.12 didn't leave the ChromDriver process running, hence no hang. But it turns out that 3.12 also fails to stop the ChromeDriver process. The difference is that the console runner was not bothered by the hanging process in 3.12 and terminated properly. In newer releases it hangs rather than terminating.

@tcali101
Copy link

The difference is that the console runner was not bothered by the hanging process in 3.12 and terminated properly. In newer releases it hangs rather than terminating.

I think that is the correct behavior, it should not be bothered by any child processes. It should not try to terminate processes. Just ignore any running children and finish.

@CharliePoole
Copy link
Member

I agree. The trick is to figure out what changed. Since nothing was done deliberately, it has to be a side effect of some other change.

@CharliePoole
Copy link
Member

Interesting... it seems to be a side effect of adding the TCP transport back in April 2021.

If the TCP transport is started and any child processes are left running, we continue to wait for a connection. This happens even if we don't use that transport (.NET Framework tests use remoting).

@CharliePoole
Copy link
Member

After going down a lot of TCP rabbit holes, I realized that the thread listening for TCP connections was a foreground thread. Making it a background thread resolves the problem.

@CharliePoole CharliePoole self-assigned this Oct 4, 2022
@CharliePoole CharliePoole added the PortToVersion4 Version 3 change needs to be ported to the version 4 code in main label Oct 31, 2022
@CharliePoole
Copy link
Member

This issue has been resolved in version 3.16.0

The release is available on:
GitHub.
NuGet packages are also available NuGet.org and
Chocolatey Packages may be found at Chocolatey.org

@CharliePoole CharliePoole removed the PortToVersion4 Version 3 change needs to be ported to the version 4 code in main label Jan 6, 2023
@michael77777777777
Copy link

Hello @CharliePoole

unfortunately this issue is still alive in 3.16.3.

We have a gitlab-runner Windows service installed on our "Windows11 Pro 22H2" build machine. This service allows up to 4 parallel gitlab-pipelines. In each gitlab-pipeline we call nunit3-console.exe to execute our test assembly.

From time to time we observe that the nunit3-console.exe ist not terminated by its parent process.
Using Microsofts "ProcessExplorer" tool to analyze the stuck nunit3.console.exe process shows that the parent process is not existing anymore. This could be one consequence of your fix. When switching to the "Threads" tab in ProcessExplorer, a Windows internal process lifecycle-check must be triggered as the stuck nunit3.console.exe terminates immediately. So I am pretty sure that the nunit3.console.exe process is not terminated because of a threading issue. Maybe due to open TCP-connections?

Also, similar to #1183 (comment)_, the nunit log shows no error at all. All tests have been executed successfully.

Any help is highly appreciated.

@CharliePoole
Copy link
Member

Hi @michael77777777777

If you read through the comments, you'll see that this issue was actually about non-terminating agent processes when running in the STA. Failing to terminate correctly is a fairly generic error, so your situation may not actually be the same bug.

I thought I'd just mention that in passing in case it helps resolve the issue. I'm no longer working on this project but others will undoubtedly pick up the issue.

Charlie

@michael77777777777
Copy link

Thank you for the remark @CharliePoole. You are right, the original cause is likely not the same.

However I am pretty sure that the observed behaviour of stuck nunit3.console.exe goes in a similar direction: TCP connections/threading

Would it be better to create a FollowUp ticket as this one here is already closed?

@CharliePoole
Copy link
Member

Yes, a new issue with details is most likely best. Threads in the example I worked were not hung but were merely waiting for the process to terminate. Once the process was created as a background process, that happened and all was well. OTOH thread hanging issue tends to be due to very specific causes.

Bear in mind that the engine does not monitor threads created by your tests. Although that was something we once considered, it would have required a fairly large addition to the engine, probably as a new service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants