[Hostfxr] [Windows] High CPU usage on background thread after invoking `hostfxr_get_runtime_delegate` on non-main thread. #39176

Sewer56 · 2020-07-13T08:22:37Z

Description

In .NET 5, invoking hostfxr_get_runtime_delegate from a non-main thread appears to leave a background task running, consuming a full CPU thread.

Specifically, the issue can be reproduced in the following fashion on Windows:

int main(int argc, char* argv[])
{
    // load_runtime sets up the runtime, calling `hostfxr_get_runtime_delegate` down the road
    auto handle = CreateThread(nullptr, 0, &load_runtime, 0, 0, nullptr);
    Sleep(INFINITE); // Stall the program to observe the effect.
}

This does not occur if the runtime is initialised on the main thread.

Reproduction

Minimal reproduction is available at Reloaded.Core.Bootstrap/runtimebug.
Simply execute Reloaded.Core.Bootstrap.Example and observe CPU utilisation once the program begins to sleep.

The offending function call that triggers the issue can be found at Reloaded.Core.Bootstrap/CoreCLR.cpp#L71.

(Requires Visual Studio 16.7 preview or newer, with Desktop development with C++ workload as well as the latest .NET 5 SDK).

Reproduction Debugging

To trigger the correct, intended behaviour (no loop/bug) simply load the runtime on main thread. i.e. replace

initializeThreadHandle = CreateThread(nullptr, 0, &load_bootstrapper_async, 0, 0, nullptr);
WaitForSingleObject(initializeThreadHandle, INFINITE);

with

load_bootstrapper_async(0);

in the example's entry point.
I have set it up this way to make control flow comparisons easier between intended and unintended behaviour should that be helpful.

Configuration

Runtime: 5.0.0-preview.6
OS: Win 10 2004 (Build 19041.329)
Architecture: x86, x64

Regression?

This is a regression from .NET Core 3.X.
(i.e. Code runs as expected all the way from 3.0 Preview 6 when it became first available to the latest 3.1 release).

I have not yet identified the first .NET 5 Preview/build with the issue intact.

Images

.NET 5:

netcoreapp3.1:

netcoreapp3.0:

The text was updated successfully, but these errors were encountered:

ghost · 2020-07-13T08:22:41Z

Tagging subscribers to this area: @vitek-karas, @swaroop-sridhar, @agocke
Notify danmosemsft if you want to be subscribed.

vitek-karas · 2020-07-13T10:16:41Z

Thanks a lot for the repro @Sewer56 . I debugged into it a little bit and it seems to be related to the diagnostic server thread. It loops in this callstack:

>	coreclr.dll!IpcStreamFactory::GetNextAvailableStream(void(*)(const char *, unsigned int) callback) Line 166	C++
 	coreclr.dll!DiagnosticServer::DiagnosticsServerThread(void * __formal) Line 50	C++
 	kernel32.dll!BaseThreadInitThunk()	Unknown
 	ntdll.dll!RtlUserThreadStart()	Unknown

It calls the IpcStream::DiagnosticsIpc::Poll which "successfully" waits (seems to be no wait time actually) on the handle, then gets a stream which is always NULL - that causes the caller to loop and try again... and so on.

I don't know what's different running it from managed exe versus hosted via the hosting APIs - obviously a simple console app doesn't repro this.

ghost · 2020-07-13T10:16:53Z

Tagging subscribers to this area: @tommcdon
Notify danmosemsft if you want to be subscribed.

Sewer56 · 2020-07-13T11:28:20Z

I don't know what's different running it from managed exe versus hosted via the hosting APIs - obviously a simple console app doesn't repro this.

Indeed, would be interesting to know, even if at a high level.
I'd be interested digging into the runtime internals myself someday.
Thanks for reaching out so quickly.

On an unrelated note.
Seems there was something I forgot to mention in the repro; this should be fairly obvious and implicit from the issue description but I should mention just in case.

To trigger the correct, intended behaviour (no loop) simply load the runtime on main thread. i.e. replace

initializeThreadHandle = CreateThread(nullptr, 0, &load_bootstrapper_async, 0, 0, nullptr);
WaitForSingleObject(initializeThreadHandle, INFINITE);

with

load_bootstrapper_async(0);

in the example's entry point. I edited the opening post to reflect this.

Explicitly, I set it up this way in the repro so it's easier to test & compare control flow.

hoyosjs · 2020-07-13T22:41:13Z

@dotnet/dotnet-diag @josalem

josalem · 2020-07-13T23:58:21Z

Interesting! I haven't had a chance to try the repro since I'm on my Mac right now.

@vitek-karas when you run this under the debugger, does a NamedPipe with a name like dotnet-diagnostic-<pid> show up for the application? IpcStreamFactory::GetNextAvailableStream returning nullptr indicates a failure state of the underlying NamedPipe (on Windows). Could you give some more details about what's happening inside the call to GetNextAvailableStream and Poll? Alternatively, could you turn on StressLog with the diagnostics log facility (0x00001000) and post the result? I'll try to repro this tomorrow.

josalem · 2020-10-20T00:47:57Z

I put this under a debugger this afternoon. The server is being put into an error state where the overlapped IO is continually returning ERROR_OPERATION_ABORTED. The description for that error states that

The I/O operation has been aborted because of either a thread exit or an application request.

In other words, the fact that you loaded the runtime on one thread but ended that thread broke the asynchronous IO. The abort is being recognized here at line 327:

runtime/src/coreclr/src/vm/ipcstreamfactory.cpp

Lines 324 to 330 in 50f82db

    
           case IpcStream::DiagnosticsIpc::PollEvents::SIGNALED: 
        
               if (pStream == nullptr) // only use first signaled stream; will get others on subsequent calls 
        
               { 
        
                   pStream = ((DiagnosticPort*)(rgIpcPollHandles[i].pUserData))->GetConnectedStream(callback); 
        
                   s_currentPort = (DiagnosticPort*)(rgIpcPollHandles[i].pUserData); 
        
               } 
        
               STRESS_LOG2(LF_DIAGNOSTICS_PORT, LL_INFO10, "IpcStreamFactory::GetNextAvailableStream - SIG :: Poll attempt: %d, connection %d signalled.\n", nPollAttempts, i);

which unfortunately doesn't set fSawError if the GetConnectedStream callback returns nullptr. In that case we never reset the asynchronous call to ConnectNamedPipe, so the pipe always comes back marked as signaled, and then fails in the same way.

This should be fixable by special casing the abort case and/or resetting the connection if the callback returns nullptr.

Sewer56 added the tenet-performance Performance related issue label Jul 13, 2020

Dotnet-GitSync-Bot added area-Host untriaged New issue has not been triaged by the area owner labels Jul 13, 2020

vitek-karas added the area-Diagnostics-coreclr label Jul 13, 2020

vitek-karas removed the area-Host label Jul 13, 2020

tommcdon removed the untriaged New issue has not been triaged by the area owner label Jul 20, 2020

tommcdon added this to the 5.0.0 milestone Jul 20, 2020

tommcdon modified the milestones: 5.0.0, 6.0.0 Jul 30, 2020

This was referenced Aug 19, 2020

Reloaded II - Support .NET 5 Reloaded-Project/Reloaded-II#12

Closed

Default to standard x64 calling convention if none is supplied Reloaded-Project/Reloaded.Hooks#3

Merged

josalem mentioned this issue Oct 22, 2020

Update IpcStreamFactory state machine to handle being started on a thread that ends #43711

Merged

josalem closed this as completed in #43711 Nov 5, 2020

josalem mentioned this issue Nov 5, 2020

[release/5.0] Update IpcStreamFactory state machine to handle being started on a thread that ends #44267

Merged

ghost locked as resolved and limited conversation to collaborators Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Hostfxr] [Windows] High CPU usage on background thread after invoking `hostfxr_get_runtime_delegate` on non-main thread. #39176

[Hostfxr] [Windows] High CPU usage on background thread after invoking `hostfxr_get_runtime_delegate` on non-main thread. #39176

Sewer56 commented Jul 13, 2020 •

edited

Loading

ghost commented Jul 13, 2020

vitek-karas commented Jul 13, 2020

ghost commented Jul 13, 2020

Sewer56 commented Jul 13, 2020 •

edited

Loading

hoyosjs commented Jul 13, 2020

josalem commented Jul 13, 2020

josalem commented Oct 20, 2020

[Hostfxr] [Windows] High CPU usage on background thread after invoking hostfxr_get_runtime_delegate on non-main thread. #39176

[Hostfxr] [Windows] High CPU usage on background thread after invoking hostfxr_get_runtime_delegate on non-main thread. #39176

Comments

Sewer56 commented Jul 13, 2020 • edited Loading

Description

Reproduction

Reproduction Debugging

Configuration

Regression?

Images

ghost commented Jul 13, 2020

vitek-karas commented Jul 13, 2020

ghost commented Jul 13, 2020

Sewer56 commented Jul 13, 2020 • edited Loading

hoyosjs commented Jul 13, 2020

josalem commented Jul 13, 2020

josalem commented Oct 20, 2020

[Hostfxr] [Windows] High CPU usage on background thread after invoking `hostfxr_get_runtime_delegate` on non-main thread. #39176

[Hostfxr] [Windows] High CPU usage on background thread after invoking `hostfxr_get_runtime_delegate` on non-main thread. #39176

Sewer56 commented Jul 13, 2020 •

edited

Loading

Sewer56 commented Jul 13, 2020 •

edited

Loading