Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

17.12 VS hang on solution close #11044

Open
JanKrivanek opened this issue Nov 27, 2024 · 4 comments
Open

17.12 VS hang on solution close #11044

JanKrivanek opened this issue Nov 27, 2024 · 4 comments
Assignees

Comments

@JanKrivanek
Copy link
Member

JanKrivanek commented Nov 27, 2024

Context

microsoft.visualstudio.projectservices.dll!Microsoft.VisualStudio.ProjectServices.DesignTimeBuildService.SolutionClosedEventHandler
https://prism.vsdata.io/failure/?query=ch%3Drelease%20r%3D17.12&eventType=hang&failureHash=cdfda8b0-fdb8-70cd-57c9-ad606708a21f
116 hits in the wild - ranks as top 14 hang

It seems related to previously fixed case: #10709 (comment) (thoug it seems slightly different)

FYI @davkean (pointed this case out)

@JanKrivanek
Copy link
Member Author

JanKrivanek commented Nov 27, 2024

It looks like we lost information about worker node shutdown and hence keep waiting for it (nodeId is 2 - pointing out to OutOfProcNode).
Otherwise the submission is completed with build cancelled result - so this is distinct from #10709

Image

@JanKrivanek
Copy link
Member Author

Through ComponentHost DI I got to the PID of the out of proc node that the BuildManager was waiting for - its: 10396:

Image

From the case ETL

Image

I can see that that that process did not terminate:

Image

So it might be possible that shutdown request wasn't even properly sent out

@JanKrivanek
Copy link
Member Author

I can see that the main node requested that node to be shutdown

Image

But without the worker node dump it's hard to tell why that got stuck.

In any case - we might want to asynchronously wait abit after signaling shutdown request and just close the connection if we do not hear back:

protected void ShutdownConnectedNodes(List<NodeContext> contextsToShutDown, bool enableReuse)
{
// Send the build completion message to the nodes, causing them to shutdown or reset.
_processesToIgnore.Clear();
// We wait for child nodes to exit to avoid them changing the terminal
// after this process terminates.
bool waitForExit = !enableReuse &&
!Console.IsInputRedirected &&
Traits.Instance.EscapeHatches.EnsureStdOutForChildNodesIsPrimaryStdout;
Task[] waitForExitTasks = waitForExit && contextsToShutDown.Count > 0 ? new Task[contextsToShutDown.Count] : null;
int i = 0;
var loggingService = _componentHost.LoggingService;
foreach (NodeContext nodeContext in contextsToShutDown)
{
if (nodeContext is null)
{
continue;
}
nodeContext.SendData(new NodeBuildComplete(enableReuse));
if (waitForExit)
{
waitForExitTasks[i++] = nodeContext.WaitForExitAsync(loggingService);
}
}
if (waitForExitTasks != null)
{
Task.WaitAll(waitForExitTasks);
}
}

We do similar under some conditions (however the wait is 30 seconds - which is too long for UI actions):

public async Task WaitForExitAsync(ILoggingService loggingService)
{
if (_exitPacketState == ExitPacketState.ExitPacketQueued)
{
// Wait up to 100ms until all remaining packets are sent.
// We don't need to wait long, just long enough for the Task to start running on the ThreadPool.
await Task.WhenAny(_packetWriteDrainTask, Task.Delay(100));
}
if (_exitPacketState == ExitPacketState.ExitPacketSent)
{
CommunicationsUtilities.Trace("Waiting for node with pid = {0} to exit", _process.Id);
// .NET 5 introduces a real WaitForExitAsyc.
// This is a poor man's implementation that uses polling.
int timeout = TimeoutForWaitForExit;
int delay = 5;
while (timeout > 0)
{
bool exited = _process.WaitForExit(milliseconds: 0);
if (exited)
{
return;
}
timeout -= delay;
await Task.Delay(delay).ConfigureAwait(false);
// Double delay up to 500ms.
delay = Math.Min(delay * 2, 500);
}
}
// Kill the child and do a blocking wait.
loggingService?.LogWarning(
BuildEventContext.Invalid,
null,
BuildEventFileInfo.Empty,
"KillingProcessWithPid",
_process.Id);
CommunicationsUtilities.Trace("Killing node with pid = {0}", _process.Id);
_process.KillTree(timeoutMilliseconds: 5000);
}

But we do not need to necessarily kill the process - we can first just close the connection - node should augument NodeShutdown packet in such case.

@YuliiaKovalova YuliiaKovalova self-assigned this Jan 6, 2025
@YuliiaKovalova
Copy link
Member

reported bug with the same problem: https://dev.azure.com/devdiv/DevDiv/_workitems/edit/2342674

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants