Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix hanging of NamedPipeClientStream when connecting with infinite timeout #66877

Merged

Conversation

cranberry-clockworks
Copy link
Contributor

Resolved Problem

Affected OS: Only Windows.

The NamedPipeClientStream could hang on connection forever while doesn't respond on a cancellation token. This happens when several conditions are met:

  1. An underlying file for named pipe is created. It means a pipe server accept connection from a different pipe.
  2. The second client tries to connect with infinite timeout to the server.
  3. The internal system function WainNamedPipe is called by the second client.

After that the second client doesn't exit when passed cancellation token is passed. The situation is
reproduces in the implemented test. One could compare test results before and after the bug fix.

Explanation of the Error

First, when a timeout parameter is not passed the infinite timeout is used.

public Task ConnectAsync(CancellationToken cancellationToken)
{
    return ConnectAsync(Timeout.Infinite, cancellationToken);
}

After that private void ConnectInternal(int timeout, CancellationToken cancellationToken, int startTime) is called.

The Timeout.Infinite is resolved to the -1 integer value. Therefore in the ConnectInternal() method int waitTime is resolved to -1 because elpased is 0 at first call. And TryConnect(waitTime, cancellationToken) is called with waitTime as -1.

If one inspects implementation of the TryConnect method for Windows he could discover two things:

  1. The passed cancellation token not used at all.
  2. The timeout is passed to the system method WaitNamedPipe as integer value:
if (!Interop.Kernel32.WaitNamedPipe(_normalizedPipePath, timeout))
{
    errorCode = Marshal.GetLastPInvokeError();
    ...

But as MSDN states WaitNamedPipe has timeout parameter as DWORD which is really resolved to uint. And there are two special values for the timeout parameters:

  1. NMPWAIT_USE_DEFAULT_WAIT (0x00000000) to used a timeout specified in the CreateNamedPipe function.
  2. NMPWAIT_WAIT_FOREVER (0xffffffff) to wait infinite until the instance is available.

The passed timeout value to the TryConnect() as -1 of type int is marshalled to 0xffffffff as DWORD. And the TryConnect() never returns until the pipe will be broken. And therefore passed cancellation token doesn't cancel the operation.

Implemented Solution

It's not correct to calculate difference between the timeout and the elapsed time when the timeout is infinite. One could do it only if the timeout is finite and grater or equals to zero.

In case of infinite timeout one should use CancellationCheckInterval as it was designed at first place.

Aleksei Abakumkin added 3 commits March 18, 2022 20:03
The test includes the situation when the NamedPipeClientStream
doesn't response to the cancellation token.
The test modified to highlight the place when a hang task could appear.
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Mar 19, 2022
@ghost
Copy link

ghost commented Mar 19, 2022

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

Issue Details

Resolved Problem

Affected OS: Only Windows.

The NamedPipeClientStream could hang on connection forever while doesn't respond on a cancellation token. This happens when several conditions are met:

  1. An underlying file for named pipe is created. It means a pipe server accept connection from a different pipe.
  2. The second client tries to connect with infinite timeout to the server.
  3. The internal system function WainNamedPipe is called by the second client.

After that the second client doesn't exit when passed cancellation token is passed. The situation is
reproduces in the implemented test. One could compare test results before and after the bug fix.

Explanation of the Error

First, when a timeout parameter is not passed the infinite timeout is used.

public Task ConnectAsync(CancellationToken cancellationToken)
{
    return ConnectAsync(Timeout.Infinite, cancellationToken);
}

After that private void ConnectInternal(int timeout, CancellationToken cancellationToken, int startTime) is called.

The Timeout.Infinite is resolved to the -1 integer value. Therefore in the ConnectInternal() method int waitTime is resolved to -1 because elpased is 0 at first call. And TryConnect(waitTime, cancellationToken) is called with waitTime as -1.

If one inspects implementation of the TryConnect method for Windows he could discover two things:

  1. The passed cancellation token not used at all.
  2. The timeout is passed to the system method WaitNamedPipe as integer value:
if (!Interop.Kernel32.WaitNamedPipe(_normalizedPipePath, timeout))
{
    errorCode = Marshal.GetLastPInvokeError();
    ...

But as MSDN states WaitNamedPipe has timeout parameter as DWORD which is really resolved to uint. And there are two special values for the timeout parameters:

  1. NMPWAIT_USE_DEFAULT_WAIT (0x00000000) to used a timeout specified in the CreateNamedPipe function.
  2. NMPWAIT_WAIT_FOREVER (0xffffffff) to wait infinite until the instance is available.

The passed timeout value to the TryConnect() as -1 of type int is marshalled to 0xffffffff as DWORD. And the TryConnect() never returns until the pipe will be broken. And therefore passed cancellation token doesn't cancel the operation.

Implemented Solution

It's not correct to calculate difference between the timeout and the elapsed time when the timeout is infinite. One could do it only if the timeout is finite and grater or equals to zero.

In case of infinite timeout one should use CancellationCheckInterval as it was designed at first place.

Author: cranberry-knight
Assignees: -
Labels:

area-System.IO

Milestone: -

@dnfadmin
Copy link

dnfadmin commented Mar 19, 2022

CLA assistant check
All CLA requirements met.

@cranberry-clockworks cranberry-clockworks changed the title Fixing a hang of NamedPipeClientStream when connecting with infinite timeout Fixing hanging of NamedPipeClientStream when connecting with infinite timeout Mar 23, 2022
@cranberry-clockworks cranberry-clockworks changed the title Fixing hanging of NamedPipeClientStream when connecting with infinite timeout Fix hanging of NamedPipeClientStream when connecting with infinite timeout Mar 23, 2022
@cranberry-clockworks
Copy link
Contributor Author

@jeffhandley
Hi! Can you please provide a feedback?

@GSPP
Copy link

GSPP commented Mar 23, 2022

I'm linking to an issue that I opened that I believe to be relevant:

In this issue, there's a lot of discussion around how difficult it is to cancel a named pipe operation.

It seems you (@cranberry-knight) would like the CancellationToken to be respected. The linked issue contains a pretty thorough discussion for why that is rather difficult to do.

@cranberry-clockworks
Copy link
Contributor Author

@GSPP Thanks for linking the issue and providing more information.

In my opinion, if the API provides method with cancellation token it should be possible to use at
any time. That for a cancellation token is! And also passed timeout treated well and works as
expected in the NamedPipeClientStream.

The async-over-sync pattern is implemented by dividing wait time into small pieces to be able check
the passed cancellation token. I think the problem of that PR not in the design domain but just a
regular bug where timeout handled wrong.

@danmoseley
Copy link
Member

@adamsitnik looks like this one needs a review from IO.

@adamsitnik adamsitnik self-assigned this Apr 8, 2022
@adamsitnik adamsitnik added this to the 7.0.0 milestone Apr 8, 2022
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cranberry-knight big thanks for your contribution and apologies for the delayed review! PTAL at my comments

The `WaitForConnectionAsync()` method is already returns a task.
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your contribution @cranberry-knight !

@adamsitnik
Copy link
Member

the failures are unrelated, I am merging

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants