Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PendingConnectionTimeout logic can be too aggressive in certain scenarios #74581

Closed
MihaZupan opened this issue Aug 25, 2022 · 7 comments
Closed
Assignees
Labels
area-System.Net.Http bug tenet-compatibility Incompatibility with previous versions or .NET Framework
Milestone

Comments

@MihaZupan
Copy link
Member

MihaZupan commented Aug 25, 2022

Creating an issue based on discussion on the PR - #71785 (comment).

With #71785, we now tear down connection attempts after 5 seconds if the request that triggered the attempt is canceled.
With busy client/server communication with many requests over multiple connections, that has an affect similar to setting the ConnectTimeout to 5 seconds.

I fear that this will cause problems in cases where users are expecting their ConnectTimeout to be honored even if it's longer than 5 seconds.
There are cases where the connection establishment (emphasis on TLS handshakes) can take longer than 5 seconds and the client is okay with that. We saw two such users in #11594:

#11594 (comment),

We have Connect timeout 10 seconds and timeout for request handling 5 seconds ... In our production environment sometimes it takes up to 7-10 seconds to finish SSL handshake

#11594 (comment)

we need to connect to a server in China, the time it takes until the TLS handshake completes can be up to 20 seconds, from that point on everything is normal

In these scenarios, you could end up with the following behavior:

  • In 6.0, your first few requests would fail due to a timeout as no connection is available. After that, things would work fine.
  • In 7.0, you may never establish any connection and all requests would fail.

Note: this sort of setup (ConnectTimeout > RequestTimeout) would only make sense >= 6.0.
In .NET 5.0 and earlier, we would always kill the connection attempt, so you'd also never get any connections/requests through.

I think we should consider only setting the PendingConnectionTimeout if the ConnectTimeout was not manually changed from infinite. This gives the user a reasonable way to tell us they're fine with connection attempts taking a long time, without having to set the AppContext switch. This isn't a great solution either.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Aug 25, 2022
@ghost
Copy link

ghost commented Aug 25, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Creating an issue based on discussion on the PR - #71785 (comment).

With #71785, we now tear down connection attempts after 5 seconds if the request that triggered the attempt is canceled.
With busy client/server communication with many requests over multiple connections, that has an affect similar to setting the ConnectTimeout to 5 seconds.

I fear that this will cause problems in cases where users are expecting their ConnectTimeout to be honored even if it's longer than 5 seconds.
There are cases where the connection establishment (emphasis on TLS handshakes) can take longer than 5 seconds and the client is okay with that. We saw two such users in #11594:

#11594 (comment),

We have Connect timeout 10 seconds and timeout for request handling 5 seconds ... In our production environment sometimes it takes up to 7-10 seconds to finish SSL handshake

#11594 (comment)

we need to connect to a server in China, the time it takes until the TLS handshake completes can be up to 20 seconds, from that point on everything is normal

In these scenarios, you could end up with the following behavior:

  • In 6.0, your first few requests would fail due to a timeout as no connection is available. After that, things would work fine.
  • In 7.0, you may never establish any connection and all requests would fail.

I think we should consider only setting the PendingConnectionTimeout if the ConnectTimeout was not manually changed from infinite. This gives the user a reasonable way to tell us they're fine with connection attempts taking a long time, without having to set the AppContext switch.

Author: MihaZupan
Assignees: -
Labels:

area-System.Net.Http

Milestone: -

@antonfirsov
Copy link
Member

antonfirsov commented Aug 25, 2022

In 7.0, you may never establish any connection and all requests would fail.

Cancellation after PendingConnectionTimeout only kicks in only if the originating request completes either by (1) succeeding or (2) after a request timeout. In case (1) there will be at least one working connection that should eventually serve all queued requests. In case (2), timeouts are not unexpected.

@MihaZupan
Copy link
Member Author

MihaZupan commented Aug 25, 2022

If every connection attempt takes longer than (RequestTimeout + 5 seconds), then in case (2) you go from "some timeouts" in 6.0 to "all requests always fail" in 7.0.

And even in case (1), having a max of one connection is not ideal.

@antonfirsov
Copy link
Member

If every connection attempt takes longer than (RequestTimeout + 5 seconds), then in case (2) you go from "some timeouts" in 6.0 to "all requests always fail" in 7.0.

We had the same behavior pre-6.0, I don't think it's incorrect. I would expect RequestTimeout to consider connection times, the fact that we pool connections (& spin them up in the background) is an implementation detail.

And even in case (1), having a max of one connection is not ideal.

I'm not 100% sure if we will max at one connection, but we should investigate.

If it turns out to be a problem, another mitigation might be to dynamically increase PendingConnectionTimeout when we detect successful connections taking longer than the current PendingConnectionTimeout. Expecting the user to change the infinite ConnectTimeout to a smaller arbitrary value to avoid timeouts feels counter-intuitive to me.

@MihaZupan
Copy link
Member Author

I agree this is a 6.0 => 7.0 issue.
The ConnectTimeout wouldn't do anything in these scenarios in 5.0.

the fact that we pool connections (& spin them up in the background) is an implementation detail

The fact that we pool connections is the contract. How exactly we do it are technically implementation details, but they have functional differences that can break users moving from 6.0 to 7.0.

Expecting the user to change the infinite ConnectTimeout to a smaller arbitrary value to avoid timeouts feels counter-intuitive to me.

That's a very good point. It would only help users that chose upper limits for the ConnectTimeout (possibly as a mitigation for the original issue we had in 6.0 with stalled connections).

@karelz karelz added this to the 8.0.0 milestone Aug 30, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Aug 30, 2022
@karelz karelz added bug tenet-compatibility Incompatibility with previous versions or .NET Framework labels Aug 30, 2022
@karelz
Copy link
Member

karelz commented Aug 30, 2022

Triage: This may be potentially a breaking change from 6.0 to 7.0.
However, the scenario is very rare - a corner case. Let's figure out a "right" fix in 8.0 and then we can decide to backport, or wait for customers hitting the problem with 7.0.

In general, it is really weird to have ConnectionTimeout > RequestTimeout. We have to decide what to do in such situations.
A workaround would be to avoid that.

@MihaZupan MihaZupan changed the title Consider disabling PendingConnectionTimeout if a ConnectTimeout is set PendingConnectionTimeout logic can be too aggressive in certain scenarios Aug 30, 2022
@MihaZupan
Copy link
Member Author

Triage: We believe this should be a corner case. We're unlikely to make any changes here unless we get customer feedback that someone is hitting it.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net.Http bug tenet-compatibility Incompatibility with previous versions or .NET Framework
Projects
None yet
Development

No branches or pull requests

3 participants