Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HttpRequestError] Detect NameResolutionError more reliably #89096

Closed
antonfirsov opened this issue Jul 18, 2023 · 6 comments · Fixed by #89948
Closed

[HttpRequestError] Detect NameResolutionError more reliably #89096

antonfirsov opened this issue Jul 18, 2023 · 6 comments · Fixed by #89948
Assignees
Milestone

Comments

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jul 18, 2023
@antonfirsov antonfirsov added this to the 8.0.0 milestone Jul 18, 2023
@ghost
Copy link

ghost commented Jul 18, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Follow up on https://github.com/dotnet/runtime/pull/88974/files#r1265749282:

https://github.com/antonfirsov/runtime/blob/b54e8c72cb63de3549ae4bff268d88daa628404c/src/libraries/System.Net.Http/tests/FunctionalTests/SocketsHttpHandlerTest.cs#L4372-L4374

Author: antonfirsov
Assignees: -
Labels:

area-System.Net.Http

Milestone: -

@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jul 18, 2023
@liveans liveans assigned liveans and unassigned antonfirsov Jul 20, 2023
@karelz karelz added the bug label Jul 31, 2023
@antonfirsov
Copy link
Member Author

antonfirsov commented Aug 2, 2023

When a SocketException is thrown by Socket.ConnectAsync(DnsEndPoint), there is no reliable way to decide if it was coming Dns.GetHostAddressesAsync() or from the actual connect operation that follows the name resolution. getaddrinfo may return EAI_AGIAN on both Linux and Windows, which is mapped to SocketError.TryAgain, and it may or may not be a temporary issue (eg. timeout). On CI we see EAI_AGAIN/TryAgain being returned for invalid hosts on some Ubuntu images. It would be interesting to know why, but I think we should assume this can happen in any environment, and we should map these SocketExceptions to HttpRequestError.NameResolutionError in a reliable manner.

I see two ways to deal with this:

  1. Define a way to distinguish SocketException coming from Dns.* calls. Either store an entry in Exception.Data (which costs 2x allocations, but we probably don't care on an exception path anyways), or define a new public DnsException : SocketException subtype.
  2. Since TCP connect() should not return EAGAIN/WSATRY_AGAIN according to the docs (Linux, Windows) and my limited research, we can decide to map all cases of SocketError.TryAgain to HttpRequestError.NameResolutionError. @wfurt @tmds any thoughts?

@antonfirsov
Copy link
Member Author

antonfirsov commented Aug 2, 2023

The Linux man page of connect + EAGAIN actually says:

For nonblocking UNIX domain sockets, the socket is nonblocking, and the connection cannot be completed immediately. For other socket families, there are insufficient entries in the routing cache.

I'm not sure how common is the case with "insufficient entries in the routing cache", but my gut feel is that it's much less likely than getting EAGAIN from getaddrinfo, which means that even if it's imperfect to classify all TryAgain errors as DNS errors because of some rare edge-cases, it could be much better than the behavior we have now, where we falsely classify DNS errors as connection errors in known cases on our CI.

@wfurt
Copy link
Member

wfurt commented Aug 2, 2023

We should check it but lack if Kernel resources should be rare and when that happen the system is in much bigger troubles IMHO. In future, we may add separate code for NameResolution failures IMHO but in mean time I feel mapping that into HttpRequestError should be fine. We should verify the completeness as there could at least two types of failures: The resolver asked but did not get answer on time. And then it asked and there was authoritative answer that given host does not exist.

@antonfirsov
Copy link
Member Author

We should verify the completeness as there could at least two types of failures

I'm not sure I follow what do you mean by this. I'm suggesting to map both SocketError.HostNotFound and SocketError.TryAgain to HttpRequestError.NameResolutionError. If someone is interested which of these two cases caused the HttpRequestException, they can observe the inner SocketException.

Are you suggesting there might be other cases we are missing?

@wfurt
Copy link
Member

wfurt commented Aug 2, 2023

that should be fine. That generally covers two cases I had in mind.

@antonfirsov antonfirsov self-assigned this Aug 3, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 3, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 4, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Sep 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants