-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the impact of blocking DNS calls on Unix #48566
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsAfter helping a customer look at a thread pool starvation case on linux on .NET Core 3.1 I ended up here. After doing some research and with some discussion on Twitter, it turns out that I wonder if we're better off controlling this blocking code and maybe it should be possible to turn this off with a configuration switch. The other improvement I was thinking about was only allowing one pending request to a specific host name concurrently. That would improve situations where DNS is slow and new blocking calls are issued for the same host name (which is the case the customer ran into) on thread pool threads. cc @geoffkizer @stephentoub @scalablecory
|
Also, most other platforms have a similar issue here but usually offer the "built in OS version" and another version of the API. See node's excellent documentation here: https://nodejs.org/docs/latest/api/dns.html#dns_implementation_considerations |
Just to be clear, that implementation hasn't shipped yet in any supported .NET release; it was only merged in January for .NET 6.
This is largely orthogonal. Regardless of how it's implemented, a "coalesce all pending calls for a given address into one" can be added, layered on top of the underlying operation via a transient address->task cache (that's possible for pretty much any kind of async operation). It is more impactful, though, the more resources are consumed by a pending operation, and if that includes a blocked thread, that's relatively large.
I agree, though if the implementation is exactly as you describe, I'm not even sure it's a worthwhile improvement over what we had before. Such an implementation has two main benefits over using the same pool:
But it also has cons over using the same pool, namely that those other-pool threads end up competing for resources with the main thread pool, which can't use or manage or scale them. It also further bifurcates our implementations, adding some non-trivial interop code that only works on Linux; on macOS, for example, we still just do our own async-over-sync. Interestingly, there's a follow-up PR in #47036 to add cancellation to that Linux code, but that PR calls out that all it supports is removing the pending entry from a queue, and once it's in flight, it can't be canceled: that fits with the implementation you describe. And if that's all it's doing, enabling that same form of cancellation with our own async-over-sync should be close to a one-line change (passing the cancellation token as an argument to the task:
If we rule out caring about callers doing sync-over-async with the resulting task (I'm not saying we can discard that, just that doing so is problematic code, anyway), then I'd expect just reverting back to what we had previously would be a better choice; in typical usage, it'll effectively become as if the call was just made sync, with the pool thread plucking the item it just queued out of its local queue. (And then consider the aforementioned coalescing, which could help in some circumstances but add overhead in others.) cc: @gfoidl |
Oh, that's interesting and makes #34633 less useful, as pointed out. A different way to mitigate this is to have a look at https://c-ares.haxx.se (pointed to from the twitter-discussion mentioned above). (There are other libs listed but none of them seems fine to use -- either license or implementation). If this is an option, I can try to make a prototype to see how things go. So the steps would be
|
Yep, I'm aware.
Right. It is orthogonal but I think it would reduce the chance of starvation caused.
👍🏾 I think we should revert. |
Yes I think this might be worth looking at (it also has wide platform support). This is what nodejs uses under the covers for The other option is a fully managed version of DNS resolution. It would also help us improve our UDP stack. This is the approach that golang and mono took (cc @migueldeicaza). I think no matter what we do, we'd need a way to "use OS defaults" in case people configure custom DNS servers and want their .NET apps to automagically use them. |
|
2 birds with one stone then! #19443 |
I think that if we want to switch implementations of our DNS client, then a full-managed implementation is probably the best option -- as opposed to relying on a 3rd party native library. We could use the mono code as the basis for this (though I haven't looked at it and have no idea how complete/performant it is). There are some other DNS-related features we could enable if we did that. One is retrieving TTL info. Another is discovering HTTPSVC records, see #43643. |
I think the biggest difficulty would be match OS behavior. e.g. I feel it would be be strange if OS resolution gives different answers than .NET. |
Yeah, I agree. Golang lets you choose -- you can use OS resolution or you can use their custom name resolution. If we implemented our own resolver we would probably need to do the same, which isn't ideal.
Yeah, this would be a great project for runtime labs. |
OK so summary:
|
I'll revert and close the related PRs. |
Triage: Scoped down to the third bullet:
|
Why is this future? Do we think its too risky? |
future does not exclude now/6.0. But we would not hold release for it @davidfowl |
That seems fine but I would leave it in 6.0 until the end of the milestone, then it could be moved. Otherwise there's no way to tell the difference between we will look at it in 6.0 if we have time and we're not going to look at it in 6.0. |
https://github.com/uber/denial-by-dns has a great test suite |
Here's an example of what this could look like: using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Linq;
using System.Net;
using System.Threading.Tasks;
[MemoryDiagnoser]
public class Program
{
static void Main(string[] args) => BenchmarkSwitcher.FromAssemblies(new[] { typeof(Program).Assembly }).Run(args);
[Benchmark]
public async Task DnsSerial()
{
for (int i = 0; i < 1_000; i++)
await Dns.GetHostAddressesAsync("www.microsoft.com");
}
[Benchmark]
public async Task DnsConcurrent()
{
for (int i = 0; i < 100; i++)
await Task.WhenAll(Enumerable.Range(0, 10).Select(i => Dns.GetHostAddressesAsync("www.microsoft.com")));
}
[Benchmark]
public async Task DnsUniqueParallel()
{
await Task.WhenAll(Enumerable.Range(0, 4).Select(i => Task.Run(async () =>
{
string address = i switch
{
0 => "www.microsoft.com",
1 => "www.bing.com",
2 => "dotnet.microsoft.com",
_ => "azure.microsoft.com",
};
for (int i = 0; i < 1_000; i++)
await Dns.GetHostAddressesAsync(address);
})));
}
}
On the pros side, concurrent requests for the same address are able to coalesce to the same task, which means no additional allocation for that request, no additional threads blocked for that request, and for some of the requests significantly reduced time to execution because they may come in at the tail end of the actual DNS request finishing and be able to use it. If you were in a bad situation where you were blocking tons of thread pool threads all for the same address, this would help a lot. On the cons side, the implementation adds relatively large allocation overheads, since when a task isn't reused, it's still paying for the ability for it to be, which means the allocation inside of the ConcurrentDictionary for the relevant node, plus the additional ContinueWith allocations to be able to remove the node from the dictionary, plus an extra continuation in the task which basically guarantees the task will need to upgrade from being able to store the caller's await continuation in its continuation field to instead needing to allocate a list to store the multiple continuations. It also means some contention for mutations on the dictionary. And for cancelable Dns requests that do have cancellation requested, that are stuck in the thread pool's queue, and that aren't reused will no longer be removed from the thread pool's queue. There are obviously possible variations on my implementation, but they all have similar tradeoffs. For example, we could enable ref counting on the tracked requests in order to be able to cancel them if all associated requests had cancellation requested, but that would incur another relatively large allocation increase. |
That is my opinion as well. How do we validate that hypothesis?
I don't want to do add API for that. Any such mitigation would be temporary, only intended for platforms where we don't yet have an async I/O implementation, but we'd end up having to apply it to all implementations and support it forever more. If we really needed our default to be overridable in case something goes awry, we could respect a temporary env var that gets removed in the future. But, unless concrete evidence dictated it was really critical, I'd rather not add such global throttling at all. |
Do a survey? Ask a couple customers if they have this in their telemetry over the course of a single app instance run? Seems like doing this would only change the second point about limiting the number of overall concurrent requests, not the per name queuing.
That sounds even better. |
Actually thinking about it more, I don't think anyone would know how many they need concurrently based on telemetry. I think they would need to know based on what the app does and how it issues requests. Bare in mind we're only talking about concurrent dns requests not even the themselves http requests. We should also add an event source event that tracks the number of pending unique dns requests and concurrent dns requests overall. |
This is what serialization per host/type might look like: Same benchmarks as earlier:
You mean counters, right? Seems reasonable. We don't have any counters yet for System.Net.NameResolution, do we? I think we'd also just want overall DNS request count. I assume "concurrent dns requests overall" just means how many requests are currently in flight at that moment, regardless of hostname. Probably worth a separate issue for such telemetry. |
Yeah, you convinced me that this is a reasonable thing to do. In particular, in this case we can know we are issuing requests that are strongly related in their blocking characteristics -- if the first blocks, then every subsequent one will block too until the underlying DNS query is completed. That said, this doesn't do anything solve the general problem. DNS requests to different hostnames can still block, and this can lead to starvation. And I doubt that it would make sense to serialize all DNS requests, since requests to different hostnames are basically unrelated.
We can tune the heuristics however we want to avoid unbounded thread growth. To me, the interesting question here is: could the thread pool do better at avoiding starvation if it knew that some threads were blocked, by being somewhat more aggressive about spinning up more threads? I think the answer here is quite likely yes. Having info about thread blocking could have other benefits, too. Like, if it were comprehensive enough, it might allow us to reduce the number of threads executing when it is pretty sure there is no blocking happening, and thus improve overall performance.
The difference is that other components can use that same API. A DNS specific solution only helps if DNS is the sole problem. I suspect that's not at all the case; often it's a different component entirely, or it's user code, or it's a mix of a lot of things that add up. |
I think yes, for async over sync, we should run these on a different set of threads. |
Why different threads? Threads are threads. What I think matters is whether any given thread is blocking at the moment, or not. |
Treating all work as homogeneous is why the thread pool behaves like it does today. Work segregation is a very common way to isolate workloads of different types. While it doesn't need to be a different physical OS thread, it needs to be treated differently and there's a non trivial cost to adding a thread pool thread (work stealing...). |
+1
Yeah, good point; I'm not sure how work stealing fits in here. I have heard that there is some consideration being given to make work stealing queues per-CPU instead of per-thread. @jkotas @kouvel is that correct? |
We've experimented with it in the past. @VSadov had a prototype. |
There's also a cost to competing pools. |
Work segregation requires code to make an up-front decision about whether to queue a new Task to threadpool X (e.g. the regular threadpool) vs threadpool Y (e.g. the threadpool for stuff that may block) vs threadpool Z (whatever). This would work fine in the specific example here (async DNS) because we know we're about to invoke a (potentially) blocking call. But I've seen lots of code that ends up blocking somewhere deep in a call stack, and callers aren't even aware this is happening. In this case, the code is executing on the regular threadpool and it's not obvious how it would be moved to a separate threadpool with different execution semantics. That's why I think the solution here has to address arbitrary blocking code on the regular thread pool. |
The first part is absolutely right. This case is async over sync where we want to throttle the work so it doesn't burn the rest of the thread pool. The other scenario, sync over async usually want to inject more threads with the hope that things resolve itself. Though, obviously there are cases where injecting more threads can be worse. |
I think that's why if you look at our competition, they usually do something similar (to the throttle) when calling getaddr_info. They don't inject more threads to handle the load. |
Do they do this for sync p/invokes in general? |
BTW, I assume you mean Golang and Node.js here; if you mean something else let me know. The thing that these frameworks do that's very different from us is that they have very hard guarantees that work on the "regular threadpool" will never block. |
They handle this case differently to the general case of blocking.
Yes, they have different principles that's right but still need to handle the general case of blocking. They just really go out of their way to avoid it happening (for good reason) especially when they know there's no way around it (like calling getaddrinfo). |
As I said above, I'm happy to throttle/synchronize/whatever specific kinds of blocking calls when it makes sense because we have knowledge of what the calls are doing and how they are likely to block, etc. The per-hostname serialization here is a good example of it. But I don't think this addresses the general problem. |
Which was #47366 was about; to detect the blocking then apply queuing mitigations for that call path |
Yes, there was a prototype where:
I think Go handles blocking calls in a similar way, except in our case threads are always 1:1 with OS threads. Occasional need of extra 10-100 threads was easily tolerated. I had tests that did random Sleep(100) in tasks and yet completing without minute-long hiccups. As I see it - If you have a call that often blocks, let's say Sleep(100) for simplicity. And let's say you must call it 100 times and there is no way around that. - then you can do it concurrently, or you can do it sequentially. There are obviously other costs to adding a thread and hogging apps will eventually see them. Starvation tolerance is a plan-B feature, to be used after plan-A, which is "use async". |
Several distros use Are there open bugs for the issues you're running in with system DNS?
Yes, if there is a managed implementation, it should be opt-in. The system DNS is aware of configuration stuff the managed implementation would not know about. For example, |
Seems like it's asynchronous based on this text https://www.freedesktop.org/software/systemd/man/systemd-resolved.service.html |
After helping a customer look at a thread pool starvation case on linux on .NET Core 3.1 I ended up here. After doing some research and with some discussion on Twitter, it turns out that
getaddrinfo_a
uses an internal thread pool and blocks ongetaddrinfo
and isn't doing any async IO. This change is an improvement over what we had before because our threadpool doesn't grow but I'm not sure this change is a net positive in the long run. The thread pool limits are controlled by compile time constants in glibc (essentially, another library is doing async over sync for us on a less controllable threadpool...).I wonder if we're better off controlling this blocking code and maybe it should be possible to turn this off with a configuration switch.
The other improvement I was thinking about was only allowing one pending request to a specific host name concurrently. That would improve situations where DNS is slow and new blocking calls are issued for the same host name (which is the case the customer ran into) on thread pool threads.
cc @geoffkizer @stephentoub @scalablecory
The text was updated successfully, but these errors were encountered: