Thread pool starvation caused by sync waiting inside PhysicalBridge #2664

krasin-ga · 2024-03-07T13:14:39Z

Here is the code that starts a dedicated backlog processing thread and provides some motivation in comments on the decision not to use the thread pool:

StackExchange.Redis/src/StackExchange.Redis/PhysicalBridge.cs

Lines 901 to 912 in 441f89a

    
           // Start the backlog processor; this is a bit unorthodox, as you would *expect* this to just 
        
           // be Task.Run; that would work fine when healthy, but when we're falling on our face, it is 
        
           // easy to get into a thread-pool-starvation "spiral of death" if we rely on the thread-pool 
        
           // to unblock the thread-pool when there could be sync-over-async callers. Note that in reality, 
        
           // the initial "enough" of the back-log processor is typically sync, which means that the thread 
        
           // we start is actually useful, despite thinking "but that will just go async and back to the pool" 
        
           var thread = new Thread(s => ((PhysicalBridge)s!).ProcessBacklogAsync().RedisFireAndForget()) 
        
           { 
        
               IsBackground = true,                  // don't keep process alive (also: act like the thread-pool used to) 
        
               Name = "StackExchange.Redis Backlog", // help anyone looking at thread-dumps 
        
           }; 
        
           thread.Start(this);

But after asynchronously awaiting ProcessBridgeBacklogAsync, the rest of the code (including loop iterations) will not only continue on the thread pool thread but also block it on a sync lock:"

StackExchange.Redis/src/StackExchange.Redis/PhysicalBridge.cs

Lines 988 to 1012 in 441f89a

    
           private async Task ProcessBacklogAsync() 
        
           { 
        
               _backlogStatus = BacklogStatus.Starting; 
        
               try 
        
               { 
        
                   while (!isDisposed) 
        
                   { 
        
                       if (!_backlog.IsEmpty) 
        
                       { 
        
                           // TODO: vNext handoff this backlog to another primary ("can handle everything") connection 
        
                           // and remove any per-server commands. This means we need to track a bit of whether something 
        
                           // was server-endpoint-specific in PrepareToPushMessageToBridge (was the server ref null or not) 
        
                           await ProcessBridgeBacklogAsync().ForAwait(); 
        
                       } 
        
                       // The cost of starting a new thread is high, and we can bounce in and out of the backlog a lot. 
        
                       // So instead of just exiting, keep this thread waiting for 5 seconds to see if we got another backlog item. 
        
                       _backlogStatus = BacklogStatus.SpinningDown; 
        
                       // Note this is happening *outside* the lock 
        
                       var gotMore = _backlogAutoReset.WaitOne(5000); 
        
                       if (!gotMore) 
        
                       { 
        
                           break; 
        
                       } 
        
                   }

In our case, we have several relatively large Redis clusters with tens to hundreds of nodes, and sometimes we experience situations where the majority of thread pool threads are being blocked inside instances of PhysicalBridge:

So we have the best of two worlds: hundreds of expensive creations of worker threads and thread pool starvation 🙂

The text was updated successfully, but these errors were encountered:

mgravell · 2024-03-07T13:38:19Z

Now this is very interesting (pinging @NickCraver @philon-msft ) - it looks like the intent is for ProcessBacklogAsync to execute on a dedicated thread, but the fact that we have an await in there means that if anything is there, we immediately jump to the thread pool - and we also have a big fat WaitOne(5000). This looks... unintentional. And bad, especially (as you say) for large clusters.

Great report @krasin-ga ! Will investigate further.

NickCraver · 2024-03-08T01:05:56Z

Yep, fully agree, we changed how this works several times back at the beginning but the thread pool hopping would happen at several points now. Needs love and potentially capping somehow in thread creation lest we create hundreds for hundreds of endpoints during a storm.

krasin-ga · 2024-03-09T11:03:55Z

Thanks for the quick reply!

Please allow me to add my two cents.

I understand that this logic was added to circumvent issues related to using synchronous operations and thread pool starvation in general.

However, it seems that this is not the problem that the Redis client library should solve. It would be more rational to focus on minimizing overhead within the library itself. At first glance, it appears that abandoning synchronous locks in favor of wait-free alternatives and using System.Threading.Channels for the backlog could simplify the code and improve performance in high-load scenarios. Even better would be to extract the backlog and the logic of its processing out of PhysicalBridge into its own interface to make it configurable for different scenarios.

NickCraver · 2024-03-09T21:48:55Z

We're tweaking how this works in #2667 to reflect the current dedicated thread behavior. With respect to the follow-up comments: I'm just going to assume you've only taken a glance because the rest of it's pretty uninformed with respect to how things in the library behave.

I'm going to write off the other comments as not having taken any deep view of things in the library or dealt with thread pool starvation in hundreds of apps as we have and continue to help others with and leave it at that.

krasin-ga · 2024-03-10T13:57:32Z

As I understand the current architecture:

From the top view, PhysicalPridge is responsible for pushing messages to the backlog from which they will be read by a dedicated thread and serialized to a pipe (from which searialized messages will be pushed to the socket by PipeScheduler) or completed by timeout. Dedicated threads are only used to deal with situations when the runtime experiences thread pool starvation. There is one PhysicalPridge instance per connection. As of now there is a way to cofigure PipeScheduler, but there is no way to configure backlog processing behavior.

My points:

Having a thread per backlog/connection is far from optimal, especially when the Redis cluster is quite large. More threads mean more overhead on context switches, scheduling, etc.
If my application is experiencing thread pool starvation due to the usage of synchronous I/O or synchronous locking, having this will only hide the real source of the issue and not help much (especially if I use SocketManager.ThreadPool), and my application will still perform very poorly.
The .NET thread pool has gotten significantly better at detecting starvation and will quickly create threads by itself.
It's an absolutely good idea to help people diagnose that the thread pool is experiencing starvation, as in the message of RedisTimeoutException. Or to prevent such issues with the help of some Roslyn analyzer or find possible sources of starvation with Ben.BlockingDetector, etc.
It will not make things worse if backlog behavior can be make configurable

I wrote a quick benchmark that emulates backlog behavior and processes 10,000,000 items:

BenchmarkDotNet v0.13.12, Windows 11
AMD Ryzen 9 3900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 8.0.200
  [Host]     : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.2 (8.0.224.6711), X64 RyuJIT AVX2

| Method         | NumberOfConsumers | NumberOfProducers | Mean        | Error     | StdDev    | Median      |
|----------------|------------------ |------------------ |------------:|----------:|----------:|------------:|
| Tasks+Channels | 1                 | 1                 |  2,515.8 ms |  48.65 ms |  47.78 ms |  2,500.9 ms |
| Threads+CQ     | 1                 | 1                 |  2,518.7 ms |  44.62 ms |  41.74 ms |  2,517.4 ms |
| Tasks+Channels | 1                 | 8                 |  2,774.9 ms |  22.63 ms |  21.17 ms |  2,770.6 ms |
| Threads+CQ     | 1                 | 8                 |  3,265.1 ms |  88.79 ms | 261.79 ms |  3,432.6 ms |
| Tasks+Channels | 1                 | 32                |  2,831.8 ms |  55.89 ms |  87.02 ms |  2,825.2 ms |
| Threads+CQ     | 1                 | 32                |  3,791.6 ms |  72.86 ms |  77.96 ms |  3,770.4 ms |
| Tasks+Channels | 1                 | 128               |  3,117.2 ms |  59.07 ms |  68.03 ms |  3,127.5 ms |
| Threads+CQ     | 1                 | 128               |  3,986.0 ms |  77.86 ms |  98.47 ms |  3,987.2 ms |
| Tasks+Channels | 8                 | 1                 |  5,156.9 ms |  41.39 ms |  38.72 ms |  5,152.8 ms |
| Threads+CQ     | 8                 | 1                 | 19,966.5 ms |  86.58 ms |  80.99 ms | 19,966.4 ms |
| Tasks+Channels | 8                 | 8                 |    864.5 ms |  16.60 ms |  18.45 ms |    859.2 ms |
| Threads+CQ     | 8                 | 8                 |  1,586.8 ms |  31.37 ms |  55.76 ms |  1,601.4 ms |
| Tasks+Channels | 8                 | 32                |  1,117.6 ms |  16.38 ms |  15.32 ms |  1,116.2 ms |
| Threads+CQ     | 8                 | 32                |    912.3 ms |  18.10 ms |  47.67 ms |    913.7 ms |
| Tasks+Channels | 8                 | 128               |  1,206.8 ms |  23.42 ms |  30.45 ms |  1,198.5 ms |
| Threads+CQ     | 8                 | 128               |    983.0 ms |  18.51 ms |  19.80 ms |    988.7 ms |
| Tasks+Channels | 32                | 1                 |  5,191.6 ms |  31.90 ms |  28.28 ms |  5,184.7 ms |
| Threads+CQ     | 32                | 1                 | 19,619.6 ms | 162.84 ms | 144.36 ms | 19,644.6 ms |
| Tasks+Channels | 32                | 8                 |    977.1 ms |   8.87 ms |   8.29 ms |    978.7 ms |
| Threads+CQ     | 32                | 8                 |  1,825.8 ms |  35.98 ms |  38.50 ms |  1,838.9 ms |
| Tasks+Channels | 32                | 32                |    652.4 ms |   5.04 ms |   4.47 ms |    654.0 ms |
| Threads+CQ     | 32                | 32                |    835.5 ms |  16.55 ms |  45.58 ms |    839.3 ms |
| Tasks+Channels | 32                | 128               |    661.7 ms |  13.03 ms |  19.10 ms |    659.8 ms |
| Threads+CQ     | 32                | 128               |    839.7 ms |  16.69 ms |  36.29 ms |    844.8 ms |
| Tasks+Channels | 128               | 1                 |  4,756.7 ms |  21.07 ms |  17.59 ms |  4,760.6 ms |
| Threads+CQ     | 128               | 1                 | 19,599.3 ms | 262.33 ms | 245.39 ms | 19,649.7 ms |
| Tasks+Channels | 128               | 8                 |    995.9 ms |   9.45 ms |   8.84 ms |    997.0 ms |
| Threads+CQ     | 128               | 8                 |  2,869.4 ms |  56.35 ms | 100.17 ms |  2,864.1 ms |
| Tasks+Channels | 128               | 32                |    581.5 ms |  11.47 ms |  20.09 ms |    583.2 ms |
| Threads+CQ     | 128               | 32                |    867.5 ms |  17.07 ms |  27.56 ms |    869.9 ms |
| Tasks+Channels | 128               | 128               |    618.2 ms |   6.56 ms |   6.14 ms |    618.5 ms |
| Threads+CQ     | 128               | 128               |    912.5 ms |  19.55 ms |  57.66 ms |    921.6 ms |

Both methods can be optimized, but I hope they are still close enough for illustrative purposes.

PS: I'm not a native English speaker and don't use English for communication very often, so my selection of words may not be very accurate. Please forgive any unintended offense.

NickCraver · 2024-03-10T14:24:27Z

@krasin-ga I think we have mismatch in understanding here: that the backlog is part of the normal flow at all. It is not. The backlog is not normally written to, but it must be synchronously drained from to help prevent spiral of death scenarios (because it's quite likely that writes or response processing cannot complete due to starvation). The normal path for async commands is fully async and uses Pipes.

The backlog is only used as a buffer when we can't write to the connection, e.g. we're disconnected. And as per the docs, you can turn this off if you want to fail immediately indicating no connections are available. You're stating this can be optimized: hey sure, I'm all ears, but it needs to be in a way that the backlog has a chance of draining thread pool starvation is hit (which is a common scenario for applications losing connections unexpectedly, often having some sync-over-async elsewhere we don't control). That's why we create a thread to do this work and listen for incoming for 5 more seconds instead of repeatedly spinning up a thread if we're recovering from overwhelming the outbound socket buffer.

As for the thread pool growing, this is often a misconception when it gets into the details.

We support more than .NET 6+, e.g. .NET Framework which doesn't have any love for rapid thread pool growth expansion at all.
The .NET Core mechanism you're thinking of doesn't apply to Monitor.Wait. We discovered this after discussion with Stephen triggered by Rollback: remove .NET 6 build #1992 when we attempted to utilize that behavior.

krasin-ga · 2024-03-10T14:59:33Z

I think we have mismatch in understanding here: that the backlog is part of the normal flow at all. It is not

Yep, I certainly misunderstood that part. Thanks for the clarification!

Upd: But not that much: if we can't take mutex we still end up enqueuing to backlog, and this will happen quite often under load.

StackExchange.Redis/src/StackExchange.Redis/PhysicalBridge.cs

Lines 780 to 791 in 60e5d17

    
                           gotLock = _singleWriterMutex.Wait(0); 
        
                           if (!gotLock) 
        
           #else 
        
                           token = _singleWriterMutex.TryWait(WaitOptions.NoDelay); 
        
                           if (!token.Success) 
        
           #endif 
        
                           { 
        
                               // If we can't get it *instantaneously*, pass it to the backlog for throughput 
        
                               if (TryPushToBacklog(message, onlyIfExists: false)) 
        
                               { 
        
                                   return WriteResult.Success; // queued counts as success 
        
                               }

… dedicated thread (#2667) Resolving #2664. This went through a few iterations early on, but in the current form the issue is that we can await hop to a thread pool thread and have our wait blocking there - not awesome. We want to avoid that entirely and do a full sync path here.

NickCraver · 2024-03-11T15:34:41Z

@krasin-ga Thanks again for the report! Version 2.7.33 with this fixed up is available on NuGet now :)

krasin-ga · 2024-03-11T16:02:27Z

Nice, thanks for the quick fix!

giuliano-macedo · 2024-04-19T20:38:19Z

@krasin-ga @NickCraver Do you guys know which version this issue started to happen?

NickCraver · 2024-04-20T11:29:34Z

@giuliano-macedo I'd assume this was a potential issue when consuming a lot of endpoints since 2.5.x, that's my best guess here.

giuliano-macedo · 2024-04-22T12:17:47Z

@NickCraver Great! And thanks for the fix!

NickCraver mentioned this issue Mar 8, 2024

Fix #2664 Backlog: move to full synchronous behavior since we're on a dedicated thread #2667

Merged

NickCraver linked a pull request Mar 11, 2024 that will close this issue

Fix #2664 Backlog: move to full synchronous behavior since we're on a dedicated thread #2667

Merged

NickCraver closed this as completed in #2667 Mar 11, 2024

gabbsmo mentioned this issue Nov 15, 2024

Worker/IOCP Thread, Timeouts and Azure App Service #2311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread pool starvation caused by sync waiting inside PhysicalBridge #2664

Thread pool starvation caused by sync waiting inside PhysicalBridge #2664

krasin-ga commented Mar 7, 2024

mgravell commented Mar 7, 2024

NickCraver commented Mar 8, 2024

krasin-ga commented Mar 9, 2024

NickCraver commented Mar 9, 2024

krasin-ga commented Mar 10, 2024

NickCraver commented Mar 10, 2024

krasin-ga commented Mar 10, 2024 •

edited

Loading

NickCraver commented Mar 11, 2024

krasin-ga commented Mar 11, 2024

giuliano-macedo commented Apr 19, 2024

NickCraver commented Apr 20, 2024

giuliano-macedo commented Apr 22, 2024

Thread pool starvation caused by sync waiting inside PhysicalBridge #2664

Thread pool starvation caused by sync waiting inside PhysicalBridge #2664

Comments

krasin-ga commented Mar 7, 2024

mgravell commented Mar 7, 2024

NickCraver commented Mar 8, 2024

krasin-ga commented Mar 9, 2024

NickCraver commented Mar 9, 2024

krasin-ga commented Mar 10, 2024

NickCraver commented Mar 10, 2024

krasin-ga commented Mar 10, 2024 • edited Loading

NickCraver commented Mar 11, 2024

krasin-ga commented Mar 11, 2024

giuliano-macedo commented Apr 19, 2024

NickCraver commented Apr 20, 2024

giuliano-macedo commented Apr 22, 2024

krasin-ga commented Mar 10, 2024 •

edited

Loading