-
Notifications
You must be signed in to change notification settings - Fork 370
CounterEvent's synchronous design causes thread starvation under load #755
Comments
Hi, @DaRosenberg. Thank you for opening this issue. I have added it to our team's backlog for further consideration. We will get back to you when we have more information to share. |
@rickle-msft Thanks! May I ask - are you accepting community contributions at all? If so I am happy to submit a PR for this, since we successfully improved this in our fork. I ask because, while your readme says you do welcome contributions, I have a quite small and simple PR open since January which nobody has either approved, rejected or commented on. I went to great lengths to ensure it's up to your standards, so it doesn't feel very encouraging that you guys don't even look at it... Can you clarify whether you do in fact welcome contributions, and if not, maybe remove the section of your readme which solicits them? |
@DaRosenberg We do accept community contributions. Evidently, we have not been as responsive to all PRs as we ought to have, and I am very sorry for the frustration this has caused. Let me discuss with my team on Monday morning why we have not reviewed your other PR and whether we would realistically be able to review a PR you submit for this before you go through the hassle for doing so. Thank you again for you communication and contributions and patience. |
It looks like an implementation of CounterEventAsync made it into our split library some time ago, but hasn't been put to use in the methods you're discussing. I've made some changes and am running tests now. |
@kfarmer-msft That would be sweet. Would love to move off our fork and back onto the official package. |
@DaRosenberg: Please try v9.4.2 |
@kfarmer-msft Did some perf testing on 9.4.2 today. It is definitely an improvement, but we are not seeing the same near-perfect linear scalability we are getting with our fork. I've reviewed the commit you referenced, and as far as I can see you've covered all the places we needed to change. Probably something else was introduced into the library since then that causes some bottleneck somewhere else. Maybe I'll find the time to do some more line-level profiling some day. |
Which service(blob, file, queue, table) does this issue concern?
Blob
Which version of the SDK was used?
9.3.0
Which platform are you using? (ex: .NET Core 2.1)
.NET Core 2.1
What problem was encountered?
The type CounterEvent is used in the stream implementations to wait for all pending operations to finish before returning from a flush operation.
The implementation of
CounterEvent
is synchronous and based on an underlyingManualResetEvent
. Stream implementations queue up a thread pool operation to wait for the counter to reach zero. This thread pool operation gets scheduled on a dedicated thread, which then blocks for the duration of the wait:azure-storage-net/Lib/WindowsRuntime/Blob/BlobWriteStream.cs
Line 192 in 38425e7
This has turned out to be a significant scalability for us, as high concurrency quickly leads to thread pool starvation as a lot of these threads are in this waiting state for a long time.
Have you found a mitigation/solution?
In our fork, we changed the
CounterEvent
implementation to use an AsyncManualResetEvent behind the scenes and to provide aWaitAsync()
methods which the stream implementations can use to wait without blocking a whole thread:An even cleaner alternative could be to replace the
CounterEvent
completely with an AsyncCountdownEvent.This change removed the scalability bottleneck for us and allowed us to reach almost perfect and infinite scalability in our service. Scalability increased by 20x at least (we stopped measuring) and the CPU and the capacity of the backend blob storage account are now the only limits.
The text was updated successfully, but these errors were encountered: