CounterEvent's synchronous design causes thread starvation under load #755

DaRosenberg · 2018-08-14T13:40:47Z

Which service(blob, file, queue, table) does this issue concern?

Blob

Which version of the SDK was used?

9.3.0

Which platform are you using? (ex: .NET Core 2.1)

.NET Core 2.1

What problem was encountered?

The type CounterEvent is used in the stream implementations to wait for all pending operations to finish before returning from a flush operation.

The implementation of CounterEvent is synchronous and based on an underlying ManualResetEvent. Stream implementations queue up a thread pool operation to wait for the counter to reach zero. This thread pool operation gets scheduled on a dedicated thread, which then blocks for the duration of the wait:

azure-storage-net/Lib/WindowsRuntime/Blob/BlobWriteStream.cs

Line 192 in 38425e7

await Task.Run(() => this.noPendingWritesEvent.Wait(), cancellationToken);

This has turned out to be a significant scalability for us, as high concurrency quickly leads to thread pool starvation as a lot of these threads are in this waiting state for a long time.

Have you found a mitigation/solution?

In our fork, we changed the CounterEvent implementation to use an AsyncManualResetEvent behind the scenes and to provide a WaitAsync() methods which the stream implementations can use to wait without blocking a whole thread:

await this.noPendingWritesEvent.WaitAsync(cancellationToken);

An even cleaner alternative could be to replace the CounterEvent completely with an AsyncCountdownEvent.

This change removed the scalability bottleneck for us and allowed us to reach almost perfect and infinite scalability in our service. Scalability increased by 20x at least (we stopped measuring) and the CPU and the capacity of the backend blob storage account are now the only limits.

The text was updated successfully, but these errors were encountered:

rickle-msft · 2018-08-17T22:48:06Z

Hi, @DaRosenberg. Thank you for opening this issue. I have added it to our team's backlog for further consideration. We will get back to you when we have more information to share.

DaRosenberg · 2018-08-17T23:47:04Z

@rickle-msft Thanks! May I ask - are you accepting community contributions at all? If so I am happy to submit a PR for this, since we successfully improved this in our fork.

I ask because, while your readme says you do welcome contributions, I have a quite small and simple PR open since January which nobody has either approved, rejected or commented on. I went to great lengths to ensure it's up to your standards, so it doesn't feel very encouraging that you guys don't even look at it...

Can you clarify whether you do in fact welcome contributions, and if not, maybe remove the section of your readme which solicits them?

rickle-msft · 2018-08-18T00:03:39Z

@DaRosenberg We do accept community contributions. Evidently, we have not been as responsive to all PRs as we ought to have, and I am very sorry for the frustration this has caused. Let me discuss with my team on Monday morning why we have not reviewed your other PR and whether we would realistically be able to review a PR you submit for this before you go through the hassle for doing so.

Thank you again for you communication and contributions and patience.

kfarmer-msft · 2018-11-29T02:38:15Z

@DaRosenberg

It looks like an implementation of CounterEventAsync made it into our split library some time ago, but hasn't been put to use in the methods you're discussing. I've made some changes and am running tests now.

DaRosenberg · 2018-12-01T11:24:26Z

@kfarmer-msft That would be sweet. Would love to move off our fork and back onto the official package.

kfarmer-msft · 2018-12-17T20:59:06Z

@DaRosenberg: Please try v9.4.2

DaRosenberg · 2018-12-23T23:18:57Z

@kfarmer-msft Did some perf testing on 9.4.2 today. It is definitely an improvement, but we are not seeing the same near-perfect linear scalability we are getting with our fork.

I've reviewed the commit you referenced, and as far as I can see you've covered all the places we needed to change. Probably something else was introduced into the library since then that causes some bottleneck somewhere else. Maybe I'll find the time to do some more line-level profiling some day.

kfarmer-msft added a commit that referenced this issue Dec 4, 2018

Use CounterEventAsync (GitHub #755)

ecca142

kfarmer-msft closed this as completed Dec 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CounterEvent's synchronous design causes thread starvation under load #755

CounterEvent's synchronous design causes thread starvation under load #755

DaRosenberg commented Aug 14, 2018

rickle-msft commented Aug 17, 2018

DaRosenberg commented Aug 17, 2018

rickle-msft commented Aug 18, 2018

kfarmer-msft commented Nov 29, 2018

DaRosenberg commented Dec 1, 2018

kfarmer-msft commented Dec 17, 2018

DaRosenberg commented Dec 23, 2018 •

edited

Loading

CounterEvent's synchronous design causes thread starvation under load #755

CounterEvent's synchronous design causes thread starvation under load #755

Comments

DaRosenberg commented Aug 14, 2018

Which service(blob, file, queue, table) does this issue concern?

Which version of the SDK was used?

Which platform are you using? (ex: .NET Core 2.1)

What problem was encountered?

Have you found a mitigation/solution?

rickle-msft commented Aug 17, 2018

DaRosenberg commented Aug 17, 2018

rickle-msft commented Aug 18, 2018

kfarmer-msft commented Nov 29, 2018

DaRosenberg commented Dec 1, 2018

kfarmer-msft commented Dec 17, 2018

DaRosenberg commented Dec 23, 2018 • edited Loading

DaRosenberg commented Dec 23, 2018 •

edited

Loading