Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Ensure ConcurrentBag's TryTake is linearizable #30947

Merged
merged 1 commit into from
Jul 12, 2018

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Jul 10, 2018

For .NET Core 2.0, I ported the ThreadPool's work-stealing implementation to ConcurrentBag, leading to significant performance throughput and allocation improvements. However, there's a subtle difference in the concurrency guarantees the ThreadPool's implementation provided from what ConcurrentBag needs, which ends up breaking certain usage patterns on top of ConcurrentBag.

Specifically, ThreadPool's "steal" implementation need not be fully linearizable. It's possible for a thread to see the bag's count as 1, and then while the thread is doing a take/steal for its count to never drop below 1, but for the steal to still fail, even though there was always an item available. This is ok for the thread pool because it manages a known count of work items in the queues separately, and if it sees that there are still items available after a steal has failed, it'll try again. That "try again" logic provided above the work-stealing queue thus didn't make it over to ConcurrentBag, which breaks some usages of ConcurrentBag, in particular cases where a type like BlockingCollection is wrapping the bag and managing its own count. It's possible now for BlockingCollection to know that there's an item in the bag but to then fail to take it, which causes problems such as exceptions being thrown.

The fix is to port back the relevant portion of ConcurrentBag from .NET Core 1.x / .NET Framework, where local push operations on a list track the number of times the list transitions from empty to non-empty. A steal operation then looks at those counts prior to doing the steal, and if the steal fails, it looks again after: if the count has increased, it retries. This unfortunately means that local pushes on small lists are now more expensive than in .NET Core 2.0/2.1, as if there are <= 2 items in the list, it takes the lock, but this seems unavoidable given the work-stealing design.

Contributes to https://github.com/dotnet/corefx/issues/30781
cc: @kouvel, @tarekgh, @ReubenBond
Please pay close attention to the details of the change. I've convinced myself that it's correct, but I've also been staring at it for a while, so it's very possible I've missed something. I tried to keep the logic as close to the original .NET Core 1.x / .NET Framework logic as possible. The main difference is that rather than having each list maintain its own empty-to-nonempty transition count, I have a single count that's incremented via interlocked increment by all of them. I did that in part to keep things simpler, even though it adds a bit more expense, but also because I couldn't convince myself that the original logic used by netfx here was actually correct; if you can, we could consider switching to that, which would make some adds slightly less expensive while making steals more expensive.

For .NET Core 2.0, I ported the ThreadPool's work-stealing implementation to ConcurrentBag, leading to significant performance throughput and allocation improvements.  However, there's a subtle difference in the concurrency guarantees the ThreadPool's implementation provided from what ConcurrentBag needs, which ends up breaking certain usage patterns on top of ConcurrentBag.

Specifically, ThreadPool's "steal" implementation need not be fully linearizable.  It's possible for a thread to see the bag's count as 1, and then while the thread is doing a take/steal for its count to never drop below 1, but for the steal to still fail, even though there was always an item available.  This is ok for the thread pool because it manages a known count of work items in the queues separately, and if it sees that there are still items available after a steal has failed, it'll try again.  That "try again" logic provided above the work-stealing queue thus didn't make it over to ConcurrentBag, which breaks some usages of ConcurrentBag, in particular cases where a type like BlockingCollection is wrapping the bag and managing its own count.  It's possible now for BlockingCollection to know that there's an item in the bag but to then fail to take it, which causes problems such as exceptions being thrown.

The fix is to port back the relevant portion of ConcurrentBag from .NET Core 1.x / .NET Framework, where local push operations on a list track the number of times the list transitions from empty to non-empty.  A steal operation then looks at those counts prior to doing the steal, and if the steal fails, it looks again after: if the count has increased, it retries.  This unfortunately means that local pushes on small lists are now more expensive than in .NET Core 2.0/2.1, as if there are <= 2 items in the list, it takes the lock, but this seems unavoidable given the work-stealing design.
// We work around this by looking at the number of times any list transitions from == 0 to > 0,
// checking that before and after the steal attempts. We don't care about > 0 to > 0 transitions,
// because a steal from a list with > 0 elements would have been successful.
long initialEmptyToNonEmptyCounts = Interlocked.Read(ref _emptyToNonEmptyListTransitionCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initialEmptyToNonEmptyCounts [](start = 21, length = 28)

Is it possible in some worst case this operation can starve? I mean while executing here, other threads can push items (which change _emptyToNonEmptyListTransitionCount) and then some other third thread Take the pushed item and this operation continue to happen without having this current thread to get a chance to steal any item and in same time _emptyToNonEmptyListTransitionCount keep changing? I know this is hypothetical but it may be possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's possible, but the chances of that happening are small, and get smaller and smaller the more times it would need to loop. It's also true with ConcurrentBag in netfx today, as this is just adopting the same scheme it has.

@@ -62,7 +64,7 @@ public ConcurrentBag(IEnumerable<T> collection)
WorkStealingQueue queue = GetCurrentThreadWorkStealingQueue(forceCreate: true);
foreach (T item in collection)
{
queue.LocalPush(item);
queue.LocalPush(item, ref _emptyToNonEmptyListTransitionCount);
Copy link
Member

@tarekgh tarekgh Jul 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref _emptyToNonEmptyListTransitionCount [](start = 38, length = 39)

why we are passing _emptyToNonEmptyListTransitionCount to LocalPush and not letting LocalPush to access _emptyToNonEmptyListTransitionCount as it is object field anyway? I am seeing we always access it with Interlock operation so it should be safe. is this because erf reason?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_emptyToNonEmptyListTransitionCount is a field on ConcurrentBag; LocalPush is a method on WorkStealingQueue, which doesn't have a back reference to the ConcurrentBag with which it's associated.

/// <summary>The head work stealing queue in a linked list of queues.</summary>
private volatile WorkStealingQueue _workStealingQueues;
/// <summary>Number of times any list transitions from empty to non-empty.</summary>
private long _emptyToNonEmptyListTransitionCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long [](start = 16, length = 4)

does it make any difference if we have this as ulong instead? just to always have positive number? no strong feeling though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really... we're just doing an == check on the two values.

Debug.Assert(_headIndex <= _tailIndex);

_currentOp = (int)Operation.Add;
Interlocked.Exchange(ref _currentOp, (int)Operation.Add); // ensure subsequent reads aren't reordered before this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the surrounding lock already guarantee that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that releasing a lock has store release semantics rather than being a full barrier. If it is store release, that would prevent any reads/writes from inside the lock from moving out, but it wouldn't prevent a later read from moving up before this write to _currentOp, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah that's true

@stephentoub stephentoub merged commit 864e82e into dotnet:master Jul 12, 2018
@stephentoub stephentoub deleted the cbfix branch July 12, 2018 02:14
stephentoub added a commit to stephentoub/corefx that referenced this pull request Jul 12, 2018
For .NET Core 2.0, I ported the ThreadPool's work-stealing implementation to ConcurrentBag, leading to significant performance throughput and allocation improvements.  However, there's a subtle difference in the concurrency guarantees the ThreadPool's implementation provided from what ConcurrentBag needs, which ends up breaking certain usage patterns on top of ConcurrentBag.

Specifically, ThreadPool's "steal" implementation need not be fully linearizable.  It's possible for a thread to see the bag's count as 1, and then while the thread is doing a take/steal for its count to never drop below 1, but for the steal to still fail, even though there was always an item available.  This is ok for the thread pool because it manages a known count of work items in the queues separately, and if it sees that there are still items available after a steal has failed, it'll try again.  That "try again" logic provided above the work-stealing queue thus didn't make it over to ConcurrentBag, which breaks some usages of ConcurrentBag, in particular cases where a type like BlockingCollection is wrapping the bag and managing its own count.  It's possible now for BlockingCollection to know that there's an item in the bag but to then fail to take it, which causes problems such as exceptions being thrown.

The fix is to port back the relevant portion of ConcurrentBag from .NET Core 1.x / .NET Framework, where local push operations on a list track the number of times the list transitions from empty to non-empty.  A steal operation then looks at those counts prior to doing the steal, and if the steal fails, it looks again after: if the count has increased, it retries.  This unfortunately means that local pushes on small lists are now more expensive than in .NET Core 2.0/2.1, as if there are <= 2 items in the list, it takes the lock, but this seems unavoidable given the work-stealing design.
danmoseley pushed a commit that referenced this pull request Jul 17, 2018
For .NET Core 2.0, I ported the ThreadPool's work-stealing implementation to ConcurrentBag, leading to significant performance throughput and allocation improvements.  However, there's a subtle difference in the concurrency guarantees the ThreadPool's implementation provided from what ConcurrentBag needs, which ends up breaking certain usage patterns on top of ConcurrentBag.

Specifically, ThreadPool's "steal" implementation need not be fully linearizable.  It's possible for a thread to see the bag's count as 1, and then while the thread is doing a take/steal for its count to never drop below 1, but for the steal to still fail, even though there was always an item available.  This is ok for the thread pool because it manages a known count of work items in the queues separately, and if it sees that there are still items available after a steal has failed, it'll try again.  That "try again" logic provided above the work-stealing queue thus didn't make it over to ConcurrentBag, which breaks some usages of ConcurrentBag, in particular cases where a type like BlockingCollection is wrapping the bag and managing its own count.  It's possible now for BlockingCollection to know that there's an item in the bag but to then fail to take it, which causes problems such as exceptions being thrown.

The fix is to port back the relevant portion of ConcurrentBag from .NET Core 1.x / .NET Framework, where local push operations on a list track the number of times the list transitions from empty to non-empty.  A steal operation then looks at those counts prior to doing the steal, and if the steal fails, it looks again after: if the count has increased, it retries.  This unfortunately means that local pushes on small lists are now more expensive than in .NET Core 2.0/2.1, as if there are <= 2 items in the list, it takes the lock, but this seems unavoidable given the work-stealing design.
danmoseley added a commit that referenced this pull request Jul 17, 2018
danmoseley added a commit that referenced this pull request Jul 18, 2018
@karelz karelz added this to the 3.0 milestone Jul 19, 2018
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
For .NET Core 2.0, I ported the ThreadPool's work-stealing implementation to ConcurrentBag, leading to significant performance throughput and allocation improvements.  However, there's a subtle difference in the concurrency guarantees the ThreadPool's implementation provided from what ConcurrentBag needs, which ends up breaking certain usage patterns on top of ConcurrentBag.

Specifically, ThreadPool's "steal" implementation need not be fully linearizable.  It's possible for a thread to see the bag's count as 1, and then while the thread is doing a take/steal for its count to never drop below 1, but for the steal to still fail, even though there was always an item available.  This is ok for the thread pool because it manages a known count of work items in the queues separately, and if it sees that there are still items available after a steal has failed, it'll try again.  That "try again" logic provided above the work-stealing queue thus didn't make it over to ConcurrentBag, which breaks some usages of ConcurrentBag, in particular cases where a type like BlockingCollection is wrapping the bag and managing its own count.  It's possible now for BlockingCollection to know that there's an item in the bag but to then fail to take it, which causes problems such as exceptions being thrown.

The fix is to port back the relevant portion of ConcurrentBag from .NET Core 1.x / .NET Framework, where local push operations on a list track the number of times the list transitions from empty to non-empty.  A steal operation then looks at those counts prior to doing the steal, and if the steal fails, it looks again after: if the count has increased, it retries.  This unfortunately means that local pushes on small lists are now more expensive than in .NET Core 2.0/2.1, as if there are <= 2 items in the list, it takes the lock, but this seems unavoidable given the work-stealing design.

Commit migrated from dotnet/corefx@864e82e
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants