Ensure ConcurrentBag's TryTake is linearizable #30947

stephentoub · 2018-07-10T15:25:42Z

For .NET Core 2.0, I ported the ThreadPool's work-stealing implementation to ConcurrentBag, leading to significant performance throughput and allocation improvements. However, there's a subtle difference in the concurrency guarantees the ThreadPool's implementation provided from what ConcurrentBag needs, which ends up breaking certain usage patterns on top of ConcurrentBag.

Specifically, ThreadPool's "steal" implementation need not be fully linearizable. It's possible for a thread to see the bag's count as 1, and then while the thread is doing a take/steal for its count to never drop below 1, but for the steal to still fail, even though there was always an item available. This is ok for the thread pool because it manages a known count of work items in the queues separately, and if it sees that there are still items available after a steal has failed, it'll try again. That "try again" logic provided above the work-stealing queue thus didn't make it over to ConcurrentBag, which breaks some usages of ConcurrentBag, in particular cases where a type like BlockingCollection is wrapping the bag and managing its own count. It's possible now for BlockingCollection to know that there's an item in the bag but to then fail to take it, which causes problems such as exceptions being thrown.

The fix is to port back the relevant portion of ConcurrentBag from .NET Core 1.x / .NET Framework, where local push operations on a list track the number of times the list transitions from empty to non-empty. A steal operation then looks at those counts prior to doing the steal, and if the steal fails, it looks again after: if the count has increased, it retries. This unfortunately means that local pushes on small lists are now more expensive than in .NET Core 2.0/2.1, as if there are <= 2 items in the list, it takes the lock, but this seems unavoidable given the work-stealing design.

Contributes to https://github.com/dotnet/corefx/issues/30781
cc: @kouvel, @tarekgh, @ReubenBond
Please pay close attention to the details of the change. I've convinced myself that it's correct, but I've also been staring at it for a while, so it's very possible I've missed something. I tried to keep the logic as close to the original .NET Core 1.x / .NET Framework logic as possible. The main difference is that rather than having each list maintain its own empty-to-nonempty transition count, I have a single count that's incremented via interlocked increment by all of them. I did that in part to keep things simpler, even though it adds a bit more expense, but also because I couldn't convince myself that the original logic used by netfx here was actually correct; if you can, we could consider switching to that, which would make some adds slightly less expensive while making steals more expensive.

For .NET Core 2.0, I ported the ThreadPool's work-stealing implementation to ConcurrentBag, leading to significant performance throughput and allocation improvements. However, there's a subtle difference in the concurrency guarantees the ThreadPool's implementation provided from what ConcurrentBag needs, which ends up breaking certain usage patterns on top of ConcurrentBag. Specifically, ThreadPool's "steal" implementation need not be fully linearizable. It's possible for a thread to see the bag's count as 1, and then while the thread is doing a take/steal for its count to never drop below 1, but for the steal to still fail, even though there was always an item available. This is ok for the thread pool because it manages a known count of work items in the queues separately, and if it sees that there are still items available after a steal has failed, it'll try again. That "try again" logic provided above the work-stealing queue thus didn't make it over to ConcurrentBag, which breaks some usages of ConcurrentBag, in particular cases where a type like BlockingCollection is wrapping the bag and managing its own count. It's possible now for BlockingCollection to know that there's an item in the bag but to then fail to take it, which causes problems such as exceptions being thrown. The fix is to port back the relevant portion of ConcurrentBag from .NET Core 1.x / .NET Framework, where local push operations on a list track the number of times the list transitions from empty to non-empty. A steal operation then looks at those counts prior to doing the steal, and if the steal fails, it looks again after: if the count has increased, it retries. This unfortunately means that local pushes on small lists are now more expensive than in .NET Core 2.0/2.1, as if there are <= 2 items in the list, it takes the lock, but this seems unavoidable given the work-stealing design.

tarekgh · 2018-07-10T16:21:20Z

src/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentBag.cs

+                // We work around this by looking at the number of times any list transitions from == 0 to > 0,
+                // checking that before and after the steal attempts.  We don't care about > 0 to > 0 transitions,
+                // because a steal from a list with > 0 elements would have been successful.
+                long initialEmptyToNonEmptyCounts = Interlocked.Read(ref _emptyToNonEmptyListTransitionCount);


initialEmptyToNonEmptyCounts [](start = 21, length = 28)

Is it possible in some worst case this operation can starve? I mean while executing here, other threads can push items (which change _emptyToNonEmptyListTransitionCount) and then some other third thread Take the pushed item and this operation continue to happen without having this current thread to get a chance to steal any item and in same time _emptyToNonEmptyListTransitionCount keep changing? I know this is hypothetical but it may be possible.

Yes, it's possible, but the chances of that happening are small, and get smaller and smaller the more times it would need to loop. It's also true with ConcurrentBag in netfx today, as this is just adopting the same scheme it has.

tarekgh · 2018-07-10T16:26:01Z

src/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentBag.cs

@@ -62,7 +64,7 @@ public ConcurrentBag(IEnumerable<T> collection)
            WorkStealingQueue queue = GetCurrentThreadWorkStealingQueue(forceCreate: true);
            foreach (T item in collection)
            {
-                queue.LocalPush(item);
+                queue.LocalPush(item, ref _emptyToNonEmptyListTransitionCount);


ref _emptyToNonEmptyListTransitionCount [](start = 38, length = 39)

why we are passing _emptyToNonEmptyListTransitionCount to LocalPush and not letting LocalPush to access _emptyToNonEmptyListTransitionCount as it is object field anyway? I am seeing we always access it with Interlock operation so it should be safe. is this because erf reason?

_emptyToNonEmptyListTransitionCount is a field on ConcurrentBag; LocalPush is a method on WorkStealingQueue, which doesn't have a back reference to the ConcurrentBag with which it's associated.

tarekgh · 2018-07-10T16:27:33Z

src/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentBag.cs

        /// <summary>The head work stealing queue in a linked list of queues.</summary>
        private volatile WorkStealingQueue _workStealingQueues;
+        /// <summary>Number of times any list transitions from empty to non-empty.</summary>
+        private long _emptyToNonEmptyListTransitionCount;


long [](start = 16, length = 4)

does it make any difference if we have this as ulong instead? just to always have positive number? no strong feeling though.

Not really... we're just doing an == check on the two values.

kouvel · 2018-07-11T20:12:06Z

src/System.Collections.Concurrent/src/System/Collections/Concurrent/ConcurrentBag.cs

                            Debug.Assert(_headIndex <= _tailIndex);

-                            _currentOp = (int)Operation.Add;
+                            Interlocked.Exchange(ref _currentOp, (int)Operation.Add); // ensure subsequent reads aren't reordered before this


Wouldn't the surrounding lock already guarantee that?

I was under the impression that releasing a lock has store release semantics rather than being a full barrier. If it is store release, that would prevent any reads/writes from inside the lock from moving out, but it wouldn't prevent a later read from moving up before this write to _currentOp, right?

Oh yeah that's true

For .NET Core 2.0, I ported the ThreadPool's work-stealing implementation to ConcurrentBag, leading to significant performance throughput and allocation improvements. However, there's a subtle difference in the concurrency guarantees the ThreadPool's implementation provided from what ConcurrentBag needs, which ends up breaking certain usage patterns on top of ConcurrentBag. Specifically, ThreadPool's "steal" implementation need not be fully linearizable. It's possible for a thread to see the bag's count as 1, and then while the thread is doing a take/steal for its count to never drop below 1, but for the steal to still fail, even though there was always an item available. This is ok for the thread pool because it manages a known count of work items in the queues separately, and if it sees that there are still items available after a steal has failed, it'll try again. That "try again" logic provided above the work-stealing queue thus didn't make it over to ConcurrentBag, which breaks some usages of ConcurrentBag, in particular cases where a type like BlockingCollection is wrapping the bag and managing its own count. It's possible now for BlockingCollection to know that there's an item in the bag but to then fail to take it, which causes problems such as exceptions being thrown. The fix is to port back the relevant portion of ConcurrentBag from .NET Core 1.x / .NET Framework, where local push operations on a list track the number of times the list transitions from empty to non-empty. A steal operation then looks at those counts prior to doing the steal, and if the steal fails, it looks again after: if the count has increased, it retries. This unfortunately means that local pushes on small lists are now more expensive than in .NET Core 2.0/2.1, as if there are <= 2 items in the list, it takes the lock, but this seems unavoidable given the work-stealing design.

…)" This reverts commit b6be5d7.

…)" (#31132) This reverts commit b6be5d7.

For .NET Core 2.0, I ported the ThreadPool's work-stealing implementation to ConcurrentBag, leading to significant performance throughput and allocation improvements. However, there's a subtle difference in the concurrency guarantees the ThreadPool's implementation provided from what ConcurrentBag needs, which ends up breaking certain usage patterns on top of ConcurrentBag. Specifically, ThreadPool's "steal" implementation need not be fully linearizable. It's possible for a thread to see the bag's count as 1, and then while the thread is doing a take/steal for its count to never drop below 1, but for the steal to still fail, even though there was always an item available. This is ok for the thread pool because it manages a known count of work items in the queues separately, and if it sees that there are still items available after a steal has failed, it'll try again. That "try again" logic provided above the work-stealing queue thus didn't make it over to ConcurrentBag, which breaks some usages of ConcurrentBag, in particular cases where a type like BlockingCollection is wrapping the bag and managing its own count. It's possible now for BlockingCollection to know that there's an item in the bag but to then fail to take it, which causes problems such as exceptions being thrown. The fix is to port back the relevant portion of ConcurrentBag from .NET Core 1.x / .NET Framework, where local push operations on a list track the number of times the list transitions from empty to non-empty. A steal operation then looks at those counts prior to doing the steal, and if the steal fails, it looks again after: if the count has increased, it retries. This unfortunately means that local pushes on small lists are now more expensive than in .NET Core 2.0/2.1, as if there are <= 2 items in the list, it takes the lock, but this seems unavoidable given the work-stealing design. Commit migrated from dotnet/corefx@864e82e

tarekgh reviewed Jul 10, 2018

View reviewed changes

tarekgh approved these changes Jul 10, 2018

View reviewed changes

kouvel approved these changes Jul 11, 2018

View reviewed changes

stephentoub merged commit 864e82e into dotnet:master Jul 12, 2018

stephentoub deleted the cbfix branch July 12, 2018 02:14

stephentoub mentioned this pull request Jul 12, 2018

[release/2.1] Ensure ConcurrentBag's TryTake is linearizable (#30947) #31009

Merged

danmoseley added a commit that referenced this pull request Jul 17, 2018

Revert "Ensure ConcurrentBag's TryTake is linearizable (#30947) (#31009…

3fb080c

…)" This reverts commit b6be5d7.

danmoseley added a commit that referenced this pull request Jul 18, 2018

Revert "Ensure ConcurrentBag's TryTake is linearizable (#30947) (#31009…

778a21f

…)" (#31132) This reverts commit b6be5d7.

karelz added this to the 3.0 milestone Jul 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure ConcurrentBag's TryTake is linearizable #30947

Ensure ConcurrentBag's TryTake is linearizable #30947

stephentoub commented Jul 10, 2018 •

edited

Loading

tarekgh Jul 10, 2018

stephentoub Jul 10, 2018

tarekgh Jul 10, 2018 •

edited

Loading

stephentoub Jul 10, 2018

tarekgh Jul 10, 2018

stephentoub Jul 10, 2018

kouvel Jul 11, 2018

stephentoub Jul 11, 2018

kouvel Jul 11, 2018

Ensure ConcurrentBag's TryTake is linearizable #30947

Ensure ConcurrentBag's TryTake is linearizable #30947

Conversation

stephentoub commented Jul 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tarekgh Jul 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephentoub commented Jul 10, 2018 •

edited

Loading

tarekgh Jul 10, 2018 •

edited

Loading