Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Regression in Concurrent (Stack/Queue)? #61673

Open
indy-singh opened this issue Nov 16, 2021 · 9 comments
Open

Performance Regression in Concurrent (Stack/Queue)? #61673

indy-singh opened this issue Nov 16, 2021 · 9 comments
Labels
area-System.Collections help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Milestone

Comments

@indy-singh
Copy link

Hopefully this is the right place; short and sweet....

Description

[MemoryDiagnoser]
public class CBench
{
    private const int ITERATIONS = 10_000_000;

    [Benchmark]
    public void ConcurrentBag()
    {
        var collection = new ConcurrentBag<int>();

        Parallel.For(0, ITERATIONS, i =>
        {
            collection.Add(i);
        });
    }

    [Benchmark]
    public void ConcurrentStack()
    {
        var collection = new ConcurrentStack<int>();

        Parallel.For(0, ITERATIONS, i =>
        {
            collection.Push(i);
        });
    }

    [Benchmark]
    public void ConcurrentQueue()
    {
        var collection = new ConcurrentQueue<int>();

        Parallel.For(0, ITERATIONS, i =>
        {
            collection.Enqueue(i);
        });
    }
}

Data (.NET Framework 4.8)

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1348 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4420.0), X64 RyuJIT
  DefaultJob : .NET Framework 4.8 (4.8.4420.0), X64 RyuJIT
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ConcurrentBag 817.8 ms 15.76 ms 32.90 ms 66000.0000 23000.0000 2000.0000 383 MB
ConcurrentStack 982.1 ms 19.53 ms 24.69 ms 54000.0000 20000.0000 3000.0000 307 MB
ConcurrentQueue 428.0 ms 4.41 ms 3.68 ms 15000.0000 6000.0000 2000.0000 82 MB

Data ( .NET 6.0.0)

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1348 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  DefaultJob : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ConcurrentBag 60.36 ms 1.187 ms 2.287 ms 1375.0000 1250.0000 1000.0000 120 MB
ConcurrentStack 1,308.30 ms 21.322 ms 19.945 ms 54000.0000 20000.0000 3000.0000 305 MB
ConcurrentQueue 1,120.96 ms 22.321 ms 34.087 ms 2000.0000 2000.0000 2000.0000 80 MB

Haven't done much digging other than the above. Not really sure where to being; is there a guide for tracking down perf. regression - happy to get stuck in :)

Regards,
Indy

@indy-singh indy-singh added the tenet-performance Performance related issue label Nov 16, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Nov 16, 2021
@ghost
Copy link

ghost commented Nov 16, 2021

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details

Hopefully this is the right place; short and sweet....

Description

[MemoryDiagnoser]
public class CBench
{
    private const int ITERATIONS = 10_000_000;

    [Benchmark]
    public void ConcurrentBag()
    {
        var collection = new ConcurrentBag<int>();

        Parallel.For(0, ITERATIONS, i =>
        {
            collection.Add(i);
        });
    }

    [Benchmark]
    public void ConcurrentStack()
    {
        var collection = new ConcurrentStack<int>();

        Parallel.For(0, ITERATIONS, i =>
        {
            collection.Push(i);
        });
    }

    [Benchmark]
    public void ConcurrentQueue()
    {
        var collection = new ConcurrentQueue<int>();

        Parallel.For(0, ITERATIONS, i =>
        {
            collection.Enqueue(i);
        });
    }
}

Data (.NET Framework 4.8)

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1348 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4420.0), X64 RyuJIT
  DefaultJob : .NET Framework 4.8 (4.8.4420.0), X64 RyuJIT
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ConcurrentBag 817.8 ms 15.76 ms 32.90 ms 66000.0000 23000.0000 2000.0000 383 MB
ConcurrentStack 982.1 ms 19.53 ms 24.69 ms 54000.0000 20000.0000 3000.0000 307 MB
ConcurrentQueue 428.0 ms 4.41 ms 3.68 ms 15000.0000 6000.0000 2000.0000 82 MB

Data ( .NET 6.0.0)

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1348 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
  DefaultJob : .NET 6.0.0 (6.0.21.52210), X64 RyuJIT
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ConcurrentBag 60.36 ms 1.187 ms 2.287 ms 1375.0000 1250.0000 1000.0000 120 MB
ConcurrentStack 1,308.30 ms 21.322 ms 19.945 ms 54000.0000 20000.0000 3000.0000 305 MB
ConcurrentQueue 1,120.96 ms 22.321 ms 34.087 ms 2000.0000 2000.0000 2000.0000 80 MB

Haven't done much digging other than the above. Not really sure where to being; is there a guide for tracking down perf. regression - happy to get stuck in :)

Regards,
Indy

Author: indy-singh
Assignees: -
Labels:

area-System.Collections, tenet-performance, untriaged

Milestone: -

@indy-singh
Copy link
Author

Adding data for .NET Core 3.1.21.

Data (.NET Core 3.1.21)

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1348 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100
  [Host]     : .NET Core 3.1.21 (CoreCLR 4.700.21.51404, CoreFX 4.700.21.51508), X64 RyuJIT
  DefaultJob : .NET Core 3.1.21 (CoreCLR 4.700.21.51404, CoreFX 4.700.21.51508), X64 RyuJIT
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ConcurrentBag 75.90 ms 1.422 ms 2.874 ms 1857.1429 1714.2857 1428.5714 128 MB
ConcurrentStack 1,255.74 ms 24.361 ms 29.000 ms 54000.0000 20000.0000 3000.0000 305 MB
ConcurrentQueue 384.13 ms 7.615 ms 22.335 ms 2000.0000 2000.0000 2000.0000 80 MB

@indy-singh
Copy link
Author

Adding data for .NET 5.0.12.

Data (.NET 5.0.12)

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1348 (21H1/May2021Update)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100
  [Host]     : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT
  DefaultJob : .NET 5.0.12 (5.0.1221.52207), X64 RyuJIT
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
ConcurrentBag 75.85 ms 1.508 ms 3.148 ms 1666.6667 1500.0000 1333.3333 121 MB
ConcurrentStack 1,281.29 ms 24.748 ms 21.938 ms 54000.0000 20000.0000 3000.0000 305 MB
ConcurrentQueue 381.40 ms 8.557 ms 25.231 ms 2000.0000 2000.0000 2000.0000 80 MB

@EgorBo
Copy link
Member

EgorBo commented Nov 16, 2021

Probably related: #56017

@EgorBo
Copy link
Member

EgorBo commented Nov 16, 2021

Took a quick look, here is where most of the time is spent:
image

@EgorBo
Copy link
Member

EgorBo commented Nov 16, 2021

The only PR that touched it was #46120 (#46714 (comment))

@indy-singh indy-singh changed the title Regression in Concurrent (Stack/Queue)? Performance Regression in Concurrent (Stack/Queue)? Nov 16, 2021
@scalablecory
Copy link
Contributor

Looks like the code can be made slightly better by not re-reading _headAndTail.Tail when CompareExchange fails but it won't make up for that difference...

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Nov 23, 2021
@eiriktsarpalis eiriktsarpalis added this to the Future milestone Nov 23, 2021
@eiriktsarpalis eiriktsarpalis added the help wanted [up-for-grabs] Good issue for external contributors label Nov 23, 2021
@Marluwe
Copy link

Marluwe commented Feb 14, 2023

Hi everyone,

this is my first active contribution - hope I can actually help :)

I had a look at the ConcurrentStack. For me, it looks like the performance regression comes from a modification in the SpinWait. A test with a modified version of SpinWait gave similar results in net7 compared to net4.8. The modification is motivated by the diff of SpinWait to an earlier version in 2019 (7c0a404).

Modification:

private void SpinOnceCore(int sleep1Threshold)
{
  //..
  if (_count >= sleep1Threshold && sleep1Threshold >= 0)
  {
    Thread.Sleep(1);
  }
  else
  {
    //..
  

changed to

private void SpinOnceCore(int sleep1Threshold)
{
  //..
  if (_count >= sleep1Threshold /*&& sleep1Threshold >= 0*/)
  {
    // Allow the sleep1Threshold with value -1 case
    Thread.Sleep(1);
  }
  else
  {
    //..
BenchmarkDotNet=v0.13.4, OS=Windows 10 (10.0.19044.2486/21H2/November2021Update)
Intel Core i7-6700HQ CPU 2.60GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.102
  [Host]   : .NET 7.0.2 (7.0.222.60605), X64 RyuJIT AVX2
  .NET 7.0 : .NET 7.0.2 (7.0.222.60605), X64 RyuJIT AVX2

Job=.NET 7.0  Runtime=.NET 7.0
Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
ConcurrentStack 2.992 s 0.0529 s 0.0741 s 1.00 0.00 56000.0000 29000.0000 3000.0000 305.94 MB 1.00
ConcurrentStackAlternative 2.577 s 0.0491 s 0.0821 s 0.86 0.04 56000.0000 30000.0000 4000.0000 305.58 MB 1.00
Method Job Runtime Mean Error StdDev Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
ConcurrentStack .NET 7.0 .NET 7.0 2.987 s 0.0589 s 0.0702 s 1.00 0.00 54000.0000 28000.0000 2000.0000 305.6 MB 1.00
ConcurrentStack .NET Framework 4.8 .NET Framework 4.8 2.526 s 0.0473 s 0.0464 s 0.85 0.02 53000.0000 19000.0000 3000.0000 306.93 MB 1.00

image

@stephentoub
Copy link
Member

For me, it looks like the performance regression comes from a modification in the SpinWait.

Thanks for taking a look. cc: @kouvel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Collections help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

6 participants