Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

built-in accurate and cross platform Memory Diagnoser #284

Merged
merged 14 commits into from
Nov 25, 2016

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Oct 16, 2016

Fixed isses:
#186 - don't include allocations from setup and cleanup
#200 - be accurate about allocated bytes/op, this was possible mostly due to #277
#208 - without using ETW tests are passing on appveyor now
#133 - scale the results, make them stable

Extras:

  • cross platform Memory Diagnoser
  • Memory Diagnoser as part of BenchmarkDotNet.Core, enabled by default

Todos:

  1. Recently MS added GC.GetAllocatedBytesForCurrentThread to public api surface. We can't use it yet, because this version of Runtime is not yet available at nuget.org. As soon as they release it it should be relatively easy to use it. This will give us bytes allocated per operation for .NET Core as well. Done
  2. When we benchmark Task-returning method we call .GetAwaiter, which most probably allocates memory. This should be verified, if it's true then cost should be excluded from final results. verified, awaiter is struct, no extra allocations ;)

@adamsitnik
Copy link
Member Author

@mattwarren @AndreyAkinshin any comments on the PR?

@AndreyAkinshin
Copy link
Member

@adamsitnik, the source code looks good to me. Give me some time, I will check how it works on Linux.

// AppDomain. The number is accurate as of the last garbage collection." - CLR via C#
// so we enforce GC.Collect here just to make sure we get accurate results
GC.Collect();
return AppDomain.CurrentDomain.MonitoringTotalAllocatedMemorySize;

This comment was marked as spam.

This comment was marked as spam.

if (results.ContainsKey(benchmark))
{
var result = results[benchmark];
// TODO scale this based on the minimum value in the column, i.e. use B/KB/MB as appropriate

This comment was marked as spam.

@mattwarren
Copy link
Contributor

@adamsitnik Finally had a chance to play with this, it's really nice, great job!!

I've tested out scenarios similar to #186 and #200 and I agree that they're fixed.

I also tested #133 (reported by @xoofx). To do this I ran the IntroGC benchmark 3 times, with a modified build that prints out the Total # of Ops per/benchmark, I got the following results:

Method GcServer GcForce Run 1 Run 2 Run 3
'stackalloc byte[10KB]' False True 1,006,632,960 1,006,632,960 1,006,632,960
'stackalloc byte[10KB]' False False 1,006,632,960 1,006,632,960 1,006,632,960
'stackalloc byte[10KB]' True True 503,316,480 503,316,480 503,316,480
'stackalloc byte[10KB]' True False 503,316,480 503,316,480 503,316,480
'new byte[10KB]' False True 7,864,320 7,864,320 7,864,320
'new byte[10KB]' False False 7,864,320 7,864,320 7,864,320
'new byte[10KB]' True True 7,864,320 7,864,320 7,864,320
'new byte[10KB]' True False 7,864,320 7,864,320 7,864,320

Also as you can see in the outputs below, the # of Gen 0 collections and Bytes Allocated/Op are stable across the runs (Full output Run1, Run 2, Run 3)


Run 1

Method GcServer GcForce Mean StdErr StdDev Median Gen 0 Gen 1 Gen 2 Bytes Allocated/Op
'stackalloc byte[10KB]' False True 1.0580 ns 0.1378 ns 1.0671 ns 1.0284 ns - - - 0.00
'stackalloc byte[10KB]' False False 1.0666 ns 0.1389 ns 1.0758 ns 1.0266 ns - - - 0.00
'stackalloc byte[10KB]' True True 1.0732 ns 0.1399 ns 1.0833 ns 1.0236 ns - - - 0.00
'stackalloc byte[10KB]' True False 1.0801 ns 0.1408 ns 1.0903 ns 1.0410 ns - - - 0.00
'new byte[10KB]' False True 108.7057 ns 14.2016 ns 110.0048 ns 102.1161 ns 24,835.00 - - 10,048.00
'new byte[10KB]' False False 110.6094 ns 14.4250 ns 111.7358 ns 106.6167 ns 25,045.00 - - 10,048.00
'new byte[10KB]' True True 155.6753 ns 20.5542 ns 159.2124 ns 139.0744 ns 1,131.00 - - 10,042.00
'new byte[10KB]' True False 168.1659 ns 22.5059 ns 174.3302 ns 137.8017 ns 1,330.00 - - 10,042.00

Run 2

Method GcServer GcForce Mean StdErr StdDev Median Gen 0 Gen 1 Gen 2 Bytes Allocated/Op
'stackalloc byte[10KB]' False True 1.0694 ns 0.1392 ns 1.0786 ns 1.0444 ns - - - 0.00
'stackalloc byte[10KB]' False False 1.0722 ns 0.1396 ns 1.0814 ns 1.0501 ns - - - 0.00
'stackalloc byte[10KB]' True False 1.0765 ns 0.1402 ns 1.0861 ns 1.0419 ns - - - 0.00
'stackalloc byte[10KB]' True True 1.0790 ns 0.1405 ns 1.0883 ns 1.0459 ns - - - 0.00
'new byte[10KB]' False True 111.6729 ns 14.6014 ns 113.1016 ns 102.3328 ns 24,831.00 - - 10,048.00
'new byte[10KB]' False False 112.5393 ns 14.6706 ns 113.6381 ns 106.2042 ns 25,045.00 - - 10,048.00
'new byte[10KB]' True False 158.9486 ns 21.4685 ns 166.2941 ns 128.0673 ns 1,331.00 - - 10,042.00
'new byte[10KB]' True True 161.1846 ns 21.2168 ns 164.3446 ns 137.8021 ns 1,132.00 - - 10,042.00

Run 3

Method GcServer GcForce Mean StdErr StdDev Median Gen 0 Gen 1 Gen 2 Bytes Allocated/Op
'stackalloc byte[10KB]' False False 1.0690 ns 0.1392 ns 1.0783 ns 1.0454 ns - - - 0.00
'stackalloc byte[10KB]' True True 1.0748 ns 0.1400 ns 1.0842 ns 1.0400 ns - - - 0.00
'stackalloc byte[10KB]' True False 1.0772 ns 0.1403 ns 1.0866 ns 1.0403 ns - - - 0.00
'stackalloc byte[10KB]' False True 1.0872 ns 0.1423 ns 1.1023 ns 1.0260 ns - - - 0.00
'new byte[10KB]' False True 108.5197 ns 14.1317 ns 109.4635 ns 105.7219 ns 24,839.00 - - 10,048.00
'new byte[10KB]' False False 109.8449 ns 14.3067 ns 110.8190 ns 106.2894 ns 25,045.00 - - 10,048.00
'new byte[10KB]' True True 154.6476 ns 20.1560 ns 156.1277 ns 148.9501 ns 1,114.00 - - 10,042.00
'new byte[10KB]' True False 158.9589 ns 20.7968 ns 161.0914 ns 142.1361 ns 1,326.00 - - 10,042.00

@adamsitnik
Copy link
Member Author

@mattwarren Thanks for the review and feedback!

There is one thing that I am not sure how to solve, please consider following scenario:

User wants to compare two different methods in terms of CPU and memory. For time and bytes allocated/op we give results that can be compared because they are = sum / operationsCount.

However this is not true for Gen 0, 1 & 2 stats. Two benchmarks might be executed different amount of times, but we don't scale the GC collections count. Let's say first benchmark is executed 10k times and has 100 Gen 0, the other 20k times and also has 100 Gen 0. So if the user takes a look at the GC column he or she might think that it has the same GC pressure, but the one executed 10k times is actually two times worse.

@mattwarren @AndreyAkinshin We should most probably scale the results. What do you think? How could we call such a column?

@xoofx
Copy link
Member

xoofx commented Nov 3, 2016

Yes, a total amount for Gen0/Gen1/Gen2 doesn't make sense, because you can't compare them when op count is changing (typically issue #133) and should be converted to a gcCount/op instead.

@adamsitnik
Copy link
Member Author

Ok, I have updated the code. It's very stable now and fixes #133 Sample output:

                  Method | GcServer | GcForce |        Mean |     StdErr |      StdDev |      Median | Gen 0/op | Gen 1/op | Gen 2/op | Bytes Allocated/Op |
------------------------ |--------- |-------- |------------ |----------- |------------ |------------ |--------- |--------- |--------- |------------------- |
 'stackalloc byte[10KB]' |    False |    True |   1.2132 ns |  0.1581 ns |   1.2243 ns |   1.1721 ns |        - |        - |        - |                  0 |
 'stackalloc byte[10KB]' |    False |   False |   1.2206 ns |  0.1591 ns |   1.2323 ns |   1.1408 ns |        - |        - |        - |                  0 |
 'stackalloc byte[10KB]' |     True |   False |   1.2342 ns |  0.1608 ns |   1.2459 ns |   1.1820 ns |        - |        - |        - |                  0 |
 'stackalloc byte[10KB]' |     True |    True |   1.2348 ns |  0.1616 ns |   1.2517 ns |   1.1148 ns |        - |        - |        - |                  0 |
        'new byte[10KB]' |    False |    True | 131.1965 ns | 17.2203 ns | 133.3878 ns | 118.7028 ns | 0.003157 |        - |        - |             10,048 |
        'new byte[10KB]' |    False |   False | 131.3826 ns | 17.1396 ns | 132.7630 ns | 125.7887 ns | 0.003185 |        - |        - |             10,048 |
        'new byte[10KB]' |     True |   False | 234.2503 ns | 32.6858 ns | 253.1828 ns | 159.1990 ns | 0.000168 |        - |        - |             10,042 |
        'new byte[10KB]' |     True |    True | 250.3041 ns | 35.5450 ns | 275.3304 ns | 150.8811 ns | 0.000118 |        - |        - |             10,042 |

@xoofx
Copy link
Member

xoofx commented Nov 3, 2016

@adamsitnik amazing! Wondering if we should scale Genx/op by 1k op... we will most likely always get a gc after a certain amount of ops and barely for only 1 (through this could happen with a for loop with lots of alloc...)

@mattwarren
Copy link
Contributor

Wondering if we should scale Genx/op by 1k op

Agree, I was going to propose something similar. Scaling GenX to per/op will give pretty small values. I think that 'GenX/1k op' should be okay for most/all scenarios.

@adamsitnik
Copy link
Member Author

@xoofx @mattwarren Thanks for the ideas!

Initially I started with / 1k op:

                  Method | GcServer | GcForce |        Mean |     StdErr |      StdDev |      Median | Gen 0/1k op | Gen 1/1k op | Gen 2/1k op | Bytes Allocated/Op |
------------------------ |--------- |-------- |------------ |----------- |------------ |------------ |------------ |------------ |------------ |------------------- |
 'stackalloc byte[10KB]' |    False |    True |   1.1611 ns |  0.1513 ns |   1.1719 ns |   1.1014 ns |           - |           - |           - |                  0 |
 'stackalloc byte[10KB]' |     True |    True |   1.1684 ns |  0.1522 ns |   1.1791 ns |   1.0974 ns |           - |           - |           - |                  0 |
 'stackalloc byte[10KB]' |    False |   False |   1.1904 ns |  0.1554 ns |   1.2034 ns |   1.0478 ns |           - |           - |           - |                  0 |
 'stackalloc byte[10KB]' |     True |   False |   1.1938 ns |  0.1560 ns |   1.2085 ns |   1.0838 ns |           - |           - |           - |                  0 |
        'new byte[10KB]' |    False |   False | 118.1717 ns | 15.3887 ns | 119.2000 ns | 114.3850 ns |    3.184636 |           - |           - |             10,048 |
        'new byte[10KB]' |    False |    True | 119.3012 ns | 15.5924 ns | 120.7781 ns | 111.0194 ns |    3.158442 |           - |           - |             10,048 |
        'new byte[10KB]' |     True |    True | 193.8840 ns | 25.3369 ns | 196.2585 ns | 173.9980 ns |    0.140254 |           - |           - |             10,042 |
        'new byte[10KB]' |     True |   False | 238.4089 ns | 77.1975 ns | 597.9696 ns | 105.6934 ns |    0.167592 |           - |           - |             10,042 |

but then I decided to try the per mille placeholder:

                  Method | GcServer | GcForce |        Mean |     StdErr |      StdDev |      Median |  Gen 0 | Gen 1 | Gen 2 | Bytes Allocated/Op |
------------------------ |--------- |-------- |------------ |----------- |------------ |------------ |------- |------ |------ |------------------- |
 'stackalloc byte[10KB]' |    False |   False |   1.1704 ns |  0.1525 ns |   1.1811 ns |   1.0855 ns |      - |     - |     - |                  0 |
 'stackalloc byte[10KB]' |    False |    True |   1.1802 ns |  0.1543 ns |   1.1954 ns |   1.0718 ns |      - |     - |     - |                  0 |
 'stackalloc byte[10KB]' |     True |    True |   1.1890 ns |  0.1550 ns |   1.2007 ns |   1.0957 ns |      - |     - |     - |                  0 |
 'stackalloc byte[10KB]' |     True |   False |   1.1931 ns |  0.1555 ns |   1.2045 ns |   1.1226 ns |      - |     - |     - |                  0 |
        'new byte[10KB]' |    False |    True | 116.5610 ns | 15.1830 ns | 117.6073 ns | 113.6389 ns | 3.158‰ |     - |     - |             10,048 |
        'new byte[10KB]' |    False |   False | 122.1617 ns | 15.9805 ns | 123.7846 ns | 114.1776 ns | 3.185‰ |     - |     - |             10,048 |
        'new byte[10KB]' |     True |   False | 198.6071 ns | 27.4427 ns | 212.5701 ns | 162.8640 ns | 0.168‰ |     - |     - |             10,042 |
        'new byte[10KB]' |     True |    True | 271.9210 ns | 54.2128 ns | 419.9302 ns | 101.6848 ns | 0.119‰ |     - |     - |             10,042 |

What do you think? Which option is better? Personally I like the approach but I am afraid that people reading the console output might mislead with %

… + reduce the column's name length (everything is per operation now)
@mattwarren
Copy link
Contributor

mattwarren commented Nov 4, 2016

If find the character really hard to read, I had to set my browser zoom to 200% before I could make it out! I don't know what to suggest as a good unit of measure though?!

Either way, I think it would be good to add a note in the "Diagnostic Output" section explaining how the calculations are done, something like:

        'new byte[10KB]' |     True |    True | 193.8840 ns | 25.3369 ns | 196.2585 ns | 173.9980 ns |    0.140254 |           - |           - |             10,042 |
        'new byte[10KB]' |     True |   False | 238.4089 ns | 77.1975 ns | 597.9696 ns | 105.6934 ns |    0.167592 |           - |           - |             10,042 |

// * Diagnostic Output - MemoryDiagnoser *
Note: the Gen 0/1/2/ Measurements are per ???? Operations

Conflicts:
	samples/BenchmarkDotNet.Samples/Intro/IntroGcMode.cs
	src/BenchmarkDotNet.Core/Engines/Engine.cs
	src/BenchmarkDotNet.Core/Engines/RunResults.cs
	tests/BenchmarkDotNet.IntegrationTests/CustomEngineTests.cs
	tests/BenchmarkDotNet.IntegrationTests/MemoryDiagnoserTests.cs
private static Func<long> GetAllocatedBytesForCurrentThread()
{
// for some versions of .NET Core this method is internal,
// for some public and for others public and exposed ;)

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@adamsitnik
Copy link
Member Author

@AndreyAkinshin @mattwarren I have finished to work on this PR. Could you do a code review?

@AndreyAkinshin We have no beta package dependency now, we can release 0.10.1 to nuget.org. Due update to netcoreapp1.1 we need new SDK to be installed, that's why tests on appveyour mail fail until they install it.

@adamsitnik adamsitnik mentioned this pull request Nov 18, 2016
@AndreyAkinshin
Copy link
Member

@adamsitnik, nice job! Will do a review on this weekend.

@AndreyAkinshin
Copy link
Member

@adamsitnik, everything looks great! It prints correct results even on Linux. However, I have some additional minor requests:

  • What should happened when you are working with MemoryDiagnoser+MonoJob? For now, IntroGcMode throws a strange exception when we are trying to run it against mono. Could we print a friendly message in this case?
  • I'm not sure about hardcoded 1k-scaling for GCCollectionColumn. We could have a method with 0.0001 collections per operation and a method with 1000 collections per method. I think, we need here an adaptive logic for scaling (the same we have in TimeUnit; we choose ns/us/ms/s based on obtained measurements).
  • Why do we always print bytes in the AllocationColumn? What if a method allocates KB or MB (it's a usual situation in macrobenchmarking). Could we also use the adaptive approach here? (Check out the implementation of TimeUnit/TimeInterval and FrequencyUnit/Frequency, probably we could do the same for Memory).
  • Could you also update the documentation?

@AndreyAkinshin
Copy link
Member

@adamsitnik: additional thoughts about columns/scaling/hints. We have the following problem: it's hard to explain the meaning of some columns with small amount of characters. Maybe each column could provide a Legend value which will be printed after the table (in a case when it's not empty). What do you think?

@adamsitnik
Copy link
Member Author

@AndreyAkinshin thanks for the review! I'll adopt to your comments.

I agree that we need some explanation, especially for Gen 0/1/2 columns where the contained values are not what they used to be anymore. I see what I can do.

Conflicts:
	src/BenchmarkDotNet.Core/Engines/RunResults.cs
	src/BenchmarkDotNet.Core/Reports/BenchmarkReport.cs
@adamsitnik
Copy link
Member Author

@AndreyAkinshin I updated the docs and addes smarter way for formatting allocated memory. But I am still not sure about scalling the Gen collection counts. Here the problem is that we have no unit :/

I tried to reproduce the Mono issue and it worked. Could you provide some more details? OS etc

           Method |        Mean |     StdErr |      StdDev |      Median |  Gen 0 | Allocated |
----------------- |------------ |----------- |------------ |------------ |------- |---------- |
 'new byte[10KB]' | 884.4896 ns | 46.3528 ns | 245.2762 ns | 776.4237 ns | 0.1183 |     10 kB |

@mattwarren
Copy link
Contributor

BTW, this is what JMH does, I'm not saying we should do this, but it might prompt some ideas!

image

@AndreyAkinshin
Copy link
Member

Here the problem is that we have no unit :/

Yes, it's a problem. =(

I tried to reproduce the Mono issue and it worked.

Ok, I will debug it myself.

BTW, this is what JMH does

@mattwarren, thanks for the input. Will think about it.

@AndreyAkinshin
Copy link
Member

AndreyAkinshin commented Nov 24, 2016

Guys, I don't know the best way to do it. Let's merge the PR, release new version, and try to use it.
@mattwarren, are you happy with the new MemoryDiagnoser?

@adamsitnik
Copy link
Member Author

@AndreyAkinshin I really like this idea!

@mattwarren
Copy link
Contributor

@mattwarren, are you happy with the new MemoryDiagnoser?

I'll have a play around with the latest version this weekend, but I've been using the code from this branch for the last week or so and it looks good to me!

@benaadams
Copy link
Member

⌚️ Should also resolve #301 ?

@adamsitnik
Copy link
Member Author

@benaadams Yes, exactly

@AndreyAkinshin AndreyAkinshin merged commit f1f2317 into master Nov 25, 2016
@AndreyAkinshin
Copy link
Member

Ok, it's merged. If everything is fine, I will release v0.10.1 on the next week.

@adamsitnik adamsitnik deleted the universalMemoryDiagnoser branch December 2, 2016 20:08
@mattwarren
Copy link
Contributor

Okay, I just noticed some interesting behaviour when using MonitoringTotalAllocatedMemorySize, if you run code like this:

public static void TestMonitoringTotalAllocatedMemorySize()
{
    AppDomain.MonitoringIsEnabled = true;

    // provoke JIT, static ctors etc (was allocating 1740 bytes with first call)
    var list = new List<string> { "stringA", "stringB" };
    list.Sort(); 
    var temp = new HashSet<string>();

    var currentDomain = AppDomain.CurrentDomain;

    var hashSetBefore = currentDomain.MonitoringTotalAllocatedMemorySize;
    var countersToUse = new [] {1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000, 100000, 250000, 500000};
    foreach (var counter in countersToUse)
    {
        var loopCounter = 0;
        for (int i = 0; i < counter; i++)
        {
            var test = new HashSet<string>();
            loopCounter += test.Count;
        }
    
        Thread.Sleep(10);
        var hashSetAfter = currentDomain.MonitoringTotalAllocatedMemorySize;
        var totalAlloc = hashSetAfter - hashSetBefore;
        Console.WriteLine(
            "HashSet<string>() = {0,8:N2} bytes (Total = {1,12:N0} bytes, Counter = {2,8:N0})",
            (double)totalAlloc / counter, totalAlloc, counter);
    }            
}

You get the following output:

HashSet<string>() =     0.00 bytes (Total =            0 bytes, Counter =        1)
HashSet<string>() =     0.00 bytes (Total =            0 bytes, Counter =        5)
HashSet<string>() =     0.00 bytes (Total =            0 bytes, Counter =       10)
HashSet<string>() =   327.68 bytes (Total =        8,192 bytes, Counter =       25)
HashSet<string>() =   163.84 bytes (Total =        8,192 bytes, Counter =       50)
HashSet<string>() =   163.84 bytes (Total =       16,384 bytes, Counter =      100)
HashSet<string>() =   131.07 bytes (Total =       32,768 bytes, Counter =      250)
HashSet<string>() =   131.07 bytes (Total =       65,536 bytes, Counter =      500)
HashSet<string>() =   131.07 bytes (Total =      131,072 bytes, Counter =    1,000)
HashSet<string>() =   117.96 bytes (Total =      294,912 bytes, Counter =    2,500)
HashSet<string>() =   122.88 bytes (Total =      614,400 bytes, Counter =    5,000)
HashSet<string>() =   125.34 bytes (Total =    1,253,376 bytes, Counter =   10,000)
HashSet<string>() =    76.56 bytes (Total =    7,655,696 bytes, Counter =  100,000)
HashSet<string>() =    94.62 bytes (Total =   23,655,112 bytes, Counter =  250,000)
HashSet<string>() =   111.32 bytes (Total =   55,662,136 bytes, Counter =  500,000)

Note the variance in the amt of bytes in the 1st column, which depends on the different values of Counter. Is seems like MonitoringTotalAllocatedMemorySize often (but not always) measures allocations in 8,192 byte (8K) increments (see Total =), which seems suspiciously close to the Allocation quantum the GC uses, from Design of Allocator:

The Allocation quantum is the size of memory that the allocator allocates each time it needs more memory, in order to perform object allocations within an allocation context. The allocation is typically 8k and the average size of managed objects are around 35 bytes, enabling a single allocation quantum to be used for many object allocations.

Now I know that we generally do 1000's of iterations, so it might not be a problem, but I just wanted to raise it and see what others thought?

For instance in the test above we don't get a consistent value for how many bytes are allocated when you create an empty HashSet<T>?

@adamsitnik
Copy link
Member Author

I have encountered similar problem when I was implementing that and found this interesting note:

"This instance Int64 property returns the number of bytes that have been allocated by a specific AppDomain. The number is accurate as of the last garbage collection." - CLR via C#

@mattwarren Could you try to enforece GC.Collect first? Like we do here

@mattwarren
Copy link
Contributor

I tried adding GC.Collect() and it didn't seem to make any difference, current code in this gist?!

@adamsitnik
Copy link
Member Author

@mattwarren sorry for the late response, finally I had some time to try it.

Here is my code for both desktop and Core .NET.

Results for dotnet run -c Release -f net45

HashSet<string>() = 8.192,00 bytes (Total =        8.192 bytes, Counter =        1), Current =   44.816 bytes
HashSet<string>() = 1.638,40 bytes (Total =        8.192 bytes, Counter =        5), Current =   57.536 bytes
HashSet<string>() =   819,20 bytes (Total =        8.192 bytes, Counter =       10), Current =   57.536 bytes
HashSet<string>() =   327,68 bytes (Total =        8.192 bytes, Counter =       25), Current =   57.536 bytes
HashSet<string>() =   163,84 bytes (Total =        8.192 bytes, Counter =       50), Current =   57.536 bytes
HashSet<string>() =    81,92 bytes (Total =        8.192 bytes, Counter =      100), Current =   57.536 bytes
HashSet<string>() =    65,54 bytes (Total =       16.384 bytes, Counter =      250), Current =   57.536 bytes
HashSet<string>() =    65,54 bytes (Total =       32.768 bytes, Counter =      500), Current =   57.536 bytes
HashSet<string>() =    66,42 bytes (Total =       66.416 bytes, Counter =    1.000), Current =   57.536 bytes
HashSet<string>() =    65,89 bytes (Total =      164.720 bytes, Counter =    2.500), Current =   57.536 bytes
HashSet<string>() =    64,07 bytes (Total =      320.368 bytes, Counter =    5.000), Current =   57.536 bytes
HashSet<string>() =    64,80 bytes (Total =      648.048 bytes, Counter =   10.000), Current =   57.536 bytes
HashSet<string>() =    64,06 bytes (Total =    6.406.320 bytes, Counter =  100.000), Current =   57.536 bytes
HashSet<string>() =    64,03 bytes (Total =   16.007.608 bytes, Counter =  250.000), Current =   57.536 bytes
HashSet<string>() =    64,01 bytes (Total =   32.007.024 bytes, Counter =  500.000), Current =   57.536 bytes

Results for dotnet run -c Release -f netcoreapp1.1:

HashSet<string>() = 6.560,00 bytes (Total =        6.560 bytes, Counter =        1), Current =   54.016 bytes
HashSet<string>() = 1.633,60 bytes (Total =        8.168 bytes, Counter =        5), Current =   58.480 bytes
HashSet<string>() =   816,80 bytes (Total =        8.168 bytes, Counter =       10), Current =   58.480 bytes
HashSet<string>() =   326,72 bytes (Total =        8.168 bytes, Counter =       25), Current =   58.480 bytes
HashSet<string>() =   163,36 bytes (Total =        8.168 bytes, Counter =       50), Current =   58.480 bytes
HashSet<string>() =    81,68 bytes (Total =        8.168 bytes, Counter =      100), Current =   58.480 bytes
HashSet<string>() =    65,34 bytes (Total =       16.336 bytes, Counter =      250), Current =   58.480 bytes
HashSet<string>() =    61,18 bytes (Total =       30.592 bytes, Counter =      500), Current =   58.480 bytes
HashSet<string>() =    63,26 bytes (Total =       63.264 bytes, Counter =    1.000), Current =   58.480 bytes
HashSet<string>() =    57,98 bytes (Total =      144.944 bytes, Counter =    2.500), Current =   58.480 bytes
HashSet<string>() =    56,76 bytes (Total =      283.800 bytes, Counter =    5.000), Current =   58.480 bytes
HashSet<string>() =    56,15 bytes (Total =      561.512 bytes, Counter =   10.000), Current =   58.480 bytes
HashSet<string>() =    55,87 bytes (Total =    5.586.944 bytes, Counter =  100.000), Current =   58.480 bytes
HashSet<string>() =    55,84 bytes (Total =   13.959.336 bytes, Counter =  250.000), Current =   58.480 bytes
HashSet<string>() =    55,84 bytes (Total =   27.918.672 bytes, Counter =  500.000), Current =   58.480 bytes

What helps us in the new MemoryDiagnoser is a lot of runs, no extra allocations and the fact that the result number is long, not double. So we cut the minimum overhead when doing long = long / long

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants