Memory Randomization #1587

adamsitnik · 2020-11-06T16:01:44Z

A very simple implementation of #1513 that I hope is going to help us answer the question of whether we should invest more in this direction.

I am creating the PR just to have a NuGet package published by our CI so others can easily give it a try.

Sample benchmark:

public class IntroMemoryRandomization
{
    [Params(512 * 4)]
    public int Size;

    private int[] _array;
    private int[] _destination;

    [GlobalSetup]
    public void Setup()
    {
        _array = new int[Size];
        _destination = new int[Size];
    }

    [Benchmark]
    public void Array() => System.Array.Copy(_array, _destination, Size);
}

Default settings

dotnet run -c Release -f netcoreapp2.1 --filter IntroMemoryRandomization

-------------------- Histogram --------------------
[502.859 ns ; 508.045 ns) | @@@@@@@@@@@@@@@
---------------------------------------------------

MemoryRandomization set to true and Default Outliers setting (remove upper)

dotnet run -c Release -f netcoreapp2.1 --filter IntroMemoryRandomization --memoryRandomization true --maxIterationCount 50

-------------------- Histogram --------------------
[117.514 ns ; 203.847 ns) | @@@@@@@@@@@@
[203.847 ns ; 287.079 ns) |
[287.079 ns ; 362.172 ns) |
[362.172 ns ; 445.404 ns) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@
---------------------------------------------------

MemoryRandomization set to true and custom Outliers setting: don't remove any

dotnet run -c Release -f netcoreapp2.1 --filter IntroMemoryRandomization --memoryRandomization true --outliers DontRemove --maxIterationCount 50

-------------------- Histogram --------------------
[108.803 ns ; 213.537 ns) | @@@@@@@@@@@@@@@
[213.537 ns ; 315.458 ns) |
[315.458 ns ; 446.853 ns) | @@@@@@@@@@@@@@@@@@@@
[446.853 ns ; 559.259 ns) | @@@@@@@@@@@@@@@
---------------------------------------------------

@kunalspathak

…-size array between iterations and calls global setup after it

adamsitnik · 2020-11-06T16:04:35Z

With MemoryRandomization enabled and no outliers removal we get distribution (3 buckets|modes) similar to what we have gathered from multiple runs in the past: https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmaster_x64_Windows%2010.0.18362%2fSystem.Collections.CopyTo(Int32).Array(Size%3a%202048).html

adamsitnik · 2020-11-06T16:06:32Z

@kunalspathak if this build gets green (a matter of 30 minutes from now) then a new package 0.12.1.1459 should become available at our CI feed:

<packageSources>
  <add key="bdn-ci" value="https://ci.appveyor.com/nuget/benchmarkdotnet" />
</packageSources>

adamsitnik · 2020-11-06T16:21:28Z

cc @AndyAyersMS

AndyAyersMS · 2020-11-06T17:30:20Z

This looks really promising.

I wonder if we might need something more sophisticated eventually. We don't know is which GC heap(s) the benchmark is accessing. We can impact Gen0/LOH alignments but it's trickier to impact Gen1/Gen2. Stack alignment might also come into play -- perhaps the benchmark runner can do random-sized stackallocs too?

I suspect for Gen0 we only need very small alignments changes, perhaps just fractions of cache line sizes, though we might need to go all the way up to fractions of page sizes (though it is hard to imagine us doing enough iterations to really cover the space of possibilities here). For LOH likewise, some 85K+ base plus a cache-line-sized random amount on top.

This also tells us we might need to pay more attention to controlling alignment of some key data in the runtime (eg frequently accessed static arrays?) Worth thinking about, anyways.

…class

make sure that the random-sized object gets promoted to Gen 1 and Gen 2 allocate sth on LOH too

adamsitnik · 2020-11-07T15:07:21Z

We don't know is which GC heap(s) the benchmark is accessing.

By default every benchmark is single-threaded so it should not be a problem in most cases. When it comes to multithreaded benchmarks we could achieve that by for example having an affinitzed thread per core, but this would require much more work... I'll keep this in the back of my head and when it becomes a problem try to improve the solution.

We can impact Gen0/LOH alignments but it's trickier to impact Gen1/Gen2.

Very good point! I've modified the code and made sure that the object gets promoted to Gen 1 and Gen 2 by keeping it alive for two GC collections:

Debug.Assert(GC.GetGeneration(gen0object) == 0);
GC.Collect(0); // get it promoted to Gen 1
GC.Collect(1); // get it promoted to Gen 2

GC.KeepAlive(gen0object);

Stack alignment might also come into play -- perhaps the benchmark runner can do random-sized stackallocs too?

And another great point! Edit: I've added a way to allocate stack memory and keep it alive for iteration period

For LOH likewise, some 85K+ base plus a cache-line-sized random amount on top.

I have added that as well:

var lohObject = new byte[85 * 1024 + random.Next(32)];
Debug.Assert(GC.GetGeneration(lohObject) == 2);

GC.KeepAlive(lohObject);

This also tells us we might need to pay more attention to controlling alignment of some key data in the runtime

Maybe we should make the ArrayPool<byte>.Shared arrays being aligned by default as they are used everywhere in the runtime, BCL, and ASP.NET? Assuming that we already don't do that (cc @stephentoub @VSadov)

adamsitnik · 2020-11-07T16:40:21Z

If anyone wants to give it a try then you need to use the 0.12.1.1462 version from

<packageSources>
  <add key="bdn-ci" value="https://ci.appveyor.com/nuget/benchmarkdotnet" />
</packageSources>

AndyAyersMS · 2020-11-07T17:29:02Z

made sure that the object gets promoted to Gen 1 and Gen 2

I'm not sure if promoting will help or not... I guess my point was that the code being benchmarked may read from all sorts of objects and as far as I can tell we can't reliably randomize all their addresses.

What you had initially may end up working better, if benchmarks tend to read from objects allocated after the random object. And perhaps the random LOH allocation will help too. But influencing Gen1 / Gen2 addresses seems harder.

VSadov · 2020-11-07T21:09:41Z

I am not sure if targeting specific GC behaviors (i.e. promotions) is necessary. Besides some of those behaviors could change with different tunings, state of the machine, server vs. workstation GC, etc..

It could be fairly certain though that objects allocated together in a sequence will typically be allocated together and likely stay together even when relocated.

I think just allocating batches of objects of varying size should be sufficient, if I get the idea here.
Also, since LOH objects are separated, it would need to be handled separately.

Varying the size differences within cache line could be enough.

64bit is the largest granularity that GC would align by itself.
(SOH objects are aligned to 32bit on 32bit, with few exceptions where 64bit alignment used even on 32bit)

add initial implementation of experimental mode that allocates random…

8f96f4b

…-size array between iterations and calls global setup after it

adamsitnik added Area:Engine blocked labels Nov 6, 2020

adamsitnik added 5 commits November 7, 2020 14:55

don't remove outliers if Memory Randomization is enabled

fa89c10

introduce an attribute that allows to enable MemoryRandomization per …

c0820c6

…class

rename ForceAllocations to ForceGcCleanups to make it clear what it does

3182db0

add a call to [GlobalCleanup] before [GlobalSetu]

b1404ba

make sure that the random-sized object gets promoted to Gen 1 and Gen 2 allocate sth on LOH too

add stack memory randomization

dbe3778

keep the stack memory alive for the iteration period

8a65945

apply review suggestions

995cf58

adamsitnik mentioned this pull request Nov 9, 2020

add missing cleanups dotnet/performance#1586

Merged

add comments (mostly to trigger a CI run)

7caa3c5

This was referenced Nov 10, 2020

Refactor initialization logic to allow for enabling Memory Randomization dotnet/performance#1587

Merged

[Removed] dotnet/performance#1588

Closed

adamsitnik mentioned this pull request Nov 18, 2020

Memory Randomization dotnet/performance#1602

Open

6 tasks

adamsitnik changed the title ~~[Experiment][No Merge] Memory Randomization~~ Memory Randomization Nov 18, 2020

AndreyAkinshin approved these changes Jan 20, 2021

View reviewed changes

adamsitnik added this to the v0.13.0 milestone Jan 20, 2021

adamsitnik merged commit d5f7b9f into master Jan 20, 2021

adamsitnik deleted the randomMemory branch January 20, 2021 16:14

adamsitnik removed the blocked label Jan 20, 2021

adamsitnik mentioned this pull request Jan 20, 2021

Make memory alignment more random #1513

Closed

adamsitnik mentioned this pull request Mar 9, 2021

Inconsistent results between BenchmarkSwitcher and BenchmarkRunner #1665

Closed

danmoseley mentioned this pull request Apr 28, 2021

[Perf -11%] System.Buffers.Tests.ReadOnlySequenceTests<Char>.IterateGetPositionTenSegments dotnet/runtime#47866

Closed

adamsitnik mentioned this pull request Jun 28, 2021

unknown Bug/NotBug about span benchmarks #1733

Closed

kunalspathak mentioned this pull request Aug 31, 2021

Perf regression in string handling / culture ICU dotnet/runtime#58029

Closed

adamsitnik mentioned this pull request Nov 17, 2021

[Perf -16%] System.Collections.CopyTo<String> (6) dotnet/runtime#37814

Closed

adamsitnik mentioned this pull request Oct 6, 2022

2X different results from identical copy-pasted code #2138

Closed

adamsitnik mentioned this pull request May 29, 2023

Randomization of benchmark's Arguments #2314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Randomization #1587

Memory Randomization #1587

adamsitnik commented Nov 6, 2020

adamsitnik commented Nov 6, 2020

adamsitnik commented Nov 6, 2020

adamsitnik commented Nov 6, 2020

AndyAyersMS commented Nov 6, 2020

adamsitnik commented Nov 7, 2020 •

edited

Loading

adamsitnik commented Nov 7, 2020

AndyAyersMS commented Nov 7, 2020

VSadov commented Nov 7, 2020

Memory Randomization #1587

Memory Randomization #1587

Conversation

adamsitnik commented Nov 6, 2020

Default settings

MemoryRandomization set to true and Default Outliers setting (remove upper)

MemoryRandomization set to true and custom Outliers setting: don't remove any

adamsitnik commented Nov 6, 2020

adamsitnik commented Nov 6, 2020

adamsitnik commented Nov 6, 2020

AndyAyersMS commented Nov 6, 2020

adamsitnik commented Nov 7, 2020 • edited Loading

adamsitnik commented Nov 7, 2020

AndyAyersMS commented Nov 7, 2020

VSadov commented Nov 7, 2020

adamsitnik commented Nov 7, 2020 •

edited

Loading