Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Randomization #1587

Merged
merged 9 commits into from
Jan 20, 2021
Merged

Memory Randomization #1587

merged 9 commits into from
Jan 20, 2021

Conversation

adamsitnik
Copy link
Member

A very simple implementation of #1513 that I hope is going to help us answer the question of whether we should invest more in this direction.

I am creating the PR just to have a NuGet package published by our CI so others can easily give it a try.

Sample benchmark:

public class IntroMemoryRandomization
{
    [Params(512 * 4)]
    public int Size;

    private int[] _array;
    private int[] _destination;

    [GlobalSetup]
    public void Setup()
    {
        _array = new int[Size];
        _destination = new int[Size];
    }

    [Benchmark]
    public void Array() => System.Array.Copy(_array, _destination, Size);
}

Default settings

dotnet run -c Release -f netcoreapp2.1 --filter IntroMemoryRandomization
-------------------- Histogram --------------------
[502.859 ns ; 508.045 ns) | @@@@@@@@@@@@@@@
---------------------------------------------------

MemoryRandomization set to true and Default Outliers setting (remove upper)

dotnet run -c Release -f netcoreapp2.1 --filter IntroMemoryRandomization --memoryRandomization true --maxIterationCount 50
-------------------- Histogram --------------------
[117.514 ns ; 203.847 ns) | @@@@@@@@@@@@
[203.847 ns ; 287.079 ns) |
[287.079 ns ; 362.172 ns) |
[362.172 ns ; 445.404 ns) | @@@@@@@@@@@@@@@@@@@@@@@@@@@@@
---------------------------------------------------

MemoryRandomization set to true and custom Outliers setting: don't remove any

dotnet run -c Release -f netcoreapp2.1 --filter IntroMemoryRandomization --memoryRandomization true --outliers DontRemove --maxIterationCount 50
-------------------- Histogram --------------------
[108.803 ns ; 213.537 ns) | @@@@@@@@@@@@@@@
[213.537 ns ; 315.458 ns) |
[315.458 ns ; 446.853 ns) | @@@@@@@@@@@@@@@@@@@@
[446.853 ns ; 559.259 ns) | @@@@@@@@@@@@@@@
---------------------------------------------------

@kunalspathak

…-size array between iterations and calls global setup after it
@adamsitnik
Copy link
Member Author

With MemoryRandomization enabled and no outliers removal we get distribution (3 buckets|modes) similar to what we have gathered from multiple runs in the past: https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmaster_x64_Windows%2010.0.18362%2fSystem.Collections.CopyTo(Int32).Array(Size%3a%202048).html

obraz

@adamsitnik
Copy link
Member Author

@kunalspathak if this build gets green (a matter of 30 minutes from now) then a new package 0.12.1.1459 should become available at our CI feed:

<packageSources>
  <add key="bdn-ci" value="https://ci.appveyor.com/nuget/benchmarkdotnet" />
</packageSources>

@adamsitnik
Copy link
Member Author

cc @AndyAyersMS

@AndyAyersMS
Copy link
Member

This looks really promising.

I wonder if we might need something more sophisticated eventually. We don't know is which GC heap(s) the benchmark is accessing. We can impact Gen0/LOH alignments but it's trickier to impact Gen1/Gen2. Stack alignment might also come into play -- perhaps the benchmark runner can do random-sized stackallocs too?

I suspect for Gen0 we only need very small alignments changes, perhaps just fractions of cache line sizes, though we might need to go all the way up to fractions of page sizes (though it is hard to imagine us doing enough iterations to really cover the space of possibilities here). For LOH likewise, some 85K+ base plus a cache-line-sized random amount on top.

This also tells us we might need to pay more attention to controlling alignment of some key data in the runtime (eg frequently accessed static arrays?) Worth thinking about, anyways.

@adamsitnik
Copy link
Member Author

adamsitnik commented Nov 7, 2020

We don't know is which GC heap(s) the benchmark is accessing.

By default every benchmark is single-threaded so it should not be a problem in most cases. When it comes to multithreaded benchmarks we could achieve that by for example having an affinitzed thread per core, but this would require much more work... I'll keep this in the back of my head and when it becomes a problem try to improve the solution.

We can impact Gen0/LOH alignments but it's trickier to impact Gen1/Gen2.

Very good point! I've modified the code and made sure that the object gets promoted to Gen 1 and Gen 2 by keeping it alive for two GC collections:

Debug.Assert(GC.GetGeneration(gen0object) == 0);
GC.Collect(0); // get it promoted to Gen 1
GC.Collect(1); // get it promoted to Gen 2

GC.KeepAlive(gen0object);

Stack alignment might also come into play -- perhaps the benchmark runner can do random-sized stackallocs too?

And another great point! Edit: I've added a way to allocate stack memory and keep it alive for iteration period

For LOH likewise, some 85K+ base plus a cache-line-sized random amount on top.

I have added that as well:

var lohObject = new byte[85 * 1024 + random.Next(32)];
Debug.Assert(GC.GetGeneration(lohObject) == 2);

GC.KeepAlive(lohObject);

This also tells us we might need to pay more attention to controlling alignment of some key data in the runtime

Maybe we should make the ArrayPool<byte>.Shared arrays being aligned by default as they are used everywhere in the runtime, BCL, and ASP.NET? Assuming that we already don't do that (cc @stephentoub @VSadov)

@adamsitnik
Copy link
Member Author

If anyone wants to give it a try then you need to use the 0.12.1.1462 version from

<packageSources>
  <add key="bdn-ci" value="https://ci.appveyor.com/nuget/benchmarkdotnet" />
</packageSources>

@AndyAyersMS
Copy link
Member

made sure that the object gets promoted to Gen 1 and Gen 2

I'm not sure if promoting will help or not... I guess my point was that the code being benchmarked may read from all sorts of objects and as far as I can tell we can't reliably randomize all their addresses.

What you had initially may end up working better, if benchmarks tend to read from objects allocated after the random object. And perhaps the random LOH allocation will help too. But influencing Gen1 / Gen2 addresses seems harder.

@VSadov
Copy link
Member

VSadov commented Nov 7, 2020

I am not sure if targeting specific GC behaviors (i.e. promotions) is necessary. Besides some of those behaviors could change with different tunings, state of the machine, server vs. workstation GC, etc..

It could be fairly certain though that objects allocated together in a sequence will typically be allocated together and likely stay together even when relocated.

I think just allocating batches of objects of varying size should be sufficient, if I get the idea here.
Also, since LOH objects are separated, it would need to be handled separately.

Varying the size differences within cache line could be enough.

64bit is the largest granularity that GC would align by itself.
(SOH objects are aligned to 32bit on 32bit, with few exceptions where 64bit alignment used even on 32bit)

@adamsitnik adamsitnik changed the title [Experiment][No Merge] Memory Randomization Memory Randomization Nov 18, 2020
@adamsitnik adamsitnik added this to the v0.13.0 milestone Jan 20, 2021
@adamsitnik adamsitnik merged commit d5f7b9f into master Jan 20, 2021
@adamsitnik adamsitnik deleted the randomMemory branch January 20, 2021 16:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants