Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suppress resource data deduplication #492

Merged
merged 5 commits into from
Nov 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 71 additions & 44 deletions docs/guides/dotnet/advanced-pe-image-building.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,73 +7,77 @@ The easiest way to write a .NET module to the disk is by using the
module.Write(@"C:\Path\To\Output\Binary.exe");
```

This method is essentially a shortcut for invoking the
`ManagedPEImageBuilder` and `ManagedPEFileBuilder` classes, and will
completely reconstruct the PE image, serialize it into a PE file and
write the PE file to the disk.
Behind the scenes, this creates and invokes a
`ManagedPEImageBuilder` and a `ManagedPEFileBuilder` with their default
settings, and will completely reconstruct the PE image, serialize it into
a PE file and write the PE file to the disk.

While this is easy, and would probably work for most .NET module
processing, it does not provide much flexibility. To get more control
over the construction of the new PE image, it is therefore not
recommended to use a different overload of the `Write` method that takes
instances of `IPEImageBuilder` instead:
To get more control over the construction of the new PE image, we can use
and configure our own instance of an `IPEImageBuilder` instead:

``` csharp
var imageBuilder = new ManagedPEImageBuilder();

/* Configuration of imageBuilder here... */

module.Write(@"C:\Path\To\Output\Binary.exe", imageBuilder);
/* ... Configuration of imageBuilder here... */
```

Alternatively, it is possible to call `ModuleDefinition::ToPEImage` to
turn the module into a `PEImage` first, that can then later be
post-processed and transformed into a `PEFile` to write it to the disk:
After configuring, the builder can then be passed onto the `ModuleDefinition::Write`
method as a secondary parameter:

``` csharp
var imageBuilder = new ManagedPEImageBuilder();
module.Write(@"C:\Path\To\Output\Binary.exe", imageBuilder);
```

/* Configuration of imageBuilder here... */
It is also possible to call `ModuleDefinition::ToPEImage` to turn
the module into a `PEImage` first. This image can then be post-processed
and later transformed into a `PEFile` to write it to the disk:

// Construct image.
``` csharp
// Turn module into a new PE image.
var image = module.ToPEImage(imageBuilder);

// Write image to the disk.
/* ... Post processing of the PE image here ... */

// Construct a new PE file.
var fileBuilder = new ManagedPEFileBuilder();
var file = fileBuilder.CreateFile(image);

/* ... Post processing of the PE file here ... */

// Write PE file to disk.
file.Write(@"C:\Path\To\Output\Binary.exe");
```

To get even more control, it is possible to call the `CreateImage`
method from the image builder directly. This allows for inspecting all
build artifacts, as well as post-processing of the constructed PE image
before it is written to the disk.
To get access to additional build artifacts, such as new metadata tokens
and builder diagnostics, it is possible to call the `CreateImage`
method from the image builder directly, and inspect the resulting
`PEImageBuildResult` object:

``` csharp
var imageBuilder = new ManagedPEImageBuilder();

/* Configuration of imageBuilder here... */

// Construct image.
var result = imageBuilder.CreateImage(module);

/* Inspect build result ... */
/* ... Inspect build result here ... */

// Obtain constructed PE image.
var image = result.ConstructedImage;

/* Post processing of image happens here... */
/* ... Post processing of the PE image here ... */

// Write image to the disk.
// Construct a new PE file.
var fileBuilder = new ManagedPEFileBuilder();
var file = fileBuilder.CreateFile(image);

/* ... Post processing of the PE file here ... */

// Write PE file to disk.
file.Write(@"C:\Path\To\Output\Binary.exe");
```

This article explores various features about the `ManagedPEImageBuilder`
class.

## Token mappings
## Token Mappings

Upon constructing a new PE image for a module, members defined in the
module might be re-ordered. This can make post-processing of the PE
Expand All @@ -98,7 +102,7 @@ var mainMethodRow = result.ConstructedImage.DotNetDirectory.Metadata
.GetByRid(newToken.Rid);
```

## Preserving raw metadata structure
## Preserving Raw Metadata Structure

Some .NET modules are carefully crafted and rely on the raw structure of
all metadata streams. These kinds of modules often rely on one of the
Expand Down Expand Up @@ -135,7 +139,7 @@ blob data and all metadata tokens to type references:

``` csharp
var factory = new DotNetDirectoryFactory();
factory.MetadataBuilderFlags = MetadataBuilderFlags.PreserveBlobIndices
factory.MetadataBuilderFlags = MetadataBuilderFlags.PreserveBlobIndices
| MetadataBuilderFlags.PreserveTypeReferenceIndices;
imageBuilder.DotNetDirectoryFactory = factory;
```
Expand All @@ -159,7 +163,7 @@ imageBuilder.DotNetDirectoryFactory = factory;
> `#~` to `#-`, and the file size might increase.


## String folding in #Strings stream
## String Folding in #Strings Stream

Named metadata members (such as types, methods and fields) are assigned
a name by referencing a string in the `#Strings` stream by its starting
Expand Down Expand Up @@ -194,7 +198,27 @@ factory.MetadataBuilderFlags |= MetadataBuilderFlags.NoStringsStreamOptimization
> However, it will still try to reuse these original strings as much as
> possible.

## Preserving maximum stack depth
## Deduplication of Embedded Resource Data

By default, when adding two embedded resources to a file with identical
contents, AsmResolver will not add the second copy of the data to the
output file and instead reuse the first blob. This can drastically
reduce the size of the final output file, especially for larger applications
with many (small) identical resource files (e.g., many Windows Forms
Applications).

While supported by most implementations of the .NET runtime, some assembly
post-processors (e.g., obfuscators) may not work well with this or depend
on individual resource items to be present.

To stop AsmResolver from performing this optimization, specify the
`NoResourceDataDeduplication` metadata builder flag:

``` csharp
factory.MetadataBuilderFlags |= MetadataBuilderFlags.NoResourceDataDeduplication;
```

## Preserving Maximum Stack Depth

CIL method bodies work with a stack, and the stack has a pre-defined
size. This pre-defined size is defined by the `MaxStack` property of the
Expand All @@ -205,13 +229,16 @@ disk. However, this is not always desirable.
To override this behaviour, set `ComputeMaxStackOnBuild` to `false` on
all method bodies to exclude in the maximum stack depth calculation.

Alternatively, if you want to force the maximum stack depths should be
either preserved or recalculated, it is possible to provide a custom
implemenmtation of the `IMethodBodySerializer`, or configure the
`CilMethodBodySerializer`.
``` csharp
MethodDefinition method = ...
method.CilMethodBody.ComputeMaxStackOnBuild = false;
```

Below an example on how to preserve maximum stack depths for all methods
in the assembly:
Alternatively, if you want to force the maximum stack depths should be
either preserved or recalculated for **all** methods defined in the target
assembly, it is possible to provide a custom implementation of the
`IMethodBodySerializer`, or set up a new `CilMethodBodySerializer` with
the `ComputeMaxStackOnBuildOverride` property set to any overriding value:

``` csharp
DotNetDirectoryFactory factory = ...;
Expand All @@ -225,7 +252,7 @@ factory.MethodBodySerializer = new CilMethodBodySerializer
> Disabling max stack computation may have unexpected side-effects (such
> as rendering certain CIL method bodies invalid).

## Strong name signing
## Strong Name Signing

Assemblies can be signed with a strong-name signature. Open a strong
name private key from a file:
Expand Down Expand Up @@ -295,8 +322,8 @@ imageBuilder.ErrorListener = EmptyErrorListener.Instance;
> [!NOTE]
> Setting an instance of `IErrorListener` in the image builder will only
> affect the building process. If the input module is initialized from a
> file containing invalid metadata, you may still experience reader
> errors, even if an `EmptyErrorListener` is specified. See
> file containing invalid metadata, **you may still experience reader
> errors, even if an `EmptyErrorListener` is specified to the builder**. See
> [Advanced Module Reading](advanced-module-reading.md) for
> handling reader diagnostics.

Expand Down
31 changes: 22 additions & 9 deletions src/AsmResolver.DotNet/Builder/DotNetDirectoryBuffer.MemberTree.cs
Original file line number Diff line number Diff line change
Expand Up @@ -74,31 +74,43 @@ public void FinalizeModule(ModuleDefinition module)

AddFileReferencesInModule(module);
AddExportedTypesInModule(module);
AddResourcesInModule(module);
AddCustomAttributes(token, module);
}

private void AddResourcesInModule(ModuleDefinition module)
/// <summary>
/// Adds a collection of manifest resources to the directory buffer.
/// </summary>
/// <param name="resources">The resources to add.</param>
/// <param name="deduplicateData">
/// <c>true</c> if resource data can be reused when identical, <c>false</c> when each embedded resource should
/// get its own data offset.
/// </param>
public void DefineManifestResources(IEnumerable<ManifestResource> resources, bool deduplicateData = true)
{
for (int i = 0; i < module.Resources.Count; i++)
AddManifestResource(module.Resources[i]);
foreach (var resource in resources)
DefineManifestResource(resource, deduplicateData);
}

/// <summary>
/// Adds a single manifest resource to the buffer.
/// </summary>
/// <param name="resource">The resource to add.</param>
/// <param name="deduplicateData">
/// <c>true</c> if resource data can be reused when identical, <c>false</c> when each embedded resource should
/// get its own data offset.
/// </param>
/// <returns>The new metadata token of the resource.</returns>
public MetadataToken AddManifestResource(ManifestResource resource)
public MetadataToken DefineManifestResource(ManifestResource resource, bool deduplicateData = true)
{
uint offset = resource.Offset;
if (resource.IsEmbedded)
{
if (resource.EmbeddedDataSegment is {} segment)
if (resource.EmbeddedDataSegment is { } segment)
{
using var stream = new MemoryStream();
segment.Write(new BinaryStreamWriter(stream));
offset = Resources.GetResourceDataOffset(stream.ToArray());
byte[] data = segment.WriteIntoArray();
offset = deduplicateData
? Resources.GetResourceDataOffset(data)
: Resources.AppendLengthPrefixedData(data);
}
else
{
Expand All @@ -117,6 +129,7 @@ public MetadataToken AddManifestResource(ManifestResource resource)
var token = table.Add(row);
_tokenMapping.Register(resource, token);
AddCustomAttributes(token, resource);

return token;
}

Expand Down
5 changes: 5 additions & 0 deletions src/AsmResolver.DotNet/Builder/DotNetDirectoryFactory.cs
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,11 @@ public virtual DotNetDirectoryBuildResult CreateDotNetDirectory(
if (module.Assembly?.ManifestModule == module)
buffer.DefineAssembly(module.Assembly);

// Add resources (if any).
buffer.DefineManifestResources(
module.Resources,
(MetadataBuilderFlags & MetadataBuilderFlags.NoResourceDataDeduplication) == 0);

// Finalize module.
buffer.FinalizeModule(module);

Expand Down
13 changes: 13 additions & 0 deletions src/AsmResolver.DotNet/Builder/MetadataBuilderFlags.cs
Original file line number Diff line number Diff line change
Expand Up @@ -155,5 +155,18 @@ public enum MetadataBuilderFlags
/// </para>
/// </summary>
NoStringsStreamOptimization = 0x20000,

/// <summary>
/// <para>
/// By default, when adding two embedded resources to a file with identical contents, AsmResolver will not
/// add the second copy of the data to the output file and instead reuse the first blob. This can drastically
/// reduce the size of the final output file.
/// </para>
/// <para>
/// While supported by the .NET runtime, some post-processors (e.g., obfuscators) may not work well with this
/// or depend on individual resource items to be present. Setting this flag will disable this optimization.
/// </para>
/// </summary>
NoResourceDataDeduplication = 0x40000
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,9 @@ public DotNetResourcesDirectoryBuffer()
/// <param name="data">The data to append.</param>
/// <returns>The index to the start of the data.</returns>
/// <remarks>
/// This method does not index the resource data. Calling <see cref="AppendRawData"/> or <see cref="GetResourceDataOffset(byte[])"/>
/// on the same data will append the data a second time.
/// This method does not index the resource data. Calling <see cref="AppendRawData"/>,
/// <see cref="AppendLengthPrefixedData"/> or <see cref="GetResourceDataOffset(byte[])"/> on the same data will
/// append the data a second time.
/// </remarks>
public uint AppendRawData(byte[] data)
{
Expand All @@ -44,6 +45,24 @@ public uint AppendRawData(byte[] data)
return offset;
}

/// <summary>
/// Appends raw data to the stream, prepending the data with a length.
/// </summary>
/// <param name="data">The data to append.</param>
/// <returns>The index to the start of the prefixed data.</returns>
/// <remarks>
/// This method does not index the resource data. Calling <see cref="AppendRawData"/>,
/// <see cref="AppendLengthPrefixedData"/> or <see cref="GetResourceDataOffset(byte[])"/> on the same data will
/// append the data a second time.
/// </remarks>
public uint AppendLengthPrefixedData(byte[] data)
{
uint offset = (uint) _rawStream.Length;
_writer.WriteUInt32((uint) data.Length);
AppendRawData(data);
return offset;
}

/// <summary>
/// Gets the index to the provided resource data. If the blob is not present in the buffer, it will be appended
/// to the end of the stream.
Expand All @@ -57,9 +76,7 @@ public uint GetResourceDataOffset(byte[]? data)

if (!_dataOffsets.TryGetValue(data, out uint offset))
{
offset = (uint) _rawStream.Length;
_writer.WriteUInt32((uint) data.Length);
AppendRawData(data);
offset = AppendLengthPrefixedData(data);
_dataOffsets.Add(data, offset);
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,13 @@ public override bool TryCreateManifestResourceReader(uint offset, out BinaryStre

reader = _reader.ForkRelative(offset);
uint length = reader.ReadUInt32();

if (!reader.CanRead(length))
{
reader = default;
return false;
}

reader = reader.ForkAbsolute(reader.Offset, length);
return true;
}
Expand Down
2 changes: 2 additions & 0 deletions test/AsmResolver.DotNet.Tests/AsmResolver.DotNet.Tests.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
<IsPackable>false</IsPackable>

<Nullable>disable</Nullable>

<LangVersion>11</LangVersion>
</PropertyGroup>

<ItemGroup>
Expand Down
Loading