Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unnecessary allocations in Hash task. #7162

Merged
merged 11 commits into from
Jan 21, 2022
101 changes: 79 additions & 22 deletions src/Tasks/Hash.cs
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@ namespace Microsoft.Build.Tasks
/// </remarks>
public class Hash : TaskExtension
{
private const char ItemSeparatorCharacter = '\u2028';

/// <summary>
/// Items from which to generate a hash.
/// </summary>
Expand All @@ -37,61 +35,120 @@ public class Hash : TaskExtension
[Output]
public string HashResult { get; set; }

private static readonly char ItemSeparatorCharacter = '\u2028';
rokonec marked this conversation as resolved.
Show resolved Hide resolved

private static readonly Encoding encoding = Encoding.UTF8;
AR-May marked this conversation as resolved.
Show resolved Hide resolved

private static readonly byte[] ItemSeparatorCharacterBytes = encoding.GetBytes(new char[] { ItemSeparatorCharacter });
rokonec marked this conversation as resolved.
Show resolved Hide resolved

Forgind marked this conversation as resolved.
Show resolved Hide resolved
private const int sha1BufferSize = 512;

private const int maxInputChunkLength = 256;

/// <summary>
/// Execute the task.
/// </summary>
public override bool Execute()
{
if (ItemsToHash?.Length > 0)
if (ItemsToHash != null && ItemsToHash.Length > 0)
AR-May marked this conversation as resolved.
Show resolved Hide resolved
{
using (var sha1 = SHA1.Create())
{
var concatenatedItemStringSize = ComputeStringSize(ItemsToHash);
// Buffer in which bytes of the strings are to be stored until their number reachs the limit size.
AR-May marked this conversation as resolved.
Show resolved Hide resolved
// Once the limit is reached, the sha1.TransformBlock is be run on all the bytes of this buffer.
AR-May marked this conversation as resolved.
Show resolved Hide resolved
byte[] sha1Buffer = System.Buffers.ArrayPool<byte>.Shared.Rent(sha1BufferSize);

var hashStringSize = sha1.HashSize;
int maxItemStringSize = encoding.GetMaxByteCount(Math.Min(ComputeMaxItemSpecLength(ItemsToHash), maxInputChunkLength));
AR-May marked this conversation as resolved.
Show resolved Hide resolved
byte[] bytesBuffer = System.Buffers.ArrayPool<byte>.Shared.Rent(maxItemStringSize);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am not completely sure that it is a good idea to rent a buffer from a pool.
We might just use allocations instead here and it will not be much worse.
What do you think of using System.Buffers in a task? Could it lead to any problems? For example, for users who would like to use this task somehow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They'd have to JIT something extra if they hadn't already...it looks like we use System.Buffers in InterningBinaryReader.cs and System.Buffers.Binary in NodeProviderOutOfProcBase.cs. I'm not sure if System.Buffers.Binary is a separate assembly, but if it is, maybe not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If System.Buffers is present (net6+), the BCL already uses it in many places, so I guess it's JITted already


using (var stringBuilder = new ReuseableStringBuilder(Math.Max(concatenatedItemStringSize, hashStringSize)))
int bufferLength = 0;
for (int i = 0; i < ItemsToHash.Length; i++)
{
foreach (var item in ItemsToHash)
string itemSpec = IgnoreCase ? ItemsToHash[i].ItemSpec.ToUpperInvariant() : ItemsToHash[i].ItemSpec;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible benefit of char-by-char copying and capitalizing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I do not yet have a good idea how I can make an efficient char-by-char in place capitalization, without extra allocations and in a way that code is short and clear. I am not sure this question is worth looking further, because when msbuild is using this task, these itemSpec are just file paths and not huge strings.


// Slice the itemSpec string into chunks of reasonable size and add them to sha1 buffer.
int itemSpecLength = itemSpec.Length;
int itemSpecPosition = 0;
while (itemSpecLength > 0)
AR-May marked this conversation as resolved.
Show resolved Hide resolved
{
string itemSpec = item.ItemSpec;
stringBuilder.Append(IgnoreCase ? itemSpec.ToUpperInvariant() : itemSpec);
stringBuilder.Append(ItemSeparatorCharacter);
int charsToProcess = Math.Min(itemSpecLength, maxInputChunkLength);
int bytesNumber = encoding.GetBytes(itemSpec, itemSpecPosition, charsToProcess, bytesBuffer, 0);
itemSpecPosition += charsToProcess;
itemSpecLength -= charsToProcess;

AddBytesToSha1Buffer(sha1, sha1Buffer, ref bufferLength, sha1BufferSize, bytesBuffer, bytesNumber);
}

var hash = sha1.ComputeHash(Encoding.UTF8.GetBytes(stringBuilder.ToString()));
AddBytesToSha1Buffer(sha1, sha1Buffer, ref bufferLength, sha1BufferSize, ItemSeparatorCharacterBytes, ItemSeparatorCharacterBytes.Length);
AR-May marked this conversation as resolved.
Show resolved Hide resolved
}

stringBuilder.Clear();
sha1.TransformFinalBlock(sha1Buffer, 0, bufferLength);

foreach (var b in hash)
using (var stringBuilder = new ReuseableStringBuilder(sha1.HashSize))
{
foreach (var b in sha1.Hash)
{
stringBuilder.Append(b.ToString("x2"));
AR-May marked this conversation as resolved.
Show resolved Hide resolved
}

HashResult = stringBuilder.ToString();
}
}
}

return true;
}

private int ComputeStringSize(ITaskItem[] itemsToHash)
/// <summary>
/// Add bytes to the sha1 buffer. Once the limit size is reached, sha1.TransformBlock is called and the buffer is flushed.
/// </summary>
/// <param name="sha1">Hashing algorithm sha1.</param>
/// <param name="sha1Buffer">The sha1 buffer which stores bytes of the strings. Bytes should be added to this buffer.</param>
/// <param name="sha1BufferLength">Number of used bytes of the sha1 buffer.</param>
/// <param name="sha1BufferSize">The size of sha1 buffer.</param>
/// <param name="bytesBuffer">Bytes buffer which contains bytes to be written to sha1 buffer.</param>
/// <param name="bytesNumber">Amount of bytes that are to be added to sha1 buffer.</param>
/// <returns></returns>
private bool AddBytesToSha1Buffer(SHA1 sha1, byte[] sha1Buffer, ref int sha1BufferLength, int sha1BufferSize, byte[] bytesBuffer, int bytesNumber)
{
if (itemsToHash.Length == 0)
int bytesProcessedNumber = 0;
AR-May marked this conversation as resolved.
Show resolved Hide resolved
while (sha1BufferLength + bytesNumber >= sha1BufferSize)
{
return 0;
int sha1BufferFreeSpace = sha1BufferSize - sha1BufferLength;

if (sha1BufferLength == 0)
{
// If sha1 buffer is empty and bytes number is big enough there is no need to copy bytes to sha1 buffer.
// Pass the bytes to TransformBlock right away.
sha1.TransformBlock(bytesBuffer, 0, sha1BufferSize, null, 0);
}
else
{
Array.Copy(bytesBuffer, bytesProcessedNumber, sha1Buffer, sha1BufferLength, sha1BufferFreeSpace);
sha1.TransformBlock(sha1Buffer, 0, sha1BufferSize, null, 0);
sha1BufferLength = 0;
}

bytesProcessedNumber += sha1BufferFreeSpace;
bytesNumber -= sha1BufferFreeSpace;
}

var totalItemSize = 0;
Array.Copy(bytesBuffer, bytesProcessedNumber, sha1Buffer, sha1BufferLength, bytesNumber);
sha1BufferLength += bytesNumber;

return true;
AR-May marked this conversation as resolved.
Show resolved Hide resolved
}

private int ComputeMaxItemSpecLength(ITaskItem[] itemsToHash)
{
int maxItemSpecLength = 0;
foreach (var item in itemsToHash)
{
totalItemSize += item.ItemSpec.Length;
if (item.ItemSpec.Length > maxItemSpecLength)
{
maxItemSpecLength = item.ItemSpec.Length;
}
}

// Add one ItemSeparatorCharacter per item
return totalItemSize + itemsToHash.Length;
return maxItemSpecLength;
}
}
}