-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary allocations in Hash task. #7162
Conversation
Benchmark results
|
src/Tasks/Hash.cs
Outdated
|
||
var hashStringSize = sha1.HashSize; | ||
int maxItemStringSize = encoding.GetMaxByteCount(Math.Min(ComputeMaxItemSpecLength(ItemsToHash), maxInputChunkLength)); | ||
byte[] bytesBuffer = System.Buffers.ArrayPool<byte>.Shared.Rent(maxItemStringSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I am not completely sure that it is a good idea to rent a buffer from a pool.
We might just use allocations instead here and it will not be much worse.
What do you think of using System.Buffers
in a task? Could it lead to any problems? For example, for users who would like to use this task somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They'd have to JIT something extra if they hadn't already...it looks like we use System.Buffers in InterningBinaryReader.cs and System.Buffers.Binary in NodeProviderOutOfProcBase.cs. I'm not sure if System.Buffers.Binary is a separate assembly, but if it is, maybe not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If System.Buffers is present (net6+), the BCL already uses it in many places, so I guess it's JITted already
src/Tasks/Hash.cs
Outdated
|
||
var hashStringSize = sha1.HashSize; | ||
int maxItemStringSize = encoding.GetMaxByteCount(Math.Min(ComputeMaxItemSpecLength(ItemsToHash), maxInputChunkLength)); | ||
byte[] bytesBuffer = System.Buffers.ArrayPool<byte>.Shared.Rent(maxItemStringSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They'd have to JIT something extra if they hadn't already...it looks like we use System.Buffers in InterningBinaryReader.cs and System.Buffers.Binary in NodeProviderOutOfProcBase.cs. I'm not sure if System.Buffers.Binary is a separate assembly, but if it is, maybe not?
src/Tasks/Hash.cs
Outdated
{ | ||
foreach (var item in ItemsToHash) | ||
string itemSpec = IgnoreCase ? ItemsToHash[i].ItemSpec.ToUpperInvariant() : ItemsToHash[i].ItemSpec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible benefit of char-by-char copying and capitalizing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I do not yet have a good idea how I can make an efficient char-by-char in place capitalization, without extra allocations and in a way that code is short and clear. I am not sure this question is worth looking further, because when msbuild is using this task, these itemSpec
are just file paths and not huge strings.
Won't the BCL already use arraypool, hence it'd be jitted already? |
I am not sure about that. Do you know anybody who we could ask and figure this out? |
@danmoseley maybe? |
Yes the core libraries use Eventually I expect MSBuild could make use of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably sha1.TransformBlock
has a constant cost which led to introducing the intermediate buffer. Do you have numbers on the perf characteristics of TransformBlock
to back it up?
In other words, I am wondering how the code would perform without sha1Buffer
, so if you called TransformBlock
always, regardless of how small the incoming buffer is.
I definitely tried to write this without additional buffer in the first place, but it works longer in this case and makes me think about how to trade CPU for memory improvements. Adding this buffer I found a way not to think of it too much: both CPU and memory improves in this PR. I would need more time to repeat these measurements to show you numbers, but maybe you could check comments under previous PR about Hash function. There are such measurements too and they coincide with what I got. Their |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Two random thoughts:
- Would wrapping
sha1Buffer
with aMemoryStream
simplify the code or would it make it worse? - Do existing unit tests cover all code paths?
|
I ran the tests again: the failure was not related to my changes. Now the tests pass. |
Are you planning to incorporate any of these changes into this PR, or do you want it merged as-is and perhaps tackle some in a follow-up? (If follow-up, maybe make an issue for that?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I synced with @AR-May offline. I think it would be better to incorporate some of the changes into this PR once she gets back from vacation.
55d9346
to
19c60f4
Compare
src/Tasks.UnitTests/Hash_Tests.cs
Outdated
array[i] = new string[inputSize]; | ||
for (int j = 0; j < array[i].Length; j++) | ||
{ | ||
array[i][j] = $"Item{i}{j}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 1_000_000 iterations, is it? What is usual duration if this unit test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's 1 sec. It is less than the HashTaskDifferentImputSizesTest takes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The duration of these tests worries me though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 sec is too much, IMHO, can we get it under 100ms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is most unnecessary test of all. Maybe drop it at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, I would like to save the test with different sizes of input (HashTaskDifferentInputSizesTest
) despite it takes 3 sec, cause it tests a dangerous place where one can easily make a mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on trying to make the tests fast. I'd say 3 seconds is too much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got it. I will remove this test as not very necessary one and I will improve HashTaskDifferentInputSizesTest
. So far I got ~0.5 sec on my machine instead of 3.
d0462e3
to
67b0a90
Compare
src/Tasks.UnitTests/Hash_Tests.cs
Outdated
array[i] = new string[inputSize]; | ||
for (int j = 0; j < array[i].Length; j++) | ||
{ | ||
array[i][j] = $"Item{i}{j}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on trying to make the tests fast. I'd say 3 seconds is too much.
Fixes #7086
Context
Hash.Execute()
allocates a string which gets to the large object heap. This could be avoided without changing the resulting hash function.Changes Made
Hash function is rewritten.
Testing
Unit tests & manual testing