-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Conversation
|
||
// The extracted bits can be anywhere between 0 and 255, so we normalise the value to either 0 or 1 | ||
// to ensure compatibility with "C# bool" (0 for false, 1 for true, rest undefined) | ||
Vector256<byte> normalized = Avx2.Min(extracted, ones); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you don't do this kind of normalization for BitArray(bool[])
constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is handled by comparing the bytes with zero (checking if the bytes are false) then negating the result: 72477e7#diff-e2f01cf03382b7d63fc3a67ad77fcedcR140-R142
{ | ||
for (; (i + Vector256<byte>.Count) <= m_length; i += Vector256<byte>.Count) | ||
{ | ||
int bits = m_array[i / BitsPerInt32]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, you can load m_array as a Vector and spawn vectors for each integer
@@ -275,16 +309,34 @@ public unsafe BitArray And(BitArray value) | |||
if (Length != value.Length || (uint)count > (uint)thisArray.Length || (uint)count > (uint)valueArray.Length) | |||
throw new ArgumentException(SR.Arg_ArrayLengthsDiffer); | |||
|
|||
// Unroll loop for count less than Vector256 size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the vectorized version there's a sequential loop to process the remaining elements. Why not jump to this switch instead the loop?
(Of course, keep the loop if no Avx2 or Sse2 is available.)
1. Use AVX2, if available, for And/Or/Xor 2. Vectorize Not 3. Use Span<T>.Fill() for SetAll() 4. Add more test sizes to account for And/Or/Xor/Not loop unrolling cases
* Fix bugs present in BitArray(bool[]) * Vectorise CopyTo(Array, int) when copying to a bool[] * Add test data for random values & larger array
* Use Vector128/256.Create and store it in static readonly field instead of loading from PE header
@Gnbrkm41 thanks for your work on this PR. As you probably saw this repo will move to a new one so we are hoping to finish as many active PR's as possible by 11/13 so they don't have to be manually re-created. Are you able to keep moving this along? |
I'll push a few commits today; I've been investigating whether fetching and storing I think it'll be fine to get this merged with the current logic (that is, after I push my commits). Alternatively I'm also fine with digging more into it then just re-opening the PR after the consolidation. |
@tannergooding @BruceForstall could you please take a look? I would love to merge it before repo consolidation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good/correct to me.
Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:
|
@Gnbrkm41 thank you! |
Fixes https://github.com/dotnet/corefx/issues/41762 and https://github.com/dotnet/corefx/issues/37946
Related #39173
This PR continues from the previous PR from @BruceForstall (#39173) in an attempt to speed up various operations of BitArray by vectorisation and using AVX2 256-bit wide instructions.
The performance difference, compared to before the optimizations were applied are as following, when operating on arrays of size 4/512/32768 (Threshold 5%):
Regarding the slowdown of
BitArraySetAll
, I have re-run the benchmarks with various sizes to see at which point the new implementation outrun the current implementation.(Threshold 5%)
Which suggests that it may be faster for filling BitArray that contains more than 32 elements. One thing to note though, is that the numbers for small sizes seem to fluctuate around for small sizes, so I suppose the results may be inaccurate. (Even for this benchmark, I expect the numbers to be similar for Size 4/16/32, since they are all stored in one
int
and therefore should be just a single copy of an int; but they all seem to give different results)Furthermore, since the current implementation of
SetAll
operates on the whole of the backing array, this may result in unnecessary copying to unused area when theBitArray.Length
has been set to make the BitArray smaller but the backing array hasn't been resized due to the new length not meeting the_ShrinkThreshold
(inint
counts):corefx/src/System.Collections/src/System/Collections/BitArray.cs
Line 23 in 2b92fc0
The new implementation uses
GetInt32ArrayLengthFromBitLength(Length)
method to calculate where the used area are and only copies to that region. Unfortunately, since this happens for smaller sized arrays as well, this check on itself seem to results in approximately 0.7x slowdown when the array has less than 32 elements.Regarding the use of AVX2, I figured out that AVX2 generally improved the performance despite the concerns about downclocking. This is an example comparison between various paths for BitArray(Array, int) with bool arrays (See https://github.com/dotnet/corefx/issues/41762#issuecomment-542658154 and https://github.com/dotnet/corefx/issues/41762#issuecomment-542831649 for benchmarks of And/or/xor/not and
BitArray(bool[])
):