Add vectorized paths for Span<T>.Reverse #64412

alexcovington · 2022-01-27T23:32:35Z

Adds vectorized paths to Span<T>.Reverse for types that are supported. Falls back to previous behavior if T is not a value type or too big for a vector.

Compared against 65a5d0e.

Using this microbenchmark to compare performance, modified to use more buffer sizes and types:

Microbenchmark changes

diff --git a/src/benchmarks/micro/libraries/System.Memory/Span.cs b/src/benchmarks/micro/libraries/System.Memory/Span.cs
index e696e141..75d28d7d 100644
--- a/src/benchmarks/micro/libraries/System.Memory/Span.cs
+++ b/src/benchmarks/micro/libraries/System.Memory/Span.cs
@@ -14,11 +14,20 @@ namespace System.Memory
     [GenericTypeArguments(typeof(byte))]
     [GenericTypeArguments(typeof(char))]
     [GenericTypeArguments(typeof(int))]
+    [GenericTypeArguments(typeof(long))]
+    [GenericTypeArguments(typeof(float))]
+    [GenericTypeArguments(typeof(double))]
     [BenchmarkCategory(Categories.Runtime, Categories.Libraries, Categories.Span)]
     public class Span<T>
         where T : struct, IComparable<T>, IEquatable<T>
     {
-        [Params(Utils.DefaultCollectionSize)]
+        [Params(
+            8 /* No vectorization */,
+            34 /* SSSE3 with leftover */,
+            68 /* AVX2 path with leftover bytes */,
+            Utils.DefaultCollectionSize,
+            Utils.DefaultCollectionSize * 2
+            )]
         public int Size;

         private T[] _array, _same, _emptyWithSingleValue;

Performance results:

$ py .\scripts\benchmarks_ci.py -c Release -f net7.0 --filter *Span*Reverse* --corerun C:\Users\acovingt\source\repos\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe --bdn-artifacts C:\Users\acovingt\Documents\vectorize-span-reverse-base
$ py .\scripts\benchmarks_ci.py -c Release -f net7.0 --filter *Span*Reverse* --corerun C:\Users\acovingt\source\repos\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe --bdn-artifacts C:\Users\acovingt\Documents\vectorize-span-reverse-diff
$ cd .\src\tools\ResultsComparer\ 
$ dotnet run -- --base C:\Users\acovingt\Documents\vectorize-span-reverse-base\ --diff C:\Users\acovingt\Documents\vectorize-span-reverse-diff\ --threshold 3% --noise 5ns
summary:
better: 23, geomean: 3.946
total diff: 23

No Slower results for the provided threshold = 3% and noise filter = 5ns.

| Faster                                         | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| ---------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Memory.Span<Byte>.Reverse(Size: 1024)   |     33.45 |           616.32 |            18.42 |         |
| System.Memory.Span<Byte>.Reverse(Size: 512)    |     28.16 |           306.00 |            10.87 |         |
| System.Memory.Span<Char>.Reverse(Size: 512)    |     16.26 |           300.26 |            18.47 |         |
| System.Memory.Span<Char>.Reverse(Size: 1024)   |     14.25 |           609.56 |            42.77 |         |
| System.Memory.Span<Byte>.Reverse(Size: 68)     |      5.75 |            33.59 |             5.84 |         |
| System.Memory.Span<Char>.Reverse(Size: 68)     |      5.54 |            39.56 |             7.14 | bimodal |
| System.Memory.Span<Single>.Reverse(Size: 512)  |      4.57 |           152.19 |            33.33 |         |
| System.Memory.Span<Char>.Reverse(Size: 34)     |      4.13 |            21.56 |             5.22 | bimodal |
| System.Memory.Span<Single>.Reverse(Size: 1024) |      3.93 |           303.01 |            77.20 |         |
| System.Memory.Span<Int32>.Reverse(Size: 1024)  |      3.88 |           307.61 |            79.20 |         |
| System.Memory.Span<Int32>.Reverse(Size: 512)   |      3.67 |           154.75 |            42.20 |         |
| System.Memory.Span<Byte>.Reverse(Size: 34)     |      3.03 |            15.56 |             5.14 | several?|
| System.Memory.Span<Int32>.Reverse(Size: 68)    |      2.59 |            23.28 |             8.98 |         |
| System.Memory.Span<Single>.Reverse(Size: 68)   |      2.45 |            19.77 |             8.06 |         |
| System.Memory.Span<Single>.Reverse(Size: 34)   |      2.21 |            12.41 |             5.62 | several?|
| System.Memory.Span<Int64>.Reverse(Size: 1024)  |      2.01 |           302.07 |           150.24 |         |
| System.Memory.Span<Int32>.Reverse(Size: 34)    |      1.99 |            15.11 |             7.61 |         |
| System.Memory.Span<Int64>.Reverse(Size: 512)   |      1.98 |           153.95 |            77.64 |         |
| System.Memory.Span<Double>.Reverse(Size: 512)  |      1.97 |           152.21 |            77.18 |         |
| System.Memory.Span<Double>.Reverse(Size: 1024) |      1.93 |           301.68 |           156.33 |         |
| System.Memory.Span<Int64>.Reverse(Size: 68)    |      1.91 |            23.54 |            12.35 |         |
| System.Memory.Span<Int64>.Reverse(Size: 34)    |      1.76 |            14.90 |             8.49 |         |
| System.Memory.Span<Double>.Reverse(Size: 68)   |      1.64 |            19.80 |            12.08 |         |

Please let me know if I can provide any other information. Thanks!

ghost · 2022-01-27T23:32:43Z

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

Issue Details

Adds vectorized paths to Span<T>.Reverse for types that are supported. Falls back to previous behavior if T is not a value type or too big for a vector.

Compared against 65a5d0e.

Using this microbenchmark to compare performance, modified to use more buffer sizes and types:

Microbenchmark changes

diff --git a/src/benchmarks/micro/libraries/System.Memory/Span.cs b/src/benchmarks/micro/libraries/System.Memory/Span.cs
index e696e141..75d28d7d 100644
--- a/src/benchmarks/micro/libraries/System.Memory/Span.cs
+++ b/src/benchmarks/micro/libraries/System.Memory/Span.cs
@@ -14,11 +14,20 @@ namespace System.Memory
     [GenericTypeArguments(typeof(byte))]
     [GenericTypeArguments(typeof(char))]
     [GenericTypeArguments(typeof(int))]
+    [GenericTypeArguments(typeof(long))]
+    [GenericTypeArguments(typeof(float))]
+    [GenericTypeArguments(typeof(double))]
     [BenchmarkCategory(Categories.Runtime, Categories.Libraries, Categories.Span)]
     public class Span<T>
         where T : struct, IComparable<T>, IEquatable<T>
     {
-        [Params(Utils.DefaultCollectionSize)]
+        [Params(
+            8 /* No vectorization */,
+            34 /* SSSE3 with leftover */,
+            68 /* AVX2 path with leftover bytes */,
+            Utils.DefaultCollectionSize,
+            Utils.DefaultCollectionSize * 2
+            )]
         public int Size;

         private T[] _array, _same, _emptyWithSingleValue;

Performance results:

$ py .\scripts\benchmarks_ci.py -c Release -f net7.0 --filter *Span*Reverse* --corerun C:\Users\acovingt\source\repos\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe --bdn-artifacts C:\Users\acovingt\Documents\vectorize-span-reverse-base
$ py .\scripts\benchmarks_ci.py -c Release -f net7.0 --filter *Span*Reverse* --corerun C:\Users\acovingt\source\repos\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe --bdn-artifacts C:\Users\acovingt\Documents\vectorize-span-reverse-diff
$ cd .\src\tools\ResultsComparer\ 
$ dotnet run -- --base C:\Users\acovingt\Documents\vectorize-span-reverse-base\ --diff C:\Users\acovingt\Documents\vectorize-span-reverse-diff\ --threshold 3% --noise 5ns
summary:
better: 23, geomean: 3.946
total diff: 23

No Slower results for the provided threshold = 3% and noise filter = 5ns.

| Faster                                         | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| ---------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Memory.Span<Byte>.Reverse(Size: 1024)   |     33.45 |           616.32 |            18.42 |         |
| System.Memory.Span<Byte>.Reverse(Size: 512)    |     28.16 |           306.00 |            10.87 |         |
| System.Memory.Span<Char>.Reverse(Size: 512)    |     16.26 |           300.26 |            18.47 |         |
| System.Memory.Span<Char>.Reverse(Size: 1024)   |     14.25 |           609.56 |            42.77 |         |
| System.Memory.Span<Byte>.Reverse(Size: 68)     |      5.75 |            33.59 |             5.84 |         |
| System.Memory.Span<Char>.Reverse(Size: 68)     |      5.54 |            39.56 |             7.14 | bimodal |
| System.Memory.Span<Single>.Reverse(Size: 512)  |      4.57 |           152.19 |            33.33 |         |
| System.Memory.Span<Char>.Reverse(Size: 34)     |      4.13 |            21.56 |             5.22 | bimodal |
| System.Memory.Span<Single>.Reverse(Size: 1024) |      3.93 |           303.01 |            77.20 |         |
| System.Memory.Span<Int32>.Reverse(Size: 1024)  |      3.88 |           307.61 |            79.20 |         |
| System.Memory.Span<Int32>.Reverse(Size: 512)   |      3.67 |           154.75 |            42.20 |         |
| System.Memory.Span<Byte>.Reverse(Size: 34)     |      3.03 |            15.56 |             5.14 | several?|
| System.Memory.Span<Int32>.Reverse(Size: 68)    |      2.59 |            23.28 |             8.98 |         |
| System.Memory.Span<Single>.Reverse(Size: 68)   |      2.45 |            19.77 |             8.06 |         |
| System.Memory.Span<Single>.Reverse(Size: 34)   |      2.21 |            12.41 |             5.62 | several?|
| System.Memory.Span<Int64>.Reverse(Size: 1024)  |      2.01 |           302.07 |           150.24 |         |
| System.Memory.Span<Int32>.Reverse(Size: 34)    |      1.99 |            15.11 |             7.61 |         |
| System.Memory.Span<Int64>.Reverse(Size: 512)   |      1.98 |           153.95 |            77.64 |         |
| System.Memory.Span<Double>.Reverse(Size: 512)  |      1.97 |           152.21 |            77.18 |         |
| System.Memory.Span<Double>.Reverse(Size: 1024) |      1.93 |           301.68 |           156.33 |         |
| System.Memory.Span<Int64>.Reverse(Size: 68)    |      1.91 |            23.54 |            12.35 |         |
| System.Memory.Span<Int64>.Reverse(Size: 34)    |      1.76 |            14.90 |             8.49 |         |
| System.Memory.Span<Double>.Reverse(Size: 68)   |      1.64 |            19.80 |            12.08 |         |

Please let me know if I can provide any other information. Thanks!

Author:	alexcovington
Assignees:	-
Labels:	`area-System.Memory`, `community-contribution`
Milestone:	-

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Byte.cs

src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs

stephentoub · 2022-01-28T17:03:56Z

Thanks for sharing the perf tests. The results only show down to a size of 32 elements, and all of them show improvements. Is there a smaller size at which this is actually a regression?

stephentoub · 2022-01-28T17:05:25Z

Array.Reverse<T> has its own almost identical implementation that won't benefit from these improvements. Can we change Array to delegate to the same underlying implementation being added here so that both arrays and spans benefit equally?

runtime/src/libraries/System.Private.CoreLib/src/System/Array.cs

Lines 1610 to 1641 in 953fd35

    
           public static void Reverse<T>(T[] array) 
        
           { 
        
               if (array == null) 
        
                   ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array); 
        
               Reverse(array, 0, array.Length); 
        
           } 
        
           public static void Reverse<T>(T[] array, int index, int length) 
        
           { 
        
               if (array == null) 
        
                   ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array); 
        
               if (index < 0) 
        
                   ThrowHelper.ThrowIndexArgumentOutOfRange_NeedNonNegNumException(); 
        
               if (length < 0) 
        
                   ThrowHelper.ThrowLengthArgumentOutOfRange_ArgumentOutOfRange_NeedNonNegNum(); 
        
               if (array.Length - index < length) 
        
                   ThrowHelper.ThrowArgumentException(ExceptionResource.Argument_InvalidOffLen); 
        
               if (length <= 1) 
        
                   return; 
        
               ref T first = ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(array), index); 
        
               ref T last = ref Unsafe.Add(ref Unsafe.Add(ref first, length), -1); 
        
               do 
        
               { 
        
                   T temp = first; 
        
                   first = last; 
        
                   last = temp; 
        
                   first = ref Unsafe.Add(ref first, 1); 
        
                   last = ref Unsafe.Add(ref last, -1); 
        
               } while (Unsafe.IsAddressLessThan(ref first, ref last)); 
        
           }

src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs

alexcovington · 2022-01-28T17:37:09Z

Thanks for sharing the perf tests. The results only show down to a size of 32 elements, and all of them show improvements. Is there a smaller size at which this is actually a regression?

@stephentoub I didn't notice that the table didn't include the results for 8 byte spans. I changed my filter to be a little more inclusive when comparing results:

PS C:\Users\acovingt\source\repos\performance\src\tools\ResultsComparer> dotnet run -- --base C:\Users\acovingt\Documents\vectorize-span-reverse-base\ --diff C:\Users\acovingt\Documents\vectorize-span-reverse-diff\ --threshold 1% --noise 1ns
summary:
better: 24, geomean: 3.806
worse: 4, geomean: 1.463
total diff: 28

| Slower                                      | diff/base | Base Median (ns) | Diff Median (ns) | Modality|
| ------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| System.Memory.Span<Double>.Reverse(Size: 8) |      1.69 |             4.20 |             7.08 |         |
| System.Memory.Span<Single>.Reverse(Size: 8) |      1.55 |             4.21 |             6.53 |         |
| System.Memory.Span<Int32>.Reverse(Size: 8)  |      1.34 |             5.02 |             6.74 |         |
| System.Memory.Span<Byte>.Reverse(Size: 8)   |      1.30 |             5.24 |             6.82 |         |

| Faster                                         | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| ---------------------------------------------- | ---------:| ----------------:| ----------------:| -------- |
| System.Memory.Span<Byte>.Reverse(Size: 1024)   |     33.45 |           616.32 |            18.42 |         |
| System.Memory.Span<Byte>.Reverse(Size: 512)    |     28.16 |           306.00 |            10.87 |         |
| System.Memory.Span<Char>.Reverse(Size: 512)    |     16.26 |           300.26 |            18.47 |         |
| System.Memory.Span<Char>.Reverse(Size: 1024)   |     14.25 |           609.56 |            42.77 |         |
| System.Memory.Span<Byte>.Reverse(Size: 68)     |      5.75 |            33.59 |             5.84 |         |
| System.Memory.Span<Char>.Reverse(Size: 68)     |      5.54 |            39.56 |             7.14 | bimodal |
| System.Memory.Span<Single>.Reverse(Size: 512)  |      4.57 |           152.19 |            33.33 |         |
| System.Memory.Span<Char>.Reverse(Size: 34)     |      4.13 |            21.56 |             5.22 | bimodal |
| System.Memory.Span<Single>.Reverse(Size: 1024) |      3.93 |           303.01 |            77.20 |         |
| System.Memory.Span<Int32>.Reverse(Size: 1024)  |      3.88 |           307.61 |            79.20 |         |
| System.Memory.Span<Int32>.Reverse(Size: 512)   |      3.67 |           154.75 |            42.20 |         |
| System.Memory.Span<Byte>.Reverse(Size: 34)     |      3.03 |            15.56 |             5.14 | several?|
| System.Memory.Span<Int32>.Reverse(Size: 68)    |      2.59 |            23.28 |             8.98 |         |
| System.Memory.Span<Single>.Reverse(Size: 68)   |      2.45 |            19.77 |             8.06 |         |
| System.Memory.Span<Single>.Reverse(Size: 34)   |      2.21 |            12.41 |             5.62 | several?|
| System.Memory.Span<Int64>.Reverse(Size: 1024)  |      2.01 |           302.07 |           150.24 |         |
| System.Memory.Span<Int32>.Reverse(Size: 34)    |      1.99 |            15.11 |             7.61 |         |
| System.Memory.Span<Int64>.Reverse(Size: 512)   |      1.98 |           153.95 |            77.64 |         |
| System.Memory.Span<Double>.Reverse(Size: 512)  |      1.97 |           152.21 |            77.18 |         |
| System.Memory.Span<Double>.Reverse(Size: 1024) |      1.93 |           301.68 |           156.33 |         |
| System.Memory.Span<Int64>.Reverse(Size: 68)    |      1.91 |            23.54 |            12.35 |         |
| System.Memory.Span<Int64>.Reverse(Size: 34)    |      1.76 |            14.90 |             8.49 |         |
| System.Memory.Span<Double>.Reverse(Size: 34)   |      1.66 |            12.35 |             7.44 |         |
| System.Memory.Span<Double>.Reverse(Size: 68)   |      1.64 |            19.80 |            12.08 |         |

There's a 2-3ns regression for the really small spans possibly due to the overhead of the extra conditional checks, or it could just be noise.

Let me know if you'd like more analysis on it.

alexcovington · 2022-01-28T17:41:04Z

Array.Reverse<T> has its own almost identical implementation that won't benefit from these improvements. Can we change Array to delegate to the same underlying implementation being added here so that both arrays and spans benefit equally?

runtime/src/libraries/System.Private.CoreLib/src/System/Array.cs

Lines 1610 to 1641 in 953fd35

public static void Reverse<T>(T[] array)

{

if (array == null)

ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);

Reverse(array, 0, array.Length);

}

public static void Reverse<T>(T[] array, int index, int length)

{

if (array == null)

ThrowHelper.ThrowArgumentNullException(ExceptionArgument.array);

if (index < 0)

ThrowHelper.ThrowIndexArgumentOutOfRange_NeedNonNegNumException();

if (length < 0)

ThrowHelper.ThrowLengthArgumentOutOfRange_ArgumentOutOfRange_NeedNonNegNum();

if (array.Length - index < length)

ThrowHelper.ThrowArgumentException(ExceptionResource.Argument_InvalidOffLen);

if (length <= 1)

return;

ref T first = ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(array), index);

ref T last = ref Unsafe.Add(ref Unsafe.Add(ref first, length), -1);

do

{

T temp = first;

first = last;

last = temp;

first = ref Unsafe.Add(ref first, 1);

last = ref Unsafe.Add(ref last, -1);

} while (Unsafe.IsAddressLessThan(ref first, ref last));

}

I didn't notice this, but you're right that Array can also benefit from this. I'll try adding the paths there as well and send an update.

alexcovington · 2022-01-29T00:06:25Z

@stephentoub Updated PR to include the optimizations for Array.Reverse as well.

To verify the performance, I did a quick copy of the same benchmark and had it reverse the original array instead.

Microbenchmark changes

@@ -14,11 +14,20 @@ namespace System.Memory
     [GenericTypeArguments(typeof(byte))]
     [GenericTypeArguments(typeof(char))]
     [GenericTypeArguments(typeof(int))]
+    [GenericTypeArguments(typeof(long))]
+    [GenericTypeArguments(typeof(float))]
+    [GenericTypeArguments(typeof(double))]
     [BenchmarkCategory(Categories.Runtime, Categories.Libraries, Categories.Span)]
     public class Span<T>
         where T : struct, IComparable<T>, IEquatable<T>
     {
-        [Params(Utils.DefaultCollectionSize)]
+        [Params(
+            8 /* No vectorization */,
+            34 /* SSSE3 with leftover */,
+            68 /* AVX2 path with leftover bytes */,
+            Utils.DefaultCollectionSize,
+            Utils.DefaultCollectionSize * 2
+            )]
         public int Size;

         private T[] _array, _same, _emptyWithSingleValue;
@@ -41,6 +50,9 @@ namespace System.Memory
         [Benchmark]
         public void Reverse() => new System.Span<T>(_array).Reverse();

+        [Benchmark]
+        public void ReverseArray() => _array.Reverse();
+
         [Benchmark]
         public T[] ToArray() => new System.Span<T>(_array).ToArray();

Performance results

BenchmarkDotNet=v0.13.1.1669-nightly, OS=Windows 10 (10.0.19044.1415/21H2/November2021Update)
AMD Ryzen 5 3600, 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100-preview.2.22078.1
  [Host]     : .NET 7.0.0 (7.0.22.7408), X64 RyuJIT
  Job-FHRFIJ : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT
  Job-KGLSXA : .NET 7.0.0 (42.42.42.42424), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  Arguments=/p:DebugType=portable,-bl:benchmarkdotnet.binlog  IterationTime=250.0000 ms  
MaxIterationCount=20  MinIterationCount=15  WarmupCount=1

Type	Method	Job	Toolchain	Size	Mean	Error	StdDev	Median	Min	Max	Ratio	RatioSD	Gen 0	Allocated	Alloc Ratio
Span<Byte>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	5.520 ns	0.3235 ns	0.3595 ns	5.408 ns	5.130 ns	6.543 ns	1.00	0.00	-	-	NA
Span<Byte>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	5.266 ns	0.1361 ns	0.1456 ns	5.237 ns	5.047 ns	5.545 ns	0.96	0.07	-	-	NA

Span<Char>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	5.577 ns	0.2455 ns	0.2827 ns	5.525 ns	5.261 ns	6.255 ns	1.00	0.00	-	-	NA
Span<Char>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	5.160 ns	0.1787 ns	0.1987 ns	5.107 ns	4.780 ns	5.524 ns	0.93	0.06	-	-	NA

Span<Double>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	5.019 ns	0.2419 ns	0.2689 ns	5.010 ns	4.598 ns	5.424 ns	1.00	0.00	-	-	NA
Span<Double>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	3.372 ns	0.0810 ns	0.0676 ns	3.369 ns	3.237 ns	3.494 ns	0.67	0.03	-	-	NA

Span<Int32>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	5.367 ns	0.3346 ns	0.3853 ns	5.219 ns	4.900 ns	6.279 ns	1.00	0.00	-	-	NA
Span<Int32>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	4.585 ns	0.1160 ns	0.1028 ns	4.581 ns	4.423 ns	4.781 ns	0.84	0.07	-	-	NA

Span<Int64>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	5.366 ns	0.1760 ns	0.1956 ns	5.313 ns	5.069 ns	5.743 ns	1.00	0.00	-	-	NA
Span<Int64>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	3.281 ns	0.0995 ns	0.1065 ns	3.258 ns	3.104 ns	3.492 ns	0.61	0.03	-	-	NA

Span<Single>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	5.096 ns	0.3850 ns	0.4279 ns	4.928 ns	4.449 ns	6.003 ns	1.00	0.00	-	-	NA
Span<Single>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	5.450 ns	0.2560 ns	0.2846 ns	5.336 ns	4.920 ns	6.008 ns	1.08	0.12	-	-	NA

Span<Byte>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	7.587 ns	0.2562 ns	0.2847 ns	7.534 ns	7.171 ns	8.182 ns	1.00	0.00	0.0057	48 B	1.00
Span<Byte>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	8.353 ns	0.2116 ns	0.2352 ns	8.403 ns	7.905 ns	8.828 ns	1.10	0.04	0.0057	48 B	1.00

Span<Char>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	7.876 ns	0.2675 ns	0.2863 ns	7.863 ns	7.416 ns	8.510 ns	1.00	0.00	0.0057	48 B	1.00
Span<Char>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	8.004 ns	0.4029 ns	0.4478 ns	7.962 ns	7.293 ns	8.817 ns	1.02	0.06	0.0057	48 B	1.00

Span<Double>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	8.090 ns	0.3468 ns	0.3711 ns	8.000 ns	7.534 ns	8.745 ns	1.00	0.00	0.0057	48 B	1.00
Span<Double>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	7.939 ns	0.3305 ns	0.3806 ns	7.929 ns	7.438 ns	8.833 ns	0.99	0.08	0.0057	48 B	1.00

Span<Int32>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	8.231 ns	0.4143 ns	0.4771 ns	8.007 ns	7.617 ns	9.212 ns	1.00	0.00	0.0057	48 B	1.00
Span<Int32>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	8.063 ns	0.3655 ns	0.4210 ns	8.143 ns	7.353 ns	9.113 ns	0.98	0.08	0.0057	48 B	1.00

Span<Int64>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	9.724 ns	0.5913 ns	0.6809 ns	9.604 ns	8.832 ns	11.157 ns	1.00	0.00	0.0057	48 B	1.00
Span<Int64>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	7.725 ns	0.3031 ns	0.3369 ns	7.730 ns	7.026 ns	8.475 ns	0.80	0.06	0.0057	48 B	1.00

Span<Single>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	7.824 ns	0.2631 ns	0.2702 ns	7.771 ns	7.417 ns	8.564 ns	1.00	0.00	0.0057	48 B	1.00
Span<Single>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	8	7.809 ns	0.3318 ns	0.3821 ns	7.761 ns	7.206 ns	8.428 ns	1.01	0.05	0.0057	48 B	1.00

Span<Byte>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	19.987 ns	1.4561 ns	1.6768 ns	20.243 ns	16.980 ns	22.752 ns	1.00	0.00	-	-	NA
Span<Byte>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	3.772 ns	0.2250 ns	0.2592 ns	3.765 ns	3.346 ns	4.285 ns	0.19	0.01	-	-	NA

Span<Char>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	18.797 ns	0.3681 ns	0.3443 ns	18.846 ns	18.237 ns	19.413 ns	1.00	0.00	-	-	NA
Span<Char>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	4.134 ns	0.2428 ns	0.2597 ns	4.101 ns	3.823 ns	4.735 ns	0.22	0.02	-	-	NA

Span<Double>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	14.286 ns	0.3942 ns	0.4540 ns	14.150 ns	13.689 ns	15.292 ns	1.00	0.00	-	-	NA
Span<Double>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	7.089 ns	0.3223 ns	0.3582 ns	7.022 ns	6.581 ns	7.855 ns	0.50	0.03	-	-	NA

Span<Int32>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	15.881 ns	0.4342 ns	0.5000 ns	15.744 ns	15.185 ns	16.850 ns	1.00	0.00	-	-	NA
Span<Int32>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	4.948 ns	0.2778 ns	0.3199 ns	4.954 ns	4.458 ns	5.643 ns	0.31	0.02	-	-	NA

Span<Int64>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	15.256 ns	0.3355 ns	0.3729 ns	15.151 ns	14.861 ns	16.243 ns	1.00	0.00	-	-	NA
Span<Int64>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	6.799 ns	0.2343 ns	0.2698 ns	6.805 ns	6.303 ns	7.297 ns	0.45	0.02	-	-	NA

Span<Single>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	14.213 ns	0.4007 ns	0.4453 ns	14.075 ns	13.660 ns	15.193 ns	1.00	0.00	-	-	NA
Span<Single>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	4.810 ns	0.2720 ns	0.2910 ns	4.724 ns	4.462 ns	5.345 ns	0.34	0.02	-	-	NA

Span<Byte>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	8.077 ns	0.6692 ns	0.7706 ns	8.044 ns	7.043 ns	9.612 ns	1.00	0.00	0.0057	48 B	1.00
Span<Byte>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	7.788 ns	0.4962 ns	0.5714 ns	7.681 ns	7.107 ns	8.911 ns	0.97	0.08	0.0057	48 B	1.00

Span<Char>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	7.844 ns	0.5070 ns	0.5839 ns	7.713 ns	7.224 ns	9.234 ns	1.00	0.00	0.0057	48 B	1.00
Span<Char>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	9.578 ns	1.5863 ns	1.8268 ns	8.997 ns	7.225 ns	13.210 ns	1.23	0.25	0.0057	48 B	1.00

Span<Double>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	7.779 ns	0.1663 ns	0.1474 ns	7.751 ns	7.554 ns	8.004 ns	1.00	0.00	0.0057	48 B	1.00
Span<Double>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	7.673 ns	0.2380 ns	0.2646 ns	7.632 ns	7.274 ns	8.135 ns	0.98	0.04	0.0057	48 B	1.00

Span<Int32>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	7.794 ns	0.2142 ns	0.2381 ns	7.749 ns	7.446 ns	8.395 ns	1.00	0.00	0.0057	48 B	1.00
Span<Int32>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	7.672 ns	0.3186 ns	0.3541 ns	7.593 ns	7.256 ns	8.481 ns	0.98	0.04	0.0057	48 B	1.00

Span<Int64>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	9.618 ns	0.3395 ns	0.3909 ns	9.571 ns	9.099 ns	10.511 ns	1.00	0.00	0.0057	48 B	1.00
Span<Int64>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	7.367 ns	0.3446 ns	0.3538 ns	7.273 ns	6.963 ns	8.370 ns	0.76	0.06	0.0057	48 B	1.00

Span<Single>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	7.862 ns	0.3622 ns	0.4171 ns	7.650 ns	7.316 ns	8.645 ns	1.00	0.00	0.0057	48 B	1.00
Span<Single>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	34	7.850 ns	0.3969 ns	0.4411 ns	7.840 ns	7.281 ns	8.807 ns	1.00	0.09	0.0057	48 B	1.00

Span<Byte>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	34.032 ns	1.1898 ns	1.2731 ns	33.821 ns	32.554 ns	36.661 ns	1.00	0.00	-	-	NA
Span<Byte>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	4.381 ns	0.1225 ns	0.1311 ns	4.425 ns	4.091 ns	4.608 ns	0.13	0.01	-	-	NA

Span<Char>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	34.295 ns	0.7447 ns	0.6966 ns	34.213 ns	33.174 ns	35.414 ns	1.00	0.00	-	-	NA
Span<Char>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	5.452 ns	0.2268 ns	0.2611 ns	5.440 ns	5.078 ns	5.893 ns	0.16	0.01	-	-	NA

Span<Double>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	24.392 ns	1.0467 ns	1.0748 ns	24.309 ns	23.170 ns	27.153 ns	1.00	0.00	-	-	NA
Span<Double>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	15.108 ns	0.2478 ns	0.2069 ns	15.092 ns	14.856 ns	15.580 ns	0.62	0.03	-	-	NA

Span<Int32>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	24.522 ns	0.7755 ns	0.8930 ns	24.090 ns	23.431 ns	26.373 ns	1.00	0.00	-	-	NA
Span<Int32>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	7.718 ns	0.2908 ns	0.3349 ns	7.766 ns	7.222 ns	8.354 ns	0.32	0.02	-	-	NA

Span<Int64>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	24.862 ns	0.7633 ns	0.8484 ns	24.687 ns	23.698 ns	26.954 ns	1.00	0.00	-	-	NA
Span<Int64>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	10.736 ns	0.4633 ns	0.5149 ns	10.555 ns	10.111 ns	11.744 ns	0.43	0.03	-	-	NA

Span<Single>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	24.264 ns	1.1811 ns	1.3602 ns	23.699 ns	22.872 ns	26.753 ns	1.00	0.00	-	-	NA
Span<Single>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	8.674 ns	0.2843 ns	0.3160 ns	8.662 ns	8.160 ns	9.195 ns	0.36	0.02	-	-	NA

Span<Byte>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	7.554 ns	0.2177 ns	0.2507 ns	7.497 ns	7.024 ns	8.018 ns	1.00	0.00	0.0057	48 B	1.00
Span<Byte>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	7.463 ns	0.2131 ns	0.2369 ns	7.450 ns	7.114 ns	7.988 ns	0.99	0.05	0.0057	48 B	1.00

Span<Char>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	7.896 ns	0.3654 ns	0.3909 ns	7.884 ns	7.302 ns	8.653 ns	1.00	0.00	0.0057	48 B	1.00
Span<Char>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	7.856 ns	0.4322 ns	0.4978 ns	7.765 ns	7.275 ns	9.036 ns	0.99	0.06	0.0057	48 B	1.00

Span<Double>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	8.021 ns	0.3683 ns	0.4242 ns	8.009 ns	7.412 ns	8.776 ns	1.00	0.00	0.0057	48 B	1.00
Span<Double>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	7.520 ns	0.1368 ns	0.1142 ns	7.543 ns	7.334 ns	7.714 ns	0.94	0.05	0.0057	48 B	1.00

Span<Int32>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	7.878 ns	0.4089 ns	0.4709 ns	7.818 ns	7.351 ns	8.969 ns	1.00	0.00	0.0057	48 B	1.00
Span<Int32>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	7.759 ns	0.3057 ns	0.3398 ns	7.656 ns	7.306 ns	8.605 ns	0.99	0.06	0.0057	48 B	1.00

Span<Int64>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	9.062 ns	0.2920 ns	0.3362 ns	9.197 ns	8.153 ns	9.445 ns	1.00	0.00	0.0057	48 B	1.00
Span<Int64>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	7.693 ns	0.4299 ns	0.4951 ns	7.554 ns	7.178 ns	9.011 ns	0.85	0.05	0.0057	48 B	1.00

Span<Single>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	7.918 ns	0.2114 ns	0.2262 ns	7.910 ns	7.506 ns	8.302 ns	1.00	0.00	0.0057	48 B	1.00
Span<Single>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	68	7.989 ns	0.4591 ns	0.5287 ns	7.913 ns	7.267 ns	9.029 ns	1.01	0.08	0.0057	48 B	1.00

Span<Byte>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	308.148 ns	1.9129 ns	1.7894 ns	307.750 ns	305.312 ns	311.041 ns	1.00	0.00	-	-	NA
Span<Byte>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	9.791 ns	0.5977 ns	0.6883 ns	9.736 ns	8.779 ns	11.055 ns	0.03	0.00	-	-	NA

Span<Char>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	308.872 ns	4.4762 ns	4.1870 ns	307.695 ns	303.549 ns	317.944 ns	1.00	0.00	-	-	NA
Span<Char>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	24.216 ns	0.4776 ns	0.5110 ns	24.171 ns	23.394 ns	25.113 ns	0.08	0.00	-	-	NA

Span<Double>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	167.078 ns	6.1122 ns	6.7936 ns	167.062 ns	155.270 ns	177.899 ns	1.00	0.00	-	-	NA
Span<Double>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	79.162 ns	1.4420 ns	1.2783 ns	79.150 ns	76.854 ns	81.755 ns	0.47	0.02	-	-	NA

Span<Int32>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	159.908 ns	3.2113 ns	3.2978 ns	159.484 ns	154.711 ns	165.094 ns	1.00	0.00	-	-	NA
Span<Int32>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	42.780 ns	1.8013 ns	2.0744 ns	42.565 ns	37.955 ns	46.418 ns	0.27	0.02	-	-	NA

Span<Int64>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	159.236 ns	3.6277 ns	3.8816 ns	159.271 ns	152.683 ns	166.493 ns	1.00	0.00	-	-	NA
Span<Int64>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	74.144 ns	2.0413 ns	2.3508 ns	73.866 ns	70.691 ns	78.616 ns	0.46	0.02	-	-	NA

Span<Single>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	166.105 ns	6.8180 ns	7.8516 ns	163.366 ns	157.062 ns	182.688 ns	1.00	0.00	-	-	NA
Span<Single>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	41.930 ns	1.2742 ns	1.4674 ns	41.705 ns	39.787 ns	44.695 ns	0.25	0.01	-	-	NA

Span<Byte>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	7.418 ns	0.2660 ns	0.2846 ns	7.373 ns	6.995 ns	8.044 ns	1.00	0.00	0.0057	48 B	1.00
Span<Byte>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	7.452 ns	0.3129 ns	0.3477 ns	7.431 ns	7.026 ns	8.084 ns	1.00	0.07	0.0057	48 B	1.00

Span<Char>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	7.895 ns	0.2964 ns	0.3294 ns	7.934 ns	7.330 ns	8.437 ns	1.00	0.00	0.0057	48 B	1.00
Span<Char>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	7.862 ns	0.2800 ns	0.3224 ns	7.770 ns	7.219 ns	8.578 ns	1.00	0.07	0.0057	48 B	1.00

Span<Double>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	8.273 ns	0.3909 ns	0.4501 ns	8.225 ns	7.613 ns	9.046 ns	1.00	0.00	0.0057	48 B	1.00
Span<Double>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	8.043 ns	0.3316 ns	0.3686 ns	7.953 ns	7.540 ns	8.741 ns	0.98	0.07	0.0057	48 B	1.00

Span<Int32>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	7.767 ns	0.2597 ns	0.2887 ns	7.772 ns	7.318 ns	8.222 ns	1.00	0.00	0.0057	48 B	1.00
Span<Int32>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	7.767 ns	0.2868 ns	0.3069 ns	7.805 ns	7.165 ns	8.346 ns	1.00	0.07	0.0057	48 B	1.00

Span<Int64>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	9.678 ns	0.5640 ns	0.6269 ns	9.615 ns	8.812 ns	11.005 ns	1.00	0.00	0.0057	48 B	1.00
Span<Int64>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	7.483 ns	0.3955 ns	0.4396 ns	7.413 ns	6.938 ns	8.390 ns	0.78	0.07	0.0057	48 B	1.00

Span<Single>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	8.161 ns	0.3820 ns	0.4087 ns	8.140 ns	7.579 ns	8.966 ns	1.00	0.00	0.0057	48 B	1.00
Span<Single>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	512	7.750 ns	0.2831 ns	0.3147 ns	7.681 ns	7.376 ns	8.542 ns	0.95	0.06	0.0057	48 B	1.00

Span<Byte>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	633.223 ns	8.3046 ns	7.7682 ns	632.430 ns	620.162 ns	647.521 ns	1.00	0.00	-	-	NA
Span<Byte>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	21.942 ns	2.2872 ns	2.6340 ns	23.206 ns	17.887 ns	26.030 ns	0.03	0.00	-	-	NA

Span<Char>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	616.793 ns	10.7145 ns	10.0224 ns	615.495 ns	603.050 ns	639.831 ns	1.00	0.00	-	-	NA
Span<Char>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	41.262 ns	1.0656 ns	1.1845 ns	41.207 ns	39.861 ns	44.710 ns	0.07	0.00	-	-	NA

Span<Double>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	322.755 ns	13.7626 ns	15.8490 ns	316.026 ns	300.807 ns	353.163 ns	1.00	0.00	-	-	NA
Span<Double>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	160.278 ns	3.6299 ns	4.1802 ns	160.077 ns	155.045 ns	170.722 ns	0.50	0.03	-	-	NA

Span<Int32>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	319.603 ns	8.6402 ns	9.9500 ns	321.806 ns	302.775 ns	333.367 ns	1.00	0.00	-	-	NA
Span<Int32>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	75.303 ns	1.8537 ns	2.1347 ns	75.466 ns	71.836 ns	79.226 ns	0.24	0.01	-	-	NA

Span<Int64>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	312.095 ns	7.5790 ns	8.7280 ns	308.705 ns	302.279 ns	331.895 ns	1.00	0.00	-	-	NA
Span<Int64>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	162.231 ns	3.8566 ns	4.2866 ns	162.476 ns	155.171 ns	170.240 ns	0.52	0.02	-	-	NA

Span<Single>	Reverse	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	320.737 ns	10.5037 ns	11.6748 ns	321.167 ns	302.824 ns	345.795 ns	1.00	0.00	-	-	NA
Span<Single>	Reverse	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	79.105 ns	1.5449 ns	1.4451 ns	79.080 ns	76.844 ns	81.631 ns	0.25	0.01	-	-	NA

Span<Byte>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	7.848 ns	0.6288 ns	0.7241 ns	7.610 ns	6.913 ns	9.330 ns	1.00	0.00	0.0057	48 B	1.00
Span<Byte>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	7.390 ns	0.2733 ns	0.3148 ns	7.373 ns	6.872 ns	8.104 ns	0.95	0.09	0.0057	48 B	1.00

Span<Char>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	8.250 ns	0.4993 ns	0.5750 ns	8.298 ns	7.401 ns	9.308 ns	1.00	0.00	0.0057	48 B	1.00
Span<Char>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	7.871 ns	0.2963 ns	0.3412 ns	7.870 ns	7.388 ns	8.526 ns	0.96	0.07	0.0057	48 B	1.00

Span<Double>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	7.799 ns	0.3453 ns	0.3837 ns	7.745 ns	7.263 ns	8.400 ns	1.00	0.00	0.0057	48 B	1.00
Span<Double>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	7.917 ns	0.4342 ns	0.5000 ns	7.966 ns	7.250 ns	8.866 ns	1.02	0.09	0.0057	48 B	1.00

Span<Int32>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	8.124 ns	0.6008 ns	0.6919 ns	8.000 ns	7.292 ns	9.662 ns	1.00	0.00	0.0057	48 B	1.00
Span<Int32>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	7.962 ns	0.4545 ns	0.5052 ns	7.826 ns	7.385 ns	9.180 ns	0.98	0.08	0.0057	48 B	1.00

Span<Int64>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	9.337 ns	0.3945 ns	0.4385 ns	9.382 ns	8.083 ns	10.056 ns	1.00	0.00	0.0057	48 B	1.00
Span<Int64>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	7.301 ns	0.2937 ns	0.3382 ns	7.216 ns	6.798 ns	8.042 ns	0.78	0.06	0.0057	48 B	1.00

Span<Single>	ReverseArray	Job-FHRFIJ	\runtime-master\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	10.709 ns	0.6433 ns	0.7150 ns	10.478 ns	9.719 ns	12.393 ns	1.00	0.00	0.0057	48 B	1.00
Span<Single>	ReverseArray	Job-KGLSXA	\runtime\artifacts\bin\testhost\net7.0-windows-Release-x64\shared\Microsoft.NETCore.App\7.0.0\corerun.exe	1024	7.895 ns	0.4347 ns	0.4831 ns	7.682 ns	7.318 ns	8.894 ns	0.74	0.05	0.0057	48 B	1.00

Please let me know if there is anything else I can look at.

stephentoub · 2022-02-01T03:20:27Z

src/libraries/System.Private.CoreLib/src/System/Array.cs

+                    return;
+                }
+            }
+            ReverseInner(array, index, length);


Do we need to duplicate all of the above? How about instead having a single method in SpanHelpers:

public static void Reverse<T>(ref T elements, nuint length);

or something similar. That method can do all the of the delegation to the other helpers (ReverseByRef, ReverseInner, etc.), and then this method can be:

[MethodImpl(MethodImplOptions.AggressiveInlining)] public static void Reverse<T>(T[] array, int index, int length) { ... // argument validation SpanHelpers.Reverse(ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(array), index), length); }

and Span.Reverse can be:

[MethodImpl(MethodImplOptions.AggressiveInlining)] public static void Reverse<T>(this Span<T> span) => SpanHelpers.Reverse(ref MemoryMarshal.GetReference(span), span.Length);

?

I see, yes I agree that would be a lot cleaner! I'll go ahead and make the changes and push an update.

src/libraries/System.Private.CoreLib/src/System/Array.cs

src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.cs

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Byte.cs

stephentoub · 2022-02-08T03:20:02Z

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Byte.cs

+                    Unsafe.As<byte, Vector256<byte>>(ref last) = tempFirst;
+                    first = ref Unsafe.Add(ref first, Vector256<byte>.Count);
+                    last = ref Unsafe.Add(ref last, -Vector256<byte>.Count);
+                    numBytesWritten += Vector256<byte>.Count * 2;


Comments describing what this dense block of code is doing would be helpful.

I've added comments to hopefully clear up what the operation is. Please let me know if I can clarify anything.

stephentoub

Thanks for working on this. Other than my remaining comments, this LGTM, but @tannergooding should sign-off as well.

deeprobin

Do not be alarmed. I just noted a few coding style things.

For the things where I changed the naming from CamelCase to lowerCamelCase please check again that the usage is also adjusted and not that you just take over the diff 1:1.

Otherwise it looks good to me in the first place.

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Byte.cs

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Char.cs

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.cs

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Byte.cs

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Char.cs

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.cs

saucecontrol · 2022-03-10T21:36:02Z

Did you benchmark the AVX2 version against the SSSE3 version (run with DOTNET_EnableAVX2=0 to allow VEX encoding but force the SSSE3 fallback)? I would expect that with the additional permutes required for 256-bit vectors, the perf difference might not justify the extra code paths.

alexcovington · 2022-03-18T23:34:04Z

Did you benchmark the AVX2 version against the SSSE3 version (run with DOTNET_EnableAVX2=0 to allow VEX encoding but force the SSSE3 fallback)? I would expect that with the additional permutes required for 256-bit vectors, the perf difference might not justify the extra code paths.

@saucecontrol Haven't tried yet, I'll give it a shot and update this thread with results once finished.

jkotas · 2022-04-25T16:08:49Z

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.cs

+            ref T last = ref Unsafe.Subtract(ref Unsafe.Add(ref first, (int)length), 1);
+            do
+            {
+                (last, first) = (first, last);


This trick is neat, but it generates worse IL:

https://sharplab.io/#v2:EYLgtghglgdgNAFxFANgHwAIBYAEAVAUwGcEBGAHjwD4AKAJwIDN8dGo6S4cHm8cUIJAJQBYAFABvcThksEBMAAccAXlbsSAbmmy2HBKv6CE2sbKMlD8paYC+48dnzEEAJkq0eLPZ25MWAsLiUmayNIEIXD4IQoY00VwRQnZAA==

Notice that there is one local temp for the existing code and two local temps with the tuples trick.

Is the JIT able to optimize out the extra local temp in all cases, even for larger structs? We do not seem to have a coverage for larger structs in dotnet/performance.

The tuples trick was mostly just for styling, but if explicitly using the temp generates better IL then we should probably go with that. I've updated the PR.

This is probably a case where either Roslyn or the JIT could be updated.

Edit: I missed that the other JIT example above was for byref which is probably something only Roslyn can fix...

At a high level....

The (last, first) = (first, last) generates:

IL_0000: ldarg.1 IL_0001: ldarg.2 IL_0002: stloc.0 IL_0003: starg.s last IL_0005: ldloc.0 IL_0006: starg.s first IL_0008: ret

The var temp = last; last = first; first = temp; generates

IL_0000: ldarg.2 IL_0001: ldarg.1 IL_0002: starg.s last IL_0004: starg.s first IL_0006: ret

Why can't the JIT optimize this to the same code? All this code does is shuffle values around. There is no memory or computation involved.

I filed an issue in dotnet/roslyn: dotnet/roslyn#61127

It seems that the JIT optimizes it for int:
https://sharplab.io/#v2:EYLgtghglgdgNAFxBAzmAPgAQAwAIDKAFhAE4AOAMhMAHQBKArjAlGAKYDcAsAFC8DaAKSgIA4mxhsSUAMYAKBAE8ybAPYAzObAQBKHQF1emACy4AKmxQIAjAB4zAPjkk2687nVQSVuLhduzXAAbVF1eAG9eXGj3BDYwMlwAXg8vK24eGNTvBGTg0IyskKs8uISMgF9eAWExCSlZBWU1TW09Qx4Tc0sEACZ7J393TxzfIcDisJ5IzJi5Sd8Rqx08uSWEX0mdSqA=
but not for other types: https://sharplab.io/#v2:EYLgtghglgdgNAFxBAzmAPgAQAwAIDKAFhAE4AOAMhMAHQBKArjAlGAKYDcAsAFC8DaAKSgIA4mxhsSUAMYAKBAE8ybAPYAzOZgCMNACIQEbACqs2ASnMBdXpgAsuY2xQJtAHmMA+OSTbrHuOpQJC5wuL7+xrgANqgI5rwA3ry4qQFGYGS4ALyBwS7cPGl5IQg5MXGFxbEu5RlkhQC+vALCYhJSsgrKapo6+oYmZpY2PPaOzggATB7eEQFBpWHzUTXxSSlpcmthiy7m5XJ7CGFr5k1AA

Why can't the JIT optimize this to the same code? All this code does is shuffle values around. There is no memory or computation involved.

Likely because it can't "see" that the larger struct case is the same pattern as the smaller one. There's also the question of whether it's a worthwhile pattern to spend time trying to recognise.

adamsitnik · 2022-04-25T18:23:11Z

@alexcovington I've merged #68493, could you please sync your branch?

…2 where possible

…re the same size as char, int, or long that use AVX2 or SSSE3 where possible

Co-authored-by: Theodore Tsirpanis <[email protected]>

…cit inlining and moved generic fallbacks into their own private methods.

…th Span.Reverse and Array.Reverse

… wrapper instead

…e for reverseMask variable

… reversing empty or single-element array, better shuffle for int and long Reverse using bit control mask instead of vector control mask

…te4x64 and PermuteVar8x32 for Int32 and Int64 respectively to reduce total operations.

alexcovington · 2022-04-25T18:54:16Z

@adamsitnik I've synced the PR to include your updated tests, please let me know if there is anything else I can look at.

adamsitnik

LGTM, thank you @alexcovington !

dakersnar · 2022-06-08T19:49:02Z

The preview 5 perf report has detected a lot of improvement from this change on a variety of benchmarks, most notably:

System.Memory.Span<Byte>.Reverse(Size: 512), System.Memory.Span<Char>.Reverse(Size: 512), System.Memory.Span<Int32>.Reverse(Size: 512), System.Tests.Perf_Array.Reverse

I've included the raw data below in "details". Notably, while the x64 configs have speed up, we do see some slowdown on Arm64 configs. This is being tracked here: #68667

x64:

Arm64:

I also see some speedup in System.Memory.Span<Byte>.Fill(Size: 512), System.Memory.Span<Byte>.Clear(Size: 512). Do you know if these are related?

System.Memory.Span.Reverse(Size: 512)

Result	Ratio	Operating System	Bit	Processor Name
Slower	0.74	debian 11	Arm64	Unknown processor
Slower	0.80	ubuntu 18.04	Arm64	Unknown processor
Slower	0.71	ubuntu 20.04	Arm64	Unknown processor
Slower	0.74	Windows 11	Arm64	Microsoft SQ1 3.0 GHz
Slower	0.89	macOS Monterey 12.3	Arm64	Apple M1 Max
Faster	11.72	Windows 10	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)
Faster	10.60	Windows 10	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Faster	11.56	Windows 10	X64	Intel Core i9-10900K CPU 3.70GHz
Faster	31.21	Windows 11	X64	AMD Ryzen 9 5900X
Faster	31.78	Windows 11	X64	AMD Ryzen 9 5950X
Faster	11.16	Windows 11	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Faster	12.09	Windows 11	X64	11th Gen Intel Core i9-11900H 2.50GHz
Faster	12.43	Windows 11	X64	Intel Core i9-9900T CPU 2.10GHz
Faster	9.05	ubuntu 18.04	X64	Intel Xeon CPU E5530 2.40GHz
Faster	7.64	ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Faster	12.95	ubuntu 20.04	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Faster	9.42	Windows 10	X86	Intel Core i7-6700 CPU 3.40GHz (Skylake)
Faster	12.58	macOS Big Sur 11.6.6	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)

System.Tests.Perf_Array.Reverse

Result	Ratio	Operating System	Bit	Processor Name
Slower	0.60	debian 11	Arm64	Unknown processor
Slower	0.68	ubuntu 18.04	Arm64	Unknown processor
Slower	0.58	ubuntu 20.04	Arm64	Unknown processor
Slower	0.60	Windows 11	Arm64	Microsoft SQ1 3.0 GHz
Slower	0.75	macOS Monterey 12.3	Arm64	Apple M1 Max
Faster	4.76	Windows 10	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)
Faster	3.20	Windows 10	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Faster	4.47	Windows 10	X64	Intel Core i9-10900K CPU 3.70GHz
Faster	2.93	Windows 11	X64	AMD Ryzen 9 5900X
Faster	5.13	Windows 11	X64	AMD Ryzen 9 5950X
Faster	3.39	Windows 11	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Faster	3.80	Windows 11	X64	11th Gen Intel Core i9-11900H 2.50GHz
Faster	3.46	Windows 11	X64	Intel Core i9-9900T CPU 2.10GHz
Faster	2.06	ubuntu 18.04	X64	Intel Xeon CPU E5530 2.40GHz
Faster	1.57	ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Faster	4.12	ubuntu 20.04	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Faster	3.99	Windows 10	X86	Intel Core i7-6700 CPU 3.40GHz (Skylake)
Faster	2.21	macOS Big Sur 11.6.6	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)

System.Memory.Span.Reverse(Size: 512)

Result	Ratio	Operating System	Bit	Processor Name
Slower	0.57	debian 11	Arm64	Unknown processor
Slower	0.67	ubuntu 18.04	Arm64	Unknown processor
Slower	0.58	ubuntu 20.04	Arm64	Unknown processor
Slower	0.60	Windows 11	Arm64	Microsoft SQ1 3.0 GHz
Slower	0.77	macOS Monterey 12.3	Arm64	Apple M1 Max
Faster	4.30	Windows 10	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)
Faster	3.64	Windows 10	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Faster	3.76	Windows 10	X64	Intel Core i9-10900K CPU 3.70GHz
Faster	3.41	Windows 11	X64	AMD Ryzen 9 5900X
Faster	4.02	Windows 11	X64	AMD Ryzen 9 5950X
Faster	4.05	Windows 11	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Faster	3.42	Windows 11	X64	11th Gen Intel Core i9-11900H 2.50GHz
Faster	3.99	Windows 11	X64	Intel Core i9-9900T CPU 2.10GHz
Faster	2.25	ubuntu 18.04	X64	Intel Xeon CPU E5530 2.40GHz
Faster	1.79	ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Faster	3.56	ubuntu 20.04	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Faster	4.00	Windows 10	X86	Intel Core i7-6700 CPU 3.40GHz (Skylake)
Faster	2.84	macOS Big Sur 11.6.6	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)

System.Memory.Span.Reverse(Size: 512)

Result	Ratio	Operating System	Bit	Processor Name
Slower	0.57	debian 11	Arm64	Unknown processor
Slower	0.67	ubuntu 18.04	Arm64	Unknown processor
Slower	0.58	ubuntu 20.04	Arm64	Unknown processor
Slower	0.59	Windows 11	Arm64	Microsoft SQ1 3.0 GHz
Slower	0.33	macOS Monterey 12.3	Arm64	Apple M1 Max
Faster	7.07	Windows 10	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)
Faster	6.52	Windows 10	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)
Faster	6.29	Windows 10	X64	Intel Core i9-10900K CPU 3.70GHz
Faster	11.22	Windows 11	X64	AMD Ryzen 9 5900X
Faster	15.82	Windows 11	X64	AMD Ryzen 9 5950X
Faster	6.58	Windows 11	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Faster	7.88	Windows 11	X64	11th Gen Intel Core i9-11900H 2.50GHz
Faster	6.65	Windows 11	X64	Intel Core i9-9900T CPU 2.10GHz
Faster	4.59	ubuntu 18.04	X64	Intel Xeon CPU E5530 2.40GHz
Faster	2.97	ubuntu 18.04	X64	Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Faster	7.16	ubuntu 20.04	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Faster	6.09	Windows 10	X86	Intel Core i7-6700 CPU 3.40GHz (Skylake)
Faster	7.24	macOS Big Sur 11.6.6	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)

AndyAyersMS · 2022-06-28T16:37:33Z

Improvements:

ghost added the community-contribution Indicates that the PR has been added by a community member label Jan 27, 2022

dotnet-issue-labeler bot added the area-System.Memory label Jan 27, 2022

danmoseley reviewed Jan 28, 2022

View reviewed changes

teo-tsirpanis reviewed Jan 28, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs Outdated Show resolved Hide resolved

stephentoub reviewed Jan 28, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/MemoryExtensions.cs Outdated Show resolved Hide resolved

alexcovington requested a review from stephentoub January 29, 2022 00:08

runfoapp bot mentioned this pull request Jan 31, 2022

STATUS_UNSUCCESSFUL in RsaCryptRoundtrip_OaepSHA1 #29683

Open

stephentoub reviewed Feb 1, 2022

View reviewed changes

alexcovington requested a review from stephentoub February 1, 2022 19:17

stephentoub reviewed Feb 7, 2022

View reviewed changes

stephentoub reviewed Feb 8, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.Byte.cs Outdated Show resolved Hide resolved

stephentoub reviewed Feb 8, 2022

View reviewed changes

alexcovington requested review from stephentoub and tannergooding February 8, 2022 16:58

runfoapp bot mentioned this pull request Feb 8, 2022

System.Net.NetworkInformation.Tests.PingTest.SendPingWithHostAndTimeoutAndBuffer failing on OSX #64963

Closed

deeprobin reviewed Mar 9, 2022

View reviewed changes

saucecontrol reviewed Mar 10, 2022

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/SpanHelpers.cs Outdated Show resolved Hide resolved

alexcovington requested review from jeffhandley and MichalStrehovsky as code owners March 18, 2022 22:53

alexcovington force-pushed the vectorize-span-reverse branch from 750ddcd to 9501a0f Compare March 18, 2022 23:02

MichalStrehovsky removed their request for review March 21, 2022 01:30

alexcovington force-pushed the vectorize-span-reverse branch from c663fa4 to 7ef62ab Compare April 25, 2022 15:59

jkotas reviewed Apr 25, 2022

View reviewed changes

alexcovington and others added 14 commits April 25, 2022 11:46

Adding vectorized path for Span<byte>.Reverse that uses SSSE3 and AVX…

904bb85

…2 where possible

Added vectorized paths for Span<T>.Reverse for primitive types that a…

088771f

…re the same size as char, int, or long that use AVX2 or SSSE3 where possible

Apply suggestions from code review

fd882e9

Co-authored-by: Theodore Tsirpanis <[email protected]>

Added vectorized paths for Span.Reverse to Array.Reverse. Added expli…

74deff7

…cit inlining and moved generic fallbacks into their own private methods.

Consolidate fall back case into single method, use one wrapper for bo…

2899789

…th Span.Reverse and Array.Reverse

Remove redundant AggressiveInlining, add AggressiveInlining to single…

a6c8101

… wrapper instead

Simplify method names, add comments

3aa7cf1

Just overload Reverse

86caa86

Use Unsafe.Subtract where it is semantically more intuitive, camelCas…

7c4cd52

…e for reverseMask variable

Camel case formatting, add condition check for Array.Reverse to avoid…

3c3f140

… reversing empty or single-element array, better shuffle for int and long Reverse using bit control mask instead of vector control mask

Rework loops to use new LoadUnsafe/StoreUnsafe vector APIs. Use Permu…

94f45cc

…te4x64 and PermuteVar8x32 for Int32 and Int64 respectively to reduce total operations.

Improve readability of code

39e906f

Fix formatting, fix typos in comments

1ed5bef

Use temporary variable for generic case instead for better IL

80ae8ab

alexcovington force-pushed the vectorize-span-reverse branch from 17702cd to 80ae8ab Compare April 25, 2022 18:52

adamsitnik approved these changes Apr 26, 2022

View reviewed changes

adamsitnik merged commit 8006e6a into dotnet:main Apr 26, 2022

adamsitnik added this to the 7.0.0 milestone Apr 26, 2022

AndyAyersMS mentioned this pull request Apr 28, 2022

Regressions in System.Tests.Perf_Array for ARM64 #68667

Closed

lewing mentioned this pull request May 3, 2022

[Perf] Changes at 4/26/2022 10:28:11 AM dotnet/perf-autofiling-issues#5014

Closed

This was referenced May 4, 2022

Generate better IL for (a, b) = (b, a) dotnet/roslyn#61127

Closed

Swapping with tuple deconstruction compiles to inferior IL code compared to using a temporary variable dotnet/roslyn#53300

Closed

AndyAyersMS mentioned this pull request May 20, 2022

Test failure Interop\\COM\\NETClients\\Primitives\\NETClientPrimitivesInALC\\NETClientPrimitivesInALC.cmd #68880

Closed

ghost locked as resolved and limited conversation to collaborators Jun 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vectorized paths for Span<T>.Reverse #64412

Add vectorized paths for Span<T>.Reverse #64412

alexcovington commented Jan 27, 2022

ghost commented Jan 27, 2022

stephentoub commented Jan 28, 2022

stephentoub commented Jan 28, 2022

alexcovington commented Jan 28, 2022

alexcovington commented Jan 28, 2022

alexcovington commented Jan 29, 2022

stephentoub Feb 1, 2022

alexcovington Feb 1, 2022

stephentoub Feb 8, 2022

alexcovington Feb 8, 2022

stephentoub left a comment

deeprobin left a comment

saucecontrol commented Mar 10, 2022

alexcovington commented Mar 18, 2022

jkotas Apr 25, 2022

alexcovington Apr 25, 2022

tannergooding Apr 25, 2022 •

edited

Loading

GSPP May 4, 2022

Neme12 May 4, 2022

Neme12 May 4, 2022 •

edited

Loading

Wraith2 May 4, 2022

adamsitnik commented Apr 25, 2022

alexcovington commented Apr 25, 2022

adamsitnik left a comment

dakersnar commented Jun 8, 2022 •

edited

Loading

System.Memory.Span.Reverse(Size: 512)

System.Tests.Perf_Array.Reverse

System.Memory.Span.Reverse(Size: 512)

System.Memory.Span.Reverse(Size: 512)

AndyAyersMS commented Jun 28, 2022 •

edited

Loading

Add vectorized paths for Span<T>.Reverse #64412

Add vectorized paths for Span<T>.Reverse #64412

Conversation

alexcovington commented Jan 27, 2022

ghost commented Jan 27, 2022

stephentoub commented Jan 28, 2022

stephentoub commented Jan 28, 2022

alexcovington commented Jan 28, 2022

alexcovington commented Jan 28, 2022

alexcovington commented Jan 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephentoub left a comment

Choose a reason for hiding this comment

deeprobin left a comment

Choose a reason for hiding this comment

saucecontrol commented Mar 10, 2022

alexcovington commented Mar 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Apr 25, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Neme12 May 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adamsitnik commented Apr 25, 2022

alexcovington commented Apr 25, 2022

adamsitnik left a comment

Choose a reason for hiding this comment

dakersnar commented Jun 8, 2022 • edited Loading

System.Memory.Span.Reverse(Size: 512)

System.Tests.Perf_Array.Reverse

System.Memory.Span.Reverse(Size: 512)

System.Memory.Span.Reverse(Size: 512)

AndyAyersMS commented Jun 28, 2022 • edited Loading

tannergooding Apr 25, 2022 •

edited

Loading

Neme12 May 4, 2022 •

edited

Loading

dakersnar commented Jun 8, 2022 •

edited

Loading

AndyAyersMS commented Jun 28, 2022 •

edited

Loading