Updating a few BitConverter APIs to be intrinsic #71567

tannergooding · 2022-07-01T22:14:02Z

This helps cleanup and improve codegen for this core method to help with a few generic math related scenarios.

It does so by importing BitConverter.DoubleToInt64Bits, BitConverter.Int32BitsToSingle, BitConverter.Int64BitsToDouble, and BitConverter.SingleToInt32Bits as intrinsic GT_BITCAST operations and adding minimal support (such as value numbering) for these operations in the front end.

ghost · 2022-07-01T22:14:17Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This helps cleanup and improve codegen for this core method to help with a few generic math related scenarios.

Author:	tannergooding
Assignees:	tannergooding
Labels:	`area-CodeGen-coreclr`
Milestone:	-

tannergooding · 2022-07-01T22:24:30Z

A local --pmi diff of the --frameworks shows:

Total bytes of base: 64517925
Total bytes of diff: 64515513
Total bytes of delta: -2412 (-0.00 % of base)
Total relative delta: -8.42
    diff is an improvement.
    relative diff is an improvement.


Top file regressions (bytes):
         348 : Microsoft.CodeAnalysis.dasm (0.02% of base)

Top file improvements (bytes):
       -1985 : System.Data.Common.dasm (-0.11% of base)
        -189 : System.Private.CoreLib.dasm (-0.00% of base)
        -131 : System.Runtime.Numerics.dasm (-0.11% of base)
         -90 : Microsoft.VisualBasic.Core.dasm (-0.02% of base)
         -62 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.00% of base)
         -54 : System.Private.Xml.dasm (-0.00% of base)
         -44 : System.Text.Json.dasm (-0.00% of base)
         -34 : System.Formats.Cbor.dasm (-0.07% of base)
         -32 : System.Drawing.Primitives.dasm (-0.08% of base)
         -26 : System.Data.Odbc.dasm (-0.01% of base)
         -24 : xunit.runner.utility.netcoreapp10.dasm (-0.01% of base)
         -24 : xunit.execution.dotnet.dasm (-0.01% of base)
         -16 : Newtonsoft.Json.Bson.dasm (-0.01% of base)
         -15 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.00% of base)
         -12 : System.Data.OleDb.dasm (-0.00% of base)
          -8 : Newtonsoft.Json.dasm (-0.00% of base)
          -7 : FSharp.Core.dasm (-0.00% of base)
          -4 : System.Net.Http.dasm (-0.00% of base)
          -2 : System.Private.DataContractSerialization.dasm (-0.00% of base)
          -1 : System.Drawing.Common.dasm (-0.00% of base)

21 total files with Code Size differences (20 improved, 1 regressed), 252 unchanged.

Top method regressions (bytes):
         223 (73.36% of base) : Microsoft.CodeAnalysis.dasm - ILBuilder:EmitDoubleConstant(double):this
         157 (12.38% of base) : Microsoft.CodeAnalysis.dasm - ILBuilder:EmitConstantValue(ConstantValue):this
         109 ( 3.91% of base) : System.Private.CoreLib.dasm - Utf8Parser:TryParse(ReadOnlySpan`1,byref,byref,ushort):bool (16 methods)
         104 ( 4.50% of base) : System.Private.CoreLib.dasm - Variant:MarshalHelperCastVariant(Object,int,byref)
          22 ( 5.66% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TimeHistogramController:AddSample(Histogram,StackSourceSample):this
          10 ( 0.29% of base) : System.Private.CoreLib.dasm - EventPipePayloadDecoder:DecodePayload(byref,ReadOnlySpan`1):ref
           6 ( 3.31% of base) : System.Private.Xml.dasm - FloatingDecimal:InitFromDouble(double):this
           6 ( 1.77% of base) : System.Private.DataContractSerialization.dasm - ValueHandle:ToSingle():float:this
           3 ( 4.62% of base) : System.Private.CoreLib.dasm - BinaryReader:ReadDouble():double:this
           3 ( 0.54% of base) : System.Drawing.Common.dasm - Font:.ctor(String,float,int,int,ubyte,bool):this
           3 ( 0.49% of base) : System.Private.CoreLib.dasm - Number:FormatDouble(byref,double,ReadOnlySpan`1,NumberFormatInfo):String
           2 ( 3.12% of base) : System.Private.CoreLib.dasm - BinaryReader:ReadSingle():float:this
           1 ( 1.39% of base) : System.Private.CoreLib.dasm - Single:System.Numerics.IFloatingPoint<System.Single>.TryWriteSignificandBigEndian(Span`1,byref):bool:this
           1 ( 1.43% of base) : System.Private.CoreLib.dasm - Single:System.Numerics.IFloatingPoint<System.Single>.TryWriteSignificandLittleEndian(Span`1,byref):bool:this

Top method improvements (bytes):
        -453 (-14.08% of base) : System.Data.Common.dasm - SqlSingleStorage:Aggregate(ref,int):Object:this
        -384 (-11.91% of base) : System.Data.Common.dasm - SqlDoubleStorage:Aggregate(ref,int):Object:this
        -155 (-5.41% of base) : System.Data.Common.dasm - SqlByteStorage:Aggregate(ref,int):Object:this
        -155 (-4.55% of base) : System.Data.Common.dasm - SqlDecimalStorage:Aggregate(ref,int):Object:this
        -155 (-5.44% of base) : System.Data.Common.dasm - SqlInt16Storage:Aggregate(ref,int):Object:this
        -155 (-5.45% of base) : System.Data.Common.dasm - SqlInt32Storage:Aggregate(ref,int):Object:this
        -155 (-4.79% of base) : System.Data.Common.dasm - SqlInt64Storage:Aggregate(ref,int):Object:this
        -155 (-4.50% of base) : System.Data.Common.dasm - SqlMoneyStorage:Aggregate(ref,int):Object:this
         -36 (-1.61% of base) : System.Data.Common.dasm - SqlConvert:ConvertToSqlDouble(Object):SqlDouble
         -36 (-1.72% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceGarbageCollector:Calculate():this
         -34 (-33.01% of base) : System.Private.CoreLib.dasm - UInt128:op_Explicit(double):UInt128
         -32 (-1.70% of base) : System.Data.Common.dasm - SqlConvert:ConvertToSqlSingle(Object):SqlSingle
         -27 (-0.92% of base) : System.Private.CoreLib.dasm - HillClimbing:Update(int,double,int):ValueTuple`2:this
         -25 (-7.25% of base) : System.Runtime.Numerics.dasm - Complex:MinMagnitude(Complex,Complex):Complex
         -25 (-7.25% of base) : System.Runtime.Numerics.dasm - Complex:System.Numerics.INumberBase<System.Numerics.Complex>.MinMagnitudeNumber(Complex,Complex):Complex
         -24 (-0.72% of base) : xunit.runner.utility.netcoreapp10.dasm - XunitSerializationInfo:Deserialize(Type,String):Object
         -24 (-0.67% of base) : xunit.execution.dotnet.dasm - XunitSerializationInfo:Deserialize(Type,String):Object
         -22 (-6.71% of base) : System.Runtime.Numerics.dasm - Complex:MaxMagnitude(Complex,Complex):Complex
         -22 (-6.71% of base) : System.Runtime.Numerics.dasm - Complex:System.Numerics.INumberBase<System.Numerics.Complex>.MaxMagnitudeNumber(Complex,Complex):Complex
         -21 (-2.05% of base) : System.Private.Xml.dasm - FloatingDecimal:AdjustDbl(double):double:this

Top method regressions (percentages):
         223 (73.36% of base) : Microsoft.CodeAnalysis.dasm - ILBuilder:EmitDoubleConstant(double):this
         157 (12.38% of base) : Microsoft.CodeAnalysis.dasm - ILBuilder:EmitConstantValue(ConstantValue):this
          22 ( 5.66% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TimeHistogramController:AddSample(Histogram,StackSourceSample):this
           3 ( 4.62% of base) : System.Private.CoreLib.dasm - BinaryReader:ReadDouble():double:this
         104 ( 4.50% of base) : System.Private.CoreLib.dasm - Variant:MarshalHelperCastVariant(Object,int,byref)
         109 ( 3.91% of base) : System.Private.CoreLib.dasm - Utf8Parser:TryParse(ReadOnlySpan`1,byref,byref,ushort):bool (16 methods)
           6 ( 3.31% of base) : System.Private.Xml.dasm - FloatingDecimal:InitFromDouble(double):this
           2 ( 3.12% of base) : System.Private.CoreLib.dasm - BinaryReader:ReadSingle():float:this
           6 ( 1.77% of base) : System.Private.DataContractSerialization.dasm - ValueHandle:ToSingle():float:this
           1 ( 1.43% of base) : System.Private.CoreLib.dasm - Single:System.Numerics.IFloatingPoint<System.Single>.TryWriteSignificandLittleEndian(Span`1,byref):bool:this
           1 ( 1.39% of base) : System.Private.CoreLib.dasm - Single:System.Numerics.IFloatingPoint<System.Single>.TryWriteSignificandBigEndian(Span`1,byref):bool:this
           3 ( 0.54% of base) : System.Drawing.Common.dasm - Font:.ctor(String,float,int,int,ubyte,bool):this
           3 ( 0.49% of base) : System.Private.CoreLib.dasm - Number:FormatDouble(byref,double,ReadOnlySpan`1,NumberFormatInfo):String
          10 ( 0.29% of base) : System.Private.CoreLib.dasm - EventPipePayloadDecoder:DecodePayload(byref,ReadOnlySpan`1):ref

Top method improvements (percentages):
         -10 (-76.92% of base) : Microsoft.CodeAnalysis.dasm - DoubleFloatingPointType:get_Zero():long:this
         -10 (-41.67% of base) : Newtonsoft.Json.Bson.dasm - AsyncBinaryWriter:WriteAsync(double,CancellationToken):Task:this
         -10 (-41.67% of base) : System.Data.OleDb.dasm - DbBuffer:WriteDouble(int,double):this
         -10 (-41.67% of base) : System.Data.Odbc.dasm - DbBuffer:WriteDouble(int,double):this
         -10 (-41.67% of base) : System.Private.CoreLib.dasm - UnmanagedMemoryAccessor:Write(long,double):this
         -10 (-41.67% of base) : System.Private.CoreLib.dasm - UnmanagedMemoryAccessor:Write(long,float):this
          -6 (-35.29% of base) : Microsoft.CodeAnalysis.dasm - DoubleFloatingPointType:get_Infinity():long:this
         -34 (-33.01% of base) : System.Private.CoreLib.dasm - UInt128:op_Explicit(double):UInt128
        -453 (-14.08% of base) : System.Data.Common.dasm - SqlSingleStorage:Aggregate(ref,int):Object:this
        -384 (-11.91% of base) : System.Data.Common.dasm - SqlDoubleStorage:Aggregate(ref,int):Object:this
          -4 (-8.51% of base) : System.Private.CoreLib.dasm - Single:IsInteger(float):bool
         -16 (-8.42% of base) : Microsoft.VisualBasic.Core.dasm - ObjectType:DivSingle(float,float):Object
         -16 (-8.42% of base) : Microsoft.VisualBasic.Core.dasm - Operators:DivideSingle(float,float):Object
         -25 (-7.25% of base) : System.Runtime.Numerics.dasm - Complex:MinMagnitude(Complex,Complex):Complex
         -25 (-7.25% of base) : System.Runtime.Numerics.dasm - Complex:System.Numerics.INumberBase<System.Numerics.Complex>.MinMagnitudeNumber(Complex,Complex):Complex
         -12 (-7.10% of base) : System.Private.CoreLib.dasm - Single:Hypot(float,float):float
          -4 (-6.90% of base) : System.Private.CoreLib.dasm - Single:IsPow2(float):bool
         -22 (-6.71% of base) : System.Runtime.Numerics.dasm - Complex:MaxMagnitude(Complex,Complex):Complex
         -22 (-6.71% of base) : System.Runtime.Numerics.dasm - Complex:System.Numerics.INumberBase<System.Numerics.Complex>.MaxMagnitudeNumber(Complex,Complex):Complex
          -4 (-6.67% of base) : System.Private.CoreLib.dasm - Math:Max(float,float):float

214 total methods with Code Size differences (200 improved, 14 regressed), 279395 unchanged.

The majority of wins are simply case like:

-       vmovaps  xmm1, xmm7
-       vmovd    rcx, xmm1
+       vmovd    rcx, xmm7

There are many cases where we do less inlining and therefore create less local variable assignments and such (this is one of the smaller diffs, others have 6 or more local variables removed):

-; 0 inlinees with PGO data; 2 single block inlinees; 1 inlinees without PGO data
+; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T00] (  3,  3   )   byref  ->  rcx         this single-def
 ;* V01 loc0         [V01    ] (  0,  0   )     int  ->  zero-ref   
 ;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
-;* V03 tmp1         [V03    ] (  0,  0   )   float  ->  zero-ref    "Inlining Arg"
-;* V04 tmp2         [V04    ] (  0,  0   )     int  ->  zero-ref    "Inline return value spill temp"
-;* V05 tmp3         [V05    ] (  0,  0   )   float  ->  zero-ref    ld-addr-op "Inlining Arg"
-;* V06 tmp4         [V06    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
 ;
 ; Lcl frame size = 0

In user and test code there are also places where Foo(dblCns) can have improved codegen since getting the backing bits of a CNS_DBL can itself return a GT_CNS_INT.

Regressions look to largely be cases where we optimize more/differently such as by unrolling or cloning loops. There does look to be one case where we might be missing a containment check and I'll see if I can fix that.

SingleAccretion

VN support needs a bit of restructuring.

src/coreclr/jit/valuenum.cpp

src/coreclr/jit/valuenum.h

src/coreclr/jit/valuenum.cpp

src/coreclr/jit/importer.cpp

src/coreclr/jit/codegenxarch.cpp

gfoidl · 2022-07-02T14:15:57Z

Will this fix #11413?

SingleAccretion · 2022-07-02T14:19:12Z

Will this fix #11413?

Not by itself, but after we enable fully bitcasts in the frontend, #11413 will become trivial.

stephentoub · 2022-07-02T17:03:12Z

src/libraries/System.Private.CoreLib/src/System/BitConverter.cs

-            return *((long*)&value);
-        }
+        [Intrinsic]
+        public static unsafe long DoubleToInt64Bits(double value) => *((long*)&value);


Does the removal of the SSE2 path have any negative impact when using mono?

Possibly for Mono JIT. Mono LLVM correctly handles this as a bitcast already.

Mono JIT doesn't support those at all. LLVM will be 100% fine without it 🙂

stephentoub · 2022-07-02T17:07:48Z

I'm curious about the general approach of making these intrinsics. Is this just working around other JIT limitations today?

src/coreclr/jit/gentree.h

src/coreclr/jit/valuenum.cpp

Co-authored-by: SingleAccretion <[email protected]>

tannergooding · 2022-07-02T18:14:04Z

I'm curious about the general approach of making these intrinsics. Is this just working around other JIT limitations today?

In this case, it is working around a JIT limitation in recognizing, dealing with, and optimizing the "trivial" pattern of *(T*)&data to "reinterpret cast" the bits. Namely it avoids the data being marked "address taken", it avoids the inliner needing to kick in, and it avoids the need for forward substitution to help fixup values passed in. However, this also ends up putting in place some of the front-end support in the JIT around GT_BITCAST so that, in the future, we can properly recognize the "trivial pattern" and do the "right thing" even if devs aren't using this API.

However, there are many cases, even in C/C++ and other native compilers where intrinsics are provided even where some general pattern is recognized and supported. Even in the scenario where this was being handled by other intrinsics, the diff I linked above shows examples of where this introduced a larger number of locals and forced the JIT to do "more work" to accomplish the same thing. This means more time spent compiling, more risk that we run into the locals, CSE, and other limits in the JIT, and more. So having this "core" function be intrinsic anyways helps the JIT even where it otherwise handled the existing pattern well.

-- This being "core" as its a piece of functionality many runtimes and libraries provide and getting the underlying bits for a floating-point value being "central" to implementing and dealing with floating-point types compliantly. These ones being intrinsic means that we can get better codegen and throughput for most float and double APIs without needing to specially handle all the individual APIs.

…en appropriate

… normal value now

tannergooding · 2022-07-05T21:14:39Z

This should be ready for review now. Will post updated SPMI diffs after CI completes

tannergooding · 2022-07-06T16:21:07Z

SPMI says 0 diffs. Manually running jit-diff.exe diff --diff --pmi --frameworks gives #71567 (comment)

Total bytes of base: 64517925
Total bytes of diff: 64515513
Total bytes of delta: -2412 (-0.00 % of base)
Total relative delta: -8.42
    diff is an improvement.
    relative diff is an improvement.

tannergooding · 2022-07-06T18:50:03Z

CC. @dotnet/jit-contrib for review

tannergooding · 2022-07-07T19:47:12Z

Failure is the #71684 that was already resolved

Updating a few BitConverter APIs to be intrinsic

75aa87b

ghost assigned tannergooding Jul 1, 2022

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 1, 2022

huoyaoyuan mentioned this pull request Jul 2, 2022

Redo minor cleanups in System.Runtime.Numerics #71274

Merged

SingleAccretion reviewed Jul 2, 2022

View reviewed changes

stephentoub reviewed Jul 2, 2022

View reviewed changes

Reacting to PR feedback

143cfe1

SingleAccretion reviewed Jul 2, 2022

View reviewed changes

src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved

src/coreclr/jit/valuenum.cpp Outdated Show resolved Hide resolved

tannergooding and others added 2 commits July 2, 2022 11:11

Update src/coreclr/jit/valuenum.cpp

8e73a11

Co-authored-by: SingleAccretion <[email protected]>

Update src/coreclr/jit/gentree.h

d688376

Co-authored-by: SingleAccretion <[email protected]>

Apply formatting patch

d1743d9

tannergooding added 4 commits July 5, 2022 08:07

Merge remote-tracking branch 'dotnet/main' into float-to-bits

1cbd4bd

Ensure DoubleToInt64Bits and Int64BitsToDouble intrinsics only pop wh…

01059f8

…en appropriate

Merge remote-tracking branch 'dotnet/main' into float-to-bits

58596b7

Fix VNForBitCast to not handle exception set logic since its always a…

5d1ce7e

… normal value now

Applying formatting patch

cec314d

runfoapp bot mentioned this pull request Jul 6, 2022

system.text.regularexpressions.tests Failing on ARM64 linux #71722

Closed

jakobbotsch approved these changes Jul 7, 2022

View reviewed changes

EgorBo approved these changes Jul 7, 2022

View reviewed changes

tannergooding merged commit 79d5e09 into dotnet:main Jul 7, 2022

This was referenced Jul 8, 2022

IsMultiRegNode on ARM assumes all BITCASTs are multi-reg #71831

Closed

Delete the zero-offset field sequence map #71455

Merged

tannergooding mentioned this pull request Jul 8, 2022

Updating lclmorph to support bitcast #71839

Closed

ghost locked as resolved and limited conversation to collaborators Aug 7, 2022

tannergooding deleted the float-to-bits branch November 11, 2022 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating a few BitConverter APIs to be intrinsic #71567

Updating a few BitConverter APIs to be intrinsic #71567

tannergooding commented Jul 1, 2022 •

edited

Loading

ghost commented Jul 1, 2022

tannergooding commented Jul 1, 2022 •

edited

Loading

SingleAccretion left a comment

gfoidl commented Jul 2, 2022

SingleAccretion commented Jul 2, 2022

stephentoub Jul 2, 2022

tannergooding Jul 2, 2022

EgorBo Jul 6, 2022

stephentoub commented Jul 2, 2022

tannergooding commented Jul 2, 2022

tannergooding commented Jul 5, 2022

tannergooding commented Jul 6, 2022

tannergooding commented Jul 6, 2022

tannergooding commented Jul 7, 2022

Updating a few BitConverter APIs to be intrinsic #71567

Updating a few BitConverter APIs to be intrinsic #71567

Conversation

tannergooding commented Jul 1, 2022 • edited Loading

ghost commented Jul 1, 2022

tannergooding commented Jul 1, 2022 • edited Loading

SingleAccretion left a comment

Choose a reason for hiding this comment

gfoidl commented Jul 2, 2022

SingleAccretion commented Jul 2, 2022

stephentoub Jul 2, 2022

Choose a reason for hiding this comment

tannergooding Jul 2, 2022

Choose a reason for hiding this comment

EgorBo Jul 6, 2022

Choose a reason for hiding this comment

stephentoub commented Jul 2, 2022

tannergooding commented Jul 2, 2022

tannergooding commented Jul 5, 2022

tannergooding commented Jul 6, 2022

tannergooding commented Jul 6, 2022

tannergooding commented Jul 7, 2022

tannergooding commented Jul 1, 2022 •

edited

Loading

tannergooding commented Jul 1, 2022 •

edited

Loading