Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating a few BitConverter APIs to be intrinsic #71567

Merged
merged 10 commits into from
Jul 7, 2022

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jul 1, 2022

This helps cleanup and improve codegen for this core method to help with a few generic math related scenarios.

It does so by importing BitConverter.DoubleToInt64Bits, BitConverter.Int32BitsToSingle, BitConverter.Int64BitsToDouble, and BitConverter.SingleToInt32Bits as intrinsic GT_BITCAST operations and adding minimal support (such as value numbering) for these operations in the front end.

@ghost ghost assigned tannergooding Jul 1, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 1, 2022
@ghost
Copy link

ghost commented Jul 1, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This helps cleanup and improve codegen for this core method to help with a few generic math related scenarios.

Author: tannergooding
Assignees: tannergooding
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member Author

tannergooding commented Jul 1, 2022

A local --pmi diff of the --frameworks shows:

Total bytes of base: 64517925
Total bytes of diff: 64515513
Total bytes of delta: -2412 (-0.00 % of base)
Total relative delta: -8.42
    diff is an improvement.
    relative diff is an improvement.


Top file regressions (bytes):
         348 : Microsoft.CodeAnalysis.dasm (0.02% of base)

Top file improvements (bytes):
       -1985 : System.Data.Common.dasm (-0.11% of base)
        -189 : System.Private.CoreLib.dasm (-0.00% of base)
        -131 : System.Runtime.Numerics.dasm (-0.11% of base)
         -90 : Microsoft.VisualBasic.Core.dasm (-0.02% of base)
         -62 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.00% of base)
         -54 : System.Private.Xml.dasm (-0.00% of base)
         -44 : System.Text.Json.dasm (-0.00% of base)
         -34 : System.Formats.Cbor.dasm (-0.07% of base)
         -32 : System.Drawing.Primitives.dasm (-0.08% of base)
         -26 : System.Data.Odbc.dasm (-0.01% of base)
         -24 : xunit.runner.utility.netcoreapp10.dasm (-0.01% of base)
         -24 : xunit.execution.dotnet.dasm (-0.01% of base)
         -16 : Newtonsoft.Json.Bson.dasm (-0.01% of base)
         -15 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.00% of base)
         -12 : System.Data.OleDb.dasm (-0.00% of base)
          -8 : Newtonsoft.Json.dasm (-0.00% of base)
          -7 : FSharp.Core.dasm (-0.00% of base)
          -4 : System.Net.Http.dasm (-0.00% of base)
          -2 : System.Private.DataContractSerialization.dasm (-0.00% of base)
          -1 : System.Drawing.Common.dasm (-0.00% of base)

21 total files with Code Size differences (20 improved, 1 regressed), 252 unchanged.

Top method regressions (bytes):
         223 (73.36% of base) : Microsoft.CodeAnalysis.dasm - ILBuilder:EmitDoubleConstant(double):this
         157 (12.38% of base) : Microsoft.CodeAnalysis.dasm - ILBuilder:EmitConstantValue(ConstantValue):this
         109 ( 3.91% of base) : System.Private.CoreLib.dasm - Utf8Parser:TryParse(ReadOnlySpan`1,byref,byref,ushort):bool (16 methods)
         104 ( 4.50% of base) : System.Private.CoreLib.dasm - Variant:MarshalHelperCastVariant(Object,int,byref)
          22 ( 5.66% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TimeHistogramController:AddSample(Histogram,StackSourceSample):this
          10 ( 0.29% of base) : System.Private.CoreLib.dasm - EventPipePayloadDecoder:DecodePayload(byref,ReadOnlySpan`1):ref
           6 ( 3.31% of base) : System.Private.Xml.dasm - FloatingDecimal:InitFromDouble(double):this
           6 ( 1.77% of base) : System.Private.DataContractSerialization.dasm - ValueHandle:ToSingle():float:this
           3 ( 4.62% of base) : System.Private.CoreLib.dasm - BinaryReader:ReadDouble():double:this
           3 ( 0.54% of base) : System.Drawing.Common.dasm - Font:.ctor(String,float,int,int,ubyte,bool):this
           3 ( 0.49% of base) : System.Private.CoreLib.dasm - Number:FormatDouble(byref,double,ReadOnlySpan`1,NumberFormatInfo):String
           2 ( 3.12% of base) : System.Private.CoreLib.dasm - BinaryReader:ReadSingle():float:this
           1 ( 1.39% of base) : System.Private.CoreLib.dasm - Single:System.Numerics.IFloatingPoint<System.Single>.TryWriteSignificandBigEndian(Span`1,byref):bool:this
           1 ( 1.43% of base) : System.Private.CoreLib.dasm - Single:System.Numerics.IFloatingPoint<System.Single>.TryWriteSignificandLittleEndian(Span`1,byref):bool:this

Top method improvements (bytes):
        -453 (-14.08% of base) : System.Data.Common.dasm - SqlSingleStorage:Aggregate(ref,int):Object:this
        -384 (-11.91% of base) : System.Data.Common.dasm - SqlDoubleStorage:Aggregate(ref,int):Object:this
        -155 (-5.41% of base) : System.Data.Common.dasm - SqlByteStorage:Aggregate(ref,int):Object:this
        -155 (-4.55% of base) : System.Data.Common.dasm - SqlDecimalStorage:Aggregate(ref,int):Object:this
        -155 (-5.44% of base) : System.Data.Common.dasm - SqlInt16Storage:Aggregate(ref,int):Object:this
        -155 (-5.45% of base) : System.Data.Common.dasm - SqlInt32Storage:Aggregate(ref,int):Object:this
        -155 (-4.79% of base) : System.Data.Common.dasm - SqlInt64Storage:Aggregate(ref,int):Object:this
        -155 (-4.50% of base) : System.Data.Common.dasm - SqlMoneyStorage:Aggregate(ref,int):Object:this
         -36 (-1.61% of base) : System.Data.Common.dasm - SqlConvert:ConvertToSqlDouble(Object):SqlDouble
         -36 (-1.72% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceGarbageCollector:Calculate():this
         -34 (-33.01% of base) : System.Private.CoreLib.dasm - UInt128:op_Explicit(double):UInt128
         -32 (-1.70% of base) : System.Data.Common.dasm - SqlConvert:ConvertToSqlSingle(Object):SqlSingle
         -27 (-0.92% of base) : System.Private.CoreLib.dasm - HillClimbing:Update(int,double,int):ValueTuple`2:this
         -25 (-7.25% of base) : System.Runtime.Numerics.dasm - Complex:MinMagnitude(Complex,Complex):Complex
         -25 (-7.25% of base) : System.Runtime.Numerics.dasm - Complex:System.Numerics.INumberBase<System.Numerics.Complex>.MinMagnitudeNumber(Complex,Complex):Complex
         -24 (-0.72% of base) : xunit.runner.utility.netcoreapp10.dasm - XunitSerializationInfo:Deserialize(Type,String):Object
         -24 (-0.67% of base) : xunit.execution.dotnet.dasm - XunitSerializationInfo:Deserialize(Type,String):Object
         -22 (-6.71% of base) : System.Runtime.Numerics.dasm - Complex:MaxMagnitude(Complex,Complex):Complex
         -22 (-6.71% of base) : System.Runtime.Numerics.dasm - Complex:System.Numerics.INumberBase<System.Numerics.Complex>.MaxMagnitudeNumber(Complex,Complex):Complex
         -21 (-2.05% of base) : System.Private.Xml.dasm - FloatingDecimal:AdjustDbl(double):double:this

Top method regressions (percentages):
         223 (73.36% of base) : Microsoft.CodeAnalysis.dasm - ILBuilder:EmitDoubleConstant(double):this
         157 (12.38% of base) : Microsoft.CodeAnalysis.dasm - ILBuilder:EmitConstantValue(ConstantValue):this
          22 ( 5.66% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TimeHistogramController:AddSample(Histogram,StackSourceSample):this
           3 ( 4.62% of base) : System.Private.CoreLib.dasm - BinaryReader:ReadDouble():double:this
         104 ( 4.50% of base) : System.Private.CoreLib.dasm - Variant:MarshalHelperCastVariant(Object,int,byref)
         109 ( 3.91% of base) : System.Private.CoreLib.dasm - Utf8Parser:TryParse(ReadOnlySpan`1,byref,byref,ushort):bool (16 methods)
           6 ( 3.31% of base) : System.Private.Xml.dasm - FloatingDecimal:InitFromDouble(double):this
           2 ( 3.12% of base) : System.Private.CoreLib.dasm - BinaryReader:ReadSingle():float:this
           6 ( 1.77% of base) : System.Private.DataContractSerialization.dasm - ValueHandle:ToSingle():float:this
           1 ( 1.43% of base) : System.Private.CoreLib.dasm - Single:System.Numerics.IFloatingPoint<System.Single>.TryWriteSignificandLittleEndian(Span`1,byref):bool:this
           1 ( 1.39% of base) : System.Private.CoreLib.dasm - Single:System.Numerics.IFloatingPoint<System.Single>.TryWriteSignificandBigEndian(Span`1,byref):bool:this
           3 ( 0.54% of base) : System.Drawing.Common.dasm - Font:.ctor(String,float,int,int,ubyte,bool):this
           3 ( 0.49% of base) : System.Private.CoreLib.dasm - Number:FormatDouble(byref,double,ReadOnlySpan`1,NumberFormatInfo):String
          10 ( 0.29% of base) : System.Private.CoreLib.dasm - EventPipePayloadDecoder:DecodePayload(byref,ReadOnlySpan`1):ref

Top method improvements (percentages):
         -10 (-76.92% of base) : Microsoft.CodeAnalysis.dasm - DoubleFloatingPointType:get_Zero():long:this
         -10 (-41.67% of base) : Newtonsoft.Json.Bson.dasm - AsyncBinaryWriter:WriteAsync(double,CancellationToken):Task:this
         -10 (-41.67% of base) : System.Data.OleDb.dasm - DbBuffer:WriteDouble(int,double):this
         -10 (-41.67% of base) : System.Data.Odbc.dasm - DbBuffer:WriteDouble(int,double):this
         -10 (-41.67% of base) : System.Private.CoreLib.dasm - UnmanagedMemoryAccessor:Write(long,double):this
         -10 (-41.67% of base) : System.Private.CoreLib.dasm - UnmanagedMemoryAccessor:Write(long,float):this
          -6 (-35.29% of base) : Microsoft.CodeAnalysis.dasm - DoubleFloatingPointType:get_Infinity():long:this
         -34 (-33.01% of base) : System.Private.CoreLib.dasm - UInt128:op_Explicit(double):UInt128
        -453 (-14.08% of base) : System.Data.Common.dasm - SqlSingleStorage:Aggregate(ref,int):Object:this
        -384 (-11.91% of base) : System.Data.Common.dasm - SqlDoubleStorage:Aggregate(ref,int):Object:this
          -4 (-8.51% of base) : System.Private.CoreLib.dasm - Single:IsInteger(float):bool
         -16 (-8.42% of base) : Microsoft.VisualBasic.Core.dasm - ObjectType:DivSingle(float,float):Object
         -16 (-8.42% of base) : Microsoft.VisualBasic.Core.dasm - Operators:DivideSingle(float,float):Object
         -25 (-7.25% of base) : System.Runtime.Numerics.dasm - Complex:MinMagnitude(Complex,Complex):Complex
         -25 (-7.25% of base) : System.Runtime.Numerics.dasm - Complex:System.Numerics.INumberBase<System.Numerics.Complex>.MinMagnitudeNumber(Complex,Complex):Complex
         -12 (-7.10% of base) : System.Private.CoreLib.dasm - Single:Hypot(float,float):float
          -4 (-6.90% of base) : System.Private.CoreLib.dasm - Single:IsPow2(float):bool
         -22 (-6.71% of base) : System.Runtime.Numerics.dasm - Complex:MaxMagnitude(Complex,Complex):Complex
         -22 (-6.71% of base) : System.Runtime.Numerics.dasm - Complex:System.Numerics.INumberBase<System.Numerics.Complex>.MaxMagnitudeNumber(Complex,Complex):Complex
          -4 (-6.67% of base) : System.Private.CoreLib.dasm - Math:Max(float,float):float

214 total methods with Code Size differences (200 improved, 14 regressed), 279395 unchanged.

The majority of wins are simply case like:

-       vmovaps  xmm1, xmm7
-       vmovd    rcx, xmm1
+       vmovd    rcx, xmm7

There are many cases where we do less inlining and therefore create less local variable assignments and such (this is one of the smaller diffs, others have 6 or more local variables removed):

-; 0 inlinees with PGO data; 2 single block inlinees; 1 inlinees without PGO data
+; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
 ; Final local variable assignments
 ;
 ;  V00 this         [V00,T00] (  3,  3   )   byref  ->  rcx         this single-def
 ;* V01 loc0         [V01    ] (  0,  0   )     int  ->  zero-ref   
 ;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
-;* V03 tmp1         [V03    ] (  0,  0   )   float  ->  zero-ref    "Inlining Arg"
-;* V04 tmp2         [V04    ] (  0,  0   )     int  ->  zero-ref    "Inline return value spill temp"
-;* V05 tmp3         [V05    ] (  0,  0   )   float  ->  zero-ref    ld-addr-op "Inlining Arg"
-;* V06 tmp4         [V06    ] (  0,  0   )  simd16  ->  zero-ref    "Inline stloc first use temp"
 ;
 ; Lcl frame size = 0

In user and test code there are also places where Foo(dblCns) can have improved codegen since getting the backing bits of a CNS_DBL can itself return a GT_CNS_INT.

Regressions look to largely be cases where we optimize more/differently such as by unrolling or cloning loops. There does look to be one case where we might be missing a containment check and I'll see if I can fix that.

Copy link
Contributor

@SingleAccretion SingleAccretion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VN support needs a bit of restructuring.

src/coreclr/jit/valuenum.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/valuenum.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/valuenum.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/valuenum.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/valuenum.h Outdated Show resolved Hide resolved
src/coreclr/jit/valuenum.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/codegenxarch.cpp Outdated Show resolved Hide resolved
@gfoidl
Copy link
Member

gfoidl commented Jul 2, 2022

Will this fix #11413?

@SingleAccretion
Copy link
Contributor

Will this fix #11413?

Not by itself, but after we enable fully bitcasts in the frontend, #11413 will become trivial.

return *((long*)&value);
}
[Intrinsic]
public static unsafe long DoubleToInt64Bits(double value) => *((long*)&value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the removal of the SSE2 path have any negative impact when using mono?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly for Mono JIT. Mono LLVM correctly handles this as a bitcast already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mono JIT doesn't support those at all. LLVM will be 100% fine without it 🙂

@stephentoub
Copy link
Member

I'm curious about the general approach of making these intrinsics. Is this just working around other JIT limitations today?

src/coreclr/jit/gentree.h Outdated Show resolved Hide resolved
src/coreclr/jit/valuenum.cpp Outdated Show resolved Hide resolved
tannergooding and others added 2 commits July 2, 2022 11:11
@tannergooding
Copy link
Member Author

I'm curious about the general approach of making these intrinsics. Is this just working around other JIT limitations today?

In this case, it is working around a JIT limitation in recognizing, dealing with, and optimizing the "trivial" pattern of *(T*)&data to "reinterpret cast" the bits. Namely it avoids the data being marked "address taken", it avoids the inliner needing to kick in, and it avoids the need for forward substitution to help fixup values passed in. However, this also ends up putting in place some of the front-end support in the JIT around GT_BITCAST so that, in the future, we can properly recognize the "trivial pattern" and do the "right thing" even if devs aren't using this API.

However, there are many cases, even in C/C++ and other native compilers where intrinsics are provided even where some general pattern is recognized and supported. Even in the scenario where this was being handled by other intrinsics, the diff I linked above shows examples of where this introduced a larger number of locals and forced the JIT to do "more work" to accomplish the same thing. This means more time spent compiling, more risk that we run into the locals, CSE, and other limits in the JIT, and more. So having this "core" function be intrinsic anyways helps the JIT even where it otherwise handled the existing pattern well.

-- This being "core" as its a piece of functionality many runtimes and libraries provide and getting the underlying bits for a floating-point value being "central" to implementing and dealing with floating-point types compliantly. These ones being intrinsic means that we can get better codegen and throughput for most float and double APIs without needing to specially handle all the individual APIs.

@tannergooding
Copy link
Member Author

This should be ready for review now. Will post updated SPMI diffs after CI completes

@tannergooding
Copy link
Member Author

SPMI says 0 diffs. Manually running jit-diff.exe diff --diff --pmi --frameworks gives #71567 (comment)

Total bytes of base: 64517925
Total bytes of diff: 64515513
Total bytes of delta: -2412 (-0.00 % of base)
Total relative delta: -8.42
    diff is an improvement.
    relative diff is an improvement.

@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib for review

@tannergooding
Copy link
Member Author

Failure is the #71684 that was already resolved

@tannergooding tannergooding merged commit 79d5e09 into dotnet:main Jul 7, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Aug 7, 2022
@tannergooding tannergooding deleted the float-to-bits branch November 11, 2022 15:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants