Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the JIT to track Span.Length and ReadOnlySpan.Length as "never negative" #81055

Merged
merged 9 commits into from
Jan 26, 2023

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jan 23, 2023

This resolves #59922

This doesn't do extensive propagation or anything and primarily just focuses on the "normal" case where the imported GT_FIELD is promoted to a GT_LCL_VAR.

There is room to potentially track this metadata further through FIELD_ADDR, LCL_VAR_ADDR, LCL_FLD, and LCL_FLD_ADDR. There is likewise potential to make this "attribute" driven. For example, it would be possible to effectively create an internal IsNeverNegative attribute and put it on things like List<T>.Length or similar and make this a bigger feature.

It's worth calling out that it is possible for a user to use unsafe code to modify a Span<T>/ROSpan<T>'s length field to be negative. However, much like with mutating a static readonly field or using reflection/unsafe to modify other things considered "immutable" we can consider such things as strictly "undefined behavior".

Much like with GT_ARR_LEN this doesn't cover all cases of new locals getting created and it only participates where IsNeverNegative was already being used. We have various issues covering codegen improvements related to IntegralRange. Once this is in, future improvements to utilize IsNeverNegative instead of purely relying on base type or GTF_UNSIGNED will implicitly light up for all of T[], Span<T>, and ROSpan<T>.

@ghost ghost assigned tannergooding Jan 23, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 23, 2023
@ghost
Copy link

ghost commented Jan 23, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

This doesn't do extensive propagation or anything and primarily just focuses on the "normal" case where the imported GT_FIELD is promoted to a GT_LCL_VAR.

There is room to potentially track this metadata further through FIELD_ADDR, LCL_VAR_ADDR, LCL_FLD, and LCL_FLD_ADDR. There is likewise potential to make this "attribute" driven. For example, it would be possible to effectively create an internal IsNeverNegative attribute and put it on things like List<T>.Length or similar and make this a bigger feature.

Author: tannergooding
Assignees: tannergooding
Labels:

area-CodeGen-coreclr

Milestone: -

@tannergooding
Copy link
Member Author

tannergooding commented Jan 23, 2023

The local diff for jit-diff --pmi --frameworks is:

Total bytes of base: 59681274
Total bytes of diff: 59680774
Total bytes of delta: -500 (-0.00 % of base)
Total relative delta: NaN
    diff is an improvement.
    relative diff is a regression.


Top file improvements (bytes):
        -118 : System.Private.CoreLib.dasm (-0.00% of base)
         -64 : System.Numerics.Tensors.dasm (-0.02% of base)
         -41 : System.Security.Cryptography.dasm (-0.00% of base)
         -38 : System.Memory.dasm (-0.01% of base)
         -28 : System.Private.Xml.dasm (-0.00% of base)
         -24 : System.Console.dasm (-0.04% of base)
         -21 : System.Net.Http.dasm (-0.00% of base)
         -19 : System.Net.NetworkInformation.dasm (-0.05% of base)
         -19 : System.Security.Cryptography.Pkcs.dasm (-0.00% of base)
         -18 : System.Text.Json.dasm (-0.00% of base)
         -17 : System.Runtime.Numerics.dasm (-0.01% of base)
         -14 : System.IO.Hashing.dasm (-0.04% of base)
         -13 : System.Net.Quic.dasm (-0.01% of base)
         -11 : System.Net.WebSockets.dasm (-0.01% of base)
         -10 : System.Text.RegularExpressions.dasm (-0.00% of base)
          -9 : System.Data.Common.dasm (-0.00% of base)
          -8 : System.Formats.Asn1.dasm (-0.01% of base)
          -5 : System.Private.DataContractSerialization.dasm (-0.00% of base)
          -4 : System.Private.Xml.Linq.dasm (-0.00% of base)
          -3 : Newtonsoft.Json.dasm (-0.00% of base)

32 total files with Code Size differences (32 improved, 0 regressed), 243 unchanged.

Top method regressions (bytes):
           4 ( 1.00% of base) : System.Private.CoreLib.dasm - System.Threading.WaitHandle:WaitAnyMultiple(System.ReadOnlySpan`1[Microsoft.Win32.SafeHandles.SafeWaitHandle],int):int
           1 ( 0.18% of base) : System.Text.Json.dasm - System.Text.Json.JsonDocument:TextEquals(int,System.ReadOnlySpan`1[ubyte],bool,bool):bool:this

Top method improvements (bytes):
         -19 (-2.77% of base) : System.Net.NetworkInformation.dasm - System.Net.NetworkInformation.PhysicalAddress:TryParse(System.ReadOnlySpan`1[ushort],byref):bool
         -17 (-1.79% of base) : System.Private.CoreLib.dasm - System.Text.StringBuilder:ReplaceAllInChunk(System.ReadOnlySpan`1[int],System.Text.StringBuilder,int,System.String):this
         -14 (-1.48% of base) : System.Console.dasm - System.ConsolePal+WindowsConsoleStream:WriteFileNative(long,System.ReadOnlySpan`1[ubyte],bool):int (2 methods)
         -14 (-3.07% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.SymmetricAlgorithm:DecryptCbc(System.ReadOnlySpan`1[ubyte],System.ReadOnlySpan`1[ubyte],int):ubyte[]:this
         -14 (-2.31% of base) : System.Private.Xml.dasm - System.Text.RegularExpressions.Generated.<RegexGenerator_g>F74B1AE921BCEFE4BA601AA541C7A23B1CA9711EA81E8FE504B5B6446748E035A__EncodeCharRegex_1+RunnerFactory+Runner:TryMatchAtCurrentPosition(System.ReadOnlySpan`1[ushort]):bool:this
         -13 (-1.61% of base) : System.Private.CoreLib.dasm - System.Array:Reverse(System.Array,int,int)
         -12 (-0.99% of base) : System.IO.Hashing.dasm - System.IO.Hashing.XxHashShared:Append(byref,System.ReadOnlySpan`1[ubyte])
         -11 (-3.13% of base) : System.Net.WebSockets.dasm - System.Net.WebSockets.ManagedWebSocket:WriteHeader(ubyte,ubyte[],System.ReadOnlySpan`1[ubyte],bool,bool,bool):int
         -10 (-2.03% of base) : System.Console.dasm - System.ConsolePal+WindowsConsoleStream:ReadFileNative(long,System.Span`1[ubyte],bool,byref,bool):int
          -8 (-2.29% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[double]:.ctor(System.ReadOnlySpan`1[int],bool):this
          -8 (-2.29% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[int]:.ctor(System.ReadOnlySpan`1[int],bool):this
          -8 (-2.29% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[long]:.ctor(System.ReadOnlySpan`1[int],bool):this
          -8 (-2.29% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[short]:.ctor(System.ReadOnlySpan`1[int],bool):this
          -8 (-2.29% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.__Canon]:.ctor(System.ReadOnlySpan`1[int],bool):this
          -8 (-2.29% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Nullable`1[int]]:.ctor(System.ReadOnlySpan`1[int],bool):this
          -8 (-2.29% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[System.Numerics.Vector`1[float]]:.ctor(System.ReadOnlySpan`1[int],bool):this
          -8 (-2.29% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[ubyte]:.ctor(System.ReadOnlySpan`1[int],bool):this
          -8 (-0.74% of base) : System.Text.RegularExpressions.dasm - System.Text.RegularExpressions.Regex:Replace(System.Text.RegularExpressions.MatchEvaluator,System.Text.RegularExpressions.Regex,System.String,int,int):System.String
          -7 (-0.64% of base) : System.Formats.Asn1.dasm - System.Formats.Asn1.AsnWriter:WriteObjectIdentifierCore(System.Formats.Asn1.Asn1Tag,System.ReadOnlySpan`1[ushort]):this
          -7 (-2.10% of base) : System.Net.Http.dasm - System.Net.MultiMemory:CopyTo(System.Span`1[ubyte]):this

Top method regressions (percentages):
           4 ( 1.00% of base) : System.Private.CoreLib.dasm - System.Threading.WaitHandle:WaitAnyMultiple(System.ReadOnlySpan`1[Microsoft.Win32.SafeHandles.SafeWaitHandle],int):int
           1 ( 0.18% of base) : System.Text.Json.dasm - System.Text.Json.JsonDocument:TextEquals(int,System.ReadOnlySpan`1[ubyte],bool,bool):bool:this

Top method improvements (percentages):
          -2 (-3.57% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps[ubyte](System.ReadOnlySpan`1[ubyte],System.ReadOnlySpan`1[ubyte]):bool
          -2 (-3.23% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps[short](System.ReadOnlySpan`1[short],System.ReadOnlySpan`1[short]):bool
          -2 (-3.17% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps[ubyte](System.ReadOnlySpan`1[ubyte],System.ReadOnlySpan`1[ubyte],byref):bool
         -11 (-3.13% of base) : System.Net.WebSockets.dasm - System.Net.WebSockets.ManagedWebSocket:WriteHeader(ubyte,ubyte[],System.ReadOnlySpan`1[ubyte],bool,bool,bool):int
          -2 (-3.12% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps[double](System.ReadOnlySpan`1[double],System.ReadOnlySpan`1[double]):bool
          -2 (-3.12% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps[int](System.ReadOnlySpan`1[int],System.ReadOnlySpan`1[int]):bool
          -2 (-3.12% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps[long](System.ReadOnlySpan`1[long],System.ReadOnlySpan`1[long]):bool
          -2 (-3.12% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps[System.Nullable`1[int]](System.ReadOnlySpan`1[System.Nullable`1[int]],System.ReadOnlySpan`1[System.Nullable`1[int]]):bool
          -2 (-3.12% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Overlaps[System.Numerics.Vector`1[float]](System.ReadOnlySpan`1[System.Numerics.Vector`1[float]],System.ReadOnlySpan`1[System.Numerics.Vector`1[float]]):bool
         -14 (-3.07% of base) : System.Security.Cryptography.dasm - System.Security.Cryptography.SymmetricAlgorithm:DecryptCbc(System.ReadOnlySpan`1[ubyte],System.ReadOnlySpan`1[ubyte],int):ubyte[]:this
          -1 (-2.78% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Reverse[double](System.Span`1[double])
          -1 (-2.78% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Reverse[int](System.Span`1[int])
          -1 (-2.78% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Reverse[long](System.Span`1[long])
          -1 (-2.78% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Reverse[short](System.Span`1[short])
          -1 (-2.78% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Reverse[System.Nullable`1[int]](System.Span`1[System.Nullable`1[int]])
          -1 (-2.78% of base) : System.Private.CoreLib.dasm - System.MemoryExtensions:Reverse[ubyte](System.Span`1[ubyte])
         -19 (-2.77% of base) : System.Net.NetworkInformation.dasm - System.Net.NetworkInformation.PhysicalAddress:TryParse(System.ReadOnlySpan`1[ushort],byref):bool
         -14 (-2.31% of base) : System.Private.Xml.dasm - System.Text.RegularExpressions.Generated.<RegexGenerator_g>F74B1AE921BCEFE4BA601AA541C7A23B1CA9711EA81E8FE504B5B6446748E035A__EncodeCharRegex_1+RunnerFactory+Runner:TryMatchAtCurrentPosition(System.ReadOnlySpan`1[ushort]):bool:this
          -8 (-2.29% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[double]:.ctor(System.ReadOnlySpan`1[int],bool):this
          -8 (-2.29% of base) : System.Numerics.Tensors.dasm - System.Numerics.Tensors.Tensor`1[int]:.ctor(System.ReadOnlySpan`1[int],bool):this

214 total methods with Code Size differences (212 improved, 2 regressed), 375933 unchanged.

So nothing "amazing" but there is some nice wins in known hot methods/scenarios. Most of the diffs are changing movsxd to just mov. Likewise there are a few cases where comparisons and branches are simplified.

The diffs are notable "small" for corelib because we have a lot of places where we intentionally do tricks to avoid sign-extension and treat the length as "unsigned" already. I expect a higher band of profit in user code and the general ability to simplify our own logic in some scenarios rather than having the manual casting.

@benaadams
Copy link
Member

Also apply to Array .Length?

@tannergooding
Copy link
Member Author

Array.Length is already treated as never negative.

@tannergooding
Copy link
Member Author

Diffs

Size savings are basically what I saw locally. There is no perf difference on x64 or x86. Arm64 shows up as "overall +0.00%" with benchmaarks.run being +0.01%. It's unclear why this is the only platform impacted at all.

@tannergooding tannergooding marked this pull request as ready for review January 24, 2023 01:06
@tannergooding
Copy link
Member Author

CC. @dotnet/jit-contrib

@tannergooding
Copy link
Member Author

Updated to track the info as part of LclVarDsc rather than as a GenTree flag, as per a conversation with @jakobbotsch on Discord

@@ -1392,6 +1392,11 @@ class LocalAddressVisitor final : public GenTreeVisitor<LocalAddressVisitor>
{
GenTreeFlags lclVarFlags = node->gtFlags & (GTF_NODE_MASK | GTF_DONT_CSE);

if (node->OperIs(GT_FIELD) && node->AsField()->IsNeverNegative())
{
m_compiler->lvaGetDesc(fieldLclNum)->SetIsNeverNegative(true);
Copy link
Member

@jakobbotsch jakobbotsch Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make any difference in the PMI diffs if we add CORINFO_FLG_SPAN that is returned by getClassAttribs for Span<T> and ReadOnlySpan<T>, and then use it in Compiler::StructPromotionHelper::PromoteStructVar to set this bit on the second local created?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look independently, but I wouldn't expect significant diffs and I don't think we'd want this to be the exclusive way to determine that.

-- Posting a rough summary of what got discussed on Discord.

In order for user code to access the field, they must go through get_Length. They could reinterpret cast or use pointers or other things, but those are all rarer and potentially dangerous.

For the block copy case, you don't really end up with interesting opts possible. The user could then access the copied field, but that would still go through get_Length and therefore the intrinsic path.

We'd want to keep the flag we set as part of import to handle any early constant folding handling. Such optimizations generally have positive impact on TP due to the early dead code removal they allow (and therefore making the total IR to process significantly smaller from the start).

There are other potential opts and tracking we could do, including loop cloning if we had some CORINFO_FLG_SPAN and there might be some cases like multi-reg arg passing where the intrinsic path wouldn't necessarily catch things

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not a fan of this propagation of "never negative" from a GT_FIELD access to the LclVarDsc. It makes an implicit assumption that the local will always satisfy this condition. That is ok for the uses of IsNeverNegative() introduced by this PR but is a very surprising precondition to require in JIT IR (and in contrast to things like GTF_IND_NONNULL). I think it makes the state fragile in the future.

If we want to keep it this way I think the new accessors/information on GT_FIELD needs to be renamed into something like IsNonNegativeLengthAccess() that makes it more clear that the access is on a local that is expected not to break the invariant.

My preference would still be to introduce CORINFO_FLG_SPAN and set it on span locals. We can teach GenTree:IsNeverNegative/IntegralRange::ForNode to recognize accesses of the length (for both promoted access and potential non-promoted access, before promotion happens). It would be more in line with ARR_LENGTH today and would not require adding new state to any nodes (only a lvIsSpan on LclVarDsc that can be set as part of lvaSetStruct -- we already call getClassAttribs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would still be to introduce CORINFO_FLG_SPAN and set it on span locals. We can teach GenTree:IsNeverNegative/IntegralRange::ForNode to recognize accesses of the length (for both promoted access and potential non-promoted access, before promotion happens). It would be more in line with ARR_LENGTH today and would not require adding new state to any nodes (only a lvIsSpan on LclVarDsc that can be set as part of lvaSetStruct -- we already call getClassAttribs).

I'm concerned about the additional cost that doing that would bring. What we have here is simple, effectively, extensible, and works without negatively impacting throughput (it's a small sub 0.01% gain on most platforms and a small sub 0.01% regression on a couple).

While this wouldn't add overhead to the promotion case since we already call getClassAttribs, doing pre-promotion recognition would require additional JIT/EE calls to be introduced which can negatively impact throughput. We would then need to either track that information somehow or continue doing lookups against the field handle where necessary (and we cannot trivially cache since we have a different handle per T).

If we want to keep it this way I think the new accessors/information on GT_FIELD needs to be renamed into something like IsNonNegativeLengthAccess() that makes it more clear that the access is on a local that is expected not to break the invariant.

I think this is reasonable for now. That being said, I think we want and need this to be more generally extensible long term. The concept of a exposing an int but the code requiring it to be "never negative" is fairly integrated throughout the entirety of .NET. In practice it is similar to most APIs not allowing references to be null. There are a good deal of optimizations that are only valid on unsigned types and which will allow us to provide significantly better perf/codegen by recognizing. -- The main reason this diff "looks small" is because we are already explicitly casting to the equivalent unsigned type in much of our managed BCL code and so the remaining cases are places where we weren't or couldn't.

That is ok for the uses of IsNeverNegative() introduced by this PR but is a very surprising precondition to require in JIT IR (and in contrast to things like GTF_IND_NONNULL). I think it makes the state fragile in the future.

Can you elaborate on the fragility and how you think future changes could impact this?

Are you referring to locals getting tagged and then reused for something else? Users assigning negative values (which should only be possible via unsafe code)? Or something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this wouldn't add overhead to the promotion case since we already call getClassAttribs, doing pre-promotion recognition would require additional JIT/EE calls to be introduced which can negatively impact throughput. We would then need to either track that information somehow or continue doing lookups against > the field handle where necessary (and we cannot trivially cache since we have a different handle per T).

The overhead would be in the pattern recognition inside GenTree::IsNeverNegative/IntegralRange::ForNode, but we would need to measure it to make any conclusions.

There would be no new EE calls since lvaSetStruct already calls getClassAttribs. It would make getClassAttribs more expensive on the EE side, but I expect we will want to do this anyway at some point in the (near) future to support loop cloning for spans.

Can you elaborate on the fragility and how you think future changes could impact this?

Are you referring to locals getting tagged and then reused for something else? Users assigning negative values (which should only be possible via unsafe code)? Or something else?

I am referring to a future JIT dev marking a GT_FIELD node as never negative because the field access is non-negative at that particular point in time, in the same way one would with GTF_IND_NONNULL. That's not legal due to this propagation into LclVarDsc that actually requires the backing storage to always be non-negative globally in the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked a bit more on Discord to get clarification.

I've updated GenTreeField::IsNeverNegative to be GenTreeField::IsSpanLength to help avoid any issues around the current propagation. There exists a comment in the method that goes into more detail around the "why".

The plan after this PR is to look into also providing the getClassAttribs support since that should allow other Array like optimizations to kick in for span. However, the intent for the support being added in this PR is that it will stick around both to help catch pre-promotion dead code elimination and because the belief is that there are more general purpose areas where the information that a value is known to be non-negative will be profitable (much like GTF_IND_NONNULL).

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tannergooding tannergooding merged commit f54716d into dotnet:main Jan 26, 2023
@tannergooding tannergooding deleted the span-intrin branch January 26, 2023 01:24
@ghost ghost locked as resolved and limited conversation to collaborators Feb 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT should optimize {array | span}.Length division by constant that is power of 2
5 participants