Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: "is" keyword generates boxing in generic function #8576

Closed
roji opened this issue Jul 18, 2017 · 26 comments
Closed

JIT: "is" keyword generates boxing in generic function #8576

roji opened this issue Jul 18, 2017 · 26 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Milestone

Comments

@roji
Copy link
Member

roji commented Jul 18, 2017

It seems that using the is keyword in a generic function causes value types to be boxed. Consider the following program:

class Program
{
    const long Iterations = 9999999999;

    static void Foo<T>(T bar)
    {
        //if (bar == null)                   // No allocation
        //if (typeof(T) == typeof(string))   // No allocation
        if (bar is string)                   // Allocation
            throw new Exception("Never happens");
    }

    static void Main(string[] args)
    {
        for (var i = 0; i < Iterations; i++)
            Foo(i);
    }
}

This is especially odd as comparison to null doesn't box. This can be worked around by avoiding is and comparing types (and can also provide a form of "template specialization" as in https://github.com/MikePopoloski/StringFormatter/blob/c36fb18edd44b4d2c75e210e85dc93a3fd58efe0/StringFormatter/StringBuffer.cs#L523).

@Joe4evr
Copy link
Contributor

Joe4evr commented Jul 18, 2017

This is because is translates to isinst (which has to produce an object of the target type, since that's the direct translation of as) followed by a null-check, so a boxing conversion is inevitable.

I guess the JIT can be improved to remove unused codepaths predicated on T is x the same way it optimizes the mentioned typeof(T) == typeof(x) kind of checks? Not sure how much that would impact the runtime perf, though.

@roji
Copy link
Member Author

roji commented Jul 18, 2017

I guess the JIT can be improved to remove unused codepaths predicated on T is x the same way it optimizes the mentioned typeof(T) == typeof(x) kind of checks? Not sure how much that would impact the runtime perf, though.

It definitely sounds like a worthy optimization to me...

One additional comment - with the new pattern matching the impact of this additional allocation may be even worse, as people use it more.

@AndyAyersMS
Copy link
Member

Handling T is x is a bit more complex than handling typeof(T) == x, since subtyping comes into play (as well as special rules for nullable, icastable, etc).

I've been playing with some prototypes here, but nothing worth calling attention to quite yet.

@redknightlois
Copy link

@AndyAyersMS Wouldn't restricting the optimization to struct/primitive types be an acceptable tradeoff here?

@AndyAyersMS
Copy link
Member

Yes, looking at value classes here makes sense, since the box is the expensive bit.

My prototyping was at the isinst stage, which probably is too narrow a view. Likely even if we can prove the isinst will succeed we might not be able to clean up the rest (and as you can see from dotnet/coreclr#13016 cleaning up from an unnecessary box was not trivial).

@AndyAyersMS
Copy link
Member

With dotnet/coreclr#13169 my prototype gets the example above and also can now clean up after the box. However, aside from the example above, I'm not seeing any cases so far where a value type box feeds an isinst or castclass.

It's possible I am missing out on cases that could be optimized because the ability of the jit to reason about types is still somewhat limited (see for instance #7816). For example when crossgenning System.Private.CoreLib the jit sees around 3100 optimizable istinst/castclass operations, and is able to determine the (approximate) type of the object being tested in around 1400 cases. From that the jit is able to successfully determine the result of the cast in only 2 cases.

I will revisit as I improve local type propagation but unless the cases with missing types are somehow correlated with the optimizable cases it seems unlikely the overall success rate will improve.

@roji was this just a "hey, that's unexpected" kind of observation? Or can you refer me to examples where this pattern happens frequently?

@AndyAyersMS
Copy link
Member

Prototype is on this fork, for future reference. Output on the case above is

### M op BOX kind isinst have System.Int32 want System.String Optimizing FAILURE, unrelated types
; Assembly listing for method Program:Foo(int)
; Emitting BLENDED_CODE for X64 CPU with AVX
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;* V00 arg0         [V00    ] (  0,  0   )     int  ->  zero-ref
;* V01 tmp0         [V01    ] (  0,  0   )     ref  ->  zero-ref
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [rsp+0x00]
;
; Lcl frame size = 0

G_M45348_IG01:

G_M45348_IG02:
       C3                   ret

; Total bytes of code 1, prolog size 0 for method Program:Foo(int)

@pentp
Copy link
Contributor

pentp commented Aug 2, 2017

I suspect the relative rarity of this pattern is caused in part by people actively avoiding it because they know it allocates.

@roji
Copy link
Member Author

roji commented Aug 2, 2017

@pentp first, I'm not sure we know much about the relative rarity of this pattern, and in any case I would highly doubt people are aware of this boxing allocation - it is extremely unlikely to me that someone would assume that an is check allocates.

@AndyAyersMS I ran into this issue while doing memory profiling on my project, where a generic function needed some specialized logic. Of course I have no idea how frequent specialization is, but as I said above I can imagine it becoming a problem with the new pattern matching feature. For example, I would expect that in contexts needing specialization (like mine) you'd fine the following pattern:

if (t is string s) {
    // string specialization
}

The alternative non-allocating alternative would be to compare types instead, but that would imply an additional cast which pattern matching is intended to avoid.

@pentp
Copy link
Contributor

pentp commented Aug 2, 2017

I have avoided this pattern for many years and I didn't assume - I checked the x86 asm after I saw a box in IL.

@roji
Copy link
Member Author

roji commented Aug 3, 2017

I have avoided this pattern for many years and I didn't assume - I checked the x86 asm after I saw a box in IL.

The problem here is that unlike most other boxing scenarios where the need for boxing is intuitive to the programmer, this one is completely unexpected and needless (which is why this issue exists). I would expect the number of programmers which regularly review IL/asm to be quite low.

@benaadams
Copy link
Member

benaadams commented Sep 13, 2017

Would be nice if this worked without boxing (allocates 24 bytes)

int key0, key1;
PatternMatchEquality<int>.Equals(key0, key1)

Where

public class PatternMatchEquality<TKey>
{
    private static readonly EqualityComparer<TKey> _defaultComparer = EqualityComparer<TKey>.Default;

    public static bool Equals(TKey key0, TKey key1)
    {
        switch (key0)
        {
            case IEquatable<TKey> key:
                return key.Equals(key1);
            default:
                //return _defaultComparer.Equals(key0, key1);
                throw new Exception();
        }
    }
}

Though is the issue the C# compiler doesn't emit a constrained il instruction in pattern matching?

  .method public hidebysig static bool 
    Equals(
      !0/*TKey*/ key0, 
      !0/*TKey*/ key1
    ) cil managed 
  {
    .maxstack 2
    .locals init (
      [0] !0/*TKey*/ V_0,
      [1] class [System.Runtime]System.IEquatable`1<!0/*TKey*/> V_1
    )

    // [185 13 - 185 26]
    IL_0000: ldarg.0      // key0
    IL_0001: stloc.0      // V_0
    IL_0002: ldloc.0      // V_0
    IL_0003: box          !0/*TKey*/
    IL_0008: brfalse.s    IL_0021
    IL_000a: ldloc.0      // V_0
    IL_000b: box          !0/*TKey*/
    IL_0010: isinst       class [System.Runtime]System.IEquatable`1<!0/*TKey*/>
    IL_0015: dup          
    IL_0016: stloc.1      // V_1
    IL_0017: brfalse.s    IL_0021

    IL_0019: ldloc.1      // V_1

    // [188 21 - 188 45]
    IL_001a: ldarg.1      // key1
    IL_001b: callvirt     instance bool class [System.Runtime]System.IEquatable`1<!0/*TKey*/>::Equals(!0/*TKey*/)
    IL_0020: ret          

    // [190 22 - 190 44]
    IL_0021: newobj       instance void [System.Runtime]System.Exception::.ctor()
    IL_0026: throw        

  } // end of method PatternMatchEquality`1::Equals

@benaadams
Copy link
Member

Raised issue for constrained dotnet/roslyn#22092

Though would still have the issue with boxing for isinst

@AndyAyersMS
Copy link
Member

I just updated my branch for this yesterday; let me see how it fares on this case.

Main concern with the proposed change is whether I have the logic for isinst properly captured.... there are a lot of tricky cases out there and the jit's view of types is often limited.

@benaadams
Copy link
Member

Building :)

@AndyAyersMS
Copy link
Member

The fork's version is not updated, sorry.... anyways the local update doesn't get the above case because it is looking for a specific cast helper and that rules out interface checks.

@benaadams
Copy link
Member

:(

@AndyAyersMS
Copy link
Member

I can generalize the checks, but will need some new methods on the jit interface so the jit can call back to the EE to ask questions about whether or not casts won't/may/must succeed.

@benaadams
Copy link
Member

benaadams commented Sep 13, 2017

As background...

The scenario I'm trying to cover is calling struct generics if the type implements the the interface; so in effect getting the benefit of having a generic constraint without one being present.

The example I was looking at was the default comparer on dictionary; which currently goes via hoops and inheritance to implement the constrained EqualityComparer since TKey itself is not constrained.

However while that works, it also ends up with virtual calls for Equals and GetHashCode that can't be inlined

                Method |        Mean | Scaled |  Gen 0 | Allocated |
---------------------- |------------:|-------:|-------:|----------:|
    ConstrainedGeneric |   0.7292 ns |   1.00 |      - |       0 B |
OverridableConstrained |   1.2454 ns |   1.71 |      - |       0 B |
      EqualityComparer |   1.8306 ns |   2.51 |      - |       0 B |
     IEqualityComparer |   2.7401 ns |   3.76 |      - |       0 B |*
          FuncComparer |   6.3357 ns |   8.69 |      - |       0 B | 
  PatternMatchEquality |  11.6117 ns |  15.92 | 0.0076 |      24 B |
       EqualityGeneric | 571.9018 ns | 784.25 | 0.0582 |     184 B |**

*  Dictionary     
** Uses key0.Equals(key1) no-constraint so via Object.Equals

As highlighted by ravendb in their fast-dictionary blog post; and the same thing could be achieved with the default comparer and struct keys in the regular dictionary.

Using a non-inherited Equatable as a static default and then with an override when the instance comparer is not null; gets most of the way there (second test)

internal class DictionaryEquatableComparer<T> where T : IEquatable<T>
{
    public bool Equals(T x, T y) => x.Equals(y);
    public int GetHashCode(T obj) => obj.GetHashCode();
}

public class OverridableComparer<TKey> where TKey : IEquatable<TKey>
{
    private static readonly DictionaryEquatableComparer<TKey> s_comparer = new DictionaryEquatableComparer<TKey>();
    private IEqualityComparer<TKey> _comparer;

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public bool KeyEquals(TKey key0, TKey key1)
    {
        return _comparer == null ? s_comparer.Equals(key0, key1) : _comparer.Equals(key0, key1);
    }
}

However can't be used in dictionary as TKey is not constrained.

So was looking to see if pattern matching could solve it; as at Jit time everything is known about the key type (if struct; i.e. whether it implements IEquatable) so it could elide everything out in a method; rather than on an object. Essentially applying a generic constraint after the fact/internally.

That was the hope and motivation anyway :)

@benaadams
Copy link
Member

Specifically I'd like to do something like this for Dictionary<TKey, TValue>

private static readonly EqualityComparer<TKey> s_defaultComparer;
private readonly IEqualityComparer<TKey> _customComparer;

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public bool KeyEquals(TKey key0, TKey key1)
{
    bool result;
    if (_customComparer == null)
    {
        switch (key0)
        {
            case IEquatable<TKey> key:
                result = key.Equals(key1);
                break;
            default:
                result = s_defaultComparer.Equals(key0, key1);
                break;
        }
    }
    else
    {
        result = _customComparer.Equals(key0, key1);
    }

    return result;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public int KeyHashCode(TKey key)
{
    int result;
    if (_customComparer == null)
    {
        switch (key)
        {
            case IEquatable<TKey> k:
                result = k.GetHashCode();
                break;
            default:
                result = s_defaultComparer.GetHashCode(key);
                break;
        }
    }
    else
    {
        result = _customComparer.GetHashCode(key);
    }

    return result & 0x7FFFFFFF;
}

@AndyAyersMS
Copy link
Member

Interesting -- I had been planning on looking at ways t the jit could improve dictionary perf for simple key types and this might be the way to go.

@benaadams
Copy link
Member

Don't think its currently expressible in C# as above would be

var key = key0 as IEquatable<TKey>
return key.Equals(key1);

😢

Working on proposal

@benaadams
Copy link
Member

Proposal dotnet/csharplang#905

@omariom
Copy link
Contributor

omariom commented Sep 14, 2017

@benaadams @AndyAyersMS

For Dictionary's case we could have a tactical solution . Jan Kotas mentioned that it was already implemented by .NET Native for UWP.

@AndyAyersMS
Copy link
Member

@omariom Thanks for the reminder. That is one of the things I want to look into further.

@stephentoub
Copy link
Member

stephentoub commented Sep 26, 2017

dotnet/coreclr#14178 hits this as well.

AndyAyersMS referenced this issue in AndyAyersMS/coreclr Oct 13, 2017
Implement the jit interface compareTypesForEquality method
to handle casts from known types to known types, and from
shared types to certain interface types.

Call this method in the jit for castclass and isinst, using
`gtGetClassHandle` to obtain the from type. Optimize sucessful
casts and unsuccessful isinsts when the from type is known
exactly.

Undo part of the type-equality based optimization/workaround
in the AsyncMethodBuilder code that was introduced in dotnet#14178
in favor of interface checks. There is more here that can
be done here before this issue is entirely closed and I will
look at this subsequently.

This optimization allows the jit to remove boxes that are
used solely to feed type casts, and so closes #12877.
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 21, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI enhancement Product code improvement that does NOT require public API changes/additions optimization tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

9 participants