Make typedefof more efficient #5019

ForNeVeR · 2018-05-27T06:37:20Z

I don't think it's actually a suggestion, so I'm filing it as a compiler issue.

As @rspeele mentions in comment to TaskBuilder.fs repo, currently typedefof looks rather inefficient. Compare the following C# and the following F# snippets:

using System.Collections.Generic;
public class C {
    public void M() {
        var x = typeof(List<>);
    }
}

open System.Collections.Generic
type C() =
    member this.M() =
        let x = typedefof<List<_>>
        ()

C# compiles to the following CIL:

         ldtoken [mscorlib]System.Collections.Generic.List`1
         call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
         stloc.0

While F# compiles to the following:

         ldtoken class [mscorlib]System.Collections.Generic.List`1<object>
         call class [mscorlib]System.Type [mscorlib]System.Type::GetTypeFromHandle(valuetype [mscorlib]System.RuntimeTypeHandle)
         stloc.1
         ldloc.1
         callvirt instance bool [mscorlib]System.Type::get_IsGenericType()
         brfalse.s IL_001b
         
         ldloc.1
         callvirt instance class [mscorlib]System.Type [mscorlib]System.Type::GetGenericTypeDefinition()
         br.s IL_001c
         
IL_001b: ldloc.1
IL_001c: stloc.0

which is an equivalent of

Type typeFromHandle = typeof(List<object>);
Type type = (!typeFromHandle.IsGenericType) ? typeFromHandle : typeFromHandle.GetGenericTypeDefinition()

Could we improve that? Are there any issues I miss with plain ldtoken that require the current approach?

The text was updated successfully, but these errors were encountered:

buybackoff · 2018-05-27T13:18:03Z

If you use let x = typeof<List<_>> instead of typedefof the IL is the same as in C#. My original question in TaskBuilder issue was about which one to pick to make sure JIT-time constant magic doesn't break. Since it is JIT work, typeof should be fine. Much bigger issue would be if that didn't work from F#.

I created a benchmark for that. If writing simple F# code like below it is 100+ time slower than C#, because the method equality uses F# GenericEquality and when I just want to generate 10 millions longs profiler reminds me about my old SO question. Also the ULong64 case in F# generates many GC0, which indicates that casting is not optimized away by JIT.

type Worker() =
  [<MethodImplAttribute(MethodImplOptions.AggressiveInlining)>]
  member this.Work<'T>(x: 'T) : 'T =
    if typeof<'T> = typeof<uint64> then
      let d : uint64 = unbox(box(x))
      unbox(box(d))
    else 
      x

main
F#

Case	MOPS	Elapsed	GC0	GC1	GC2	Memory
F# Long	45.46	22 ms	0.0	0.0	0.0	0.008 MB
F# Ulong	37.04	27 ms	3.0	0.0	0.0	4.891 MB

Main
C#

Case	MOPS	Elapsed	GC0	GC1	GC2	Memory
C# Ulong	5000.00	2 ms	0.0	0.0	0.0	0.000 MB
C# Long	5000.00	2 ms	0.0	0.0	0.0	0.000 MB

Then I changed the code to

type Worker() =
  [<MethodImplAttribute(MethodImplOptions.AggressiveInlining)>]
  member this.Work<'T>(x: 'T) : 'T =
    if obj.ReferenceEquals(typeof<'T>, typeof<uint64>) then
      let d : uint64 = unbox(box(x))
      unbox(box(d))
    else 
      x

and found that the method is not inlined for some reason even though it has the attribute (I believe the issue on this attributed was merged long time ago)

But adding inline to the F# method makes it exactly the same as C# without allocations etc.:

type Worker() =
  [<MethodImplAttribute(MethodImplOptions.AggressiveInlining)>]
  member inline this.Work<'T>(x: 'T) : 'T =
    if obj.ReferenceEquals(typeof<'T>, typeof<uint64>) then
      let d : uint64 = unbox(box(x))
      unbox(box(d + 1UL))
    else 
      x

main
F#

Case	MOPS	Elapsed	GC0	GC1	GC2	Memory
F# Long	2500.00	4 ms	0.0	0.0	0.0	0.000 MB
F# Ulong	2500.00	4 ms	0.0	0.0	0.0	0.000 MB

Main
C#

Case	MOPS	Elapsed	GC0	GC1	GC2	Memory
C# Ulong	2500.00	4 ms	0.0	0.0	0.0	0.000 MB
C# Long	2500.00	4 ms	0.0	0.0	0.0	0.000 MB

But increasing the number of operations for more precise measurement reveals that F# does less operations per second in both cases and the difference to C# is similar to the difference between the two cases (they differ by ULong addition) so likely the if branch is not eliminated in F#. I do not want to disassembly and dig deeper for now, probably @adamsitnik could help to show how to use disassembly with BenchmarkDotNet.

main
F#

Case	MOPS	Elapsed	GC0	GC1	GC2	Memory
F# Long	2173.91	46 ms	0.0	0.0	0.0	0.000 MB
F# Ulong	1923.08	52 ms	0.0	0.0	0.0	0.000 MB

Main
C#

Case	MOPS	Elapsed	GC0	GC1	GC2	Memory
C# Ulong	2325.58	43 ms	0.0	0.0	0.0	0.000 MB
C# Long	2272.73	44 ms	0.0	0.0	0.0	0.000 MB

The C# code is as simple/idiomatic as one could write, no special tricks are needed.

    class Worker
    {
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public T Work<T>(T x)
        {
            if (typeof(T) == typeof(ulong))
            {
                var u = (ulong)(object)x;
                u = u + 1;
                return (T)(object)u;
            }
            else
            {
                return x;
            }
        }
    }

Update:
I have looked into the IL and found that C# generates equality using System.Type::op_Equality where in F# I used obj.ReferenceEquals. How to make F# generate the same? When I use simple .Equals it generates a method call and the benchmark becomes many times slower.

call bool [System.Runtime]System.Type::op_Equality(class [System.Runtime]System.Type, class [System.Runtime]System.Type)

rspeele · 2018-05-27T17:50:18Z

Hi, perhaps I misunderstand your meaning with this statement:

If you use let x = typeof<List<_>> instead of typedefof the IL is the same as in C#

But F#'s typeof<List<_>> is equivalent to C#'s typeof(List<object>).

It's not a replacement for C#'s typeof(List<>), which is why in F# you must use either typedefof<List<_>> or typeof<List<_>>.GetGenericTypeDefinition(). In both of these scenarios, again, the <_> is effectively <obj>, it's just idiomatic to use _.

I should mention regardless of this being changed I still probably wouldn't want to do reflection-based checks to return cached bool tasks in TaskBuilder.fs, just because it adds complexity and there would always be some overhead to it in the majority of cases where not returning bools. But this issue is a separate thing and I must admit there have been times when doing heavy reflection code in F# that I have (probably as an unnecessary premature optimization) chosen to save the result of typedefof in a variable.

buybackoff · 2018-05-27T18:32:21Z

@rspeele in the original issue my focus was on JIT compile-time constant and if branch elimination. If it worked then...

there would always be some overhead to it in the majority of cases where not returning bools.

... there would be exactly zero overhead because JIT compiler just eliminates the wrong branches entirely (details here). I promised some benchmarks in the original issue and just posted it there.

zpodlovics · 2018-05-28T08:31:41Z

@buybackoff I guess this issue #526 still haunt us. I guess of the performance related issues comes from unwanted heap allocations (or even stack because of the excessive struct copy due the inability to pass struct's by reference,hopefully this will change with Span<'T> and inref<'T>).

F# should have some kind of infrastructure (even better: constraints or even attributes something like the roslyn checker provides!) to check the heap / stack allocations - perferrably by the compiler. It it possible to calculate the heap / stack allocation based on the TAST and the generated IL? While for functional languages It's an active research area [1], even a basic or a Roslyn HeapAllocationAnalyzer like tool will help a lot due the quick feedback loop to fix existing F# problems like this issue (it also helps that we know it's possible to solve it, because C# already have these tricks):

https://github.com/Microsoft/RoslynClrHeapAllocationAnalyzer

Also take a look at this "constraint" for C# as "The proposal is to declare a local function as non-capturing. The keyword to indicate that would be static":
dotnet/csharplang#1565

[1] http://www.raml.co/
[2] https://arxiv.org/abs/1801.01896

dsyme · 2018-05-29T22:14:35Z

I have looked into the IL and found that C# generates equality using System.Type::op_Equality where in F# I used obj.ReferenceEquals. How to make F# generate the same?

@buybackoff What happens when you use System.Type.op_Equality explicitly, e.g.

    if System.Type.op_Equality(typeof<'T>, typeof<uint64>) then
      ...

Note that open FSharp.Core.Operators.NonStructuralComparison may be sensible for this kind of low-allocation code in any case. In that case, I believe op_Equality will always be called directly for all uses of a = b but of course you will lose structural quality calls and won't be able to use = on some types

buybackoff · 2018-05-29T22:32:22Z

@dsyme yep, this removes the difference (and actually F# becomes slightly faster). Haven't looked into IL.

But what is the deal with inlining here? Is F# compiler or JIT does the work when the inline is present on the method?

dsyme · 2018-05-29T23:29:10Z

But what is the deal with inlining here? Is F# compiler or JIT does the work when the inline is present on the method?

The F# compiler does the work for inline. The JIT does the work for the AggressiveInlining attribute if there are any remaining calls to the method (which there won't be in F# code).

buybackoff · 2018-05-29T23:37:15Z

@dsyme I formulated the question incorrectly. When F# inlines the method (I realize that this is like literally pasting IL code to a call site) then boxing disappears. Does F# compiler eliminate the unbox(box(x)) part when types are the same or JIT? And why do you thing this method is not inlined with the AggressiveInlining attribute?

dsyme · 2018-05-29T23:42:46Z

@dsyme I formulated the question incorrectly. When F# inlines the method (I realize that this is like literally pasting IL code to a call site) then boxing disappears. Does F# compiler eliminate the unbox(box(x)) part when types are the same or JIT?

I think the JIT will. F# doesn't. The inlining will also type-specialise the code and that might help the JIT.

And why do you thing this method is not inlined with the AggressiveInlining attribute?

No idea I'm afraid. I'd just be guessing.

zpodlovics · 2018-05-30T06:49:36Z

FYI:

"Also, there are a number of reasons why methods with AggressiveInlining might not get inlined. If you can drop a checked clrjit.dll into your dotnet install, you can set COMPlus_JitPrintInlinedMethods to get a log of inlining decisions with failure reasons (this is less voluminous than a full jit dump)."

https://github.com/dotnet/coreclr/issues/5996#issuecomment-228467831

There is an optimization pass to detect box + isinst: (https://github.com/dotnet/coreclr/issues/12877 + https://github.com/dotnet/coreclr/issues/14472) https://github.com/dotnet/coreclr/blob/master/src/jit/compiler.h#L3153 however I did not yet found a box + unbox optimization pass in compiler.h / compiler.cpp.

https://github.com/dotnet/coreclr/issues/14305

zpodlovics · 2018-05-30T07:17:46Z

@buybackoff last time I checked the if typeof<'T> = typeof then else if pattern created a somewhat horrible and the slowest code (fsharp/fslang-design#287 (comment)).

 TypeofStructDU | 342.482 ns | 3.3741 ns | 3.1561 ns |

    [<MethodImpl(MethodImplOptions.AggressiveInlining)>]
    let inline matchTypeofStructDU(x: 'T) (v: int) = 
        if typeof<StructCase1> = typeof<'T> then 
            Baseline.Invoke(v,1)
        else if typeof<StructCase2> = typeof<'T> then
                Baseline.Invoke(v,2)
        else if typeof<StructCase3> = typeof<'T> then
                Baseline.Invoke(v,3)
        else if typeof<StructCase4> = typeof<'T> then
                Baseline.Invoke(v,4)
        else if typeof<StructCase5> = typeof<'T> then
                Baseline.Invoke(v,5)
        else if typeof<StructCase6> = typeof<'T> then
                Baseline.Invoke(v,6)
        else if typeof<StructCase7> = typeof<'T> then
                Baseline.Invoke(v,7)
        else if typeof<StructCase8> = typeof<'T> then
                Baseline.Invoke(v,8)
        else if typeof<StructCase9> = typeof<'T> then
                Baseline.Invoke(v,9)
        else if typeof<StructCase10> = typeof<'T> then
                Baseline.Invoke(v,10)
        else 0

    let typeofStructDUInstance = (Unchecked.defaultof<StructCase10>) 

    [<MethodImpl(MethodImplOptions.NoInlining)>]
    let typeofStructDU() = matchTypeofStructDU typeofStructDUInstance 1

Generated IL code snippet (full namespace retracted) for typeof as match:

  .method assembly static int32  matchTypeofStructDU$cont@396(class [FSharp.Core]Microsoft.FSharp.Core.Unit unitVar) cil managed
  {
    .custom instance void [System.Runtime]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 ) 
    // Code size       284 (0x11c)
    .maxstack  4
    IL_0000:  ldtoken    StructCase3
    IL_0005:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_000a:  ldtoken    StructCase10
    IL_000f:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0014:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_0019:  brfalse.s  IL_0023

    IL_001b:  ldc.i4.1
    IL_001c:  ldc.i4.3
    IL_001d:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0022:  ret

    IL_0023:  ldtoken    StructCase4
    IL_0028:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_002d:  ldtoken    StructCase10
    IL_0032:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0037:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_003c:  brfalse.s  IL_0046

    IL_003e:  ldc.i4.1
    IL_003f:  ldc.i4.4
    IL_0040:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0045:  ret

    IL_0046:  ldtoken    StructCase5
    IL_004b:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0050:  ldtoken    StructCase10
    IL_0055:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_005a:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_005f:  brfalse.s  IL_0069

    IL_0061:  ldc.i4.1
    IL_0062:  ldc.i4.5
    IL_0063:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0068:  ret

    IL_0069:  ldtoken    StructCase6
    IL_006e:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0073:  ldtoken    StructCase10
    IL_0078:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_007d:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_0082:  brfalse.s  IL_008c

    IL_0084:  ldc.i4.1
    IL_0085:  ldc.i4.6
    IL_0086:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_008b:  ret

    IL_008c:  ldtoken    StructCase7
    IL_0091:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0096:  ldtoken    StructCase10
    IL_009b:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_00a0:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_00a5:  brfalse.s  IL_00af

    IL_00a7:  ldc.i4.1
    IL_00a8:  ldc.i4.7
    IL_00a9:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_00ae:  ret

    IL_00af:  ldtoken    StructCase8
    IL_00b4:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_00b9:  ldtoken    StructCase10
    IL_00be:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_00c3:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_00c8:  brfalse.s  IL_00d2

    IL_00ca:  ldc.i4.1
    IL_00cb:  ldc.i4.8
    IL_00cc:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_00d1:  ret

    IL_00d2:  ldtoken    StructCase9
    IL_00d7:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_00dc:  ldtoken    StructCase10
    IL_00e1:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_00e6:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_00eb:  brfalse.s  IL_00f6

    IL_00ed:  ldc.i4.1
    IL_00ee:  ldc.i4.s   9
    IL_00f0:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_00f5:  ret

    IL_00f6:  ldtoken    StructCase10
    IL_00fb:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0100:  ldtoken    StructCase10
    IL_0105:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_010a:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_010f:  brfalse.s  IL_011a

    IL_0111:  ldc.i4.1
    IL_0112:  ldc.i4.s   10
    IL_0114:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0119:  ret

    IL_011a:  ldc.i4.0
    IL_011b:  ret
  } // end of method MatchBenchmark::matchTypeofStructDU$cont@396

Generated IL code snippet (full namespace retracted) for benchmark:

 .method public static int32  typeofStructDU() cil managed noinlining
  {
    // Code size       77 (0x4d)
    .maxstack  4
    IL_0000:  ldtoken    StructCase1
    IL_0005:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_000a:  ldtoken    StructCase10
    IL_000f:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0014:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_0019:  brfalse.s  IL_0023

    IL_001b:  ldc.i4.1
    IL_001c:  ldc.i4.1
    IL_001d:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0022:  ret

    IL_0023:  ldtoken    StructCase2
    IL_0028:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_002d:  ldtoken    StructCase10
    IL_0032:  call       class [System.Runtime]System.Type [System.Runtime]System.Type::GetTypeFromHandle(valuetype [System.Runtime]System.RuntimeTypeHandle)
    IL_0037:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>(!!0,
                                                                                                                                                             !!0)
    IL_003c:  brfalse.s  IL_0046

    IL_003e:  ldc.i4.1
    IL_003f:  ldc.i4.2
    IL_0040:  call       int32 Baseline::Invoke(int32,
                                                                         int32)
    IL_0045:  ret

    IL_0046:  ldnull
    IL_0047:  call       int32 MatchBenchmark::matchTypeofStructDU$cont@396(class [FSharp.Core]Microsoft.FSharp.Core.Unit)
    IL_004c:  ret
  } // end of method MatchBenchmark::typeofStructDU

buybackoff · 2018-05-30T07:17:59Z

@zpodlovics the point is that the same C# method is inlined by JIT and they use the same JIT... which probably means IL is different. But in such cases like this I prefer F# order, not a hint, to inline.

There are not so many reasons when a simple method is not inlined - the most common is exception throwing, like in this case:

The method is F#-intrinsic, but not JIT-intrisic, so it is a normal one and any throw 100% makes a method not a candidate for inlining (which is easy to avoid with ThrowHelper pattern).

buybackoff · 2018-05-30T07:21:54Z

@zpodlovics

a somewhat horrible and the slowest code

"horrible and the slowest code" is this part: typeof<StructCase2> = typeof<'T> which translates to bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<class [System.Runtime]System.Type>.

Try using what @dsyme showed above: if System.Type.op_Equality(typeof<'T>, typeof<uint64>) and JIT will happily recognize that this condition is JIT-time constant and will remove false branches completely.

zpodlovics · 2018-05-30T07:32:08Z

@buybackoff I did not noticed that System.Type.op_Equality comment earlier, I will try it out.

It may worth to reconsider the exception handling in F# code base expecially in standard library and move out the slow path to helpers (throwhelpers). This could shrink the fastpath number of ILs, that could also improve the JIT inlining decisions, but that's a different story / issue.

AndyAyersMS · 2018-05-30T07:49:44Z

Adjacent box / unbox.any is handled via a peephole optimization in the jit importer (look at importer.cpp around line 14890).

I tried looking at inlining decisions in your linked repro case but it appears that inlining of Worker.Work was done by the frontend compiler (even without the inline keyword).

The jit in .Net Core 2.1 has a number of enhancements to type equality opts and boxing opts so you might also give that a try, if you're still using 2.0.

buybackoff · 2018-05-30T10:38:23Z

It may worth to reconsider the exception handling in F# code base expecially in standard library and move out the slow path to helpers (throwhelpers). This could shrink the fastpath number of ILs, that could also improve the JIT inlining decisions, but that's a different story / issue.

This is a low-hanging fruit. Corefx uses this everywhere with interesting comments on why two nested method. Since @AndyAyersMS is here he could give authoritative answer if this pattern is worth it to change many LOCs

Just looked into Array.fs and found that exceptions are inlined with F# inline keyword, which is quite opposite to ThrowHelper pattern.

zpodlovics · 2018-05-30T12:08:25Z

@buybackoff Exactly, this why I mentioned. It's assumed to be a low-hanging fruit, but I am afraid it will be probably the most critical realization for F# performance related issues. It's not accidental that CoreCLR/CoreFX/Kestrel code refactored to that style.

a) The most common execution path shrinked significantly (fastpath). The less code means lot more opportunity to inline (some slow code path could also prevent inlining due the complexity, size, etc) and the JIT will generate lot less code to execute that path.
b) The non-common path (slowpath) moved out to own method/function to avoid code bloat when the fastpath is inlinied.
c) The other non-common path (exception) moved out to own method/function to avoid code bloat when the fastpath is inlinied.

Inlining/NoInlining flags should be used wisely: https://github.com/dotnet/coreclr/issues/1270#issuecomment-180862050

The original reason why raise marked inline is ... to avoid warnings is code coverage tests (instead of fixing the code coverage tool). That was reasonably reasonable at that time, but the CoreCLR changed/improved significantly since then. Please see my late concerns about here: #3121 (comment)

There are some legitimate case when inlining (at IL level by the compiler) exception throwing is required for example NoDynamicInvocation case here: https://github.com/dotnet/coreclr/issues/1270#issuecomment-181767435

buybackoff · 2018-05-30T12:17:43Z

@zpodlovics but that is tedious work without much fun. I try to use ThrowHelper by default after I created such helper for myself, but usually it makes sense when a profiler shows that e.g. method call takes more time than some numeric calculation. But at least in F# Core many throwing functions are reused from helper modules and it's easier to change code there than everywhere.

zpodlovics · 2018-05-30T12:33:45Z

@buybackoff I did the same in my code. Well, we can always introduce a an attribute for forced inline exception throwing (on raise) eg.: ForceInlineThrow (or something else), and the compiler do a transformation and generate helper static methods for raise code. But I am afraid this automatic tranformation may result a non-equivalent program and/or error prone (but the same could be true for inlining exception throwing).

AndyAyersMS · 2018-05-30T16:48:42Z

I would say in CoreCLR/CoreFX where're kind of in an ok but funny place. We have some number of methods that throw manually converted over to use throw helpers. But not all. We'd really prefer that this transformation be done automatically, behind the scenes (via say the ILLinker). But we don't have that yet.

The pattern is:

The main method does its checks (say argument validation) and then conditionally calls throw helper with any needed args. Type the helper so no implicit boxing or parms array allocation is needed.
The throw helper does all formatting and exception object creation, then unconditionally throws. It is deliberately not marked with [MethodImpl(MethodImplOptions.NoInlining)].

The jit recognizes this pattern and does not inline the throw helper, and moves its call site to the "cold" section of the calling method's code (either end of method when jitting, or in a split off cold part when prejitting).

Note that because of resource lookup, boxing, etc the formatting of the exception message can be a fairly large chunk of code (and the trend is to make these messages be more detailed). So we also get overall size wins when throw helpers can be reused across a number of different methods.

Whether this is worth doing is hard to say -- if you have a number of frequently called methods that have conditional throws then it probably is helpful. You can be selective and only convert things gradually.

manofstick · 2018-07-09T06:58:01Z

Not re: the original topic, but where it meandered to... #5307 affects this (sprinkling breadcrumbs...)

(Oh, and BTW, not sure if it was ever changed, but I remember looked many moons ago at AggressiveInlining in f#... From memory the MethodImplAttributes were handled as special cases, and AggressiveInlining were not passed through to the IL (probably the enum value was created after the fsharp compiler did that bit...), which might explain why it didn't work. From memory again I think I had a crack at adding it, and it wasn't particularly hard - not sure why I never posted a PR though... Hmmm. Maybe there was some issue...)

zpodlovics · 2018-07-09T12:10:11Z

@manofstick Last time I checked AggressiveInlining was fine in F#. The decompiled IL (with ildasm) have the aggressiveinlining marker. Congratulation for your amazing work on equality!

    [<MethodImpl(MethodImplOptions.AggressiveInlining)>]
    static member op_Equality(this : SOption<'a>, other : SOption<'a>) =
        this.IsSome = other.IsSome && this.Value = other.Value

    .method public specialname static bool 
            op_Equality(valuetype Program/VOption`1<!a> this,
                        valuetype Program/VOption`1<!a> other) cil managed aggressiveinlining
    {
      // Code size       44 (0x2c)
      .maxstack  4
      .locals init (!a V_0,
               !a V_1)
      IL_0000:  ldarga.s   this
      IL_0002:  call       instance bool valuetype Program/VOption`1<!a>::get_IsSome()
      IL_0007:  ldarga.s   other
      IL_0009:  call       instance bool valuetype Program/VOption`1<!a>::get_IsSome()
      IL_000e:  bne.un.s   IL_002a

      IL_0010:  ldarga.s   this
      IL_0012:  call       instance !0 valuetype Program/VOption`1<!a>::get_Value()
      IL_0017:  stloc.0
      IL_0018:  ldarga.s   other
      IL_001a:  call       instance !0 valuetype Program/VOption`1<!a>::get_Value()
      IL_001f:  stloc.1
      IL_0020:  ldloc.0
      IL_0021:  ldloc.1
      IL_0022:  tail.
      IL_0024:  call       bool [FSharp.Core]Microsoft.FSharp.Core.LanguagePrimitives/HashCompare::GenericEqualityIntrinsic<!a>(!!0,
                                                                                                                                !!0)
      IL_0029:  ret

      IL_002a:  ldc.i4.0
      IL_002b:  ret
    } // end of method VOption`1::op_Equality

NinoFloris · 2020-02-03T16:41:00Z

Any way to move this forward?

@dsyme what are your thoughts on loading an open generic token like C# does by repurposing _ within a typedefof call by adding an optimization in the compiler?

dsyme · 2020-03-23T14:04:04Z

@dsyme what are your thoughts on loading an open generic token like C# does by repurposing _ within a typedefof call by adding an optimization in the compiler?

Couldn't we just add a specific optimization to generate the open ldtoken?

NinoFloris · 2020-03-23T14:24:00Z

That was the suggestion indeed!

Without adding discards support it is a bit confusing you must fully specify the type yet you don't care about the arguments.

Related to this is something I ran into a while back with a complex type; as typedefof is also just a normal generic function it must also respect the constraints on the type parameters involved. This can lead to awkward signatures where you must put a lot of emphasis on the arguments (think nonsense like Foo<'T, 'U when 'T :> seq<'U>, 'U :> IEquatable<'U>>). I'm assuming it would have to become an intrinsic to loosen type checks for typedef, if it's even something you want to entertain.

ForNeVeR mentioned this issue May 27, 2018

Caching bool tasks rspeele/TaskBuilder.fs#21

Open

zpodlovics mentioned this issue Jun 12, 2018

Performance: Not possible to prevent inlining the IL by the F# compiler without the MethodImplOptions.NoInlining flags #5178

Closed

cartermp added Area-Compiler Feature Improvement labels Aug 13, 2018

cartermp added this to the Unknown milestone Aug 25, 2018

zpodlovics mentioned this issue Sep 1, 2020

Nop IL instructions in release code #6026

Closed

cartermp modified the milestones: Feature requests and improvements, Backlog Oct 28, 2020

teo-tsirpanis mentioned this issue Nov 11, 2020

typeof doesn't work with ref struct types. #10426

Open

dsyme added Area-Compiler-Optimization The F# optimizer, release code gen etc. and removed Area-Compiler labels Mar 31, 2022

vzarytovskii added this to F# Compiler and Tooling Jun 17, 2022

vzarytovskii moved this to Not Planned in F# Compiler and Tooling Jun 17, 2022

NinoFloris mentioned this issue Feb 20, 2023

F# typeof comparison suboptimal codegen dotnet/runtime#82394

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make typedefof more efficient #5019

Make typedefof more efficient #5019

ForNeVeR commented May 27, 2018 •

edited

Loading

buybackoff commented May 27, 2018 •

edited

Loading

rspeele commented May 27, 2018 •

edited

Loading

buybackoff commented May 27, 2018

zpodlovics commented May 28, 2018 •

edited

Loading

dsyme commented May 29, 2018

buybackoff commented May 29, 2018

dsyme commented May 29, 2018 •

edited

Loading

buybackoff commented May 29, 2018

dsyme commented May 29, 2018

zpodlovics commented May 30, 2018 •

edited

Loading

zpodlovics commented May 30, 2018

buybackoff commented May 30, 2018

buybackoff commented May 30, 2018

zpodlovics commented May 30, 2018 •

edited

Loading

AndyAyersMS commented May 30, 2018

buybackoff commented May 30, 2018

zpodlovics commented May 30, 2018 •

edited

Loading

buybackoff commented May 30, 2018

zpodlovics commented May 30, 2018 •

edited

Loading

AndyAyersMS commented May 30, 2018

manofstick commented Jul 9, 2018

zpodlovics commented Jul 9, 2018

NinoFloris commented Feb 3, 2020

dsyme commented Mar 23, 2020

NinoFloris commented Mar 23, 2020

Make typedefof more efficient #5019

Make typedefof more efficient #5019

Comments

ForNeVeR commented May 27, 2018 • edited Loading

buybackoff commented May 27, 2018 • edited Loading

rspeele commented May 27, 2018 • edited Loading

buybackoff commented May 27, 2018

zpodlovics commented May 28, 2018 • edited Loading

dsyme commented May 29, 2018

buybackoff commented May 29, 2018

dsyme commented May 29, 2018 • edited Loading

buybackoff commented May 29, 2018

dsyme commented May 29, 2018

zpodlovics commented May 30, 2018 • edited Loading

zpodlovics commented May 30, 2018

buybackoff commented May 30, 2018

buybackoff commented May 30, 2018

zpodlovics commented May 30, 2018 • edited Loading

AndyAyersMS commented May 30, 2018

buybackoff commented May 30, 2018

zpodlovics commented May 30, 2018 • edited Loading

buybackoff commented May 30, 2018

zpodlovics commented May 30, 2018 • edited Loading

AndyAyersMS commented May 30, 2018

manofstick commented Jul 9, 2018

zpodlovics commented Jul 9, 2018

NinoFloris commented Feb 3, 2020

dsyme commented Mar 23, 2020

NinoFloris commented Mar 23, 2020

ForNeVeR commented May 27, 2018 •

edited

Loading

buybackoff commented May 27, 2018 •

edited

Loading

rspeele commented May 27, 2018 •

edited

Loading

zpodlovics commented May 28, 2018 •

edited

Loading

dsyme commented May 29, 2018 •

edited

Loading

zpodlovics commented May 30, 2018 •

edited

Loading

zpodlovics commented May 30, 2018 •

edited

Loading

zpodlovics commented May 30, 2018 •

edited

Loading

zpodlovics commented May 30, 2018 •

edited

Loading