[mono] Append -sroa and -instcombine to LLVM-JIT pass manager #37458

EgorBo · 2020-06-04T22:49:18Z

Partially fixes #37449

static ReadOnlySpan<byte> Arr
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    get => new byte[] {1, 2, 3, 4};
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static byte GetByte(int i) => Arr[0];

Codegen for GetByte() in LLVM-JIT mode:

0000000000000000 <gram_GetByte__int_>:
<BB>:1
   0:   48 83 ec 28             sub    $0x28,%rsp
   4:   c5 f8 57 c0             vxorps %xmm0,%xmm0,%xmm0
   8:   c5 f8 29 04 24          vmovaps %xmm0,(%rsp)
   d:   48 b8 b0 bd 55 d2 3f    movabs $0x563fd255bdb0,%rax
  14:   56 00 00
  17:   48 89 04 24             mov    %rax,(%rsp)
  1b:   c7 44 24 08 04 00 00    movl   $0x4,0x8(%rsp)
  22:   00
  23:   48 8b 04 24             mov    (%rsp),%rax
  27:   48 89 44 24 10          mov    %rax,0x10(%rsp)
  2c:   8b 44 24 08             mov    0x8(%rsp),%eax
  30:   89 44 24 18             mov    %eax,0x18(%rsp)
  34:   8b 44 24 0c             mov    0xc(%rsp),%eax
  38:   89 44 24 1c             mov    %eax,0x1c(%rsp)
  3c:   83 7c 24 18 00          cmpl   $0x0,0x18(%rsp)
  41:   74 0c                   je     4f <gram_GetByte__int_+0x4f>
  43:   48 8b 44 24 10          mov    0x10(%rsp),%rax
  48:   8a 00                   mov    (%rax),%al
  4a:   48 83 c4 28             add    $0x28,%rsp
  4e:   c3                      retq
  4f:   48 b8 d0 ae 42 d2 3f    movabs $0x563fd242aed0,%rax
  56:   56 00 00
  59:   bf 02 01 00 00          mov    $0x102,%edi
  5e:   ff 10                   callq  *(%rax)

Codegen after I appended -sroa -instcombine (makes sense to add/move extra -simplifycfg for hygiene) to the end of LLVM passes list:

0000000000000000 <gram_GetByte__int_>:
<BB>:1
   0:   48 b8 d0 ef a6 ba 31    movabs $0x5631baa6efd0,%rax
   7:   56 00 00
   a:   8a 00                   mov    (%rax),%al
   c:   c3                      retq

This https://godbolt.org/z/YKjzsV link explains motivation (on the left is the "final" LLVM IR our llvm-jit produces after the default optimizations). Zoltan noticed that LLVM-AOT where we use opt -O2 instead of custom pass order optimized that code perfectly.

The other issue remains: for some reason we don't inline get_Arr() without AggressiveInlining, coreclr does inline it:

Successfully inlined Program:get_Arr():System.ReadOnlySpan`1[Byte] (12 IL bytes) (depth 1) [below ALWAYS_INLINE size]

ghost · 2020-06-04T22:49:24Z

Tagging subscribers to this area: @lewurm
Notify danmosemsft if you want to be subscribed.

vargaz · 2020-06-05T07:07:02Z

We don't inline get_Arr () because it hits the inlining cost limit of 60.

Mono's inliner is quite conservative (see dotnet/runtime#34286) so we have to workaround some limitations, e.g. ```csharp static ReadOnlySpan<byte> Arr => new byte[] {1, 2, 3, 4}; // access example: byte GetByte(int i) => Arr[i]; ``` #### Current codegen for `GetByte` (with dotnet/runtime#37458 included) ```asm 0000000000000000 <gram_GetByte__int_>: <BB>:1 0: 48 83 ec 18 sub $0x18,%rsp 4: c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0 8: c5 f8 29 04 24 vmovaps %xmm0,(%rsp) d: 48 b8 50 22 73 d2 6e movabs $0x556ed2732250,%rax 14: 55 00 00 17: 48 89 e7 mov %rsp,%rdi 1a: ff 10 callq *(%rax) ;;;; get_Arr() is not inlined! 1c: 83 7c 24 08 00 cmpl $0x0,0x8(%rsp) 21: 74 0b je 2e <gram_GetByte__int_+0x2e> 23: 48 8b 04 24 mov (%rsp),%rax 27: 8a 00 mov (%rax),%al 29: 48 83 c4 18 add $0x18,%rsp 2d: c3 retq 2e: 48 b8 68 22 73 d2 6e movabs $0x556ed2732268,%rax 35: 55 00 00 38: bf 02 01 00 00 mov $0x102,%edi 3d: ff 10 callq *(%rax) ``` As you can see, `get_Arr()` is not inlined - it happened because the calculated size is 60 and the major contributor is `ReadOnlySpan<>.ctor` which is marked with [AggressiveInlining], so Mono inlined that `.ctor` into `get_Arr` and the total cost for `get_Arr` increased to 60. #### Codegen with this PR: ```asm 0000000000000000 <gram_GetByte__int_>: <BB>:1 0: 48 b8 00 6e 1f 91 11 movabs $0x5611911f6e00,%rax 7: 56 00 00 a: 8a 00 mov (%rax),%al c: c3 retq ``` Thus, the fix helped to inline more and even reduced function size. There are other places where we append method call costs to `inline_costs` but I decided to start with this one for constructors and only for LLVM to minimize possible regressions. Possible regressions match current CoreCLR behavior, bad case: [sharplab.io](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgCUBXAOwwEt8YLAMIR8AB14AbGFADKMgG68wMXAG4aNYgGYm5JE1IMhNAN4MLNCxYDaAWRgYAFhAAmASXGSAFA+dvPMUkAeTE+CC5cFgBBAHNY2FxcXgUYdy5JXi4s2IBKAF0rax09A2IUBgAhXli/F1dvXKKLU2brC2YATm8AIgAJHtyNanb2rt6YQeHR4vJunskptrG53sWh5dn5iCWRmb1ujb2Z8Z6AdV39g96do6vTqEv90/Xp59We1yeTj4BCb4YAF9NMcmLpmGUKtVao56mcoNgxGIZI02q1Qe0APSYqo1OpuRoMXi4IkZLIwVwMPAMGAIZFgDAUhhwBgQADWmws0PxDTu1mB1DaJQhTAqABVVBhUaD0VdsQx3FT8AwuBAMDS6TAGTlcTD/JSMBAGMAYKTMlwmVlDQwJbgpU0MdZ5cAOOreG6SSzVer2ZzdTz4YjkVBGm8LALAUA===)

#19930) Mono's inliner is quite conservative (see dotnet/runtime#34286) so we have to workaround some limitations, e.g. ```csharp static ReadOnlySpan<byte> Arr => new byte[] {1, 2, 3, 4}; // access example: byte GetByte(int i) => Arr[i]; ``` #### Current codegen for `GetByte` (with dotnet/runtime#37458 included) ```asm 0000000000000000 <gram_GetByte__int_>: <BB>:1 0: 48 83 ec 18 sub $0x18,%rsp 4: c5 f8 57 c0 vxorps %xmm0,%xmm0,%xmm0 8: c5 f8 29 04 24 vmovaps %xmm0,(%rsp) d: 48 b8 50 22 73 d2 6e movabs $0x556ed2732250,%rax 14: 55 00 00 17: 48 89 e7 mov %rsp,%rdi 1a: ff 10 callq *(%rax) ;;;; get_Arr() is not inlined! 1c: 83 7c 24 08 00 cmpl $0x0,0x8(%rsp) 21: 74 0b je 2e <gram_GetByte__int_+0x2e> 23: 48 8b 04 24 mov (%rsp),%rax 27: 8a 00 mov (%rax),%al 29: 48 83 c4 18 add $0x18,%rsp 2d: c3 retq 2e: 48 b8 68 22 73 d2 6e movabs $0x556ed2732268,%rax 35: 55 00 00 38: bf 02 01 00 00 mov $0x102,%edi 3d: ff 10 callq *(%rax) ``` As you can see, `get_Arr()` is not inlined - it happened because the calculated size is 60 and the major contributor is `ReadOnlySpan<>.ctor` which is marked with [AggressiveInlining], so Mono inlined that `.ctor` into `get_Arr` and the total cost for `get_Arr` increased to 60. #### Codegen with this PR: ```asm 0000000000000000 <gram_GetByte__int_>: <BB>:1 0: 48 b8 00 6e 1f 91 11 movabs $0x5611911f6e00,%rax 7: 56 00 00 a: 8a 00 mov (%rax),%al c: c3 retq ``` Thus, the fix helped to inline more and even reduced function size. There are other places where we append method call costs to `inline_costs` but I decided to start with this one for constructors and only for LLVM to minimize possible regressions. Possible regressions match current CoreCLR behavior, bad case: [sharplab.io](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgCUBXAOwwEt8YLAMIR8AB14AbGFADKMgG68wMXAG4aNYgGYm5JE1IMhNAN4MLNCxYDaAWRgYAFhAAmASXGSAFA+dvPMUkAeTE+CC5cFgBBAHNY2FxcXgUYdy5JXi4s2IBKAF0rax09A2IUBgAhXli/F1dvXKKLU2brC2YATm8AIgAJHtyNanb2rt6YQeHR4vJunskptrG53sWh5dn5iCWRmb1ujb2Z8Z6AdV39g96do6vTqEv90/Xp59We1yeTj4BCb4YAF9NMcmLpmGUKtVao56mcoNgxGIZI02q1Qe0APSYqo1OpuRoMXi4IkZLIwVwMPAMGAIZFgDAUhhwBgQADWmws0PxDTu1mB1DaJQhTAqABVVBhUaD0VdsQx3FT8AwuBAMDS6TAGTlcTD/JSMBAGMAYKTMlwmVlDQwJbgpU0MdZ5cAOOreG6SSzVer2ZzdTz4YjkVBGm8LALAUA===) Co-authored-by: EgorBo <[email protected]>

Append -sroa -instcombine

df98db1

EgorBo requested review from vargaz and imhameed June 4, 2020 22:49

Dotnet-GitSync-Bot added the area-Codegen-LLVM-mono label Jun 4, 2020

monojenkins mentioned this pull request Jun 4, 2020

[mono] Append -sroa and -instcombine to LLVM-JIT pass manager mono/mono#19916

Merged

imhameed approved these changes Jun 5, 2020

View reviewed changes

jaredpar mentioned this pull request Jun 5, 2020

Errors installing the SDK during builds #34015

Closed

EgorBo mentioned this pull request Jun 6, 2020

[mono] Don't contribute to inline_const if ctor has AggressiveInlining #37544

Merged

monojenkins mentioned this pull request Jun 6, 2020

[mono] Don't contribute to inline_const if ctor has AggressiveInlining mono/mono#19930

Merged

EgorBo mentioned this pull request Jun 6, 2020

[mono] ReadOnlySpan<byte> hack doesn't work in Mono #37449

Closed

EgorBo merged commit d01bb68 into dotnet:master Jun 8, 2020

ghost locked as resolved and limited conversation to collaborators Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono] Append -sroa and -instcombine to LLVM-JIT pass manager #37458

[mono] Append -sroa and -instcombine to LLVM-JIT pass manager #37458

EgorBo commented Jun 4, 2020 •

edited

Loading

ghost commented Jun 4, 2020

vargaz commented Jun 5, 2020

[mono] Append -sroa and -instcombine to LLVM-JIT pass manager #37458

[mono] Append -sroa and -instcombine to LLVM-JIT pass manager #37458

Conversation

EgorBo commented Jun 4, 2020 • edited Loading

Codegen for GetByte() in LLVM-JIT mode:

ghost commented Jun 4, 2020

vargaz commented Jun 5, 2020

EgorBo commented Jun 4, 2020 •

edited

Loading