Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unroll StringBuilder.Append for const string #85894

Merged
merged 5 commits into from
May 9, 2023

Conversation

yesmey
Copy link
Contributor

@yesmey yesmey commented May 7, 2023

Buffer.Memmove is now being unrolled for constant lengths. If we simplify Append(ref char, int) a bit, the JIT can inline it.
As a result, this allows the methods StringBuilder.Append(string) and StringBuilder.AppendLine() to be able to be unrolled.
I didn't seem to need to add any AggressiveInlining attribute, so I left it out.

Const string example

public static void Example(StringBuilder stringBuilder)
{
    stringBuilder.Append("1234567890abcdefghijklmnopqrstuvwxyzåäö");
}

Before:

; Method Program:Example(System.Text.StringBuilder)
G_M35345_IG01:  ;; offset=0000H
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M35345_IG02:  ;; offset=0004H
       cmp      byte  ptr [rcx], cl
       mov      rdx, 0x154802028D8      ; '1'
       add      rdx, 12
       mov      r8d, 39
       call     [System.Text.StringBuilder:Append(byref,int):this]
       nop      
						;; size=29 bbWeight=1 PerfScore 7.00

G_M35345_IG03:  ;; offset=0021H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 38

After:

; Method Program:Example(System.Text.StringBuilder)
G_M35345_IG01:  ;; offset=0000H
       sub      rsp, 40
       vzeroupper 
						;; size=7 bbWeight=1 PerfScore 1.25

G_M35345_IG02:  ;; offset=0007H
       mov      rdx, 0x18C802028D8      ; '1'
       add      rdx, 12
       mov      r8, gword ptr [rcx+08H]
       mov      eax, dword ptr [rcx+18H]
       lea      r9d, [rax+27H]
       cmp      dword ptr [r8+08H], r9d
       jb       SHORT G_M35345_IG04
						;; size=31 bbWeight=1 PerfScore 9.00

G_M35345_IG03:  ;; offset=0026H
       movsxd   r9, eax
       lea      r8, bword ptr [r8+2*r9+10H]
       vmovdqu  ymm0, ymmword ptr [rdx]
       vmovdqu  ymm1, ymmword ptr [rdx+20H]
       vmovdqu  xmm2, xmmword ptr [rdx+3EH]
       vmovdqu  ymmword ptr [r8], ymm0
       vmovdqu  ymmword ptr [r8+20H], ymm1
       vmovdqu  xmmword ptr [r8+3EH], xmm2
       add      eax, 39
       mov      dword ptr [rcx+18H], eax
       jmp      SHORT G_M35345_IG05
						;; size=47 bbWeight=0.50 PerfScore 12.25

G_M35345_IG04:  ;; offset=0055H
       mov      r8d, 39
       call     [System.Text.StringBuilder:AppendWithExpansion(byref,int):this]
						;; size=12 bbWeight=0.50 PerfScore 1.62

G_M35345_IG05:  ;; offset=0061H
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25

G_M35345_IG06:  ;; offset=0062H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 103

AppendLine example

public static void Example(StringBuilder stringBuilder)
{
    stringBuilder.AppendLine();
}

Before:

; Method Program:Example(System.Text.StringBuilder)
G_M35345_IG01:  ;; offset=0000H
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M35345_IG02:  ;; offset=0004H
       cmp      byte  ptr [rcx], cl
       mov      rdx, 0x195802028D8      ; '  '
       add      rdx, 12
       mov      r8d, 2
       call     [System.Text.StringBuilder:Append(byref,int):this]
       nop      
						;; size=29 bbWeight=1 PerfScore 7.00

G_M35345_IG03:  ;; offset=0021H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 38

After:

; Method Program:Example(System.Text.StringBuilder)
G_M35345_IG01:  ;; offset=0000H
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M35345_IG02:  ;; offset=0004H
       mov      rdx, 0x17C002028D8      ; '  '
       add      rdx, 12
       mov      r8, gword ptr [rcx+08H]
       mov      eax, dword ptr [rcx+18H]
       lea      r9d, [rax+02H]
       cmp      dword ptr [r8+08H], r9d
       jb       SHORT G_M35345_IG04
						;; size=31 bbWeight=1 PerfScore 9.00

G_M35345_IG03:  ;; offset=0023H
       movsxd   r9, eax
       lea      r8, bword ptr [r8+2*r9+10H]
       mov      r9d, dword ptr [rdx]
       mov      dword ptr [r8], r9d
       add      eax, 2
       mov      dword ptr [rcx+18H], eax
       jmp      SHORT G_M35345_IG05
						;; size=22 bbWeight=0.50 PerfScore 3.75

G_M35345_IG04:  ;; offset=0039H
       mov      r8d, 2
       call     [System.Text.StringBuilder:AppendWithExpansion(byref,int):this]
						;; size=12 bbWeight=0.50 PerfScore 1.62

G_M35345_IG05:  ;; offset=0045H
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25

G_M35345_IG06:  ;; offset=0046H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 75

@EgorBo can you please take a look at this

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 7, 2023
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label May 7, 2023
@EgorBo
Copy link
Member

EgorBo commented May 7, 2023

Very nice use-case! One problem that it might slightly regress the case when it's not inlined and Mono.
The method seems to be compiled to a small codegen (when not unrolled) so presumably it won't hurt to mark it as AggressiveInlining.

Another option is to tune inliner - if it sees Buffer.Memmove and the last arg is a constant - boost inlining (but you need to invoke a different overload of Buffer.Memmove, the one that accepts byte)

@EgorBo
Copy link
Member

EgorBo commented May 7, 2023

EgorBo@b79af3f

this seems to be enough to make it inlineable for const input (it gives inliner extra hint)

@yesmey
Copy link
Contributor Author

yesmey commented May 8, 2023

You are right we need to keep small size optimization, I figured it was only there for constant separators/newline, and I didn't think about Mono.

EgorBo@b79af3f

this seems to be enough to make it inlineable for const input (it gives inliner extra hint)

That's a cool trick! It seem to make it inline it pretty aggressively unfortunately, even when the input is not constant. It causes the code size to increase a lot where its not really necessary.

Another option is to tune inliner - if it sees Buffer.Memmove and the last arg is a constant - boost inlining (but you need to invoke a different overload of Buffer.Memmove, the one that accepts byte)

What's funny is that if I use the byte variant of Buffer.Memmove instead, it already seems to make it inlinable! (without removing the optimization for small input of course). I'm guessing directly calling an intrinsic gives an inline hint?

It seems to hit a pretty good spot now where it inlines for a constant length, but not otherwise.

I know very little about how the jit works, so I'm out of my league here - but is it possible an even better solution is to somehow make both variants of Buffer.Memmove have the same inlining "weight"? It feels a bit odd that they behave differently here.

@yesmey
Copy link
Contributor Author

yesmey commented May 8, 2023

With current code:

public static void Example(StringBuilder builder, string s)
{
    builder.Append(s);
    builder.Append("1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN");
    builder.AppendLine();
}
; Method Program:Example(System.Text.StringBuilder,System.String)
G_M17123_IG01:  ;; offset=0000H
       push     rsi
       sub      rsp, 32
       vzeroupper 
       mov      rsi, rcx
						;; size=11 bbWeight=1 PerfScore 2.50

G_M17123_IG02:  ;; offset=000BH
       cmp      byte  ptr [rsi], sil
       test     rdx, rdx
       je       SHORT G_M17123_IG04
						;; size=8 bbWeight=1 PerfScore 4.25

G_M17123_IG03:  ;; offset=0013H
       mov      r8d, dword ptr [rdx+08H]
       add      rdx, 12
       mov      rcx, rsi
       call     [System.Text.StringBuilder:Append(byref,int):this]
						;; size=17 bbWeight=0.50 PerfScore 2.75

G_M17123_IG04:  ;; offset=0024H
       mov      rdx, 0x1EB802028D8      ; '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN'
       add      rdx, 12
       mov      rcx, gword ptr [rsi+08H]
       mov      r8d, dword ptr [rsi+18H]
       lea      eax, [r8+32H]
       cmp      dword ptr [rcx+08H], eax
       jb       SHORT G_M17123_IG06
						;; size=31 bbWeight=1 PerfScore 9.00

G_M17123_IG05:  ;; offset=0043H
       movsxd   rax, r8d
       lea      rcx, bword ptr [rcx+2*rax+10H]
       vmovdqu  ymm0, ymmword ptr [rdx]
       vmovdqu  ymm1, ymmword ptr [rdx+20H]
       vmovdqu  ymm2, ymmword ptr [rdx+40H]
       vmovdqu  xmm3, xmmword ptr [rdx+54H]
       vmovdqu  ymmword ptr [rcx], ymm0
       vmovdqu  ymmword ptr [rcx+20H], ymm1
       vmovdqu  ymmword ptr [rcx+40H], ymm2
       vmovdqu  xmmword ptr [rcx+54H], xmm3
       add      r8d, 50
       mov      dword ptr [rsi+18H], r8d
       jmp      SHORT G_M17123_IG07
						;; size=56 bbWeight=0.50 PerfScore 15.75

G_M17123_IG06:  ;; offset=007BH
       mov      rcx, rsi
       mov      r8d, 50
       call     [System.Text.StringBuilder:AppendWithExpansion(byref,int):this]
						;; size=15 bbWeight=0.50 PerfScore 1.75

G_M17123_IG07:  ;; offset=008AH
       mov      rdx, 0x1EB80202958      ; '  '
       add      rdx, 12
       mov      rcx, gword ptr [rsi+08H]
       mov      r8d, dword ptr [rsi+18H]
       lea      eax, [r8+02H]
       cmp      dword ptr [rcx+08H], eax
       jb       SHORT G_M17123_IG09
						;; size=31 bbWeight=1 PerfScore 9.00

G_M17123_IG08:  ;; offset=00A9H
       movsxd   rdx, r8d
       lea      rcx, bword ptr [rcx+2*rdx+10H]
       mov      word  ptr [rcx], 13
       mov      word  ptr [rcx+02H], 10
       mov      dword ptr [rsi+18H], eax
       jmp      SHORT G_M17123_IG10
						;; size=24 bbWeight=0.50 PerfScore 3.12

G_M17123_IG09:  ;; offset=00C1H
       mov      rcx, rsi
       mov      r8d, 2
       call     [System.Text.StringBuilder:AppendWithExpansion(byref,int):this]
						;; size=15 bbWeight=0.50 PerfScore 1.75

G_M17123_IG10:  ;; offset=00D0H
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25

G_M17123_IG11:  ;; offset=00D1H
       add      rsp, 32
       pop      rsi
       ret      
						;; size=6 bbWeight=1 PerfScore 1.75
; Total bytes of code: 215

@EgorBo
Copy link
Member

EgorBo commented May 8, 2023

@yesmey can you apply this patch 47eb0eb

it allows us to avoid using uglier version of Memmove + possibly improves other similiar cases

@yesmey
Copy link
Contributor Author

yesmey commented May 8, 2023

@yesmey can you apply this patch 47eb0eb

it allows us to avoid using uglier version of Memmove + possibly improves other similiar cases

Works perfectly for this scenario, thanks! I'll post some examples here. I assume you will want to take it from here and close this?

Append non-const
public static void Append(StringBuilder builder, string s)
{
    builder.Append(s);
}

Before

; Method Program:Append(System.Text.StringBuilder,System.String)
G_M43621_IG01:  ;; offset=0000H
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M43621_IG02:  ;; offset=0004H
       cmp      byte  ptr [rcx], cl
       test     rdx, rdx
       je       SHORT G_M43621_IG04
						;; size=7 bbWeight=1 PerfScore 4.25

G_M43621_IG03:  ;; offset=000BH
       mov      r8d, dword ptr [rdx+08H]
       add      rdx, 12
       call     [System.Text.StringBuilder:Append(byref,int):this]
						;; size=14 bbWeight=0.49 PerfScore 2.57

G_M43621_IG04:  ;; offset=0019H
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25

G_M43621_IG05:  ;; offset=001AH
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 31

After

; Method Program:Append(System.Text.StringBuilder,System.String)
G_M43621_IG01:  ;; offset=0000H
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M43621_IG02:  ;; offset=0004H
       cmp      byte  ptr [rcx], cl
       test     rdx, rdx
       je       SHORT G_M43621_IG04
						;; size=7 bbWeight=1 PerfScore 4.25

G_M43621_IG03:  ;; offset=000BH
       mov      r8d, dword ptr [rdx+08H]
       add      rdx, 12
       call     [System.Text.StringBuilder:Append(byref,int):this]
						;; size=14 bbWeight=0.50 PerfScore 2.62

G_M43621_IG04:  ;; offset=0019H
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25

G_M43621_IG05:  ;; offset=001AH
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 31
Append by index non-const
public static void AppendIndex(StringBuilder builder, string s)
{
    builder.Append(s, 2, 16);
}

Before

; Method Program:AppendIndex(System.Text.StringBuilder,System.String)
G_M19771_IG01:  ;; offset=0000H
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M19771_IG02:  ;; offset=0004H
       mov      r8d, 2
       mov      r9d, 16
       cmp      dword ptr [rcx], ecx
       call     [System.Text.StringBuilder:Append(System.String,int,int):System.Text.StringBuilder:this]
       nop      
						;; size=21 bbWeight=1 PerfScore 6.75

G_M19771_IG03:  ;; offset=0019H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 30

After

; Method Program:AppendIndex(System.Text.StringBuilder,System.String)
G_M19771_IG01:  ;; offset=0000H
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M19771_IG02:  ;; offset=0004H
       mov      r8d, 2
       mov      r9d, 16
       cmp      dword ptr [rcx], ecx
       call     [System.Text.StringBuilder:Append(System.String,int,int):System.Text.StringBuilder:this]
       nop      
						;; size=21 bbWeight=1 PerfScore 6.75

G_M19771_IG03:  ;; offset=0019H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 30
AppendLine
public static void AppendLine(StringBuilder builder, string s)
{
    builder.AppendLine(s);
}

Before

; Method Program:AppendLine(System.Text.StringBuilder,System.String)
G_M48267_IG01:  ;; offset=0000H
       push     rsi
       sub      rsp, 32
       mov      rsi, rcx
						;; size=8 bbWeight=1 PerfScore 1.50

G_M48267_IG02:  ;; offset=0008H
       cmp      byte  ptr [rsi], sil
       test     rdx, rdx
       je       SHORT G_M48267_IG04
						;; size=8 bbWeight=1 PerfScore 4.25

G_M48267_IG03:  ;; offset=0010H
       mov      r8d, dword ptr [rdx+08H]
       add      rdx, 12
       mov      rcx, rsi
       call     [System.Text.StringBuilder:Append(byref,int):this]
						;; size=17 bbWeight=0.49 PerfScore 2.70

G_M48267_IG04:  ;; offset=0021H
       mov      rdx, 0x283002028D8      ; '  '
       add      rdx, 12
       mov      rcx, rsi
       mov      r8d, 2
       call     [System.Text.StringBuilder:Append(byref,int):this]
       nop      
						;; size=30 bbWeight=1 PerfScore 4.25

G_M48267_IG05:  ;; offset=003FH
       add      rsp, 32
       pop      rsi
       ret      
						;; size=6 bbWeight=1 PerfScore 1.75
; Total bytes of code: 69

After

; Method Program:AppendLine(System.Text.StringBuilder,System.String)
G_M48267_IG01:  ;; offset=0000H
       push     rsi
       sub      rsp, 32
       mov      rsi, rcx
						;; size=8 bbWeight=1 PerfScore 1.50

G_M48267_IG02:  ;; offset=0008H
       cmp      byte  ptr [rsi], sil
       test     rdx, rdx
       je       SHORT G_M48267_IG04
						;; size=8 bbWeight=1 PerfScore 4.25

G_M48267_IG03:  ;; offset=0010H
       mov      r8d, dword ptr [rdx+08H]
       add      rdx, 12
       mov      rcx, rsi
       call     [System.Text.StringBuilder:Append(byref,int):this]
						;; size=17 bbWeight=0.50 PerfScore 2.75

G_M48267_IG04:  ;; offset=0021H
       mov      rdx, 0x207003028D8      ; '  '
       add      rdx, 12
       mov      rcx, gword ptr [rsi+08H]
       mov      r8d, dword ptr [rsi+18H]
       lea      eax, [r8+02H]
       cmp      dword ptr [rcx+08H], eax
       jb       SHORT G_M48267_IG06
						;; size=31 bbWeight=1 PerfScore 9.00

G_M48267_IG05:  ;; offset=0040H
       movsxd   rdx, r8d
       lea      rcx, bword ptr [rcx+2*rdx+10H]
       mov      word  ptr [rcx], 13
       mov      word  ptr [rcx+02H], 10
       mov      dword ptr [rsi+18H], eax
       jmp      SHORT G_M48267_IG07
						;; size=24 bbWeight=0.50 PerfScore 3.12

G_M48267_IG06:  ;; offset=0058H
       mov      rcx, rsi
       mov      r8d, 2
       call     [System.Text.StringBuilder:AppendWithExpansion(byref,int):this]
						;; size=15 bbWeight=0.50 PerfScore 1.75

G_M48267_IG07:  ;; offset=0067H
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25

G_M48267_IG08:  ;; offset=0068H
       add      rsp, 32
       pop      rsi
       ret      
						;; size=6 bbWeight=1 PerfScore 1.75
; Total bytes of code: 110
Append const
public static void AppendConst(StringBuilder builder)
{
    builder.Append("1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN");
}

Before

; Method Program:AppendConst(System.Text.StringBuilder)
G_M5650_IG01:  ;; offset=0000H
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M5650_IG02:  ;; offset=0004H
       cmp      byte  ptr [rcx], cl
       mov      rdx, 0x14A003028D8      ; '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN'
       add      rdx, 12
       mov      r8d, 50
       call     [System.Text.StringBuilder:Append(byref,int):this]
       nop      
						;; size=29 bbWeight=1 PerfScore 7.00

G_M5650_IG03:  ;; offset=0021H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 38

After

; Method Program:AppendConst(System.Text.StringBuilder)
G_M5650_IG01:  ;; offset=0000H
       sub      rsp, 40
       vzeroupper 
						;; size=7 bbWeight=1 PerfScore 1.25

G_M5650_IG02:  ;; offset=0007H
       mov      rdx, 0x20B002028D8      ; '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN'
       add      rdx, 12
       mov      r8, gword ptr [rcx+08H]
       mov      eax, dword ptr [rcx+18H]
       lea      r9d, [rax+32H]
       cmp      dword ptr [r8+08H], r9d
       jb       SHORT G_M5650_IG04
						;; size=31 bbWeight=1 PerfScore 9.00

G_M5650_IG03:  ;; offset=0026H
       movsxd   r9, eax
       lea      r8, bword ptr [r8+2*r9+10H]
       vmovdqu  ymm0, ymmword ptr [rdx]
       vmovdqu  ymm1, ymmword ptr [rdx+20H]
       vmovdqu  ymm2, ymmword ptr [rdx+40H]
       vmovdqu  xmm3, xmmword ptr [rdx+54H]
       vmovdqu  ymmword ptr [r8], ymm0
       vmovdqu  ymmword ptr [r8+20H], ymm1
       vmovdqu  ymmword ptr [r8+40H], ymm2
       vmovdqu  xmmword ptr [r8+54H], xmm3
       add      eax, 50
       mov      dword ptr [rcx+18H], eax
       jmp      SHORT G_M5650_IG05
						;; size=58 bbWeight=0.50 PerfScore 15.75

G_M5650_IG04:  ;; offset=0060H
       mov      r8d, 50
       call     [System.Text.StringBuilder:AppendWithExpansion(byref,int):this]
						;; size=12 bbWeight=0.50 PerfScore 1.62

G_M5650_IG05:  ;; offset=006CH
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25

G_M5650_IG06:  ;; offset=006DH
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 114
Append by index const
public static void AppendConstIndex(StringBuilder builder)
{
    builder.Append("1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN", 2, 16);
}

Before

; Method Program:AppendConstIndex(System.Text.StringBuilder)
G_M5612_IG01:  ;; offset=0000H
       sub      rsp, 40
       vzeroupper 
						;; size=7 bbWeight=1 PerfScore 1.25

G_M5612_IG02:  ;; offset=0007H
       mov      rdx, 0x257002028D8      ; '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN'
       add      rdx, 12
       add      rdx, 4
       mov      r8, gword ptr [rcx+08H]
       mov      eax, dword ptr [rcx+18H]
       lea      r9d, [rax+10H]
       cmp      dword ptr [r8+08H], r9d
       jb       SHORT G_M5612_IG04
						;; size=35 bbWeight=1 PerfScore 9.25

G_M5612_IG03:  ;; offset=002AH
       movsxd   r9, eax
       lea      r8, bword ptr [r8+2*r9+10H]
       vmovdqu  ymm0, ymmword ptr [rdx]
       vmovdqu  ymmword ptr [r8], ymm0
       add      eax, 16
       mov      dword ptr [rcx+18H], eax
       jmp      SHORT G_M5612_IG05
						;; size=25 bbWeight=0.50 PerfScore 5.75

G_M5612_IG04:  ;; offset=0043H
       mov      r8d, 16
       call     [System.Text.StringBuilder:AppendWithExpansion(byref,int):this]
						;; size=12 bbWeight=0.50 PerfScore 1.62

G_M5612_IG05:  ;; offset=004FH
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25

G_M5612_IG06:  ;; offset=0050H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 85

After

; Method Program:AppendConstIndex(System.Text.StringBuilder)
G_M5612_IG01:  ;; offset=0000H
       sub      rsp, 40
       vzeroupper 
						;; size=7 bbWeight=1 PerfScore 1.25

G_M5612_IG02:  ;; offset=0007H
       mov      rdx, 0x18E802028D8      ; '1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMN'
       add      rdx, 12
       add      rdx, 4
       mov      r8, gword ptr [rcx+08H]
       mov      eax, dword ptr [rcx+18H]
       lea      r9d, [rax+10H]
       cmp      dword ptr [r8+08H], r9d
       jb       SHORT G_M5612_IG04
						;; size=35 bbWeight=1 PerfScore 9.25

G_M5612_IG03:  ;; offset=002AH
       movsxd   r9, eax
       lea      r8, bword ptr [r8+2*r9+10H]
       vmovdqu  ymm0, ymmword ptr [rdx]
       vmovdqu  ymmword ptr [r8], ymm0
       add      eax, 16
       mov      dword ptr [rcx+18H], eax
       jmp      SHORT G_M5612_IG05
						;; size=25 bbWeight=0.50 PerfScore 5.75

G_M5612_IG04:  ;; offset=0043H
       mov      r8d, 16
       call     [System.Text.StringBuilder:AppendWithExpansion(byref,int):this]
						;; size=12 bbWeight=0.50 PerfScore 1.62

G_M5612_IG05:  ;; offset=004FH
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25

G_M5612_IG06:  ;; offset=0050H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 85

@EgorBo
Copy link
Member

EgorBo commented May 8, 2023

I assume you will want to take it from here and close this?

Feel free to integrate into your PR so we can merge it

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing the opportunity!

@EgorBo EgorBo added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 9, 2023
@ghost
Copy link

ghost commented May 9, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Buffer.Memmove is now being unrolled for constant lengths. If we simplify Append(ref char, int) a bit, the JIT can inline it.
As a result, this allows the methods StringBuilder.Append(string) and StringBuilder.AppendLine() to be able to be unrolled.
I didn't seem to need to add any AggressiveInlining attribute, so I left it out.

Const string example

public static void Example(StringBuilder stringBuilder)
{
    stringBuilder.Append("1234567890abcdefghijklmnopqrstuvwxyzåäö");
}

Before:

; Method Program:Example(System.Text.StringBuilder)
G_M35345_IG01:  ;; offset=0000H
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M35345_IG02:  ;; offset=0004H
       cmp      byte  ptr [rcx], cl
       mov      rdx, 0x154802028D8      ; '1'
       add      rdx, 12
       mov      r8d, 39
       call     [System.Text.StringBuilder:Append(byref,int):this]
       nop      
						;; size=29 bbWeight=1 PerfScore 7.00

G_M35345_IG03:  ;; offset=0021H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 38

After:

; Method Program:Example(System.Text.StringBuilder)
G_M35345_IG01:  ;; offset=0000H
       sub      rsp, 40
       vzeroupper 
						;; size=7 bbWeight=1 PerfScore 1.25

G_M35345_IG02:  ;; offset=0007H
       mov      rdx, 0x18C802028D8      ; '1'
       add      rdx, 12
       mov      r8, gword ptr [rcx+08H]
       mov      eax, dword ptr [rcx+18H]
       lea      r9d, [rax+27H]
       cmp      dword ptr [r8+08H], r9d
       jb       SHORT G_M35345_IG04
						;; size=31 bbWeight=1 PerfScore 9.00

G_M35345_IG03:  ;; offset=0026H
       movsxd   r9, eax
       lea      r8, bword ptr [r8+2*r9+10H]
       vmovdqu  ymm0, ymmword ptr [rdx]
       vmovdqu  ymm1, ymmword ptr [rdx+20H]
       vmovdqu  xmm2, xmmword ptr [rdx+3EH]
       vmovdqu  ymmword ptr [r8], ymm0
       vmovdqu  ymmword ptr [r8+20H], ymm1
       vmovdqu  xmmword ptr [r8+3EH], xmm2
       add      eax, 39
       mov      dword ptr [rcx+18H], eax
       jmp      SHORT G_M35345_IG05
						;; size=47 bbWeight=0.50 PerfScore 12.25

G_M35345_IG04:  ;; offset=0055H
       mov      r8d, 39
       call     [System.Text.StringBuilder:AppendWithExpansion(byref,int):this]
						;; size=12 bbWeight=0.50 PerfScore 1.62

G_M35345_IG05:  ;; offset=0061H
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25

G_M35345_IG06:  ;; offset=0062H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 103

AppendLine example

public static void Example(StringBuilder stringBuilder)
{
    stringBuilder.AppendLine();
}

Before:

; Method Program:Example(System.Text.StringBuilder)
G_M35345_IG01:  ;; offset=0000H
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M35345_IG02:  ;; offset=0004H
       cmp      byte  ptr [rcx], cl
       mov      rdx, 0x195802028D8      ; '  '
       add      rdx, 12
       mov      r8d, 2
       call     [System.Text.StringBuilder:Append(byref,int):this]
       nop      
						;; size=29 bbWeight=1 PerfScore 7.00

G_M35345_IG03:  ;; offset=0021H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 38

After:

; Method Program:Example(System.Text.StringBuilder)
G_M35345_IG01:  ;; offset=0000H
       sub      rsp, 40
						;; size=4 bbWeight=1 PerfScore 0.25

G_M35345_IG02:  ;; offset=0004H
       mov      rdx, 0x17C002028D8      ; '  '
       add      rdx, 12
       mov      r8, gword ptr [rcx+08H]
       mov      eax, dword ptr [rcx+18H]
       lea      r9d, [rax+02H]
       cmp      dword ptr [r8+08H], r9d
       jb       SHORT G_M35345_IG04
						;; size=31 bbWeight=1 PerfScore 9.00

G_M35345_IG03:  ;; offset=0023H
       movsxd   r9, eax
       lea      r8, bword ptr [r8+2*r9+10H]
       mov      r9d, dword ptr [rdx]
       mov      dword ptr [r8], r9d
       add      eax, 2
       mov      dword ptr [rcx+18H], eax
       jmp      SHORT G_M35345_IG05
						;; size=22 bbWeight=0.50 PerfScore 3.75

G_M35345_IG04:  ;; offset=0039H
       mov      r8d, 2
       call     [System.Text.StringBuilder:AppendWithExpansion(byref,int):this]
						;; size=12 bbWeight=0.50 PerfScore 1.62

G_M35345_IG05:  ;; offset=0045H
       nop      
						;; size=1 bbWeight=1 PerfScore 0.25

G_M35345_IG06:  ;; offset=0046H
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 75

@EgorBo can you please take a look at this

Author: yesmey
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented May 9, 2023

@AndyAyersMS PTAL inliner change

@EgorBo EgorBo requested a review from AndyAyersMS May 9, 2023 15:14
Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@EgorBo EgorBo merged commit 58a1365 into dotnet:main May 9, 2023
@yesmey yesmey deleted the unroll-stringbuilder branch May 9, 2023 18:11
@sebastienros
Copy link
Member

This apparently is regressing Fortunes benchmarks with middleware in aspnet.

#86113

@EgorBo
Copy link
Member

EgorBo commented May 11, 2023

This apparently is regressing Fortunes benchmarks with middleware in aspnet.

#86113

Yes, this PR introduces more unrollings => more avx512 usages => throttle

This and other regressions are expected to be fixed by #85551

@sebastienros
Copy link
Member

Thanks! The one that keeps giving. Makes sense since I couldn't repro the regression on the newer machines (aspnet-perf-lin).

@ghost ghost locked as resolved and limited conversation to collaborators Jun 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants