Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient recursive pattern matching (?) native code generated #37986

Open
myblindy opened this issue Jun 11, 2020 · 13 comments
Open

Inefficient recursive pattern matching (?) native code generated #37986

myblindy opened this issue Jun 11, 2020 · 13 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Milestone

Comments

@myblindy
Copy link

myblindy commented Jun 11, 2020

Version Used:

VS 16.6

Steps to Reproduce:

I'm using the following code:

using System;
public class C 
{
    public void M() 
    {
        object o = new System.Data.DataTable().Rows[0][0];
        //if (o is object obj) Console.WriteLine($"obj {o}");
        //if (o is string s) Console.WriteLine($"str {s}");
        //if (o is DateTime dt) Console.WriteLine($"dt {dt}");
        if (o is string { Length: 3 } ss) 
            if (ss is { Length: 3 } s40) 
                Console.WriteLine($"arf {s40}");
    }
}

And the issue is that while the code generated is correct, it is indeed checking for null and for the Length property twice, even in release mode, in both ILASM and native assembly (so the JIT is also failing to optimize this):

    L0060: mov edx, eax
    L0062: test edx, edx
    L0064: je short L0070
    L0066: cmp dword ptr [edx], 0x34aacd0
    L006c: je short L0070
    L006e: xor edx, edx
    L0070: test edx, edx
    L0072: je short L0093
    L0074: mov ecx, [edx+4]
    L0077: cmp ecx, 3
    L007a: jne short L0093
    L007c: cmp ecx, 3
    L007f: jne short L0093
    L0081: mov ecx, [0xb7d2270]
    L0087: call System.String.Concat(System.String, System.String)
    L008c: mov ecx, eax
    L008e: call System.Console.WriteLine(System.String)
    L0093: lea esp, [ebp-4]

Expected Behavior:

I would expect the 2nd check to be eliminated, since it's checking exactly the same conditions as the first.

category:cq
theme:basic-cq
skill-level:intermediate
cost:medium

@myblindy
Copy link
Author

The question mark is regarding the fact that I don't know the official name of this pattern, I just found out it exists yesterday. Let me know if it's wrong and I'll edit it.

@alrz
Copy link
Member

alrz commented Jun 11, 2020

        if (o is string { Length: 3 } ss) 
            if (ss is { Length: 3 } s40) 

You don't need the second if. You're checking length twice in source, hence the duplicated codegen.

@myblindy
Copy link
Author

myblindy commented Jun 11, 2020

I understand I don't need the second if, I just think the compiler should also understand that and not emit the same code twice.

If you're having doubts about this, consider this other chunk of code:

        var i = (int)o;
        if(i == 2)
            if(i == 2)
                Console.WriteLine("arf");

which generates the following correct native code:

    L00ad: mov ecx, [esi+4]
    L00b0: cmp ecx, 2
    L00b3: jne short L00c0
    L00b5: mov ecx, [0xb7d2354]
    L00bb: call System.Console.WriteLine(System.String)

@alrz
Copy link
Member

alrz commented Jun 11, 2020

If anything, the compiler won't usually do this kind of optimizations. If you get optimal code for some of code patterns, those are most likely recognized by JIT -- the compiler, on the other hand, still emits the source as-is, you can verify that in the resulting IL.

@gafter gafter transferred this issue from dotnet/roslyn Jun 16, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Jun 16, 2020
@myblindy
Copy link
Author

myblindy commented Jun 17, 2020

I was about to close this issue after thinking about it for a couple of days and coming to the conclusion that checking the Length property is probably not guaranteed to not have changed (ie in a badly written multi-threaded scenario).

However, if you check the generated assembly code again, it doesn't actually check the Length property twice, it checks the copy in the register twice! This is absolutely useless, the register value is never updated between the checks.

So if the code should check the property twice to consider (again, badly written) multi-threaded code, it should copy the length in the register twice as well. And if it shouldn't (this is my vote), then it shouldn't compare the register twice to the same value (like it already does for plain integers). Either way, something is not completely correct here.

Edit: Also strings in .Net are immutable, so the length should never change unless some dark magic in the VM changes them.

@AndyAyersMS
Copy link
Member

Thanks for the report. Codegen for a slightly simplified version:

using System;
public class C 
{
    static object o_s;

    public static void Main() 
    {
        object o = o_s;
        if (o is string { Length: 3 } ss) 
            if (ss is { Length: 3 } s40) 
                Console.WriteLine($"arf {s40}");
    }
}

shows the same features -- inefficient transition from the castclass checks to the utlimate use, and the redundant length check.

; Assembly listing for method C:Main()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 loc0         [V00,T02] (  4,  3   )     ref  ->  rdx         class-hnd
;* V01 loc1         [V01    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+0x00]   "OutgoingArgSpace"
;  V03 tmp1         [V03,T01] (  2,  4   )     ref  ->  rdx         "CASTCLASS eval op1"
;  V04 tmp2         [V04,T00] (  5,  6.74)     ref  ->  rdx         class-hnd "spilling QMark2"
;  V05 cse0         [V05,T03] (  3,  1.50)     int  ->  rcx         "CSE - moderate"
;
; Lcl frame size = 40

G_M48905_IG01:
       4883EC28             sub      rsp, 40
						;; bbWeight=1    PerfScore 0.25
G_M48905_IG02:
       48B9882C63296B020000 mov      rcx, 0x26B29632C88
       488B11               mov      rdx, gword ptr [rcx]
       4885D2               test     rdx, rdx
       7411                 je       SHORT G_M48905_IG05
						;; bbWeight=1    PerfScore 3.50
G_M48905_IG03:
       48B9285F13FCFB7F0000 mov      rcx, 0x7FFBFC135F28
       48390A               cmp      qword ptr [rdx], rcx
       7402                 je       SHORT G_M48905_IG05
						;; bbWeight=0.25 PerfScore 0.81
G_M48905_IG04:
       33D2                 xor      rdx, rdx
						;; bbWeight=0.12 PerfScore 0.03
G_M48905_IG05:
       4885D2               test     rdx, rdx
       7427                 je       SHORT G_M48905_IG07
						;; bbWeight=1    PerfScore 1.25
G_M48905_IG06:
       8B4A08               mov      ecx, dword ptr [rdx+8]
       83F903               cmp      ecx, 3
       751F                 jne      SHORT G_M48905_IG07
       83F903               cmp      ecx, 3
       751A                 jne      SHORT G_M48905_IG07
       48B9883163296B020000 mov      rcx, 0x26B29633188
       488B09               mov      rcx, gword ptr [rcx]
       E8551BFFFF           call     System.String:Concat(System.String,System.String):System.String
       488BC8               mov      rcx, rax
       E8C5F8FFFF           call     System.Console:WriteLine(System.String)
						;; bbWeight=0.50 PerfScore 4.50
G_M48905_IG07:
       90                   nop      
						;; bbWeight=1    PerfScore 0.25
G_M48905_IG08:
       4883C428             add      rsp, 40
       C3                   ret      
						;; bbWeight=1    PerfScore 1.25

; Total bytes of code 89, prolog size 4, PerfScore 20.74, (MethodHash=eba540f6) for method C:Main()

We are in the process of buttoning up work for 5.0, so going to mark this as future.

@AndyAyersMS AndyAyersMS added optimization and removed untriaged New issue has not been triaged by the area owner labels Jun 17, 2020
@myblindy
Copy link
Author

I don't know how to get the fancy output you're getting, but you can see the same problem in a much simpler scenario:

using System;
public class C 
{
    public void M() 
    {
        string s = System.Reflection.Assembly.GetExecutingAssembly().CodeBase;
        if(s.Length == 2)
            if(s.Length == 2)
                Console.WriteLine("arf");
    }
}

Simply reading the property (the weird string assignment is so the optimizer doesn't get rid of it) is enough, it's not a pattern matching issue.

@myblindy
Copy link
Author

The asm result is:

    L0016: mov ecx, [rax+0x8]
    L0019: cmp ecx, 0x2
    L001c: jnz L0035
    L001e: cmp ecx, 0x2
    L0021: jnz L0035

Again, the first line copies Length in a register, then it's compared twice, with a conditional jump each time.

@AndyAyersMS
Copy link
Member

I don't know how to get the fancy output you're getting

This comes from the jit, if you're using a checked build of the jit, and set COMPlus_JitDisasm. See viewing jit dumps for details.

you can see the same problem in a much simpler scenario:

Thanks. As you can see the jit is not very good at removing certain kinds of redundant branches.

@AndyAyersMS AndyAyersMS added this to the Future milestone Jun 17, 2020
@BruceForstall BruceForstall added the JitUntriaged CLR JIT issues needing additional triage label Oct 28, 2020
@BruceForstall BruceForstall modified the milestones: Future, 6.0.0 Nov 7, 2020
@BruceForstall BruceForstall removed the JitUntriaged CLR JIT issues needing additional triage label Nov 7, 2020
@JulieLeeMSFT JulieLeeMSFT added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 23, 2021
@AndyAyersMS
Copy link
Member

AndyAyersMS commented Mar 30, 2021

FWIW the latest 6.0 codegen does not contain redundant length compares:

; Assembly listing for method C:Main()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;  V00 loc0         [V00,T02] (  4,  3   )     ref  ->  rdx         class-hnd
;* V01 loc1         [V01    ] (  0,  0   )     ref  ->  zero-ref    class-hnd
;  V02 OutArgs      [V02    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;  V03 tmp1         [V03,T01] (  2,  4   )     ref  ->  rdx         "CASTCLASS eval op1"
;  V04 tmp2         [V04,T00] (  5,  6.75)     ref  ->  rdx         class-hnd "spilling QMark2"
;
; Lcl frame size = 40

G_M48905_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
                                                ;; bbWeight=1    PerfScore 0.25
G_M48905_IG02:              ;; offset=0004H
       48B9382DF1390B020000 mov      rcx, 0x20B39F12D38
       488B11               mov      rdx, gword ptr [rcx]
       4885D2               test     rdx, rdx
       7411                 je       SHORT G_M48905_IG05
                                                ;; bbWeight=1    PerfScore 3.50
G_M48905_IG03:              ;; offset=0016H
       48B900BA2897FD7F0000 mov      rcx, 0x7FFD9728BA00
       48390A               cmp      qword ptr [rdx], rcx
       7402                 je       SHORT G_M48905_IG05
                                                ;; bbWeight=0.25 PerfScore 0.81
G_M48905_IG04:              ;; offset=0025H
       33D2                 xor      rdx, rdx
                                                ;; bbWeight=0.12 PerfScore 0.03
G_M48905_IG05:              ;; offset=0027H
       4885D2               test     rdx, rdx
       7420                 je       SHORT G_M48905_IG07
                                                ;; bbWeight=1    PerfScore 1.25
G_M48905_IG06:              ;; offset=002CH
       837A0803             cmp      dword ptr [rdx+8], 3
       751A                 jne      SHORT G_M48905_IG07
       48B9D831F1390B020000 mov      rcx, 0x20B39F131D8
       488B09               mov      rcx, gword ptr [rcx]
       E8B4D0FEFF           call     System.String:Concat(System.String,System.String):System.String
       488BC8               mov      rcx, rax
       E8E4FCFFFF           call     System.Console:WriteLine(System.String)
                                                ;; bbWeight=0.50 PerfScore 3.75
G_M48905_IG07:              ;; offset=004CH
       90                   nop
                                                ;; bbWeight=1    PerfScore 0.25
G_M48905_IG08:              ;; offset=004DH
       4883C428             add      rsp, 40
       C3                   ret
                                                ;; bbWeight=1    PerfScore 1.25
; Total bytes of code 82, prolog size 4, PerfScore 19.29, instruction count 21, allocated bytes for code 82 (MethodHash=eba540f6) for method C:Main()

It still has one other redundant compare at IG05. Perhaps once I sort out the issues with #49713 we can get rid of this too.

@AndyAyersMS AndyAyersMS removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 30, 2021
@AndyAyersMS
Copy link
Member

Going to mark this as future, though I am hoping to find time to work on #49713 before 6.0 is done.

@AndyAyersMS
Copy link
Member

Was hoping #76283 would apply here, but it's blocked by a side effecting assignment.

@myblindy
Copy link
Author

Getting better one PR at a time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

No branches or pull requests

6 participants