Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: extend redundant branch opt to catch partially redundant cases #49713

Closed

Conversation

AndyAyersMS
Copy link
Member

If the current relop has PHI inputs, see if any of those inputs would
produce the same relop VN as a dominating compare; if so the current
relop is partially redundant, and we may be able to optimize some of
the paths through the relop block via jump threading.

Addresses cases like the one seen in #48115, though that particular
case is not optimized as the current relop block has a side effect.

If the current relop has PHI inputs, see if any of those inputs would
produce the same relop VN as a dominating compare; if so the current
relop is partially redundant, and we may be able to optimize some of
the paths through the relop block via jump threading.

Addresses cases like the one seen in dotnet#48115, though that particular
case is not optimized as the current relop block has a side effect.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 16, 2021
@AndyAyersMS
Copy link
Member Author

@EgorBo PTAL
cc @dotnet/jit-contrib

SPMI sees ~300 cases across all the collections. Regressions are small.

33 total methods with Code Size differences (32 improved, 1 regressed), 1 unchanged.
15 total methods with Code Size differences (14 improved, 1 regressed), 3 unchanged.
89 total methods with Code Size differences (78 improved, 11 regressed), 7 unchanged.
94 total methods with Code Size differences (85 improved, 9 regressed), 8 unchanged.
107 total methods with Code Size differences (98 improved, 9 regressed), 9 unchanged.
149 total methods with Code Size differences (148 improved, 1 regressed), 1 unchanged.

Total bytes of delta: -1217 (-6.41% of base)
Total bytes of delta: -192 (-1.41% of base)
Total bytes of delta: -1221 (-1.17% of base)
Total bytes of delta: -1279 (-1.32% of base)
Total bytes of delta: -1647 (-1.63% of base)
Total bytes of delta: -7183 (-10.01% of base)

Sample diff. In the Before picture, IG06 has 3 preds; two for which RAX is null, and one where it's not null. The compare in IG06 in VN space is a constant compared to a PHI. The opt scans through the phi and discovers one of the phi input VNs was redundantly compared in a dominating block, and so we try jump threading, and that succeeds...

;;; Assembly listing for method System.IO.Pipelines.Pipe:GetReadAsyncStatus():int:this

;;; BEFORE

G_M35282_IG05:        ; gcVars=0000000000000000 {}, gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, gcvars, byref, isz
       ; gcrRegs +[rcx]
       add      rcx, 216
       ; gcrRegs -[rcx]
       ; byrRegs +[rcx]
       mov      rax, gword ptr [rcx]
       ; gcrRegs +[rax]
       test     rax, rax
       je       SHORT G_M35282_IG06
       mov      rdx, 0xD1FFAB1E
       cmp      qword ptr [rax], rdx
       je       SHORT G_M35282_IG06
       xor      rax, rax
						;; bbWeight=0    PerfScore 0.00
G_M35282_IG06:        ; gcrefRegs=00000001 {rax}, byrefRegs=00000000 {}, byref, isz
       ; byrRegs -[rcx]
       test     rax, rax
       je       SHORT G_M35282_IG08
       mov      eax, 2
       ; gcrRegs -[rax]
						;; bbWeight=0    PerfScore 0.00
G_M35282_IG07:        ; , epilog, nogc, extend
       ret      
						;; bbWeight=0    PerfScore 0.00
G_M35282_IG08:        ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
       mov      eax, 1
						;; bbWeight=0    PerfScore 0.00
G_M35282_IG09:        ; , epilog, nogc, extend
       ret      
						;; bbWeight=0    PerfScore 0.00

;;; AFTER

G_M35282_IG05:        ; gcVars=0000000000000000 {}, gcrefRegs=00000002 {rcx}, byrefRegs=00000000 {}, gcvars, byref, isz
       ; gcrRegs +[rcx]
       add      rcx, 216
       ; gcrRegs -[rcx]
       ; byrRegs +[rcx]
       mov      rax, gword ptr [rcx]
       ; gcrRegs +[rax]
       test     rax, rax
       je       SHORT G_M35282_IG07
       mov      eax, 2
       ; gcrRegs -[rax]
						;; bbWeight=0    PerfScore 0.00
G_M35282_IG06:        ; , epilog, nogc, extend
       ret      
						;; bbWeight=0    PerfScore 0.00
G_M35282_IG07:        ; gcVars=0000000000000000 {}, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, gcvars, byref
       ; byrRegs -[rcx]
       mov      eax, 1
						;; bbWeight=0    PerfScore 0.00
G_M35282_IG08:        ; , epilog, nogc, extend
       ret      
						;; bbWeight=0    PerfScore 0.00

Jit dump for the above

... [hasInterestingVN] in BB06 relop first operand VN is PhiDef for V05
N003 (  5,  4) [000054] N------N----              *  EQ        int    $250
N001 (  3,  2) [000052] ------------              +--*  LCL_VAR   ref    V05 tmp4         u:2 (last use) $281
N002 (  1,  1) [000053] ------------              \--*  CNS_INT   ref    null $VN.Null
... phi input [0] has VN $280
... substitute VN is $24b
... phi search succeeded after checking 1 cases

Dominator BB03 of BB06 has relop with an interesting liberal VN:
N003 (  5,  4) [000042] J------N----              *  EQ        int    <l:$24b, c:$24a>
N001 (  3,  2) [000041] ------------              +--*  LCL_VAR   ref    V05 tmp4         u:1 <l:$280, c:$2c0>
N002 (  1,  1) [000040] ------------              \--*  CNS_INT   ref    null $VN.Null
Implies current relop is partially redundant compare
N003 (  5,  4) [000054] N------N----              *  EQ        int    $250
N001 (  3,  2) [000052] ------------              +--*  LCL_VAR   ref    V05 tmp4         u:2 (last use) $281
N002 (  1,  1) [000053] ------------              \--*  CNS_INT   ref    null $VN.Null
Both successors of IDom BB03 reach BB06 -- attempting jump threading
BB03 is a true pred
BB04 is a false pred
BB05 is a false pred
BB05 is the fall-through pred
Optimizing via jump threading
Jump flow from pred BB03 -> BB06 implies predicate true; we can safely redirect flow to be BB03 -> BB08
Setting edge weights for BB03 -> BB08 to [0 .. 0]
Jump flow from pred BB04 -> BB06 implies predicate

Still need to look at TP here as in the worst case (large number of PHI inputs, deep dominance tree) we could be doing a lot of searching.

@AndyAyersMS
Copy link
Member Author

Here is some TP data (from spmi's -v v over the aspnnet collection, release builds). Doesn't appear to be anything problematic in the methods processed there (~39000 or so).

BASE
Total time: 7811.973500ms
Total time: 7754.420600ms
Total time: 7749.713900ms
Total time: 7772.103300ms
Total time: 7782.580800ms
DIFF
Total time: 7805.535700ms
Total time: 7764.570400ms
Total time: 7785.820900ms
Total time: 7757.510900ms
Total time: 7721.618100ms

@AndyAyersMS
Copy link
Member Author

Test failures indicate there's a flaw in the logic here... not quite sure what it is just yet. Looks like I'll have plenty of cases to choose from when debugging, though.

Suspect perhaps that we can only jump thread those preds that bring in the same VN as the one we matched, and right now we're optimizing all the preds, and this is too aggressive.

@EgorBo
Copy link
Member

EgorBo commented Mar 17, 2021

A minimal repro:

using System;

internal class SmallLoop1
{
    public static void Main()
    {
        int j = 2;
        for (int i = 1; i < 5; i++)
            j++;

        if (j != 6) 
            Console.WriteLine("FAILED");

        Console.ReadKey();
    }
}
... [hasInterestingVN] in BB03 relop first operand VN is PhiDef for V00
N003 (  3,  3) [000022] J------N----              *  EQ        int    $105
N001 (  1,  1) [000020] ------------              +--*  LCL_VAR   int    V00 loc0         u:3 (last use) $142
N002 (  1,  1) [000021] ------------              \--*  CNS_INT   int    6 $46
... phi input [0] has VN $42
... substitute VN is $40
... phi search succeeded after checking 1 cases

Dominator BB01 of BB03 has relop with an interesting liberal VN:
N003 (  3,  3) [000045] J------N----              *  GE        int    $40
N001 (  1,  1) [000046] ------------              +--*  LCL_VAR   int    V01 loc1         u:2 $43
N002 (  1,  1) [000047] ------------              \--*  CNS_INT   int    5 $44
Implies current relop is partially redundant compare
N003 (  3,  3) [000022] J------N----              *  EQ        int    $105
N001 (  1,  1) [000020] ------------              +--*  LCL_VAR   int    V00 loc0         u:3 (last use) $142
N002 (  1,  1) [000021] ------------              \--*  CNS_INT   int    6 $46
Both successors of IDom BB01 reach BB03 -- attempting jump threading
BB01 is a true pred
BB02 is a false pred
BB02 is the fall-through pred
Optimizing via jump threading
Jump flow from pred BB01 -> BB03 implies predicate true; we can safely redirect flow to be BB01 -> BB05
Setting edge weights for BB01 -> BB05 to [0 .. 3.402823e+38]

Removing statement STMT00005 (IL 0x012...0x014)
N004 (  5,  5) [000023] ------------              *  JTRUE     void  
N003 (  3,  3) [000022] J------N----              \--*  EQ        int    $105
N001 (  1,  1) [000020] ------------                 +--*  LCL_VAR   int    V00 loc0         u:3 (last use) $142
N002 (  1,  1) [000021] ------------                 \--*  CNS_INT   int    6 $46
 in BB03 as useless:

Fall through flow from pred BB02 -> BB03 implies predicate false
  repurposing BB03 to always fall through to BB04

@AndyAyersMS
Copy link
Member Author

Thanks, seems like there are actually several interesting aspects:

  • if the current relop VN evaluates to a constant then than any dominating compare with the same outcome will match. Here we have false and false. When something like this happens we'll reach the wrong conclusions.
    • the existing optimization (which is not destructuring phis) seems vulnerable to this too. The case where the current compare has a known outcome without any consideration of dominating compares should likely be handled specially. There might be something upstream handling these cases already which is why this hasn't tripped anything up.
    • if we substitute in a phi value and we get a constant then the branch is partially redundant regardless of any dominating compare, for any pred that would bring in that phi value. We don't keep track of the correspondence between phis and pred BBs (indeed we don't even have a 1-1 correspondence) so we'd need to reconstruct this info somehow.
    • phi inputs that lead to ambiguous outcomes may be partially redundant with a dominating compare (as was the intention of this PR) but again the resolution is only for those pred blocks that bring in that value. Other preds associated with other phis may have the same, the opposite, or an ambiguous outcome.
    • for the phi-destructuring we may need to disregard dominators with known-outcome VNs, and/or else disregard the unreachable paths from those dominating compares. Here the path from BB01->BB03 is unreachable and so the phi value (j =2) is unrealizable.

So the upshot is that if we substitute in a phi we can either get:

  • a known outcome -- branch is partially redundant along some paths regardless of dominator behavior
  • an unknown outcome with matching dominator -- branch is partially redundant along some paths, behavior requires path analysis
  • an unknown outcome

and we need to account for those paths and pass them down to the jump threader in some fashion; this will feed into the true/false/ambiguous preds detection done there.

@AndyAyersMS
Copy link
Member Author

Need to rethink this.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 16, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants