Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Slightly refactor handling of conditions/conditional branches in the backend #82020

Merged
merged 26 commits into from
Feb 16, 2023

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Feb 13, 2023

Slightly refactor how compares are represented and processed in the backend in preparation of lifting the remaining cmov restrictions.

  • Change all compare nodes to always produce values and never set flags. That is, GT_EQ/GT_NE/GT_LT/GT_LE/GT_GE/GT_GT/GT_TEST_EQ/GT_TEST_NE now are all "normal" nodes, always.

  • Stop optimizing compares by looking for a user JTRUE node. Instead, we optimize the compare when we see it (for example, EQ/NE(AND(x, y), 0) is turned into GT_TEST_EQ/GT_TEST_NE(x, y)), but we do not change the user JTRUE node to a JCC node. We now never create SETCC nodes when we see the compare, and we always turn JTRUE nodes into JCC nodes with a proper condition.

  • Handle the transformation from JTRUE to JCC when we actually get to JTRUE. The transformation that now happens is:

    1. The compare node is turned into a GT_CMP (for GT_EQ to GT_GT) or GT_TEST node (for GT_TEST_EQ/GT_TEST_NE) node. These nodes are always void-typed and do not produce values; they only set the CPU flags, and correspond directly to low-level target instructions (like cmp/test).
    2. The JTRUE is turned into JCC with the condition from the compare.

    For arm64/xarch, we handle optimizing EQ/NE(node_that_sets_zero_flag, 0) at this stage too now.

  • Introduce new xarch backend-only GT_BITTEST_EQ/GT_BITTEST_NE that semantically correspond to (x & (1 << y)) != 0. The corresponding "set flags" node is GT_BT and already exists. We would previously transform (x & (1 << y)) != 0 into GT_BT + GT_SETCC (when the user was not a JTRUE). This is now transformed into GT_BITTEST_EQ/GT_BITTEST_NE instead, which are completely normal nodes. The processing of JTRUE handles the transformation into BT + JCC.

  • Change some of the emitter compare peepholes to work with this scheme. This requires some lookahead when we see them, but allows us to get rid of an "ugly" flag.

  • Allow liveness to DCE GT_CMP/GT_TEST when they have had GTF_SET_FLAGS unset (due to FG optimizations). We already allowed this for GT_EQ, so this showed up as some regressions.

Some example LIR diffs:

  Generating: N033 (???,???) [000136] -----------                            IL_OFFSET void   INLRT @ 0x00A[E-] REG NA
  Generating: N035 (  1,  1) [000009] -----------                    t9 =    LCL_VAR   int    V02 loc1         u:1 ecx REG ecx $141
  Generating: N037 (  1,  1) [000011] -c---------                   t11 =    CNS_INT   int    16 REG NA $42
                                                                          ┌──▌  t9     int    
                                                                          ├──▌  t11    int    
- Generating: N039 (  3,  3) [000012] N------N-U-                         ▌  GE        void   REG NA $142
+ Generating: N039 (  3,  3) [000012] -------N-U-                         ▌  CMP       void   REG NA
  IN0005:        cmp      ecx, 16
- Generating: N041 (  5,  5) [000013] -----------                         ▌  JTRUE     void   REG NA $VN.Void
+ Generating: N041 (???,???) [000144] -----------                            JCC       void   cond=UGE REG NA
  IN0006:        jae      L_M20112_BB03
                                                                          ┌──▌  t112   ubyte  
                                                                          ├──▌  t54    int    
- Generating: N061 ( 22, 14) [000060] ---XGO-----                         ▌  BT        void   REG NA
+ Generating: N061 ( 22, 14) [000060] ---XGO-----                   t60 = ▌  BITTEST_NE int    REG eax
  IN0009:        bt       eax, esi
- Generating: N063 (???,???) [000144] -----------                  t144 =    SETCC     int    cond=C REG eax
  IN000a:        setb     al
  IN000b:        movzx    eax, al
-                                                                         ┌──▌  t144   int    
+                                                                         ┌──▌  t60   int    
  Generating: N065 ( 23, 15) [000061] ---XGO-----                         ▌  RETURN    int    REG NA <l:$1c8, c:$1c7>

…e backend

Slightly refactor how compares are represented and processed in the
backend in preparation of lifting the remaining cmov restrictions.

* Change all compare nodes to always produce values and never set flags.
  That is, GT_EQ/GT_NE/GT_LT/GT_LE/GT_GE/GT_GT/GT_TEST_EQ/GT_TEST_NE now
  are all "normal" nodes.
* Stop optimizing compares by looking for a user JTRUE node. Instead, we
  optimize the compare when we see it (for example, EQ/NE(AND(x, y), 0)
  is turned into GT_TEST_EQ/GT_TEST_NE(x, y)), but we do not change the
  user JTRUE node to a JCC node. We now never create SETCC nodes when we
  see the compare.
* Handle the transformation from JTRUE to JCC when we actually get to
  JCC. The transformation that now happens is:
  1. The compare node is turned into a GT_CMP (for GT_EQ to GT_GT) or
     GT_TEST node (for GT_TEST_EQ/GT_TEST_NE) node. These nodes are
     always void-typed and do not produce values; they only set the CPU
     flags, and correspond directly to low-level target instructions
     (like cmp/test). GT_CMP already existed, GT_TEST is added as part
     of this PR.
  2. The JTRUE is turned into JCC with the condition from the compare.

  For xarch/arm64 we handle optimizing EQ/NE(node_that_sets_zf, 0) at
  this stage too now.
* Introduce new xarch backend-only GT_BITTEST_EQ/GT_BITTEST_NE that
  semantically correspond to (x & (1 << y)) != 0. The corresponding "set
  flags" node is GT_BT and already exists. We would previously transform
  (x & (1 << y)) != 0 into GT_BT + GT_SETCC (when the user was not a
  JTRUE). This is now transformed into GT_BITTEST_EQ/GT_BITTEST_NE
  instead, which are completely normal nodes. The processing of JTRUE
  handles the transformation into BT + JCC (in the same way as described
  above).

What I was after is moving all handling of the conditions to when we
process the JTRUE node, in anticipation of doing the same for GT_SELECT.
For GT_SELECT we are going to have to reorder the condition with the
true/false values, and when we were handling some of these
transformations as part of the compare that seemed dangerous.
@ghost ghost assigned jakobbotsch Feb 13, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 13, 2023
@ghost
Copy link

ghost commented Feb 13, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

Slightly refactor how compares are represented and processed in the backend in preparation of lifting the remaining cmov restrictions.

  • Change all compare nodes to always produce values and never set flags. That is, GT_EQ/GT_NE/GT_LT/GT_LE/GT_GE/GT_GT/GT_TEST_EQ/GT_TEST_NE now are all "normal" nodes, always.

  • Stop optimizing compares by looking for a user JTRUE node. Instead, we optimize the compare when we see it (for example, EQ/NE(AND(x, y), 0) is turned into GT_TEST_EQ/GT_TEST_NE(x, y)), but we do not change the user JTRUE node to a JCC node. We now never create SETCC nodes when we see the compare, and we always turn JTRUE nodes into JCC nodes with a proper condition.

  • Handle the transformation from JTRUE to JCC when we actually get to JTRUE. The transformation that now happens is:

    1. The compare node is turned into a GT_CMP (for GT_EQ to GT_GT) or GT_TEST node (for GT_TEST_EQ/GT_TEST_NE) node. These nodes are always void-typed and do not produce values; they only set the CPU flags, and correspond directly to low-level target instructions (like cmp/test).
    2. The JTRUE is turned into JCC with the condition from the compare.

    For arm64/xarch, we handle optimizing EQ/NE(node_that_sets_zero_flag, 0) at this stage too now.

  • Introduce new xarch backend-only GT_BITTEST_EQ/GT_BITTEST_NE that semantically correspond to (x & (1 << y)) != 0. The corresponding "set flags" node is GT_BT and already exists. We would previously transform (x & (1 << y)) != 0 into GT_BT + GT_SETCC (when the user was not a JTRUE). This is now transformed into GT_BITTEST_EQ/GT_BITTEST_NE instead, which are completely normal nodes. The processing of JTRUE handles the transformation into BT + JCC.

What I was after is moving all handling of the condition to when we process the JTRUE node, in anticipation of doing the same for GT_SELECT. For GT_SELECT we are going to have to reorder the condition with the true/false values, and when we were handling this as part of the compares in some cases that seemed dangerous.

Some example LIR diffs:

  Generating: N033 (???,???) [000136] -----------                            IL_OFFSET void   INLRT @ 0x00A[E-] REG NA
  Generating: N035 (  1,  1) [000009] -----------                    t9 =    LCL_VAR   int    V02 loc1         u:1 ecx REG ecx $141
  Generating: N037 (  1,  1) [000011] -c---------                   t11 =    CNS_INT   int    16 REG NA $42
                                                                          ┌──▌  t9     int    
                                                                          ├──▌  t11    int    
- Generating: N039 (  3,  3) [000012] N------N-U-                         ▌  GE        void   REG NA $142
+ Generating: N039 (  3,  3) [000012] -------N-U-                         ▌  CMP       void   REG NA
  IN0005:        cmp      ecx, 16
- Generating: N041 (  5,  5) [000013] -----------                         ▌  JTRUE     void   REG NA $VN.Void
+ Generating: N041 (???,???) [000144] -----------                            JCC       void   cond=UGE REG NA
  IN0006:        jae      L_M20112_BB03
                                                                          ┌──▌  t112   ubyte  
                                                                          ├──▌  t54    int    
- Generating: N061 ( 22, 14) [000060] ---XGO-----                         ▌  BT        void   REG NA
+ Generating: N061 ( 22, 14) [000060] ---XGO-----                   t60 = ▌  BITTEST_NE int    REG eax
  IN0009:        bt       eax, esi
- Generating: N063 (???,???) [000144] -----------                  t144 =    SETCC     int    cond=C REG eax
  IN000a:        setb     al
  IN000b:        movzx    eax, al
                                                                          ┌──▌  t144   int    
  Generating: N065 ( 23, 15) [000061] ---XGO-----                         ▌  RETURN    int    REG NA <l:$1c8, c:$1c7>
Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

Looks like I need to update a few peeps, and there's an arm32 issue.

@jakobbotsch jakobbotsch marked this pull request as draft February 13, 2023 12:49
@jakobbotsch
Copy link
Member Author

Some diffs are because we treat GT_BITTEST nodes like other compares now, so we get some of the peepholes from there for free:

@@ -18,15 +18,15 @@
 G_M16052_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, nogc <-- Prolog IG
 						;; size=0 bbWeight=1 PerfScore 0.00
 G_M16052_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       xor      eax, eax
        bt       rcx, rdx
        setb     al
-       movzx    rax, al
-						;; size=10 bbWeight=1 PerfScore 1.75
+						;; size=9 bbWeight=1 PerfScore 1.75
 G_M16052_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=1 bbWeight=1 PerfScore 1.00
 
-; Total bytes of code 11, prolog size 0, PerfScore 3.85, instruction count 4, allocated bytes for code 11 (MethodHash=2849c14b) for method Microsoft.CodeAnalysis.BitVector:IsTrue(ulong,int):bool
+; Total bytes of code 10, prolog size 0, PerfScore 3.75, instruction count 4, allocated bytes for code 10 (MethodHash=2849c14b) for method Microsoft.CodeAnalysis.BitVector:IsTrue(ulong,int):bool

@jakobbotsch
Copy link
Member Author

Some other diffs are because the new "staged" approach composes better, so some optimizations can now enable new optimizations:

@@ -29,14 +29,9 @@ G_M26981_IG05:        ; bbWeight=0.50, gcVars=0000000000000000 {}, gcrefRegs=000
        blsr     eax, ecx
        setne    al
        movzx    rax, al
-       mov      edx, 1
-       xor      r8d, r8d
-       test     eax, eax
-       cmove    edx, r8d
-       mov      eax, edx
        cmp      ecx, 1
        je       SHORT G_M26981_IG07
-						;; size=32 bbWeight=0.50 PerfScore 2.12
+						;; size=16 bbWeight=0.50 PerfScore 1.50
 G_M26981_IG06:        ; bbWeight=4, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, isz
        inc      eax
        sar      ecx, 1
@@ -47,7 +42,7 @@ G_M26981_IG07:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byr
        ret      
 						;; size=1 bbWeight=0.50 PerfScore 0.50
 
-; Total bytes of code 49, prolog size 0, PerfScore 17.40, instruction count 19, allocated bytes for code 49 (MethodHash=e65b969a) for method ILCompiler.Reflection.ReadyToRun.GcInfoTypes:CeilOfLog2(int):uint
+; Total bytes of code 33, prolog size 0, PerfScore 15.18, instruction count 14, allocated bytes for code 33 (MethodHash=e65b969a) for method ILCompiler.Reflection.ReadyToRun.GcInfoTypes:CeilOfLog2(int):uint

We have the following LIR:

               [000073] -----------                            IL_OFFSET void   INLRT @ 0x00C[E-]
N001 (  1,  1) [000004] -----------                    t4 =    LCL_VAR   int    V00 arg0         u:1 $80
N002 (  1,  1) [000005] -----------                    t5 =    LCL_VAR   int    V00 arg0         u:1 $80
N003 (  1,  1) [000006] -----------                    t6 =    CNS_INT   int    -1 $41
                                                            ┌──▌  t5     int    
                                                            ├──▌  t6     int    
N004 (  3,  3) [000007] -----------                    t7 =ADD       int    $101
                                                            ┌──▌  t4     int    
                                                            ├──▌  t7     int    
N005 (  5,  5) [000008] -----------                    t8 =AND       int    $102
N006 (  1,  1) [000009] -----------                    t9 =    CNS_INT   int    0 $40
                                                            ┌──▌  t8     int    
                                                            ├──▌  t9     int    
N007 (  7,  7) [000010] J------N---                   t10 =NE        int    $103
N008 (  1,  1) [000012] -----------                   t12 =    CNS_INT   int    1 $42
N009 (  1,  1) [000035] -----------                   t35 =    CNS_INT   int    0 $40
                                                            ┌──▌  t10    int    
                                                            ├──▌  t12    int    
                                                            ├──▌  t35    int    
N010 ( 10, 10) [000067] -----------                   t67 =SELECT    int   
                                                            ┌──▌  t67    int    

We first transform the AND(V00, SUB(V00, 1)) into HWINTRINSIC(ResetLowestSetBit). Previously we then lowered NE(HWINTRINSIC(ResetLowestSetBit), 0) into HWINTRINSIC + SETCC since this intrinsic sets ZF. We then ended up with
SELECT(SETCC(HWINTRINSIC(ResetLowestSetBit)), 0, 1) which the SELECT transformation I added recently does not handle.

With this PR we instead end up with SELECT(NE(HWINTRINSIC(ResetLowestSetBit), 0), 1, 0) that is handled and turned into just NE(HWINTRINSIC(ResetLowestSetBit), 0). There is an emitter peephole that then omits generating a compare for the final NE since it knows the HW intrinsic already set the zero flag.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Feb 13, 2023

Some other of the improvements are because we now support some dead code elimination for some arithmetic + JCC pairs that we emit in some cases. In those cases, even before this change, we are emitting:

N023 ( 13, 13) [000087] ----GO-----                   t87 =XOR       int    <l:$500, c:$501>
N026 ( 17, 17) [000090] ----GO-----                            JCC       void   cond=UNE

We did not handle this in fgOptimizeBranchToNext before (we only handled JTRUE with an operand that sets the flags), so we would remove the JCC but leave the XOR with GTF_SET_FLAGS set so that it could not be DCE'd.

jakobbotsch added a commit to jakobbotsch/runtime that referenced this pull request Feb 13, 2023
…o be tracked

Otherwise we may track it after lowering and invalidate the assumptions
made by the optimization.

Fix dotnet#82020
@dotnet dotnet deleted a comment from azure-pipelines bot Feb 13, 2023
@dotnet dotnet deleted a comment from azure-pipelines bot Feb 13, 2023
Comment on lines +6686 to +6689
// The emitter's general logic handles op1/op2 for bt reversed. As a
// small hack we reverse it in codegen instead of special casing the
// emitter throughout.
std::swap(op1, op2);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is preexisting to this PR, the old genCodeForBT had this handling too.

Comment on lines -511 to -512
GTF_RELOP_SJUMP_OPT = 0x04000000, // GT_<relop> -- Swap signed jl/jge with js/jns during emitter, reuses flags
// from previous instruction.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We handle this peephole fully when we see the flag setting instruction now, so we don't need this flag to communicate it along.

// Arguments:
// other - The range to move from.
//
LIR::ReadOnlyRange& LIR::ReadOnlyRange::operator=(ReadOnlyRange&& other)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the move constructor on LIR::ReadOnlyRange and I needed this version to write some logic cleanly.

@@ -2122,7 +2122,6 @@ void Compiler::fgComputeLifeLIR(VARSET_TP& life, BasicBlock* block, VARSET_VALAR
case GT_STORE_BLK:
case GT_STORE_DYN_BLK:
case GT_JCMP:
case GT_CMP:
Copy link
Member Author

@jakobbotsch jakobbotsch Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what now allows us to do DCE for GT_CMP nodes if they don't have GTF_SET_FLAGS set. Even before this PR we allowed that for the GT_EQ family of nodes, which were exactly the same as GT_CMP when void typed with GTF_SET_FLAGS set.

@@ -1673,7 +1673,7 @@ int LinearScan::ComputeOperandDstCount(GenTree* operand)
// Stores and void-typed operands may be encountered when processing call nodes, which contain
// pointers to argument setup stores.
assert(operand->OperIsStore() || operand->OperIsBlkOp() || operand->OperIsPutArgStk() ||
operand->OperIsCompare() || operand->OperIs(GT_CMP) || operand->TypeGet() == TYP_VOID);
operand->TypeIs(TYP_VOID));
Copy link
Member Author

@jakobbotsch jakobbotsch Feb 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This OperIs(GT_CMP) was seemingly only here because of code in LowerModPow2 that created a non-void typed GT_CMP, but GT_CMP does not produce a value so that wasn't expected.
The OperIsCompare() check here is unnecessary as those nodes always produce a value now.

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr libraries-jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

relop->gtType = TYP_VOID;
relop->gtFlags |= GTF_SET_FLAGS;

if (relop->OperIs(GT_EQ, GT_NE, GT_LT, GT_LE, GT_GE, GT_GT))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this code, a JTRUE(LT) is always turned into a JCC(CMP) with the JCC containing a condition code within gtCondition. In codegen for JCC, we don't need to check the child for the condition. I'm happy with that.

In my and chains patch, I let optimise bools create a chain of the form NE(AND(LT,EQ),0). LowerConstCompare() turns it into TEST_NE(LT,EQ) and then new function lowerTestCompare() turns it into CCMP_LT(EQ). A JTRUE node is allowed to have either a standard compare child or a CCMP child, eg JTRUE(CCMP_LT(EQ).

When combing both worlds, the lowering stage gets a little complicated, but everything else feels simple:

The CCMP_LT has a gtCondition, plus there is also a CCMP node. lowerTestCompare() turns TEST_NE(LT,EQ) into CCMP_LT(CMP). Then LowerJTrue() would turn it into CCMP(CMP), giving us JCC(CCMP(CMP).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. After this PR the plan is that we then introduce GT_SELECTCC which is to GT_SELECT as GT_JCC is to GT_JTRUE (with the additional caveat that we never see GT_JTRUE in codegen, because we have an invariant that there is nothing between the relop and GT_JTRUE).

When lowering GT_SELECT and GT_JTRUE we will invoke a common function to transform compare chains into CMP + CCMP + ... , followed up by the transformation into GT_SELECTCC and GT_JCC. For GT_SELECT this function also needs to verify that reordering the compare to happen before GT_SELECT is safe (if it isn't we'll leave GT_SELECT as GT_SELECT instead of transforming into GT_SELECTCC). This should put everything on the same plan and I think it will unify the last of the handling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it sounds like CCMP_LT would not be necessary with generic CCMP. I.e. we can transform TEST_NE(LT,EQ) into SETCC(CCMP(CMP(x, y))) and teach LowerJTrue to pass through SETCC nodes.
I actually considered using this approach instead of having GT_TEST_NE/GT_TEST_EQ/GT_BITTEST_NE/GT_BITTEST_EQ but it creates an additional node and will require some bespoke interference logic for GT_SELECT (but I think that logic will be necessary on arm64 in that case regardless).
I can try to see what it would look like in a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can transform TEST_NE(LT,EQ) into SETCC(CCMP(CMP(x, y)))

That's better. Reduces the number of node types.

When lowering GT_SELECT and GT_JTRUE we will invoke a common function to transform compare chains into CMP + CCMP + ...

If the chain parsing is also done at the CMP and AND nodes, then we should be able to only check the parent and child at each stage, so no probably no need to recursively go through the chain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, my question then is what to do with my and chains patch? I should wait until this patch is merged before continuing with it. Should I also wait for the GT_SELECTCC or any other patches - I don't think I need to, they should be mostly independent, but will merge conflict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think waiting for this one is a good idea, I hope we can merge it by today or tomorrow. Not sure how much this one will conflict with your PR.
I don't think you need to wait for GT_SELECTCC either. I will probably move the xarch backend to GT_SELECTCC as the first step and leave arm64 alone, so then they shouldn't conflict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For GT_SELECT this function also needs to verify that reordering the compare to happen before GT_SELECT is safe (if it isn't we'll leave GT_SELECT as GT_SELECT instead of transforming into GT_SELECTCC). This should put everything on the same plan and I think it will unify the last of the handling.

This will drop all the containing within SELECT? ie -SELECT will always load flags from a register, but SELECTCC will require the previous node to be a compare (like JCC) and always rely of the condition flags being set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Feb 14, 2023

cc @dotnet/jit-contrib PTAL @BruceForstall

Diffs

This turned out to be a bit more than a 'slight' refactoring, but I think it's an overall simplification for how conditions are handled, and should allow us to get some more principled handling for SELECT and the work that @a74nh is doing in the future.

Note that I ran outerloop, jitstress, libraries-jitstress and Fuzzlyn on this (I deleted some of the comments if you're wondering where I did that)

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Looks like a nice simplification.

@jakobbotsch jakobbotsch merged commit f2f734a into dotnet:main Feb 16, 2023
@jakobbotsch jakobbotsch deleted the cpu-flag-nodes-backend branch February 16, 2023 09:36
@jakobbotsch
Copy link
Member Author

cc @shushanhf -- the LA backend might need some fixups for this.

@shushanhf
Copy link
Contributor

cc @shushanhf -- the LA backend might need some fixups for this.

Thanks very much.
I will do it for LA.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants