-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DAGCombine] Transform shl X, cttz(Y)
to mul (Y & -Y), X
if cttz is unsupported
#85066
Conversation
✅ With the latest revision this PR passed the C/C++ code formatter. |
Does this need to happen in CGP for some reason, or would DAGCombine also work? |
I implement this in CGP to avoid duplicating the logic in GISel. |
CGP should only be used for transforms that require cross-block reasoning, which does not seem to be the case here. Aspirationally GlobalISel does not need CGP at all, because it can perform those optimizations itself. (Realistically, we are far from that...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this should be done in DAGCombiner.
llvm/lib/CodeGen/CodeGenPrepare.cpp
Outdated
// shl X, cttz(Y) -> mul (Y & -Y), X if cttz is unsupported on the target. | ||
Value *Y; | ||
if (match(I->getOperand(1), | ||
m_OneUse(m_Intrinsic<Intrinsic::cttz>(m_Value(Y))))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can match an intrinsic without specifying a match for all operands? That's surprising.
@arsenm Any comments? |
Yes, this is straightforward combining. The downside is then you have to do it twice, in the DAG and GISel |
7a9c015
to
43e36d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this into DAGCombine (and GISel if you want to handle both)
43e36d8
to
41b7713
Compare
shl X, cttz(Y)
to mul (Y & -Y), X
if cttz is unsupportedshl X, cttz(Y)
to mul (Y & -Y), X
if cttz is unsupported
@llvm/pr-subscribers-llvm-selectiondag Author: Yingwei Zheng (dtcxzyw) ChangesThis patch fold Patch is 28.06 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/85066.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
index dcd0310734ad72..a77054d1e33d61 100644
--- a/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
@@ -9962,6 +9962,18 @@ SDValue DAGCombiner::visitSHL(SDNode *N) {
if (SDValue NewSHL = visitShiftByConstant(N))
return NewSHL;
+ // fold (shl X, cttz(Y)) -> (mul (Y & -Y), X) if cttz is unsupported on the
+ // target.
+ if ((N1.getOpcode() == ISD::CTTZ || N1.getOpcode() == ISD::CTTZ_ZERO_UNDEF) &&
+ N1.hasOneUse() && !TLI.isOperationLegalOrCustom(ISD::CTTZ, VT) &&
+ TLI.isOperationLegalOrCustom(ISD::MUL, VT)) {
+ SDValue Y = N1.getOperand(0);
+ SDLoc DL(N);
+ SDValue NegY = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Y);
+ SDValue And = DAG.getNode(ISD::AND, DL, VT, Y, NegY);
+ return DAG.getNode(ISD::MUL, DL, VT, And, N0);
+ }
+
if (SimplifyDemandedBits(SDValue(N, 0)))
return SDValue(N, 0);
diff --git a/llvm/test/CodeGen/RISCV/shl-cttz.ll b/llvm/test/CodeGen/RISCV/shl-cttz.ll
new file mode 100644
index 00000000000000..e3ed16d4971410
--- /dev/null
+++ b/llvm/test/CodeGen/RISCV/shl-cttz.ll
@@ -0,0 +1,807 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 4
+; RUN: llc -mtriple=riscv32 -mattr=+m -verify-machineinstrs < %s \
+; RUN: | FileCheck %s -check-prefix=RV32I
+; RUN: llc -mtriple=riscv32 -mattr=+m,+zbb -verify-machineinstrs < %s \
+; RUN: | FileCheck %s -check-prefix=RV32ZBB
+; RUN: llc -mtriple=riscv64 -mattr=+m -verify-machineinstrs < %s \
+; RUN: | FileCheck %s -check-prefixes=RV64I,RV64IILLEGALI32
+; RUN: llc -mtriple=riscv64 -mattr=+m,+zbb -verify-machineinstrs < %s \
+; RUN: | FileCheck %s -check-prefixes=RV64ZBB,RV64ZBBILLEGALI32
+; RUN: llc -mtriple=riscv64 -mattr=+m -riscv-experimental-rv64-legal-i32 -verify-machineinstrs < %s \
+; RUN: | FileCheck %s -check-prefixes=RV64I,RV64ILEGALI32
+; RUN: llc -mtriple=riscv64 -mattr=+m,+zbb -riscv-experimental-rv64-legal-i32 -verify-machineinstrs < %s \
+; RUN: | FileCheck %s -check-prefixes=RV64ZBB,RV64ZBBLEGALI32
+
+define i8 @shl_cttz_i8(i8 %x, i8 %y) {
+; RV32I-LABEL: shl_cttz_i8:
+; RV32I: # %bb.0: # %entry
+; RV32I-NEXT: addi a2, a1, -1
+; RV32I-NEXT: not a1, a1
+; RV32I-NEXT: and a1, a1, a2
+; RV32I-NEXT: srli a2, a1, 1
+; RV32I-NEXT: andi a2, a2, 85
+; RV32I-NEXT: sub a1, a1, a2
+; RV32I-NEXT: andi a2, a1, 51
+; RV32I-NEXT: srli a1, a1, 2
+; RV32I-NEXT: andi a1, a1, 51
+; RV32I-NEXT: add a1, a2, a1
+; RV32I-NEXT: srli a2, a1, 4
+; RV32I-NEXT: add a1, a1, a2
+; RV32I-NEXT: andi a1, a1, 15
+; RV32I-NEXT: sll a0, a0, a1
+; RV32I-NEXT: ret
+;
+; RV32ZBB-LABEL: shl_cttz_i8:
+; RV32ZBB: # %bb.0: # %entry
+; RV32ZBB-NEXT: ctz a1, a1
+; RV32ZBB-NEXT: sll a0, a0, a1
+; RV32ZBB-NEXT: ret
+;
+; RV64IILLEGALI32-LABEL: shl_cttz_i8:
+; RV64IILLEGALI32: # %bb.0: # %entry
+; RV64IILLEGALI32-NEXT: addi a2, a1, -1
+; RV64IILLEGALI32-NEXT: not a1, a1
+; RV64IILLEGALI32-NEXT: and a1, a1, a2
+; RV64IILLEGALI32-NEXT: srli a2, a1, 1
+; RV64IILLEGALI32-NEXT: andi a2, a2, 85
+; RV64IILLEGALI32-NEXT: subw a1, a1, a2
+; RV64IILLEGALI32-NEXT: andi a2, a1, 51
+; RV64IILLEGALI32-NEXT: srli a1, a1, 2
+; RV64IILLEGALI32-NEXT: andi a1, a1, 51
+; RV64IILLEGALI32-NEXT: add a1, a2, a1
+; RV64IILLEGALI32-NEXT: srli a2, a1, 4
+; RV64IILLEGALI32-NEXT: add a1, a1, a2
+; RV64IILLEGALI32-NEXT: andi a1, a1, 15
+; RV64IILLEGALI32-NEXT: sll a0, a0, a1
+; RV64IILLEGALI32-NEXT: ret
+;
+; RV64ZBBILLEGALI32-LABEL: shl_cttz_i8:
+; RV64ZBBILLEGALI32: # %bb.0: # %entry
+; RV64ZBBILLEGALI32-NEXT: ctz a1, a1
+; RV64ZBBILLEGALI32-NEXT: sll a0, a0, a1
+; RV64ZBBILLEGALI32-NEXT: ret
+;
+; RV64ILEGALI32-LABEL: shl_cttz_i8:
+; RV64ILEGALI32: # %bb.0: # %entry
+; RV64ILEGALI32-NEXT: addi a2, a1, -1
+; RV64ILEGALI32-NEXT: not a1, a1
+; RV64ILEGALI32-NEXT: and a1, a1, a2
+; RV64ILEGALI32-NEXT: srliw a2, a1, 1
+; RV64ILEGALI32-NEXT: andi a2, a2, 85
+; RV64ILEGALI32-NEXT: subw a1, a1, a2
+; RV64ILEGALI32-NEXT: andi a2, a1, 51
+; RV64ILEGALI32-NEXT: srliw a1, a1, 2
+; RV64ILEGALI32-NEXT: andi a1, a1, 51
+; RV64ILEGALI32-NEXT: add a1, a2, a1
+; RV64ILEGALI32-NEXT: srliw a2, a1, 4
+; RV64ILEGALI32-NEXT: add a1, a1, a2
+; RV64ILEGALI32-NEXT: andi a1, a1, 15
+; RV64ILEGALI32-NEXT: sllw a0, a0, a1
+; RV64ILEGALI32-NEXT: ret
+;
+; RV64ZBBLEGALI32-LABEL: shl_cttz_i8:
+; RV64ZBBLEGALI32: # %bb.0: # %entry
+; RV64ZBBLEGALI32-NEXT: ctzw a1, a1
+; RV64ZBBLEGALI32-NEXT: sllw a0, a0, a1
+; RV64ZBBLEGALI32-NEXT: ret
+entry:
+ %cttz = call i8 @llvm.cttz.i8(i8 %y, i1 true)
+ %res = shl i8 %x, %cttz
+ ret i8 %res
+}
+
+define i8 @shl_cttz_constant_i8(i8 %y) {
+; RV32I-LABEL: shl_cttz_constant_i8:
+; RV32I: # %bb.0: # %entry
+; RV32I-NEXT: addi a1, a0, -1
+; RV32I-NEXT: not a0, a0
+; RV32I-NEXT: and a0, a0, a1
+; RV32I-NEXT: srli a1, a0, 1
+; RV32I-NEXT: andi a1, a1, 85
+; RV32I-NEXT: sub a0, a0, a1
+; RV32I-NEXT: andi a1, a0, 51
+; RV32I-NEXT: srli a0, a0, 2
+; RV32I-NEXT: andi a0, a0, 51
+; RV32I-NEXT: add a0, a1, a0
+; RV32I-NEXT: srli a1, a0, 4
+; RV32I-NEXT: add a0, a0, a1
+; RV32I-NEXT: andi a0, a0, 15
+; RV32I-NEXT: li a1, 4
+; RV32I-NEXT: sll a0, a1, a0
+; RV32I-NEXT: ret
+;
+; RV32ZBB-LABEL: shl_cttz_constant_i8:
+; RV32ZBB: # %bb.0: # %entry
+; RV32ZBB-NEXT: ctz a0, a0
+; RV32ZBB-NEXT: li a1, 4
+; RV32ZBB-NEXT: sll a0, a1, a0
+; RV32ZBB-NEXT: ret
+;
+; RV64IILLEGALI32-LABEL: shl_cttz_constant_i8:
+; RV64IILLEGALI32: # %bb.0: # %entry
+; RV64IILLEGALI32-NEXT: addi a1, a0, -1
+; RV64IILLEGALI32-NEXT: not a0, a0
+; RV64IILLEGALI32-NEXT: and a0, a0, a1
+; RV64IILLEGALI32-NEXT: srli a1, a0, 1
+; RV64IILLEGALI32-NEXT: andi a1, a1, 85
+; RV64IILLEGALI32-NEXT: subw a0, a0, a1
+; RV64IILLEGALI32-NEXT: andi a1, a0, 51
+; RV64IILLEGALI32-NEXT: srli a0, a0, 2
+; RV64IILLEGALI32-NEXT: andi a0, a0, 51
+; RV64IILLEGALI32-NEXT: add a0, a1, a0
+; RV64IILLEGALI32-NEXT: srli a1, a0, 4
+; RV64IILLEGALI32-NEXT: add a0, a0, a1
+; RV64IILLEGALI32-NEXT: andi a0, a0, 15
+; RV64IILLEGALI32-NEXT: li a1, 4
+; RV64IILLEGALI32-NEXT: sll a0, a1, a0
+; RV64IILLEGALI32-NEXT: ret
+;
+; RV64ZBBILLEGALI32-LABEL: shl_cttz_constant_i8:
+; RV64ZBBILLEGALI32: # %bb.0: # %entry
+; RV64ZBBILLEGALI32-NEXT: ctz a0, a0
+; RV64ZBBILLEGALI32-NEXT: li a1, 4
+; RV64ZBBILLEGALI32-NEXT: sll a0, a1, a0
+; RV64ZBBILLEGALI32-NEXT: ret
+;
+; RV64ILEGALI32-LABEL: shl_cttz_constant_i8:
+; RV64ILEGALI32: # %bb.0: # %entry
+; RV64ILEGALI32-NEXT: addi a1, a0, -1
+; RV64ILEGALI32-NEXT: not a0, a0
+; RV64ILEGALI32-NEXT: and a0, a0, a1
+; RV64ILEGALI32-NEXT: srliw a1, a0, 1
+; RV64ILEGALI32-NEXT: andi a1, a1, 85
+; RV64ILEGALI32-NEXT: subw a0, a0, a1
+; RV64ILEGALI32-NEXT: andi a1, a0, 51
+; RV64ILEGALI32-NEXT: srliw a0, a0, 2
+; RV64ILEGALI32-NEXT: andi a0, a0, 51
+; RV64ILEGALI32-NEXT: add a0, a1, a0
+; RV64ILEGALI32-NEXT: srliw a1, a0, 4
+; RV64ILEGALI32-NEXT: add a0, a0, a1
+; RV64ILEGALI32-NEXT: andi a0, a0, 15
+; RV64ILEGALI32-NEXT: li a1, 4
+; RV64ILEGALI32-NEXT: sllw a0, a1, a0
+; RV64ILEGALI32-NEXT: ret
+;
+; RV64ZBBLEGALI32-LABEL: shl_cttz_constant_i8:
+; RV64ZBBLEGALI32: # %bb.0: # %entry
+; RV64ZBBLEGALI32-NEXT: ctzw a0, a0
+; RV64ZBBLEGALI32-NEXT: li a1, 4
+; RV64ZBBLEGALI32-NEXT: sllw a0, a1, a0
+; RV64ZBBLEGALI32-NEXT: ret
+entry:
+ %cttz = call i8 @llvm.cttz.i8(i8 %y, i1 true)
+ %res = shl i8 4, %cttz
+ ret i8 %res
+}
+
+define i16 @shl_cttz_i16(i16 %x, i16 %y) {
+; RV32I-LABEL: shl_cttz_i16:
+; RV32I: # %bb.0: # %entry
+; RV32I-NEXT: addi a2, a1, -1
+; RV32I-NEXT: not a1, a1
+; RV32I-NEXT: and a1, a1, a2
+; RV32I-NEXT: srli a2, a1, 1
+; RV32I-NEXT: lui a3, 5
+; RV32I-NEXT: addi a3, a3, 1365
+; RV32I-NEXT: and a2, a2, a3
+; RV32I-NEXT: sub a1, a1, a2
+; RV32I-NEXT: lui a2, 3
+; RV32I-NEXT: addi a2, a2, 819
+; RV32I-NEXT: and a3, a1, a2
+; RV32I-NEXT: srli a1, a1, 2
+; RV32I-NEXT: and a1, a1, a2
+; RV32I-NEXT: add a1, a3, a1
+; RV32I-NEXT: srli a2, a1, 4
+; RV32I-NEXT: add a1, a1, a2
+; RV32I-NEXT: andi a2, a1, 15
+; RV32I-NEXT: slli a1, a1, 20
+; RV32I-NEXT: srli a1, a1, 28
+; RV32I-NEXT: add a1, a2, a1
+; RV32I-NEXT: sll a0, a0, a1
+; RV32I-NEXT: ret
+;
+; RV32ZBB-LABEL: shl_cttz_i16:
+; RV32ZBB: # %bb.0: # %entry
+; RV32ZBB-NEXT: ctz a1, a1
+; RV32ZBB-NEXT: sll a0, a0, a1
+; RV32ZBB-NEXT: ret
+;
+; RV64IILLEGALI32-LABEL: shl_cttz_i16:
+; RV64IILLEGALI32: # %bb.0: # %entry
+; RV64IILLEGALI32-NEXT: addi a2, a1, -1
+; RV64IILLEGALI32-NEXT: not a1, a1
+; RV64IILLEGALI32-NEXT: and a1, a1, a2
+; RV64IILLEGALI32-NEXT: srli a2, a1, 1
+; RV64IILLEGALI32-NEXT: lui a3, 5
+; RV64IILLEGALI32-NEXT: addiw a3, a3, 1365
+; RV64IILLEGALI32-NEXT: and a2, a2, a3
+; RV64IILLEGALI32-NEXT: sub a1, a1, a2
+; RV64IILLEGALI32-NEXT: lui a2, 3
+; RV64IILLEGALI32-NEXT: addiw a2, a2, 819
+; RV64IILLEGALI32-NEXT: and a3, a1, a2
+; RV64IILLEGALI32-NEXT: srli a1, a1, 2
+; RV64IILLEGALI32-NEXT: and a1, a1, a2
+; RV64IILLEGALI32-NEXT: add a1, a3, a1
+; RV64IILLEGALI32-NEXT: srli a2, a1, 4
+; RV64IILLEGALI32-NEXT: add a1, a1, a2
+; RV64IILLEGALI32-NEXT: andi a2, a1, 15
+; RV64IILLEGALI32-NEXT: slli a1, a1, 52
+; RV64IILLEGALI32-NEXT: srli a1, a1, 60
+; RV64IILLEGALI32-NEXT: add a1, a2, a1
+; RV64IILLEGALI32-NEXT: sll a0, a0, a1
+; RV64IILLEGALI32-NEXT: ret
+;
+; RV64ZBBILLEGALI32-LABEL: shl_cttz_i16:
+; RV64ZBBILLEGALI32: # %bb.0: # %entry
+; RV64ZBBILLEGALI32-NEXT: ctz a1, a1
+; RV64ZBBILLEGALI32-NEXT: sll a0, a0, a1
+; RV64ZBBILLEGALI32-NEXT: ret
+;
+; RV64ILEGALI32-LABEL: shl_cttz_i16:
+; RV64ILEGALI32: # %bb.0: # %entry
+; RV64ILEGALI32-NEXT: addi a2, a1, -1
+; RV64ILEGALI32-NEXT: not a1, a1
+; RV64ILEGALI32-NEXT: and a1, a1, a2
+; RV64ILEGALI32-NEXT: srliw a2, a1, 1
+; RV64ILEGALI32-NEXT: lui a3, 5
+; RV64ILEGALI32-NEXT: addi a3, a3, 1365
+; RV64ILEGALI32-NEXT: and a2, a2, a3
+; RV64ILEGALI32-NEXT: subw a1, a1, a2
+; RV64ILEGALI32-NEXT: lui a2, 3
+; RV64ILEGALI32-NEXT: addi a2, a2, 819
+; RV64ILEGALI32-NEXT: and a3, a1, a2
+; RV64ILEGALI32-NEXT: srliw a1, a1, 2
+; RV64ILEGALI32-NEXT: and a1, a1, a2
+; RV64ILEGALI32-NEXT: add a1, a3, a1
+; RV64ILEGALI32-NEXT: srliw a2, a1, 4
+; RV64ILEGALI32-NEXT: add a1, a1, a2
+; RV64ILEGALI32-NEXT: andi a2, a1, 15
+; RV64ILEGALI32-NEXT: slli a1, a1, 52
+; RV64ILEGALI32-NEXT: srli a1, a1, 60
+; RV64ILEGALI32-NEXT: add a1, a2, a1
+; RV64ILEGALI32-NEXT: sllw a0, a0, a1
+; RV64ILEGALI32-NEXT: ret
+;
+; RV64ZBBLEGALI32-LABEL: shl_cttz_i16:
+; RV64ZBBLEGALI32: # %bb.0: # %entry
+; RV64ZBBLEGALI32-NEXT: ctzw a1, a1
+; RV64ZBBLEGALI32-NEXT: sllw a0, a0, a1
+; RV64ZBBLEGALI32-NEXT: ret
+entry:
+ %cttz = call i16 @llvm.cttz.i16(i16 %y, i1 true)
+ %res = shl i16 %x, %cttz
+ ret i16 %res
+}
+
+define i16 @shl_cttz_constant_i16(i16 %y) {
+; RV32I-LABEL: shl_cttz_constant_i16:
+; RV32I: # %bb.0: # %entry
+; RV32I-NEXT: addi a1, a0, -1
+; RV32I-NEXT: not a0, a0
+; RV32I-NEXT: and a0, a0, a1
+; RV32I-NEXT: srli a1, a0, 1
+; RV32I-NEXT: lui a2, 5
+; RV32I-NEXT: addi a2, a2, 1365
+; RV32I-NEXT: and a1, a1, a2
+; RV32I-NEXT: sub a0, a0, a1
+; RV32I-NEXT: lui a1, 3
+; RV32I-NEXT: addi a1, a1, 819
+; RV32I-NEXT: and a2, a0, a1
+; RV32I-NEXT: srli a0, a0, 2
+; RV32I-NEXT: and a0, a0, a1
+; RV32I-NEXT: add a0, a2, a0
+; RV32I-NEXT: srli a1, a0, 4
+; RV32I-NEXT: add a0, a0, a1
+; RV32I-NEXT: andi a1, a0, 15
+; RV32I-NEXT: slli a0, a0, 20
+; RV32I-NEXT: srli a0, a0, 28
+; RV32I-NEXT: add a0, a1, a0
+; RV32I-NEXT: li a1, 4
+; RV32I-NEXT: sll a0, a1, a0
+; RV32I-NEXT: ret
+;
+; RV32ZBB-LABEL: shl_cttz_constant_i16:
+; RV32ZBB: # %bb.0: # %entry
+; RV32ZBB-NEXT: ctz a0, a0
+; RV32ZBB-NEXT: li a1, 4
+; RV32ZBB-NEXT: sll a0, a1, a0
+; RV32ZBB-NEXT: ret
+;
+; RV64IILLEGALI32-LABEL: shl_cttz_constant_i16:
+; RV64IILLEGALI32: # %bb.0: # %entry
+; RV64IILLEGALI32-NEXT: addi a1, a0, -1
+; RV64IILLEGALI32-NEXT: not a0, a0
+; RV64IILLEGALI32-NEXT: and a0, a0, a1
+; RV64IILLEGALI32-NEXT: srli a1, a0, 1
+; RV64IILLEGALI32-NEXT: lui a2, 5
+; RV64IILLEGALI32-NEXT: addiw a2, a2, 1365
+; RV64IILLEGALI32-NEXT: and a1, a1, a2
+; RV64IILLEGALI32-NEXT: sub a0, a0, a1
+; RV64IILLEGALI32-NEXT: lui a1, 3
+; RV64IILLEGALI32-NEXT: addiw a1, a1, 819
+; RV64IILLEGALI32-NEXT: and a2, a0, a1
+; RV64IILLEGALI32-NEXT: srli a0, a0, 2
+; RV64IILLEGALI32-NEXT: and a0, a0, a1
+; RV64IILLEGALI32-NEXT: add a0, a2, a0
+; RV64IILLEGALI32-NEXT: srli a1, a0, 4
+; RV64IILLEGALI32-NEXT: add a0, a0, a1
+; RV64IILLEGALI32-NEXT: andi a1, a0, 15
+; RV64IILLEGALI32-NEXT: slli a0, a0, 52
+; RV64IILLEGALI32-NEXT: srli a0, a0, 60
+; RV64IILLEGALI32-NEXT: add a0, a1, a0
+; RV64IILLEGALI32-NEXT: li a1, 4
+; RV64IILLEGALI32-NEXT: sll a0, a1, a0
+; RV64IILLEGALI32-NEXT: ret
+;
+; RV64ZBBILLEGALI32-LABEL: shl_cttz_constant_i16:
+; RV64ZBBILLEGALI32: # %bb.0: # %entry
+; RV64ZBBILLEGALI32-NEXT: ctz a0, a0
+; RV64ZBBILLEGALI32-NEXT: li a1, 4
+; RV64ZBBILLEGALI32-NEXT: sll a0, a1, a0
+; RV64ZBBILLEGALI32-NEXT: ret
+;
+; RV64ILEGALI32-LABEL: shl_cttz_constant_i16:
+; RV64ILEGALI32: # %bb.0: # %entry
+; RV64ILEGALI32-NEXT: addi a1, a0, -1
+; RV64ILEGALI32-NEXT: not a0, a0
+; RV64ILEGALI32-NEXT: and a0, a0, a1
+; RV64ILEGALI32-NEXT: srliw a1, a0, 1
+; RV64ILEGALI32-NEXT: lui a2, 5
+; RV64ILEGALI32-NEXT: addi a2, a2, 1365
+; RV64ILEGALI32-NEXT: and a1, a1, a2
+; RV64ILEGALI32-NEXT: subw a0, a0, a1
+; RV64ILEGALI32-NEXT: lui a1, 3
+; RV64ILEGALI32-NEXT: addi a1, a1, 819
+; RV64ILEGALI32-NEXT: and a2, a0, a1
+; RV64ILEGALI32-NEXT: srliw a0, a0, 2
+; RV64ILEGALI32-NEXT: and a0, a0, a1
+; RV64ILEGALI32-NEXT: add a0, a2, a0
+; RV64ILEGALI32-NEXT: srliw a1, a0, 4
+; RV64ILEGALI32-NEXT: add a0, a0, a1
+; RV64ILEGALI32-NEXT: andi a1, a0, 15
+; RV64ILEGALI32-NEXT: slli a0, a0, 52
+; RV64ILEGALI32-NEXT: srli a0, a0, 60
+; RV64ILEGALI32-NEXT: add a0, a1, a0
+; RV64ILEGALI32-NEXT: li a1, 4
+; RV64ILEGALI32-NEXT: sllw a0, a1, a0
+; RV64ILEGALI32-NEXT: ret
+;
+; RV64ZBBLEGALI32-LABEL: shl_cttz_constant_i16:
+; RV64ZBBLEGALI32: # %bb.0: # %entry
+; RV64ZBBLEGALI32-NEXT: ctzw a0, a0
+; RV64ZBBLEGALI32-NEXT: li a1, 4
+; RV64ZBBLEGALI32-NEXT: sllw a0, a1, a0
+; RV64ZBBLEGALI32-NEXT: ret
+entry:
+ %cttz = call i16 @llvm.cttz.i16(i16 %y, i1 true)
+ %res = shl i16 4, %cttz
+ ret i16 %res
+}
+
+define i32 @shl_cttz_i32(i32 %x, i32 %y) {
+; RV32I-LABEL: shl_cttz_i32:
+; RV32I: # %bb.0: # %entry
+; RV32I-NEXT: neg a2, a1
+; RV32I-NEXT: and a1, a1, a2
+; RV32I-NEXT: mul a0, a1, a0
+; RV32I-NEXT: ret
+;
+; RV32ZBB-LABEL: shl_cttz_i32:
+; RV32ZBB: # %bb.0: # %entry
+; RV32ZBB-NEXT: ctz a1, a1
+; RV32ZBB-NEXT: sll a0, a0, a1
+; RV32ZBB-NEXT: ret
+;
+; RV64I-LABEL: shl_cttz_i32:
+; RV64I: # %bb.0: # %entry
+; RV64I-NEXT: negw a2, a1
+; RV64I-NEXT: and a1, a1, a2
+; RV64I-NEXT: lui a2, 30667
+; RV64I-NEXT: addi a2, a2, 1329
+; RV64I-NEXT: mul a1, a1, a2
+; RV64I-NEXT: srliw a1, a1, 27
+; RV64I-NEXT: lui a2, %hi(.LCPI4_0)
+; RV64I-NEXT: addi a2, a2, %lo(.LCPI4_0)
+; RV64I-NEXT: add a1, a2, a1
+; RV64I-NEXT: lbu a1, 0(a1)
+; RV64I-NEXT: sllw a0, a0, a1
+; RV64I-NEXT: ret
+;
+; RV64ZBB-LABEL: shl_cttz_i32:
+; RV64ZBB: # %bb.0: # %entry
+; RV64ZBB-NEXT: ctzw a1, a1
+; RV64ZBB-NEXT: sllw a0, a0, a1
+; RV64ZBB-NEXT: ret
+entry:
+ %cttz = call i32 @llvm.cttz.i32(i32 %y, i1 true)
+ %res = shl i32 %x, %cttz
+ ret i32 %res
+}
+
+define i32 @shl_cttz_i32_zero_is_defined(i32 %x, i32 %y) {
+; RV32I-LABEL: shl_cttz_i32_zero_is_defined:
+; RV32I: # %bb.0: # %entry
+; RV32I-NEXT: beqz a1, .LBB5_2
+; RV32I-NEXT: # %bb.1: # %cond.false
+; RV32I-NEXT: neg a2, a1
+; RV32I-NEXT: and a1, a1, a2
+; RV32I-NEXT: lui a2, 30667
+; RV32I-NEXT: addi a2, a2, 1329
+; RV32I-NEXT: mul a1, a1, a2
+; RV32I-NEXT: srli a1, a1, 27
+; RV32I-NEXT: lui a2, %hi(.LCPI5_0)
+; RV32I-NEXT: addi a2, a2, %lo(.LCPI5_0)
+; RV32I-NEXT: add a1, a2, a1
+; RV32I-NEXT: lbu a1, 0(a1)
+; RV32I-NEXT: sll a0, a0, a1
+; RV32I-NEXT: ret
+; RV32I-NEXT: .LBB5_2:
+; RV32I-NEXT: li a1, 32
+; RV32I-NEXT: sll a0, a0, a1
+; RV32I-NEXT: ret
+;
+; RV32ZBB-LABEL: shl_cttz_i32_zero_is_defined:
+; RV32ZBB: # %bb.0: # %entry
+; RV32ZBB-NEXT: ctz a1, a1
+; RV32ZBB-NEXT: sll a0, a0, a1
+; RV32ZBB-NEXT: ret
+;
+; RV64I-LABEL: shl_cttz_i32_zero_is_defined:
+; RV64I: # %bb.0: # %entry
+; RV64I-NEXT: sext.w a2, a1
+; RV64I-NEXT: beqz a2, .LBB5_2
+; RV64I-NEXT: # %bb.1: # %cond.false
+; RV64I-NEXT: negw a2, a1
+; RV64I-NEXT: and a1, a1, a2
+; RV64I-NEXT: lui a2, 30667
+; RV64I-NEXT: addi a2, a2, 1329
+; RV64I-NEXT: mul a1, a1, a2
+; RV64I-NEXT: srliw a1, a1, 27
+; RV64I-NEXT: lui a2, %hi(.LCPI5_0)
+; RV64I-NEXT: addi a2, a2, %lo(.LCPI5_0)
+; RV64I-NEXT: add a1, a2, a1
+; RV64I-NEXT: lbu a1, 0(a1)
+; RV64I-NEXT: sllw a0, a0, a1
+; RV64I-NEXT: ret
+; RV64I-NEXT: .LBB5_2:
+; RV64I-NEXT: li a1, 32
+; RV64I-NEXT: sllw a0, a0, a1
+; RV64I-NEXT: ret
+;
+; RV64ZBB-LABEL: shl_cttz_i32_zero_is_defined:
+; RV64ZBB: # %bb.0: # %entry
+; RV64ZBB-NEXT: ctzw a1, a1
+; RV64ZBB-NEXT: sllw a0, a0, a1
+; RV64ZBB-NEXT: ret
+entry:
+ %cttz = call i32 @llvm.cttz.i32(i32 %y, i1 false)
+ %res = shl i32 %x, %cttz
+ ret i32 %res
+}
+
+define i32 @shl_cttz_constant_i32(i32 %y) {
+; RV32I-LABEL: shl_cttz_constant_i32:
+; RV32I: # %bb.0: # %entry
+; RV32I-NEXT: neg a1, a0
+; RV32I-NEXT: and a0, a0, a1
+; RV32I-NEXT: slli a0, a0, 2
+; RV32I-NEXT: ret
+;
+; RV32ZBB-LABEL: shl_cttz_constant_i32:
+; RV32ZBB: # %bb.0: # %entry
+; RV32ZBB-NEXT: ctz a0, a0
+; RV32ZBB-NEXT: li a1, 4
+; RV32ZBB-NEXT: sll a0, a1, a0
+; RV32ZBB-NEXT: ret
+;
+; RV64I-LABEL: shl_cttz_constant_i32:
+; RV64I: # %bb.0: # %entry
+; RV64I-NEXT: negw a1, a0
+; RV64I-NEXT: and a0, a0, a1
+; RV64I-NEXT: lui a1, 30667
+; RV64I-NEXT: addi a1, a1, 1329
+; RV64I-NEXT: mul a0, a0, a1
+; RV64I-NEXT: srliw a0, a0, 27
+; RV64I-NEXT: lui a1, %hi(.LCPI6_0)
+; RV64I-NEXT: addi a1, a1, %lo(.LCPI6_0)
+; RV64I-NEXT: add a0, a1, a0
+; RV64I-NEXT: lbu a0, 0(a0)
+; RV64I-NEXT: li a1, 4
+; RV64I-NEXT: sllw a0, a1, a0
+; RV64I-NEXT: ret
+;
+; RV64ZBB-LABEL: shl_cttz_constant_i32:
+; RV64ZBB: # %bb.0: # %entry
+; RV64ZBB-NEXT: ctzw a0, a0
+; RV64ZBB-NEXT: li a1, 4
+; RV64ZBB-NEXT: sllw a0, a1, a0
+; RV64ZBB-NEXT: ret
+entry:
+ %cttz = call i32 @llvm.cttz.i32(i32 %y, i1 true)
+ %res = shl i32 4, %cttz
+ ret i32 %res
+}
+
+define i32 @shl_cttz_multiuse_i32(i32 %x, i32 %y) {
+; RV32I-LABEL: shl_cttz_multiuse_i32:
+; RV32I: # %bb.0: # %entry
+; RV32I-NEXT: addi sp, sp, -16
+; RV32I-NEXT: .cfi_def_cfa_offset 16
+; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
+; RV32I-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
+; RV32I-NEXT: sw s1, 4(sp) # 4-byte Folded Spill
+; RV32I-NEXT: .cfi_offset ra, -4
+; RV32I-NEXT: .cfi_offset s0, -8
+; RV32I-NEXT: .cfi_offset s1, -12
+; RV32I-NEXT: neg a2, a1
+; RV32I-NEXT: and a1, a1, a2
+; RV32I-NEXT: lui a2, 30667
+; RV32I-NEXT: addi a2, a2, 1329
+; RV32I-NEXT: mul a1, a1, a2
+; RV32I-NEXT: srli a1, a1, 27
+; RV32I-NEXT: lui a2, %hi(.LCPI7_0)
+; RV32I-NEXT: addi a2, a2, %lo(.LCPI7_0)
+; RV32I-NEXT: add a1, a2, a1
+; RV32I-NEXT: lbu s0, 0(a1)
+; RV32I-NEXT: mv s1, a0
+; RV32I-NEXT: mv a0, s0
+; RV32I-NEXT: call use32
+; RV32I-NEXT: sll a0, s1, s0
+; RV32I-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
+; RV32I-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
+; RV32I-NEXT: lw s1, 4(sp) # 4-byte Folded Reload
+; RV32I-NEXT: addi sp, sp, 16
+; RV32I-NEXT: ret...
[truncated]
|
Done (only for DAGCombine). |
// fold (shl X, cttz(Y)) -> (mul (Y & -Y), X) if cttz is unsupported on the | ||
// target. | ||
if ((N1.getOpcode() == ISD::CTTZ || N1.getOpcode() == ISD::CTTZ_ZERO_UNDEF) && | ||
N1.hasOneUse() && !TLI.isOperationLegalOrCustom(ISD::CTTZ, VT) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You check cttz||cttz_zero_undef but hardcode the opcode in the legality check. Should you check for getOpcode's legality instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hardcode the opcode to avoid introducing regressions on rv64+zbb :(
Do you have better solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right condition might be isLegalOrCustom(CTTZ||CTTZ_ZERO_UNDEF)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right condition might be isLegalOrCustom(CTTZ||CTTZ_ZERO_UNDEF)
Unfortunately it doesn't work :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it suitable to add a TLI hook?
41b7713
to
728dad3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also port the same to globalisel?
Ping |
This is already approved? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…is unsupported (llvm#85066) This patch fold `shl X, cttz(Y)` to `mul (Y & -Y), X` if cttz is unsupported by the target. Alive2: https://alive2.llvm.org/ce/z/AtLN5Y Fixes llvm#84763.
This patch causes a crash when building the Linux kernel for PowerPC. A reduced C reproducer from struct {
short active_links;
} *iwl_mvm_exit_esr_vif;
short iwl_mvm_exit_esr_new_active_links;
void iwl_mvm_exit_esr(int link_to_keep) {
int __trans_tmp_10;
if (({
int __ret_warn_on =
iwl_mvm_exit_esr_vif->active_links & 1UL << link_to_keep;
__asm__("");
__builtin_expect(__ret_warn_on, 0);
})) {
long word = iwl_mvm_exit_esr_vif->active_links;
__trans_tmp_10 = __builtin_ctzl(word);
link_to_keep = __trans_tmp_10;
}
iwl_mvm_exit_esr_new_active_links = 1UL << link_to_keep;
}
A reduced LLVM IR reproducer from target datalayout = "e-m:e-Fn32-i64:64-n32:64-S128-v256:256:256-v512:512:512"
target triple = "powerpc64le-unknown-linux-gnu"
define void @iwl_mvm_exit_esr(i16 %0) {
entry:
%1 = tail call i16 @llvm.cttz.i16(i16 %0, i1 false)
%2 = zext i16 %1 to i64
%.pre9 = shl i64 1, %2
%conv7 = trunc i64 %.pre9 to i16
store i16 %conv7, ptr null, align 2
ret void
}
; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
declare i16 @llvm.cttz.i16(i16, i1 immarg) #0
attributes #0 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
|
I will have a look. |
Same problem as #92753. I will post a fix later :) |
@nathanchance Should be fixed by #94008. |
… X)` (#94008) Proof: https://alive2.llvm.org/ce/z/J7GBMU Same as #92753, the types of LHS and RHS in shift nodes may differ. + When VT is smaller than ShiftVT, it is safe to use trunc. + When VT is larger than ShiftVT, it is safe to use zext iff `is_zero_poison` is true (i.e., `opcode == ISD::CTTZ_ZERO_UNDEF`). See also the counterexample `src_shl_cttz2 -> tgt_shl_cttz2` in the alive2 proofs. Fixes issue #85066 (comment).
This patch fold
shl X, cttz(Y)
tomul (Y & -Y), X
if cttz is unsupported by the target.Alive2: https://alive2.llvm.org/ce/z/AtLN5Y
Fixes #84763.