Skip to content

Commit

Permalink
[DAGCombiner] Transform (icmp eq/ne (and X,C0),(shift X,C1)) to use…
Browse files Browse the repository at this point in the history
… rotate or to getter constants.

If `C0` is a mask and `C1` shifts out all the masked bits (to
essentially compare two subsets of `X`), we can arbitrarily re-order
shift as `srl` or `shl`.

If `C1` (shift amount) is a power of 2, we can replace the and+shift
with a rotate.

Otherwise, based on target preference we can arbitrarily swap `shl`
and `shl` in/out to get better constants.

On x86 we can use this re-ordering to:
    1) get better `and` constants for `C0` (zero extended moves or
       avoid imm64).
    2) covert `srl` to `shl` if `shl` will be implementable with `lea`
       or `add` (both of which can be preferable).

Proofs: https://alive2.llvm.org/ce/z/qzGM_w

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152116
  • Loading branch information
goldsteinn committed Oct 18, 2023
1 parent 0c2d28a commit 112e49b
Show file tree
Hide file tree
Showing 5 changed files with 279 additions and 86 deletions.
18 changes: 18 additions & 0 deletions llvm/include/llvm/CodeGen/TargetLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -828,6 +828,24 @@ class TargetLoweringBase {
return N->getOpcode() == ISD::FDIV;
}

// Given:
// (icmp eq/ne (and X, C0), (shift X, C1))
// or
// (icmp eq/ne X, (rotate X, CPow2))

// If C0 is a mask or shifted mask and the shift amt (C1) isolates the
// remaining bits (i.e something like `(x64 & UINT32_MAX) == (x64 >> 32)`)
// Do we prefer the shift to be shift-right, shift-left, or rotate.
// Note: Its only valid to convert the rotate version to the shift version iff
// the shift-amt (`C1`) is a power of 2 (including 0).
// If ShiftOpc (current Opcode) is returned, do nothing.
virtual unsigned preferedOpcodeForCmpEqPiecesOfOperand(
EVT VT, unsigned ShiftOpc, bool MayTransformRotate,
const APInt &ShiftOrRotateAmt,
const std::optional<APInt> &AndMask) const {
return ShiftOpc;
}

/// These two forms are equivalent:
/// sub %y, (xor %x, -1)
/// add (add %x, 1), %y
Expand Down
130 changes: 115 additions & 15 deletions llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12443,27 +12443,127 @@ SDValue DAGCombiner::visitSETCC(SDNode *N) {

ISD::CondCode Cond = cast<CondCodeSDNode>(N->getOperand(2))->get();
EVT VT = N->getValueType(0);
SDValue N0 = N->getOperand(0), N1 = N->getOperand(1);

SDValue Combined = SimplifySetCC(VT, N->getOperand(0), N->getOperand(1), Cond,
SDLoc(N), !PreferSetCC);

if (!Combined)
return SDValue();
SDValue Combined = SimplifySetCC(VT, N0, N1, Cond, SDLoc(N), !PreferSetCC);

// If we prefer to have a setcc, and we don't, we'll try our best to
// recreate one using rebuildSetCC.
if (PreferSetCC && Combined.getOpcode() != ISD::SETCC) {
SDValue NewSetCC = rebuildSetCC(Combined);
if (Combined) {
// If we prefer to have a setcc, and we don't, we'll try our best to
// recreate one using rebuildSetCC.
if (PreferSetCC && Combined.getOpcode() != ISD::SETCC) {
SDValue NewSetCC = rebuildSetCC(Combined);

// We don't have anything interesting to combine to.
if (NewSetCC.getNode() == N)
return SDValue();
// We don't have anything interesting to combine to.
if (NewSetCC.getNode() == N)
return SDValue();

if (NewSetCC)
return NewSetCC;
if (NewSetCC)
return NewSetCC;
}
return Combined;
}

return Combined;
// Optimize
// 1) (icmp eq/ne (and X, C0), (shift X, C1))
// or
// 2) (icmp eq/ne X, (rotate X, C1))
// If C0 is a mask or shifted mask and the shift amt (C1) isolates the
// remaining bits (i.e something like `(x64 & UINT32_MAX) == (x64 >> 32)`)
// Then:
// If C1 is a power of 2, then the rotate and shift+and versions are
// equivilent, so we can interchange them depending on target preference.
// Otherwise, if we have the shift+and version we can interchange srl/shl
// which inturn affects the constant C0. We can use this to get better
// constants again determined by target preference.
if (Cond == ISD::SETNE || Cond == ISD::SETEQ) {
auto IsAndWithShift = [](SDValue A, SDValue B) {
return A.getOpcode() == ISD::AND &&
(B.getOpcode() == ISD::SRL || B.getOpcode() == ISD::SHL) &&
A.getOperand(0) == B.getOperand(0);
};
auto IsRotateWithOp = [](SDValue A, SDValue B) {
return (B.getOpcode() == ISD::ROTL || B.getOpcode() == ISD::ROTR) &&
B.getOperand(0) == A;
};
SDValue AndOrOp = SDValue(), ShiftOrRotate = SDValue();
bool IsRotate = false;

// Find either shift+and or rotate pattern.
if (IsAndWithShift(N0, N1)) {
AndOrOp = N0;
ShiftOrRotate = N1;
} else if (IsAndWithShift(N1, N0)) {
AndOrOp = N1;
ShiftOrRotate = N0;
} else if (IsRotateWithOp(N0, N1)) {
IsRotate = true;
AndOrOp = N0;
ShiftOrRotate = N1;
} else if (IsRotateWithOp(N1, N0)) {
IsRotate = true;
AndOrOp = N1;
ShiftOrRotate = N0;
}

if (AndOrOp && ShiftOrRotate && ShiftOrRotate.hasOneUse() &&
(IsRotate || AndOrOp.hasOneUse())) {
EVT OpVT = N0.getValueType();
// Get constant shift/rotate amount and possibly mask (if its shift+and
// variant).
auto GetAPIntValue = [](SDValue Op) -> std::optional<APInt> {
ConstantSDNode *CNode = isConstOrConstSplat(Op, /*AllowUndefs*/ false,
/*AllowTrunc*/ false);
if (CNode == nullptr)
return std::nullopt;
return CNode->getAPIntValue();
};
std::optional<APInt> AndCMask =
IsRotate ? std::nullopt : GetAPIntValue(AndOrOp.getOperand(1));
std::optional<APInt> ShiftCAmt =
GetAPIntValue(ShiftOrRotate.getOperand(1));
unsigned NumBits = OpVT.getScalarSizeInBits();

// We found constants.
if (ShiftCAmt && (IsRotate || AndCMask) && ShiftCAmt->ult(NumBits)) {
unsigned ShiftOpc = ShiftOrRotate.getOpcode();
// Check that the constants meet the constraints.
bool CanTransform =
IsRotate ||
(*ShiftCAmt == (~*AndCMask).popcount() && ShiftOpc == ISD::SHL
? (~*AndCMask).isMask()
: AndCMask->isMask());

// See if target prefers another shift/rotate opcode.
unsigned NewShiftOpc = TLI.preferedOpcodeForCmpEqPiecesOfOperand(
OpVT, ShiftOpc, ShiftCAmt->isPowerOf2(), *ShiftCAmt, AndCMask);
// Transform is valid and we have a new preference.
if (CanTransform && NewShiftOpc != ShiftOpc) {
SDLoc DL(N);
SDValue NewShiftOrRotate =
DAG.getNode(NewShiftOpc, DL, OpVT, ShiftOrRotate.getOperand(0),
ShiftOrRotate.getOperand(1));
SDValue NewAndOrOp = SDValue();

if (NewShiftOpc == ISD::SHL || NewShiftOpc == ISD::SRL) {
APInt NewMask =
NewShiftOpc == ISD::SHL
? APInt::getHighBitsSet(NumBits,
NumBits - ShiftCAmt->getZExtValue())
: APInt::getLowBitsSet(NumBits,
NumBits - ShiftCAmt->getZExtValue());
NewAndOrOp =
DAG.getNode(ISD::AND, DL, OpVT, ShiftOrRotate.getOperand(0),
DAG.getConstant(NewMask, DL, OpVT));
} else {
NewAndOrOp = ShiftOrRotate.getOperand(0);
}

return DAG.getSetCC(DL, VT, NewAndOrOp, NewShiftOrRotate, Cond);
}
}
}
}
return SDValue();
}

SDValue DAGCombiner::visitSETCCCARRY(SDNode *N) {
Expand Down
66 changes: 66 additions & 0 deletions llvm/lib/Target/X86/X86ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3257,6 +3257,72 @@ bool X86TargetLowering::
return NewShiftOpcode == ISD::SHL;
}

unsigned X86TargetLowering::preferedOpcodeForCmpEqPiecesOfOperand(
EVT VT, unsigned ShiftOpc, bool MayTransformRotate,
const APInt &ShiftOrRotateAmt, const std::optional<APInt> &AndMask) const {
if (!VT.isInteger())
return ShiftOpc;

bool PreferRotate = false;
if (VT.isVector()) {
// For vectors, if we have rotate instruction support, then its definetly
// best. Otherwise its not clear what the best so just don't make changed.
PreferRotate = Subtarget.hasAVX512() && (VT.getScalarType() == MVT::i32 ||
VT.getScalarType() == MVT::i64);
} else {
// For scalar, if we have bmi prefer rotate for rorx. Otherwise prefer
// rotate unless we have a zext mask+shr.
PreferRotate = Subtarget.hasBMI2();
if (!PreferRotate) {
unsigned MaskBits =
VT.getScalarSizeInBits() - ShiftOrRotateAmt.getZExtValue();
PreferRotate = (MaskBits != 8) && (MaskBits != 16) && (MaskBits != 32);
}
}

if (ShiftOpc == ISD::SHL || ShiftOpc == ISD::SRL) {
assert(AndMask.has_value() && "Null andmask when querying about shift+and");

if (PreferRotate && MayTransformRotate)
return ISD::ROTL;

// If vector we don't really get much benefit swapping around constants.
// Maybe we could check if the DAG has the flipped node already in the
// future.
if (VT.isVector())
return ShiftOpc;

// See if the beneficial to swap shift type.
if (ShiftOpc == ISD::SHL) {
// If the current setup has imm64 mask, then inverse will have
// at least imm32 mask (or be zext i32 -> i64).
if (VT == MVT::i64)
return AndMask->getSignificantBits() > 32 ? ISD::SRL : ShiftOpc;

// We can only benefit if req at least 7-bit for the mask. We
// don't want to replace shl of 1,2,3 as they can be implemented
// with lea/add.
return ShiftOrRotateAmt.uge(7) ? ISD::SRL : ShiftOpc;
}

if (VT == MVT::i64)
// Keep exactly 32-bit imm64, this is zext i32 -> i64 which is
// extremely efficient.
return AndMask->getSignificantBits() > 33 ? ISD::SHL : ShiftOpc;

// Keep small shifts as shl so we can generate add/lea.
return ShiftOrRotateAmt.ult(7) ? ISD::SHL : ShiftOpc;
}

// We prefer rotate for vectors of if we won't get a zext mask with SRL
// (PreferRotate will be set in the latter case).
if (PreferRotate || VT.isVector())
return ShiftOpc;

// Non-vector type and we have a zext mask with SRL.
return ISD::SRL;
}

bool X86TargetLowering::preferScalarizeSplat(SDNode *N) const {
return N->getOpcode() != ISD::FP_EXTEND;
}
Expand Down
5 changes: 5 additions & 0 deletions llvm/lib/Target/X86/X86ISelLowering.h
Original file line number Diff line number Diff line change
Expand Up @@ -1138,6 +1138,11 @@ namespace llvm {
unsigned OldShiftOpcode, unsigned NewShiftOpcode,
SelectionDAG &DAG) const override;

unsigned preferedOpcodeForCmpEqPiecesOfOperand(
EVT VT, unsigned ShiftOpc, bool MayTransformRotate,
const APInt &ShiftOrRotateAmt,
const std::optional<APInt> &AndMask) const override;

bool preferScalarizeSplat(SDNode *N) const override;

bool shouldFoldConstantShiftPairToMask(const SDNode *N,
Expand Down
Loading

3 comments on commit 112e49b

@alexfh
Copy link
Contributor

@alexfh alexfh commented on 112e49b Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is causing a miscompile: https://gcc.godbolt.org/z/Ko3MsGYGs

Please fix or revert.

@alexfh
Copy link
Contributor

@alexfh alexfh commented on 112e49b Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test case without gtest: https://gcc.godbolt.org/z/fdbs8rx75

@alexfh
Copy link
Contributor

@alexfh alexfh commented on 112e49b Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've sent #71598 to revert this, but if you see an obvious problem with a reliable fix, feel free to suggest an alternative solution.

Please sign in to comment.